Goal:

Layouts (mfrow and layout)
Exporting figures (png, pdf, jpg and in Makdown)
Brief intro to ggplot

Using the World Development Indicators data from the last lab

wdi <- read.csv("WDI.csv")
wdi$region <- factor(wdi$region)
cols <- c("East Asia & Pacific"        = "steelblue",
          "Europe & Central Asia"      = "forestgreen",
          "Latin America & Caribbean"  = "goldenrod",
          "Middle East & North Africa" = "darkorange",
          "North America"              = "mediumpurple",
          "South Asia"                 = "firebrick",
          "Sub-Saharan Africa"         = "sienna")

Recall:

str(wdi)

## 'data.frame':    208 obs. of  16 variables:
##  $ country    : chr  "India" "China" "United States" "Indonesia" ...
##  $ iso2c      : chr  "IN" "CN" "US" "ID" ...
##  $ iso3c      : chr  "IND" "CHN" "USA" "IDN" ...
##  $ year       : int  2022 2022 2022 2022 2022 2022 2022 2022 2022 2022 ...
##  $ status     : logi  NA NA NA NA NA NA ...
##  $ lastupdated: chr  "2026-01-28" "2026-01-28" "2026-01-28" "2026-01-28" ...
##  $ LifeExp    : num  71.7 78.2 77.4 70.9 67.4 ...
##  $ GDP        : num  2347 12971 76657 4731 1538 ...
##  $ Area       : num  2973190 9388210 9147420 1892555 770880 ...
##  $ Population : int  1425423212 1412175000 334017321 278830529 243700667 223150896 210306415 169384897 144236933 128613117 ...
##  $ region     : Factor w/ 7 levels "East Asia & Pacific",..: 6 1 5 1 6 7 3 6 2 3 ...
##  $ capital    : chr  "New Delhi" "Beijing" "Washington D.C." "Jakarta" ...
##  $ longitude  : num  77.2 116.3 -77 106.8 72.8 ...
##  $ latitude   : num  28.6 40 38.9 -6.2 30.5 ...
##  $ income     : chr  "Lower middle income" "Upper middle income" "High income" "Upper middle income" ...
##  $ lending    : chr  "IBRD" "IBRD" "Not classified" "IBRD" ...

1. Controlloing layouts

with `mfrow()`

par(mfrow = c(2,1))
boxplot(GDP/1000 ~ region, data = wdi, log = "y", las = 1, col = cols)
boxplot(LifeExp ~ region, data = wdi, las = 1, col = cols)

Or prettier: modifying morgins with `oma` and `outer`

par(mfrow = c(2,1), 
    mar = c(0,4,1,2), # margins around each figure
    oma = c(3,0,2,0), # outer margins
    cex.axis = 0.8, 
    tck = 0.01, bty ="l", mgp = c(2,.25,0))

boxplot(GDP/1000 ~ region, data = wdi, log = "y", las = 1, 
        col = cols, 
        xlab = "", xaxt = "n")
boxplot(LifeExp ~ region, data = wdi, las = 1, col = cols)
title("Global Indicators across regions", outer = TRUE)

Issues with labelling?

par(mfrow = c(2,1), 
    mar = c(0,4,1,2), # margins around each figure
    oma = c(3,0,2,0), # outer margins
    cex.axis = 0.8, 
    tck = 0.01, bty ="l", mgp = c(2,.25,0))

boxplot(GDP/1000 ~ region, data = wdi, log = "y", las = 1, col = cols, 
        varwidth = TRUE, 
        xlab = "", xaxt = "n")  # no axis labelling
boxplot(LifeExp ~ region, data = wdi, las = 1, 
        varwidth = TRUE, 
        col = cols, xaxt = "n")
title("Global Indicators across regions", outer = TRUE)
regions <- levels(factor(wdi$region))
region_labels <- gsub("&", "&\n", regions, fixed = TRUE)
mtext(region_labels, side = 1, at = 1:7, line = 0, padj = 1)

More complicated layouts with `layout`

layout(rbind(c(1,2), c(1,3)))
layout.show(3)

Now do this:

require(scales)   # for transparency!
layout(rbind(c(1,2), c(1,3)))
par(cex.lab = 1.2, las = 1,
    mgp = c(2, .25, 0), tck = 0.01, 
    bty = "l",  mar = c(0,4,0,2), 
    oma = c(4,0,4,0), xpd = NA)

cont_col <- cols[match(wdi$region, levels(wdi$region))]
cex_pop  <- sqrt(wdi$Population / max(wdi$Population, na.rm = TRUE)) * 15

plot(wdi$GDP/1e3, wdi$LifeExp, log = "x",
     ylab = "Life expectancy (years)",
     xlab = "GDP per capita (1000 USD, log scale)", type = "n")
points(wdi$GDP/1e3, wdi$LifeExp,
       cex = cex_pop + .5, pch = 21, col = cont_col,
       bg  = alpha(cont_col, .7))
legend("bottomright", title = "Region",
       legend = levels(wdi$region),
       pt.bg = cols, col = "grey40",
       pch = 21, pt.cex = 1.8, bty = "n")

boxplot(GDP/1000 ~ region, data = wdi, log = "y", las = 1, 
        varwidth = TRUE, 
        col = alpha(cols, .7), border = cols, 
        xlab = "", xaxt = "n")  # no axis labelling
boxplot(LifeExp ~ region, data = wdi, las = 1, 
        varwidth = TRUE, 
        col = alpha(cols, .7), border = cols, 
        xaxt = "n", xlab = "Region")
title("Global Indicators across regions", outer = TRUE)

1b. Brief (unexpected) aside on functions

We’ll talk about functions more later in class. But one very basica pplication for using functions is bundling a bunch of code into a single quick line.

For example:

setPars <- function(){
  par(cex.lab = 1.2, las = 1,
    mgp = c(2, .25, 0), tck = 0.01, 
    bty = "l",  mar = c(0,4,0,2), 
    oma = c(4,0,4,0), xpd = NA)
}


plotWDI <- function(){
  cont_col <- cols[match(wdi$region, levels(wdi$region))]
  cex_pop  <- sqrt(wdi$Population / max(wdi$Population, na.rm = TRUE)) * 15
  
  plot(wdi$GDP/1e3, wdi$LifeExp, log = "x",
       ylab = "Life expectancy (years)",
       xlab = "GDP per capita (1000 USD, log scale)", type = "n")
  points(wdi$GDP/1e3, wdi$LifeExp,
         cex = cex_pop + .5, pch = 21, col = cont_col,
         bg  = alpha(cont_col, .7))
  legend("bottomright", title = "Region",
         legend = levels(wdi$region),
         pt.bg = cols, col = "grey40",
         pch = 21, pt.cex = 1.8, bty = "n")
}

boxplotWDI <- function(){
  boxplot(GDP/1000 ~ region, data = wdi, log = "y", las = 1, 
        varwidth = TRUE, 
        col = alpha(cols, .7), border = cols, 
        xlab = "", xaxt = "n")  # no axis labelling
  boxplot(LifeExp ~ region, data = wdi, las = 1, 
          varwidth = TRUE, 
          col = alpha(cols, .7), border = cols, 
          xaxt = "n", xlab = "Region")
}

Note - these functions ONLY do the plotting - none of the setup. Also they have no arguments and they return nothing. They just do the thing. Code then becomes very simple:

layout(rbind(c(1,2), c(1,3)))
setPars()
plotWDI()
boxplotWDI()

2. Outputting figures to files

What is a graphics device?

When R draws a plot, it sends output to a graphics device — a destination that knows how to render lines, points, text, and colors. By default that destination is your screen (the interactive device: windows() on Windows, quartz() on Mac). To save a figure to a file, you open a file device instead, draw your plot, then close the device with dev.off(). The pattern is always the same:

device_function("filename.ext", width = ..., height = ...)  # open
  # ... plotting code ...
dev.off()                                                    # close and write

Nothing appears on screen — output goes straight to the file. Forgetting dev.off() is the most common mistake: the file stays open and incomplete.

PDF vs. PNG: vector vs. raster

The two most useful formats are PDF (vector) and PNG (raster). They store images differently and suit different purposes.

	PDF	PNG
Type	Vector	Raster (bitmap)
Stores images as	Mathematical shape descriptions	Grid of pixels
Scaling	Perfect at any size	Degrades if enlarged
Best for	Publications, LaTeX, print	Web, Word, presentations
Resolution setting	Not applicable	Critical — use `res =`

PDF is resolution-independent: a line is stored as “from point A to point B,” so it renders crisp at any zoom level or print size. Ideal for journal submissions.

PNG stores a fixed pixel grid. At low resolution it looks blurry in print; at high resolution files get large. The res argument (dots per inch) controls the tradeoff.

Exporting to PDF

pdf("WDImegaplot.pdf", width = 12, height = 6)   # dimensions in inches

  layout(rbind(c(1,2), c(1,3)))
  setPars()
  plotWDI()
  boxplotWDI()

dev.off()

width and height are always inches
Default is 7 × 7 inches
Plotting in a loop produces a multi-page PDF — one page per plot

Exporting to PNG

png("WDImegaplot.png", width = 2400, height = 1200, res = 200)   # pixels

  layout(rbind(c(1,2), c(1,3)))
  setPars()
  plotWDI()
  boxplotWDI()
  
dev.off()

width and height are pixels by default
res sets dots per inch — controls how large text and points appear relative to the canvas
Effective plot size in inches = width / res, so width = 800, res = 150 gives a ~5.3 × 4 inch plot
For print quality use res = 300; for screen res = 96 or res = 150 is fine

Controlling size in R Markdown

In an .Rmd file you don’t open and close devices manually — knitr handles that. Instead, control figure size through chunk options:


``` r
# ... plotting code ...
```

fig.width / fig.height — dimensions in inches (like PDF)
dpi — resolution of the rendered PNG (knitr defaults to 72; use 150–300 for print)
Set defaults for the whole document in the setup chunk:

knitr::opts_chunk$set(fig.width = 7, fig.height = 5, dpi = 150)

`jpeg()` — when to use it

jpeg() works identically to png() but uses lossy compression:

jpeg("myplot.jpg", width = 800, height = 600, res = 150, quality = 90)
  # ... plotting code ...
dev.off()

quality ranges 0–100; lower = smaller file, more compression artifacts
Avoid for plots with sharp lines, text, or flat color regions — artifacts are visible
Reasonable for photographic content or where file size is the priority

3. Introduction to `ggplot2`

History and philosophy

ggplot2 was created by Hadley Wickham in 2005, based on Leland Wilkinson’s book The Grammar of Graphics (1999) — hence the “gg”. The core idea is that every statistical graphic can be described by a small set of components assembled in a consistent way:

data — a data frame
aesthetics — mappings from variables to visual properties (x, y, color, size, …)
geometries — the actual marks drawn (points, lines, bars, …)
scales, facets, themes — further refinements

In base R, you build a plot by issuing a sequence of imperative commands (plot(), then points(), then legend(), …). In ggplot2, you declare what you want and the package figures out the rendering. The result is a single object you can inspect, modify, and print.

Everything is built from data frames

This is the single most important constraint: every variable you want to plot must be a column in a data frame. You cannot pass a loose vector to ggplot2 the way you can to plot(). This is sometimes inconvenient, but it forces a discipline that pays off with complex data.

library(ggplot2)
wdi <- read.csv("WDI.csv")
wdi$region <- factor(wdi$region)

Aesthetics and geoms

A ggplot2 call has two required ingredients: ggplot() sets up the data and aesthetic mappings; a geom_*() function specifies what to draw.

ggplot(data = wdi, aes(x = GDP/1e3, y = LifeExp)) +
  geom_point()

aes() maps variables to visual channels:

aesthetic	visual property
`x`, `y`	position
`color`	point/line color (outline)
`fill`	filled shape color
`size`	point size
`alpha`	transparency
`shape`	point character

Common geom_* functions:

geom	what it draws
`geom_point()`	scatter plot
`geom_line()`	lines connecting observations
`geom_smooth()`	smoothed trend with confidence band
`geom_boxplot()`	box-and-whisker
`geom_histogram()`	histogram
`geom_bar()` / `geom_col()`	bar chart
`geom_text()`	text labels

Advantage 1 — adding dimensions is quick

In base R, encoding a third or fourth variable requires computing color vectors, size scalings, and legend calls by hand. In ggplot2, you just add aesthetics — scales and legends are generated automatically.

Compare adding color, size, and a log scale:

ggplot(wdi, aes(x = GDP/1e3, y = LifeExp, color = region, size  = Population/1e6)) +
  geom_point(alpha = 0.7) +
  scale_x_log10() +
  theme_bw()

Four variables (GDP, life expectancy, region, population) in about 4 lines, with automatic legends for both color and size.

BUT — it is extremely difficult to customize

ggplot2’s layered system makes common tasks easy but uncommon tasks surprisingly hard. Moving a legend to an exact coordinate, controlling tick positions on a secondary axis, mixing fonts within a plot title, nudging individual labels — all of these require either digging into theme() (which has ~100 arguments) or reaching for extension packages like ggtext, ggnewscale, or ggrepel.

In base R, arbitrary customization is usually a few par() arguments or a call to text() / mtext(). The same result in ggplot2 may require a stack of overrides.

# Moving legend inside the plot, changing its background,
# adjusting axis text size, removing the minor grid,
# and making the title left-aligned — all separate theme() calls
theme(
  legend.position      = c(0.15, 0.85),
  legend.background    = element_rect(fill = "white", color = "grey70"),
  axis.text            = element_text(size = 11),
  panel.grid.minor     = element_blank(),
  plot.title.position  = "plot"
)

Customizing a ggplot is one of the better uses of Large-Language Models

Faceting is extremely powerful / easy for data exploration

Faceting splits the data into subplots by one or more categorical variables — something that requires manual layout() calls and repeated plotting code in base R. In ggplot2 it is a single line.

ggplot(wdi, aes(x = GDP/1e3, y = LifeExp, size = Population/1e6)) +
  geom_point(alpha = 0.6, color = "steelblue") +
  geom_smooth(method = "lm", color = "firebrick",linewidth = 0.7) +
  scale_x_log10() +
  facet_wrap(~ region)   # key line of code

facet_wrap(~ region) produces one panel per region, all on the same scales, with a trend line in each. Doing this in base R would require a loop, careful margin management, and separate legend placement.

facet_grid(rows ~ cols) extends this to two categorical variables simultaneously.

The equivalent task in base R plotting is much harder.

ggplot2 vs. base R: a summary

	base R	ggplot2
Learning curve	Steeper for basics	Steeper for customization
Multi-variable plots	Manual (but explicit)	Fast (automatic legends/scales)
Faceting	Manual `layout()`	`facet_wrap()` / `facet_grid()`
Fine-grained control	Excellent	Difficult
Default aesthetics	Minimal, print-friendly	Ugly (IMO)
Data requirement	Any form of data is OK	Data frame required

The two approaches are complementary. ggplot2 is often faster for exploratory work and for panel figures; base R is often better when you need precise, publication-ready control over every element.

Base Plotting in R: Part III

EFB 654: R and Reproducible Research

Elie Gurarie

February 18, 2026

Goal:

1. Controlloing layouts

with `mfrow()`

Or prettier: modifying morgins with `oma` and `outer`

Issues with labelling?

More complicated layouts with `layout`

Now do this:

1b. Brief (unexpected) aside on functions

2. Outputting figures to files

What is a graphics device?

PDF vs. PNG: vector vs. raster

Exporting to PDF

Exporting to PNG

Controlling size in R Markdown

`jpeg()` — when to use it

3. Introduction to `ggplot2`

History and philosophy

Everything is built from data frames

Aesthetics and geoms

Advantage 1 — adding dimensions is quick

BUT — it is extremely difficult to customize

Faceting is extremely powerful / easy for data exploration

ggplot2 vs. base R: a summary

Base Plotting in R: Part III

EFB 654: R and Reproducible Research

Elie Gurarie

February 18, 2026

Goal:

1. Controlloing layouts

with mfrow()

Or prettier: modifying morgins with oma and outer

Issues with labelling?

More complicated layouts with layout

Now do this:

1b. Brief (unexpected) aside on functions

2. Outputting figures to files

What is a graphics device?

PDF vs. PNG: vector vs. raster

Exporting to PDF

Exporting to PNG

Controlling size in R Markdown

jpeg() — when to use it

3. Introduction to ggplot2

History and philosophy

Everything is built from data frames

Aesthetics and geoms

Advantage 1 — adding dimensions is quick

BUT — it is extremely difficult to customize

Faceting is extremely powerful / easy for data exploration

ggplot2 vs. base R: a summary

with `mfrow()`

Or prettier: modifying morgins with `oma` and `outer`

More complicated layouts with `layout`

`jpeg()` — when to use it

3. Introduction to `ggplot2`