March 18, 2026

Patterns

Homework errors fall into a few categories:

  • Code that runs but doesn’t answer the question
  • Logic errors (& vs |)
  • Indexing errors (negative indexing)
  • Formula interface misuse ($ inside aggregate())
  • Merge confusion (all vs all.x)
  • Incomplete execution

Your Code Runs. But Does It Answer the Question?

The most common issue across all homeworks. Valid syntax, wrong result.

# Q: "Display the first 3 rows of mtcars"
head(mtcars)  # Returns 6 rows -- the default!
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

head() with no second argument returns 6 rows. If the question asks for 3, pass n = 3.

# Correct: specify the number of rows
head(mtcars, 3)
##                mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1

The Most Common Logic Error

Question: “Subset cars with mpg > 25 or 4 cylinders.”

# WRONG: & means BOTH conditions must be true
nrow(mtcars[mtcars$mpg > 25 & mtcars$cyl == 4, ])
## [1] 6
# RIGHT: | means EITHER condition
nrow(mtcars[mtcars$mpg > 25 | mtcars$cyl == 4, ])
## [1] 11

& narrows your results. | broadens them.

Quick truth table

mpg > 25 cyl == 4 & (AND) \| (OR)
TRUE TRUE TRUE TRUE
TRUE FALSE FALSE TRUE
FALSE TRUE FALSE TRUE
FALSE FALSE FALSE FALSE

Rule of thumb:

  • “and” = both must be true = fewer rows = &
  • “or” = either can be true = more rows = |

Negative Indexing: What Are You Actually Removing?

temps <- c(freezing = 32, cold = 45, cool = 60,
           warm = 75, hot = 90, boiling = 212)
temps
## freezing     cold     cool     warm      hot  boiling 
##       32       45       60       75       90      212

Question: “Remove freezing and boiling (positions 1 and 6).”

# WRONG: removes the MIDDLE, keeps the extremes
temps[-c(2:5)]
## freezing  boiling 
##       32      212
# RIGHT: removes positions 1 and 6
temps[-c(1, 6)]
## cold cool warm  hot 
##   45   60   75   90

The $ Trap Inside aggregate()

From HW06. The formula interface already knows to look inside the data argument.

# WRONG: $ notation inside a formula
aggregate(CO2$uptake ~ CO2$Type, data = CO2, FUN = mean)
# RIGHT: bare column names, data argument does the work
aggregate(uptake ~ Type, data = CO2, FUN = mean)
##          Type   uptake
## 1      Quebec 33.54286
## 2 Mississippi 20.88333

cbind() and Duplicate Columns

When you cbind() two aggregate() results that share a grouping column:

agg_mean <- aggregate(uptake ~ Type, data = CO2, FUN = mean)
agg_sd   <- aggregate(uptake ~ Type, data = CO2, FUN = sd)
cbind(agg_mean, agg_sd)
##          Type   uptake        Type   uptake
## 1      Quebec 33.54286      Quebec 9.673830
## 2 Mississippi 20.88333 Mississippi 7.815773

Type appears twice. This is messy and can cause problems downstream.

Fix: drop the duplicate before binding

cbind(agg_mean, agg_sd[, -1, drop = FALSE])
##          Type   uptake   uptake
## 1      Quebec 33.54286 9.673830
## 2 Mississippi 20.88333 7.815773

Or better yet, use merge():

merge(agg_mean, agg_sd, by = "Type", suffixes = c("_mean", "_sd"))
##          Type uptake_mean uptake_sd
## 1 Mississippi    20.88333  7.815773
## 2      Quebec    33.54286  9.673830

merge() Common Confusion

The most frequent HW06 error: confusing all = TRUE with all.x = TRUE.

sites   <- data.frame(site = c("A", "B", "C"),
                       habitat = c("forest", "wetland", "field"))
surveys <- data.frame(site = c("B", "C", "D"),
                       count = c(12, 8, 15))
merge(sites, surveys)                  # Inner: only B and C
##   site habitat count
## 1    B wetland    12
## 2    C   field     8

Comparing the join arguments

merge(sites, surveys, all.x = TRUE)   # Left: all of sites
##   site habitat count
## 1    A  forest    NA
## 2    B wetland    12
## 3    C   field     8
merge(sites, surveys, all = TRUE)     # Full outer: everything
##   site habitat count
## 1    A  forest    NA
## 2    B wetland    12
## 3    C   field     8
## 4    D    <NA>    15

all.x = TRUE keeps all rows from the first data frame. all = TRUE keeps all rows from both. They are not the same.

Non-Executing Code Chunks

Your .Rmd file must contain executable code chunks for knitting to work.

 ```r
 # This is DISPLAY ONLY -- it does not execute
 head(mtcars, 3)
 ```
 ```{r}
 # This EXECUTES when you knit
 head(mtcars, 3)
 ```

The difference is the curly braces: {r} vs just r. Without them, R Markdown renders the code as a formatted block but never runs it. Also: pasting console output with > prompts will not knit.

The Completeness Pattern

A recurring theme: so close!

# Step 1: Create the logical vector (done)
high_temps <- temps >= 60 & temps <= 100

# Step 2: Extract with it (MISSING)
temps[high_temps]
# Step 1: Subset the data (done)
six_cyl <- mtcars[mtcars$cyl == 6, ]

# Step 2: Calculate the mean (MISSING)
mean(six_cyl$mpg)

Summary

Pitfall Fix
head() with no n argument Always specify n when the question gives a number
& when you mean | “and” = & (fewer rows), “or” = | (more rows)
Wrong positions in negative indexing Index positively first to verify, then negate
$ inside aggregate() formula Use bare column names; data = handles the lookup
Duplicate columns from cbind() Drop the duplicate or use merge()
all = TRUE vs all.x = TRUE all = both sides; all.x = left side only
Overwriting built-in objects Always use a new variable name
```r vs ```{r} Curly braces make it executable
Incomplete answers Reread the question after writing your code