- Indexing vectors (Today)
- Indexing matrices, data frames, and lists (Wednesday)
Feb 2, 2026
Download data from Blackboard:
I downloaded 2013’s most popular baby names from the Social Security Administration (http://www.ssa.gov/oact/babynames/limits.html).
load("Top300MF_names.Rdata")
str(top300) # What's the structure of top300?
## 'data.frame': 300 obs. of 3 variables: ## $ name: chr "Sophia" "Emma" "Olivia" "Isabella" ... ## $ mf : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ... ## $ n : int 21075 20788 18256 17490 15129 13066 13044 12313 10529 9345 ...
Create a small vector
y<-top300[1:20, "n"] # number of name occurences is the vector str(y) # Get used to using str()
## int [1:20] 21075 20788 18256 17490 15129 13066 13044 12313 10529 9345 ...
[: “Extract or Replace Parts of an Object”[ and ] are used for almost all indexing. See ?‘[’.y
## [1] 21075 20788 18256 17490 15129 13066 13044 12313 10529 9345 9232 9121 ## [13] 9108 8714 8370 8222 7979 7927 7677 7616
See the [1] on the far left? That’s R telling us how we can access that first value, 21075.
y
## [1] 21075 20788 18256 17490 15129 13066 13044 12313 10529 9345 9232 9121 ## [13] 9108 8714 8370 8222 7979 7927 7677 7616
Since they are already sorted, this is easy. Access them by their index.
y
## [1] 21075 20788 18256 17490 15129 13066 13044 12313 10529 9345 9232 9121 ## [13] 9108 8714 8370 8222 7979 7927 7677 7616
Since they are already sorted, this is easy. Access them by their index.
y[1]; y[2]; y[3] # OR y[1:3] # Use an expression within the index operator. Yields a vector.
## [1] 21075 ## [1] 20788 ## [1] 18256 ## [1] 21075 20788 18256
Let’s use the names in the name column of top300.
head(top300)
## name mf n ## 1 Sophia F 21075 ## 2 Emma F 20788 ## 3 Olivia F 18256 ## 4 Isabella F 17490 ## 5 Ava F 15129 ## 6 Mia F 13066
names(y)<-top300[1:20, "name"] # make 'names' the vector names str(y)
## Named int [1:20] 21075 20788 18256 17490 15129 13066 13044 12313 10529 9345 ... ## - attr(*, "names")= chr [1:20] "Sophia" "Emma" "Olivia" "Isabella" ...
y
## Sophia Emma Olivia Isabella Ava Mia Emily Abigail ## 21075 20788 18256 17490 15129 13066 13044 12313 ## Madison Elizabeth Charlotte Avery Sofia Chloe Ella Harper ## 10529 9345 9232 9121 9108 8714 8370 8222 ## Amelia Aubrey Addison Evelyn ## 7979 7927 7677 7616
y
## Sophia Emma Olivia Isabella Ava Mia Emily Abigail ## 21075 20788 18256 17490 15129 13066 13044 12313 ## Madison Elizabeth Charlotte Avery Sofia Chloe Ella Harper ## 10529 9345 9232 9121 9108 8714 8370 8222 ## Amelia Aubrey Addison Evelyn ## 7979 7927 7677 7616
y["Harper"]
## Harper ## 8222
y
## Sophia Emma Olivia Isabella Ava Mia Emily Abigail ## 21075 20788 18256 17490 15129 13066 13044 12313 ## Madison Elizabeth Charlotte Avery Sofia Chloe Ella Harper ## 10529 9345 9232 9121 9108 8714 8370 8222 ## Amelia Aubrey Addison Evelyn ## 7979 7927 7677 7616
y[c("Sophia", "Sofia")]
## Sophia Sofia ## 21075 9108
y)Take a minute to find the last 3 baby names. Assume you don’t already know the length of y is 20.
## Aubrey Addison Evelyn ## 7927 7677 7616
y)Take a minute to find the last 3 baby names. Assume you don’t already know the length of y is 20.
yl<-length(y) # The length of a vector is also the index of the last element y[(yl-2):yl] #
## Aubrey Addison Evelyn ## 7927 7677 7616
What’s another way?
rev(rev(y)[1:3])
## Aubrey Addison Evelyn ## 7927 7677 7616
Another way if we know the length of the vector. Use the minus sign to remove items.
y[-c(1:17)]
## Aubrey Addison Evelyn ## 7927 7677 7616
y[c(-1,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16,-17)]
## Aubrey Addison Evelyn ## 7927 7677 7616
Much better options…
y[-1:-17] y[seq(-1,-17)]
## Aubrey Addison Evelyn ## 7927 7677 7616 ## Aubrey Addison Evelyn ## 7927 7677 7616
Exclude the first two names
log.vec<-c(FALSE, FALSE, rep(TRUE, times=18)) log.vec
## [1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
y[log.vec]
## Olivia Isabella Ava Mia Emily Abigail Madison Elizabeth ## 18256 17490 15129 13066 13044 12313 10529 9345 ## Charlotte Avery Sofia Chloe Ella Harper Amelia Aubrey ## 9232 9121 9108 8714 8370 8222 7979 7927 ## Addison Evelyn ## 7677 7616
y
## Sophia Emma Olivia Isabella Ava Mia Emily Abigail ## 21075 20788 18256 17490 15129 13066 13044 12313 ## Madison Elizabeth Charlotte Avery Sofia Chloe Ella Harper ## 10529 9345 9232 9121 9108 8714 8370 8222 ## Amelia Aubrey Addison Evelyn ## 7979 7927 7677 7616
length(log.vec) == length(y)
## [1] TRUE
If the logical vector is shorter than the data, it will be recycled.
y[c(TRUE, TRUE)]
## Sophia Emma Olivia Isabella Ava Mia Emily Abigail ## 21075 20788 18256 17490 15129 13066 13044 12313 ## Madison Elizabeth Charlotte Avery Sofia Chloe Ella Harper ## 10529 9345 9232 9121 9108 8714 8370 8222 ## Amelia Aubrey Addison Evelyn ## 7979 7927 7677 7616
# same as y[TRUE]
## Sophia Emma Olivia Isabella Ava Mia Emily Abigail ## 21075 20788 18256 17490 15129 13066 13044 12313 ## Madison Elizabeth Charlotte Avery Sofia Chloe Ella Harper ## 10529 9345 9232 9121 9108 8714 8370 8222 ## Amelia Aubrey Addison Evelyn ## 7979 7927 7677 7616
# different from. Excluding every OTHER value. y[c(TRUE, FALSE)]
## Sophia Olivia Ava Emily Madison Charlotte Sofia Ella ## 21075 18256 15129 13044 10529 9232 9108 8370 ## Amelia Addison ## 7979 7677
The real power of indexing comes from using expressions within the indexing operators, but be sure the internal expression produces the value you expect.
y y[y == 13066] y[y > 13066]
## Sophia Emma Olivia Isabella Ava Mia Emily Abigail ## 21075 20788 18256 17490 15129 13066 13044 12313 ## Madison Elizabeth Charlotte Avery Sofia Chloe Ella Harper ## 10529 9345 9232 9121 9108 8714 8370 8222 ## Amelia Aubrey Addison Evelyn ## 7979 7927 7677 7616 ## Mia ## 13066 ## Sophia Emma Olivia Isabella Ava ## 21075 20788 18256 17490 15129
y[y != 13066]
## Sophia Emma Olivia Isabella Ava Emily Abigail Madison ## 21075 20788 18256 17490 15129 13044 12313 10529 ## Elizabeth Charlotte Avery Sofia Chloe Ella Harper Amelia ## 9345 9232 9121 9108 8714 8370 8222 7979 ## Aubrey Addison Evelyn ## 7927 7677 7616
y[y != y["Mia"]]
## Sophia Emma Olivia Isabella Ava Emily Abigail Madison ## 21075 20788 18256 17490 15129 13044 12313 10529 ## Elizabeth Charlotte Avery Sofia Chloe Ella Harper Amelia ## 9345 9232 9121 9108 8714 8370 8222 7979 ## Aubrey Addison Evelyn ## 7927 7677 7616
y[y > median(y)]
## Sophia Emma Olivia Isabella Ava Mia Emily Abigail ## 21075 20788 18256 17490 15129 13066 13044 12313 ## Madison Elizabeth ## 10529 9345
%in%%in% = “are found in”.cnames<-c("Chloe", "Charlotte", "Crystal", "Candy", "CoCo")
y[cnames] # Not really what we want
## Chloe Charlotte <NA> <NA> <NA> ## 8714 9232 NA NA NA
y[names(y) %in% cnames]
## Charlotte Chloe ## 9232 8714
Note that in cnames “Crystal”, “Candy”, and “CoCo” are not in y.
?intersect()islands dataset.R’s recycling of small vectors.seq()?1.Get the islands dataset.
data(islands)
2.Print the area of Vancouver and Victoria islands.
islands[c("Vancouver", "Victoria")]
## Vancouver Victoria ## 12 82
3.Find the sum of the first five islands in the vector.
sum(islands[1:5])
## [1] 36978
4.Find the mean of every other island combined starting with Antarctica (i.e., Antarctica, Australia, etc.) using logical indexing. It will save you time to take advantage of R’s recycling of small vectors.
mean(islands[c(F,T)])
## [1] 453.4167
5.Do the same as above, but use the vector’s index instead. Remember the function seq()?
mean(islands[seq(from=2, to=length(islands), by=2)])
## [1] 453.4167