Feb 2, 2026

Overview

  1. Indexing vectors (Today)
  2. Indexing matrices, data frames, and lists (Wednesday)

Indexing

  • Indexing is how you access particular items within a data structure (e.g., vector, data frame).
  • Typically, the more complex the structure the more work it takes to index chosen values.
  • By learning to index vectors you’ll learn how to index ~90% of all R objects.

Vector indexing

Download data from Blackboard:

I downloaded 2013’s most popular baby names from the Social Security Administration (http://www.ssa.gov/oact/babynames/limits.html).

load("Top300MF_names.Rdata")
str(top300) # What's the structure of top300?
## 'data.frame':    300 obs. of  3 variables:
##  $ name: chr  "Sophia" "Emma" "Olivia" "Isabella" ...
##  $ mf  : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
##  $ n   : int  21075 20788 18256 17490 15129 13066 13044 12313 10529 9345 ...

Vector indexing

Create a small vector

y<-top300[1:20, "n"] # number of name occurences is the vector
str(y) # Get used to using str()
##  int [1:20] 21075 20788 18256 17490 15129 13066 13044 12313 10529 9345 ...

[: “Extract or Replace Parts of an Object”

  • [ and ] are used for almost all indexing. See ?‘[’.
  • Reaching within an object
  • It takes names, index (integers), or logicals as arguments.
  • An element’s location within an object is its indexR chooses.
  • The names are user-defined—we choose.
  • Three ways to index = greater flexibility.
y
##  [1] 21075 20788 18256 17490 15129 13066 13044 12313 10529  9345  9232  9121
## [13]  9108  8714  8370  8222  7979  7927  7677  7616

See the [1] on the far left? That’s R telling us how we can access that first value, 21075.

Occurrence of the top 3 most popular baby names?

Occurrence of the top 3 most popular baby names?

Name the elements of the vector

Let’s use the names in the name column of top300.

head(top300)
##       name mf     n
## 1   Sophia  F 21075
## 2     Emma  F 20788
## 3   Olivia  F 18256
## 4 Isabella  F 17490
## 5      Ava  F 15129
## 6      Mia  F 13066

Name the elements of the vector

names(y)<-top300[1:20, "name"] # make 'names' the vector names
str(y)
##  Named int [1:20] 21075 20788 18256 17490 15129 13066 13044 12313 10529 9345 ...
##  - attr(*, "names")= chr [1:20] "Sophia" "Emma" "Olivia" "Isabella" ...
y
##    Sophia      Emma    Olivia  Isabella       Ava       Mia     Emily   Abigail 
##     21075     20788     18256     17490     15129     13066     13044     12313 
##   Madison Elizabeth Charlotte     Avery     Sofia     Chloe      Ella    Harper 
##     10529      9345      9232      9121      9108      8714      8370      8222 
##    Amelia    Aubrey   Addison    Evelyn 
##      7979      7927      7677      7616

Accessing a single element by name

y
##    Sophia      Emma    Olivia  Isabella       Ava       Mia     Emily   Abigail 
##     21075     20788     18256     17490     15129     13066     13044     12313 
##   Madison Elizabeth Charlotte     Avery     Sofia     Chloe      Ella    Harper 
##     10529      9345      9232      9121      9108      8714      8370      8222 
##    Amelia    Aubrey   Addison    Evelyn 
##      7979      7927      7677      7616
y["Harper"]
## Harper 
##   8222

Accessing greater than 1 element by name

y
##    Sophia      Emma    Olivia  Isabella       Ava       Mia     Emily   Abigail 
##     21075     20788     18256     17490     15129     13066     13044     12313 
##   Madison Elizabeth Charlotte     Avery     Sofia     Chloe      Ella    Harper 
##     10529      9345      9232      9121      9108      8714      8370      8222 
##    Amelia    Aubrey   Addison    Evelyn 
##      7979      7927      7677      7616
y[c("Sophia", "Sofia")]
## Sophia  Sofia 
##  21075   9108

Last 3 names (in our short dataset, y)

Take a minute to find the last 3 baby names. Assume you don’t already know the length of y is 20.

##  Aubrey Addison  Evelyn 
##    7927    7677    7616

Last 3 names (in our short dataset, y)

Take a minute to find the last 3 baby names. Assume you don’t already know the length of y is 20.

yl<-length(y) # The length of a vector is also the index of the last element
y[(yl-2):yl] # 
##  Aubrey Addison  Evelyn 
##    7927    7677    7616

What’s another way?

rev(rev(y)[1:3])
##  Aubrey Addison  Evelyn 
##    7927    7677    7616

Excluding values

Another way if we know the length of the vector. Use the minus sign to remove items.

y[-c(1:17)]
##  Aubrey Addison  Evelyn 
##    7927    7677    7616

Alternatively…

y[c(-1,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16,-17)]
##  Aubrey Addison  Evelyn 
##    7927    7677    7616

Much better options…

y[-1:-17]
y[seq(-1,-17)]
##  Aubrey Addison  Evelyn 
##    7927    7677    7616 
##  Aubrey Addison  Evelyn 
##    7927    7677    7616

Using a vector of logicals to index a vector

Exclude the first two names

log.vec<-c(FALSE, FALSE, rep(TRUE, times=18))
log.vec
##  [1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [13]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
y[log.vec]
##    Olivia  Isabella       Ava       Mia     Emily   Abigail   Madison Elizabeth 
##     18256     17490     15129     13066     13044     12313     10529      9345 
## Charlotte     Avery     Sofia     Chloe      Ella    Harper    Amelia    Aubrey 
##      9232      9121      9108      8714      8370      8222      7979      7927 
##   Addison    Evelyn 
##      7677      7616
y
##    Sophia      Emma    Olivia  Isabella       Ava       Mia     Emily   Abigail 
##     21075     20788     18256     17490     15129     13066     13044     12313 
##   Madison Elizabeth Charlotte     Avery     Sofia     Chloe      Ella    Harper 
##     10529      9345      9232      9121      9108      8714      8370      8222 
##    Amelia    Aubrey   Addison    Evelyn 
##      7979      7927      7677      7616

length(log.vec) == length(y)
## [1] TRUE

“Recycling”

If the logical vector is shorter than the data, it will be recycled.

y[c(TRUE, TRUE)]
##    Sophia      Emma    Olivia  Isabella       Ava       Mia     Emily   Abigail 
##     21075     20788     18256     17490     15129     13066     13044     12313 
##   Madison Elizabeth Charlotte     Avery     Sofia     Chloe      Ella    Harper 
##     10529      9345      9232      9121      9108      8714      8370      8222 
##    Amelia    Aubrey   Addison    Evelyn 
##      7979      7927      7677      7616
# same as
y[TRUE]
##    Sophia      Emma    Olivia  Isabella       Ava       Mia     Emily   Abigail 
##     21075     20788     18256     17490     15129     13066     13044     12313 
##   Madison Elizabeth Charlotte     Avery     Sofia     Chloe      Ella    Harper 
##     10529      9345      9232      9121      9108      8714      8370      8222 
##    Amelia    Aubrey   Addison    Evelyn 
##      7979      7927      7677      7616

Using a vector of logicals to index a vector

# different from. Excluding every OTHER value.
y[c(TRUE, FALSE)]
##    Sophia    Olivia       Ava     Emily   Madison Charlotte     Sofia      Ella 
##     21075     18256     15129     13044     10529      9232      9108      8370 
##    Amelia   Addison 
##      7979      7677

Vector Indexing

The real power of indexing comes from using expressions within the indexing operators, but be sure the internal expression produces the value you expect.

y
y[y == 13066]
y[y > 13066]
##    Sophia      Emma    Olivia  Isabella       Ava       Mia     Emily   Abigail 
##     21075     20788     18256     17490     15129     13066     13044     12313 
##   Madison Elizabeth Charlotte     Avery     Sofia     Chloe      Ella    Harper 
##     10529      9345      9232      9121      9108      8714      8370      8222 
##    Amelia    Aubrey   Addison    Evelyn 
##      7979      7927      7677      7616 
##   Mia 
## 13066 
##   Sophia     Emma   Olivia Isabella      Ava 
##    21075    20788    18256    17490    15129

Vector Indexing

y[y != 13066]
##    Sophia      Emma    Olivia  Isabella       Ava     Emily   Abigail   Madison 
##     21075     20788     18256     17490     15129     13044     12313     10529 
## Elizabeth Charlotte     Avery     Sofia     Chloe      Ella    Harper    Amelia 
##      9345      9232      9121      9108      8714      8370      8222      7979 
##    Aubrey   Addison    Evelyn 
##      7927      7677      7616

Vector Indexing

y[y != y["Mia"]]
##    Sophia      Emma    Olivia  Isabella       Ava     Emily   Abigail   Madison 
##     21075     20788     18256     17490     15129     13044     12313     10529 
## Elizabeth Charlotte     Avery     Sofia     Chloe      Ella    Harper    Amelia 
##      9345      9232      9121      9108      8714      8370      8222      7979 
##    Aubrey   Addison    Evelyn 
##      7927      7677      7616

Vector Indexing

y[y > median(y)]
##    Sophia      Emma    Olivia  Isabella       Ava       Mia     Emily   Abigail 
##     21075     20788     18256     17490     15129     13066     13044     12313 
##   Madison Elizabeth 
##     10529      9345

%in%

  • %in% = “are found in”.
  • Useful for looking for a vector of values within another vector of values.
cnames<-c("Chloe", "Charlotte", "Crystal", "Candy", "CoCo")
y[cnames] # Not really what we want
##     Chloe Charlotte      <NA>      <NA>      <NA> 
##      8714      9232        NA        NA        NA
y[names(y) %in% cnames]
## Charlotte     Chloe 
##      9232      8714

Note that in cnames “Crystal”, “Candy”, and “CoCo” are not in y.

  • also see ?intersect()

Exercise

  1. Get the islands dataset.
  2. Print the area of Vancouver and Victoria islands.
  3. Find the sum of the first five islands in the vector.
  4. Find the mean of every other island combined starting with Antarctica (i.e., Antarctica, Australia, etc.) using logical indexing. It will save you time to take advantage of R’s recycling of small vectors.
  5. Do the same as above, but use the vector’s index instead. Remember the function seq()?

Exercise

1.Get the islands dataset.

data(islands)

2.Print the area of Vancouver and Victoria islands.

islands[c("Vancouver", "Victoria")]
## Vancouver  Victoria 
##        12        82

3.Find the sum of the first five islands in the vector.

sum(islands[1:5])
## [1] 36978

Exercise

4.Find the mean of every other island combined starting with Antarctica (i.e., Antarctica, Australia, etc.) using logical indexing. It will save you time to take advantage of R’s recycling of small vectors.

mean(islands[c(F,T)])
## [1] 453.4167

5.Do the same as above, but use the vector’s index instead. Remember the function seq()?

mean(islands[seq(from=2, to=length(islands), by=2)])
## [1] 453.4167