Jan 28, 2026

Two-or-more-dimensional Data Structures and Classes

  • Dataframes
  • Lists
  • Matrices
  • Arrays

Data Frames

Data Frames

“…tightly coupled collections of variables…used as the fundamental data structure by most of R’s modeling software.”—data.frame() help page.

  • Extremely common
  • Combine most types of data classes (e.g., characters, factors, numeric)
  • mtcars and trees datasets were dataframes
  • create our own
    • “tag = value”
    • data imported into R is most often imported into a data.frame
  • Most important rule for data frames: they must be rectangular.
    • All columns and rows must have the same length.
    • Missing values are filled with NA to help with this.

Birds <- data.frame(Type = c("BirdofPrey", "BirdofPrey", "SongBird", 
                             "Shorebird", "Flightless", "SongBird", 
                             "Flightless"),
                    Num. = c(3, 15, 42, 8, 0, 29, 17),
                    Extinct = c(F,F,F,F,T,F,F), 
                    row.names = c("Eagle", "Hawk", "Bluebird", 
                                "Egret", "Dodo", "Blue Jay", "Kiwi"))

Copy the Birds data frame from Blackboard for the remaining exercises.

Birds
##                Type Num. Extinct
## Eagle    BirdofPrey    3   FALSE
## Hawk     BirdofPrey   15   FALSE
## Bluebird   SongBird   42   FALSE
## Egret     Shorebird    8   FALSE
## Dodo     Flightless    0    TRUE
## Blue Jay   SongBird   29   FALSE
## Kiwi     Flightless   17   FALSE

Did data.frame() do what we wanted?

median(Num.)
## Error:
## ! object 'Num.' not found
median(Birds$Num.)
## [1] 15

class(Birds$Type)
## [1] "character"
class(Birds$Num.)
## [1] "numeric"

It’s extremely useful to be able access the names of a data frame. For a data frame, names() or colnames() gives column or variable names. rownames() returns row names.

names(Birds)
## [1] "Type"    "Num."    "Extinct"
colnames(Birds)
## [1] "Type"    "Num."    "Extinct"
rownames(Birds)
## [1] "Eagle"    "Hawk"     "Bluebird" "Egret"    "Dodo"     "Blue Jay" "Kiwi"

Lists

Lists

Data frames are actually lists with equal length vectors. They’re just displayed differently.

class(Birds) # Can be user defined, but has defaults
typeof(Birds) # Unchangable
class(Birds)<-"BirdFrame" # Arbitrary, I just made that up.
## [1] "data.frame"
## [1] "list"

While data frames are probably the most common structure for temporary data storage and viewing, lists are probably most often used for more complex data. They are extremely flexible and useful, but a little tricky to work with.

Let’s look at our Birds data frame transformed to a list:

as.list(Birds)
## $Type
## [1] "BirdofPrey" "BirdofPrey" "SongBird"   "Shorebird"  "Flightless"
## [6] "SongBird"   "Flightless"
## 
## $Num.
## [1]  3 15 42  8  0 29 17
## 
## $Extinct
## [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
## 
## attr(,"class")
## [1] "BirdFrame"
## attr(,"row.names")
## [1] "Eagle"    "Hawk"     "Bluebird" "Egret"    "Dodo"     "Blue Jay" "Kiwi"

It’s the same exact information, just not as pretty.

Side note: as. functions

?as.vector
?as.numeric
?as.list
# etc.
as.list(Birds)

An example:

as.numeric(Birds)
## Error:
## ! 'list' object cannot be coerced to type 'double'

Why doesn’t this work?

Let’s create our own list. First, find some values to use.

letters
month.name
pi
seq(5, 95, by=10)
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y" "z"
##  [1] "January"   "February"  "March"     "April"     "May"       "June"     
##  [7] "July"      "August"    "September" "October"   "November"  "December" 
## [1] 3.141593
##  [1]  5 15 25 35 45 55 65 75 85 95

Use list() to create a list and remember to use the “tag = value” structure.

list(alpha = letters, Months = month.name, PI = pi, fives = seq(5, 95, by=10))
## $alpha
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y" "z"
## 
## $Months
##  [1] "January"   "February"  "March"     "April"     "May"       "June"     
##  [7] "July"      "August"    "September" "October"   "November"  "December" 
## 
## $PI
## [1] 3.141593
## 
## $fives
##  [1]  5 15 25 35 45 55 65 75 85 95

Matrices

Matrices

  • Also rectangular
  • All of the values populating the matrix have to be of the same class (e.g. character, numeric).

matrix()

?matrix
nine<-matrix(1:9, nrow=3) # by column by default
nine
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
nine<-matrix(1:9, nrow=3, byrow=T)
nine
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

nine<-matrix(as.character(1:9), nrow=3, byrow=T) # Characters are fine too
nine
##      [,1] [,2] [,3]
## [1,] "1"  "2"  "3" 
## [2,] "4"  "5"  "6" 
## [3,] "7"  "8"  "9"
nine<-matrix(1:9, nrow=3, byrow=T) # Change back to numeric
nine
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

Trying to mix data types results in (typically) unwanted coersion.

nine[1,1]
## [1] 1
nine[1,1]<-"Ten" 
nine
##      [,1]  [,2] [,3]
## [1,] "Ten" "2"  "3" 
## [2,] "4"   "5"  "6" 
## [3,] "7"   "8"  "9"

Another example, if we try to coerce our Birds data frame to a matrix, R changes a few things so that it becomes a valid matrix.

class(Birds)<-"data.frame"
as.matrix(Birds)
##          Type         Num. Extinct
## Eagle    "BirdofPrey" " 3" "FALSE"
## Hawk     "BirdofPrey" "15" "FALSE"
## Bluebird "SongBird"   "42" "FALSE"
## Egret    "Shorebird"  " 8" "FALSE"
## Dodo     "Flightless" " 0" "TRUE" 
## Blue Jay "SongBird"   "29" "FALSE"
## Kiwi     "Flightless" "17" "FALSE"

Matrix dimension names

Similar to a data frame, you can name the rows and columns. In addition, you can name the dimensions, but the names have to be passed as a list.

nine<-matrix(1:9, nrow=3, byrow=T, 
  dimnames=list(Foo=LETTERS[1:3], Bar=month.abb[1:3]))
nine
##    Bar
## Foo Jan Feb Mar
##   A   1   2   3
##   B   4   5   6
##   C   7   8   9

Arrays

Arrays

  • Can be one, two, three or more dimensional.
  • A two-dimensional array is the same thing as a matrix

Create arrays by passing a vector and giving a vector of dimensions to function array.

arr1<-array(1:(3^3), dim=rep(3,3), 
  dimnames=list(Foo=LETTERS[1:3], 
                Bar=month.abb[1:3], 
                Stooge=c("Larry", "Moe", "Curly")))

arr1
## , , Stooge = Larry
## 
##    Bar
## Foo Jan Feb Mar
##   A   1   4   7
##   B   2   5   8
##   C   3   6   9
## 
## , , Stooge = Moe
## 
##    Bar
## Foo Jan Feb Mar
##   A  10  13  16
##   B  11  14  17
##   C  12  15  18
## 
## , , Stooge = Curly
## 
##    Bar
## Foo Jan Feb Mar
##   A  19  22  25
##   B  20  23  26
##   C  21  24  27

Simple Exercise

Make an array from the environmental heterogeneity values in the BCI.env dataset available from the vegan package.

library(vegan) #might have to install.packages("vegan") first
data(BCI.env)
head(BCI.env$EnvHet)
## [1] 0.6272 0.3936 0.0000 0.0000 0.4608 0.0768
length(BCI.env$EnvHet)
## [1] 50

Simple Exercise

array(BCI.env$EnvHet, dim=c(2,5,5))
## , , 1
## 
##        [,1] [,2]   [,3]   [,4] [,5]
## [1,] 0.6272    0 0.4608 0.3808    0
## [2,] 0.3936    0 0.0768 0.2112    0
## 
## , , 2
## 
##        [,1]   [,2]   [,3]   [,4]   [,5]
## [1,] 0.4032 0.6624 0.0000 0.0000 0.0768
## [2,] 0.0000 0.1472 0.4608 0.6592 0.2112
## 
## , , 3
## 
##        [,1]   [,2]   [,3]   [,4]   [,5]
## [1,] 0.2688 0.6240 0.6080 0.0000 0.6528
## [2,] 0.2112 0.4352 0.3648 0.3328 0.6144
## 
## , , 4
## 
##        [,1]   [,2]   [,3]   [,4]   [,5]
## [1,] 0.4928 0.0768 0.3328 0.3648 0.0000
## [2,] 0.7264 0.0000 0.4032 0.0000 0.6208
## 
## , , 5
## 
##        [,1]   [,2]   [,3]   [,4]   [,5]
## [1,] 0.4032 0.0768 0.3424 0.3648 0.4992
## [2,] 0.1472 0.5568 0.1472 0.4608 0.6368