Note: this is an unfinished lecture on the utility of matrices in ecology for a Practical Seminar in Quantitative Wildlife Ecology graduate level course at SUNY-ESF. A theme of the class is looking under the hood of various statistical operations and tools - and at times building our own engines. WIth that in mind, matrices come up over and over again in various applications, and this is an attempt to bring together several “magical” ways that matrices concretely solve problems, fit models, and perform other useful operations on data. This includes: linear regression, principle component analysis, spatial transformations, and population modeling … so really a lot of what we do!

Part I. Ordinary Least Squares in Matrix Notation

In matrix notation, the ordinary least squares (OLS) estimates of simple linear regression and factorial analysis is a straightforward generalization of:

\[ {\bf y} = {\bf X} \beta^T + {\bf \epsilon} \]

Here, \({\bf y}\) is a \(n \times 1\) vector of observations1, \(\beta^T\) represents a \(1 \times k\) of regression coefficients (intercepts, group means, regression coefficients, etc.), \(X\) is an \(n \times k\) “design matrix” for the model (more on this later), and \({\bf \epsilon}\) is a vector of random “errors”2.

To sink into \(X\), the design matrix, a bit, consider the familiar intercept / slope model (\({\bf y} = \alpha + \beta {\bf x} + {\bf \epsilon}\)). Here \(X\) will have TWO columns \((1, {\bf x})\), one for the intercept, the second for the covariate \(x\). An even SIMPLER case is to only have an intercept column, where every element is = 1. This is equivalent to just estimating a mean and standard deviation. Walking through the examples below may provide some intuition for why a column of 1’s is an “intercept”.

And never forget that our goal is to estimate all the \(\beta\) values (and also the ever-forgotten \(\sigma\) standard deviation of the \(\epsilon\)).

1. Solving for \(\widehat{\beta}\)

Just a bit of algebra & calculus here. We really want to minimize those residuals \(\epsilon\), which we do by taking the derivative and setting to 0. So:

\[{\partial {\bf \epsilon} \over \partial\beta} = {\partial \left( {\bf y} - X \beta^T \right) \over \partial \beta}\]

It seems crazy that this is even possible, but once you get used to it - matrix / vector calculus is pretty similar to scalar (number) calculus. In any case, you CAN take that derivative, and you CAN set it equal to 0, and you CAN solve for \(\beta\) (even though it’s a vector and X is a FREAKING MATRIX[!!]) and you get a solution that looks like this:

\[\hat{\beta} = \left(\mathbf{X}^T \mathbf{X}\right)^{-1} \mathbf{X}^T y\]

This is general, and computable (i.e. no need to numerically look for a solution). But it does need some picking apart and explaining. We’ll do some simple examples below.

2. Mean only model

If there are no covariates, the model is simply:
\[Y \sim {\cal N}(\beta_0, \sigma^2) \]

(Obviously, \(\beta_0\) = \(\mu\) = \(\alpha\), as the intercept is the mean)

The OLS estimate of \(\beta_0\) was: \[\widehat{\beta_0} = {1\over n} \sum_{i=1}^n Y_i\]

Let’s try to apply the more general solution: \[ \hat{\beta}=\left(\mathbf{X}^T\mathbf{X}\right)^{-1}\mathbf{X}^Ty\]

The design matrix \(X\) here is just a column vector of 1’s. In the following example, assume a vector \(Y = (1,2,3)\), and \(X = \left( \begin{array}{c}1\\1\\1\end{array} \right)\). In R:

(Y <- 1:3)
## [1] 1 2 3
(X <- matrix(1, nrow=3))
##      [,1]
## [1,]    1
## [2,]    1
## [3,]    1

2b. Some important matrix operations

Matrices are basically just a very convenient / powerful shorthand to do bookkeeping and collapse sums on multi-dimensional vectors. You can define a \(n \times k\) matrix \(A\) just by enumerating the row and column elements \(A_{ij}\), where \(i \in {1,2,..,n}\) and \(j \in {1,2, ...,k}\). The following matrix operations are needed to obtain the estimates.

The transpose of \(A_{ij}\) is: \(A^T_{ij} = A_{ji}\). Thus: \(A^T\) is a \(k\times n\) matrix, and (in our example) \(X^T = (1,1,1)\) (a \(1 \times 3\) matrix). In R, the function is just t():

t(X)
##      [,1] [,2] [,3]
## [1,]    1    1    1

Matrix multiplication: The product of a \(n \times m\) matrix \(A\), consisting of elements \(a_{ij}\) multiplied by an \(m \times p\) matrix \(B\), consisting of elements \(b_{ij}\) is a square \(m \times m\) matrix, the elements of which are: \[(AB)_{ij} = \sum_{k=1}^m a_{ik}\,b_{kj}\]

So, \(X^T X\) is a \(1 \times 1\) matrix, the element of which is \(\sum_{k=1}^3 X^T_{1,k} X_{k,1} = 1+1+1 = 3\). In R, matrix multiplication (and other matrix operations) are denotes with a % delimeter. Thus:

t(X) %*% X
##      [,1]
## [1,]    3

Compare with:

X %*% t(X)
##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    1    1    1
## [3,]    1    1    1

And with

t(X) * X
## Error in t(X) * X: non-conformable arrays

The inverse of a matrix: \(A^{-1}\) is defined according to the following identity: \[A^{-1} A = A \times A^{-1} = I\] Where \(I\) is the identity matrix, or a diagonal vector of 1’s. The inversion of a matrix is a somewhat slippery and subtle thing. It is also an important step in lots of statistical calculations, and somewhat computationally intensive. If a regression procedure is slow, the inversion of the matrix is almost always the computationally limiting step. For example, inverting a \(1000 \times 1000\) matrix takes a noticable amount of time (several seconds on my machine):

system.time(solve(matrix(rnorm(1e6),nrow=1e3)))
##    user  system elapsed 
##   1.548   0.026   1.577

3. Estimating the mean

Ok, back to our case. The inverse of \(X^T X\) is the matrix which returns the identity matrix when mutiplied by \(X^T X\) . Since \(X^T X = 3\), its inverse is just \(1/3\). In R, the inverse is obtained with the solve() function:

solve(t(X)%*%X)
##           [,1]
## [1,] 0.3333333

We multiply this back by \(X^T\) to obtain for the “hat matrix” \(H\) (which converts \(Y\) to \(\widehat{Y}\)). This product (of a \(1\times1\) and a \(1\times 3\) matrix) just carries the 1/3 across all the elements of \(X^T\):

solve(t(X) %*% X) %*% t(X)
##           [,1]      [,2]      [,3]
## [1,] 0.3333333 0.3333333 0.3333333

When we multiply this by \(Y\) (the last step), you are performing the sum - that is part of the definition of the sample mean - but it weighted by the 1/n term, thus: \[ \left(\mathbf{X}^T\mathbf{X}\right)^{-1}\mathbf{X}^T \, Y = H Y = {1\over 3} y_1 + {1\over 3} y_3 + {1\over 3} y_3 = {1\over3}(1+2+3) = 2.\] In R:

(beta <- solve(t(X)%*%X)%*%t(X)%*%Y)
##      [,1]
## [1,]    2

I.e. \(\widehat{\beta_0} = \widehat{\mu} = \overline{X} = 2\). Voilà!

4. Estimating the variance

Note that that estimate of the variance on the residuals, expressed generally, is:

\[ \widehat{\sigma^2} = {1 \over n-k} (Y - \widehat{Y})^T (Y - \widehat{Y})\]
In this case, \(k=1\), and the product of the two matrices is just the regular sum of squared residuals in the definition of the sample variance:

\[ \widehat{\sigma^2} = {1 \over n-1} \sum_{i=1}^n (Y_i - \widehat{Y})^2\]

Thus, in R:

H <- solve(t(X)%*%X)%*%t(X)
beta.hat <- H%*%Y
Y.hat <-X%*%beta.hat
(sigma.hat <- t(Y-Y.hat) %*% (Y - Y.hat) / (nrow(X) - ncol(X)))
##      [,1]
## [1,]    1

Which is, of course, equal to:

sd(Y)
## [1] 1

5. Adding a column

Obviously, the advantage of the matrix notation is its great generality. We can perform a linear regression of \(Y = 1,2,3\) against some covariate \(X = -1,0,1\) using the exact same code. We need only to set up the design matrix:

(X <- cbind(c(1,1,1), -1:1))
##      [,1] [,2]
## [1,]    1   -1
## [2,]    1    0
## [3,]    1    1
(H <- solve(t(X)%*%X)%*%t(X))
##            [,1]      [,2]      [,3]
## [1,]  0.3333333 0.3333333 0.3333333
## [2,] -0.5000000 0.0000000 0.5000000
(beta.hat <- H%*%Y)
##      [,1]
## [1,]    2
## [2,]    1
Y.hat <-X%*%beta.hat
(sigma.hat <- t(Y-Y.hat) %*% (Y - Y.hat) / (nrow(X) - ncol(X)))
##      [,1]
## [1,]    0

So the regression coefficients are \(\beta_0 = 2\) and \(\beta_1 = 1\), and \(\sigma^2 = 0\). Obviously, that’s correct, but if you need proof:

plot(X[,2],Y, cex=2, pch=19)
abline(t(beta.hat), col=2, lwd=2)

6. Real data example

Maybe you’re unimpressed because that’s such a fake example? Here’s an all-matrix regression analysis of the famous iris datas set:. The figure shows petal length as a response variable against sepal length and species in interaction.

require(ggplot2)
data(iris)
ggplot(iris, aes(Sepal.Length, Petal.Length, col = Species)) + geom_point() +  geom_smooth(method = "lm")

What are the regression coefficients? Here’s your response:

y <- iris$Petal.Length

here’s your design matrix:

X <- with(iris, 
          cbind(Sepal.Length = iris$Sepal.Length,
                setosa = Species == "setosa",
                versicolor = Species == "versicolor",
                virginica = Species == "virginica"))

Here’s how that matrix looks:

head(X)
##      Sepal.Length setosa versicolor virginica
## [1,]          5.1      1          0         0
## [2,]          4.9      1          0         0
## [3,]          4.7      1          0         0
## [4,]          4.6      1          0         0
## [5,]          5.0      1          0         0
## [6,]          5.4      1          0         0

Here are the regression coefficients:

(beta.hat <- solve(t(X) %*% X) %*% t(X) %*% y)
##                    [,1]
## Sepal.Length  0.6321099
## setosa       -1.7023422
## versicolor    0.5077956
## virginica     1.3876599

Compare those to the output of lm:

(mylm <- lm(Petal.Length ~ Sepal.Length + Species-1, data = iris))
## 
## Call:
## lm(formula = Petal.Length ~ Sepal.Length + Species - 1, data = iris)
## 
## Coefficients:
##      Sepal.Length      Speciessetosa  Speciesversicolor   Speciesvirginica  
##            0.6321            -1.7023             0.5078             1.3877

Voilà!

Note that that matrix X is created by the formula command in R y ~ X, and can be returnes with the very useful model.matrix() function. Thus:

model.matrix(mylm) |> head()
##   Sepal.Length Speciessetosa Speciesversicolor Speciesvirginica
## 1          5.1             1                 0                0
## 2          4.9             1                 0                0
## 3          4.7             1                 0                0
## 4          4.6             1                 0                0
## 5          5.0             1                 0                0
## 6          5.4             1                 0                0

What about the model variance? I.e. that hidden \(\widehat{\sigma^2}\) that everyone always forgets about?

y.hat <- X %*% beta.hat
(sigma.hat <- t(y-y.hat) %*% (y - y.hat) / (nrow(X) - ncol(X)))
##            [,1]
## [1,] 0.07984347

And that little variance is in the model output here:

summary(mylm)$sigma^2
## [1] 0.07984347

Huzzah!

Mini-challenge: How would you write up the design matrix the interaction model, where there is a unique slope for each species?

Part II: Matrices as transformations and rotations

A square (\(k \times k\)) matrix distorts a vector (or matrix) of length (or first dimension) \(k\) by twisting and stretching.

Here’s a 2D vector:

x <- rbind(c(1,1)) # note ... this has to be a "matrix" for everything to work

plot(0,0, xlim = c(-1,1), asp = 1, ylim = c(-1,1))
plotarrow <- function(x, l = .1, ...) 
  arrows(0,0, x[1], x[2], length = l, ...)
plotarrow(x)

here’s a vector that rotates everything by some angle \(\theta\)

theta <- pi/4 # this is 45 degrees counter-clockwise in radian-speak
M.rotate <- rbind(c(cos(theta),-sin(theta)), 
                  c(sin(theta), cos(theta)))

Here’s a rotation matrix of that first vector. Note - there’s a lot of transforming that has to happen!:

x2 <- t(M.rotate %*% t(x))
plot(0,0,xlim = c(-2,2), asp = 1, ylim = c(-2,2))
plotarrow(x)
plotarrow(x2, col = "red")

You can go round and round and round:

narrows <- 100
palette(gplots::rich.colors(narrows))
theta <- 2*pi/narrows # this is 45 degrees counter-clockwise in radian-speak
M.rotate <- rbind(c(cos(theta),-sin(theta)), 
                  c(sin(theta), cos(theta)))

x <-  rbind(c(1,0))
plot(0,0,xlim = c(-2,2), asp = 1, ylim = c(-2,2))
for(i in 1:narrows){
  x <- t(M.rotate %*% t(x))
  plotarrow(x, col = i)
}

Other matrices simply stretch. Perhaps in different ways across different dimensions.

d.x <- 0.9; d.y <- 0.8
M.squeeze <- rbind(c(d.x,0),c(0,d.y))

Stretch/squeeze once:

x <- rbind(c(1,1))
x2 <- t(M.squeeze %*% t(x))
plot(0,0,xlim = c(-2,2), asp = 1, ylim = c(-2,2))
plotarrow(x)
plotarrow(x2, col = 2)

Lets do this a bunch of times:

narrows <- 20
palette(gplots::rich.colors(narrows))
x <-  rbind(c(1,1))
plot(0,0,xlim = c(0,1), asp = 1, ylim = c(0,1))
for(i in 1:20){
  x <- t(M.squeeze %*% t(x))
  plotarrow(x, col = i)
}

Q. What happens if we start with the vector (1,0)? Or the vector (0,1)?

Nice! We can combine both of these transformations into a single Squeezing & Rotating matrix. To do this, you matrix multiple the matrices:

narrows <- 20
palette(gplots::rich.colors(narrows))
theta <- 2*pi/narrows # this is 45 degrees counter-clockwise in radian-speak
M.rotate <- rbind(c(cos(theta),-sin(theta)), c(sin(theta), cos(theta)))
M <- M.rotate %*% M.squeeze
x <-  rbind(c(1,1))
plot(0,0,xlim = c(-.2,1), asp = 1, ylim = c(-.2,1))
for(i in 1:20){
  x <- t(M %*% t(x))
  plotarrow(x, col = i)
}

We can make a super general rotate, squeeze matrix function:

getDistortMatrix <- function(theta, d.x, d.y){
  M.rotate <- rbind(c(cos(theta),-sin(theta)), c(sin(theta), cos(theta)))
  M.squeeze <- rbind(c(d.x, 0), c(0,d.y))
  M.rotate %*% M.squeeze
}

And we can apply it in surprising ways. For example, here are some coordinates of a country:

require(mapdata)
mexico <-map("world", "Mexico", plot = FALSE) 
mexico.xy <- cbind(mexico$x, mexico$y) |> apply(2, scale)
plot(mexico.xy, type = "l")

Rotating Mexico!

## Warning in par(mar = c(0, 0, 0, 0), bty = "n", axes = FALSE, bg = "black"):
## "axes" is not a graphical parameter
n.frames <- 60
M <- getDistortMatrix(2*pi/n.frames, 1, 1)
mexico.t <- mexico.xy

for(i in 1:n.frames){
  mexico.t <- t(M %*% t(mexico.t))
  plot(mexico.t, type = "l",
       ylim = c(-2,2), xlim = c(-2,2),
       asp = 1, lwd = 2, col = "antiquewhite")
}

This Mexico is rotating and shrinking!

## Warning in par(mar = c(0, 0, 0, 0), bty = "n", axes = FALSE, bg = "black"):
## "axes" is not a graphical parameter
n.frames <- 60
M <- getDistortMatrix(3*pi/n.frames, .95, .95)
mexico.t <- mexico.xy

for(i in 1:n.frames){
  mexico.t <- t(M %*% t(mexico.t))
  plot(mexico.t, type = "l", 
       ylim = c(-2,2), xlim = c(-2,2),
       asp = 1, lwd = 2, col = i)
}

This may seem silly - BUT - underneath all of your spatial transformations of georeferenced data is more or less completely straightfoward matrix multiplication.

Eigenvalues / eigenvectors

This is the fundamental equation eigenvalue equation:

\[M {\bf v} = \lambda {\bf v}\]

In words this means that for some (not all!) SQUARE matrices, there is a magic vector (the eigenvector) which - when the matrix is applied - the effect is ONLY to distort it linearly.

This is important! This is not the first time you’ve seen \(\lambda\)!

A few points:

  1. there are (technically) as many eigenvector - eigenvalue pairs as there are columns/rows in M. Usually we only care about the first few.
  2. The applications of “eigen-values” are surprising and many.

Our friend the “distortion/squeeze” matrix has a real eigenvector and eigenvalue:

M.squeeze <- rbind(c(.9,0),
                   c(0,.8))
M.squeeze
##      [,1] [,2]
## [1,]  0.9  0.0
## [2,]  0.0  0.8
eigen(M.squeeze)
## eigen() decomposition
## $values
## [1] 0.9 0.8
## 
## $vectors
##      [,1] [,2]
## [1,]   -1    0
## [2,]    0   -1

What does this mean? It means there are two eigenvector / eigenvalue pairs. One is: (-1, 0) - paired with the value 0.9, and the other is (0, -1) paired with the value 0.8.

Let look at the first eigenvector:

x <- rbind(c(-1,0)) # note ... this has to be a "matrix" for everything to work

plot(0,-1, xlim = c(-1,1), asp = 1, ylim = c(-1,1), type = "l")
plotarrow(x)

What happens when we apply the matrix to this vector?

M.squeeze %*% t(x)
##      [,1]
## [1,] -0.9
## [2,]  0.0

It shrank! By exactly the first eigenvalue \(\lambda = 0.9\)! Same for the second pair:

M.squeeze %*% cbind(c(0,-1))
##      [,1]
## [1,]  0.0
## [2,] -0.8

In practice what this means is that any vector along any of the two axes will always be only shrunk by these two respective numbers - that the “distortion” matrix becomes just a “stretching factor”. Why this is important will make sense - maybe - later.

Part III: Population ecology

The most fundamental/intuitive interpretation (for an ecologist witih any intuition for population biology) is with respect to Leslie matrices, and how they “transform” an age-structured population. For now, just check out these population ecology slides here:

https://eligurarie.github.io/EFB370/lectures/ModuleIII/PopulationStructure_PartII.html

Part IV: Principle Components

… are basically the eigenvalue - eigenvector combinations of variance-covariance matrices. What the heck does that mean!? As usual, examples are our friends.

So … back to our iris data. 3

require(scales)
cols <- c("orange","purple","green")
data(iris)
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

To decompose things properly, it is important to scale the variables you are ordinating. See how I use apply to do that below:

X.iris <- as.matrix(iris[,1:4] |> apply(2, scale))
pairs(X.iris, col = alpha(cols[iris$Species], .4), 
      pch = 19, upper.panel = NULL)

We will do a principle component analysis by hand. The goal will be to “rotate” the data along PC axes which do the best job accounting for correlated covariates.

1. the autocorrelation matrix

We now obtain the correlation matrix:

M.corr <- cor(X.iris)
fields::image.plot(x = 1:4, y = 1:4, M.corr, yaxt = "n", ylab = "", 
           asp = 1, main = "correlation matrix of iris data")
mtext(side = 2, at = 1:4, colnames(X.iris), las = 1)

You can see that the correlation is very high for petal length & width, pretty high for sepal length, but negative for sepal width.

2. Compute eigenvalues and eigenvectors

Here’s the decomposition:

eigen(M.corr)
## eigen() decomposition
## $values
## [1] 2.91849782 0.91403047 0.14675688 0.02071484
## 
## $vectors
##            [,1]        [,2]       [,3]       [,4]
## [1,]  0.5210659 -0.37741762  0.7195664  0.2612863
## [2,] -0.2693474 -0.92329566 -0.2443818 -0.1235096
## [3,]  0.5804131 -0.02449161 -0.1421264 -0.8014492
## [4,]  0.5648565 -0.06694199 -0.6342727  0.5235971

Just to confirm that this is “true” according to the eigenvalue theorem:

(lambda1 <- eigen(M.corr)$value[1])
## [1] 2.918498
(v1 <- eigen(M.corr)$vector[,1])
## [1]  0.5210659 -0.2693474  0.5804131  0.5648565
lambda1 * v1
## [1]  1.5207297 -0.7860899  1.6939344  1.6485326
M.corr %*% v1
##                    [,1]
## Sepal.Length  1.5207297
## Sepal.Width  -0.7860899
## Petal.Length  1.6939344
## Petal.Width   1.6485326

Perfect. So there is a linear combination of these variables that when the correlation matrix is “applied” to it - simply expands the values of those vectors by a pretty large value (x 2.9).

Maybe that’s not too intuitive, but in practice it means that if we combine those four variables according to the ratios described in that eigenvector, you will have stretched the data out maximally along a core axis of those data. Or, more specifically, that you’ve found the “axis” along which the variance of the observations is maximized.

3. Compute all four principle components

Principal components are new variables that are constructed as linear combinations or mixtures of the initial variables, and those mixtures are in the eigenvector.

So, let’s take the first eigenvector:

(eigen1 <- eigen(M.corr)$vector[,1])
## [1]  0.5210659 -0.2693474  0.5804131  0.5648565

If we multiply these weightings by all the covariates and add them up (exactly - matrix multiplication) we “transform” the data into an orthogonalized version of the data:

PC1 <- X.iris %*% t(t(eigen1))

PC1 is a vector of the same length as the data, but it has the largest possible variance of any linear combination of those vectors!

Check it:

sd(PC1)
## [1] 1.708361

And the standard deviation of all our (scaled) variables, is just 1.

We can get all 4 components by doing all the multiplication and addition all at once with …. Matrix Multiplication!

PC <- X.iris %*% eigen(M.corr)$vector
colnames(PC) <- paste0("PC", 1:4)

Compare this pair of pairs plots:

pairs(PC, upper.panel = NULL, asp = 1, pch = 19, 
      col = alpha(cols[iris$Species], .5))

pairs(X.iris, upper.panel = NULL, pch = 19, 
       col = alpha(cols[iris$Species], .5))

The correlation among the principle components is now - basically - zero:

cor(PC) |> round(1e-10)
##     PC1 PC2 PC3 PC4
## PC1   1   0   0   0
## PC2   0   1   0   0
## PC3   0   0   1   0
## PC4   0   0   0   1

4. Homemade ordination plot

When we plot these (& color-code by species), we see how well they separate:

plot(PC[,1], PC[,2], col = alpha(cols[iris$Species], .3), 
     pch = 19, asp = 1)

You can see how that 1st component has radically separated the data orthogonally along an axis (even though it was naive to species).

We can also see how each of the measured traits (singly) maps on to these new principle component axes … that’s simply their respective weights. Thus:

plot(PC[,1], PC[,2], col = alpha(cols[iris$Species], .3), 
     pch = 19, asp = 1)
weights1 <- eigen(M.corr)$vector[,1]
weights2 <- eigen(M.corr)$vector[,2]

arrows(rep(0,4),rep(0,4),weights1, weights2, col = "red",length = 0.05)
text(weights1, weights2, colnames(X.iris), cex = 0.7, col = "darkred", font = 3)

Petal length and petal width do something very very similar - they are so correlated it is hardly worth using both for any reason. Sepal length is, similar, but a bit deeper in the 2nd component, and sepal width is totally its own thing.

This thing we

5. Compare to a “PCA”

There are lots of options for performing a principle components analysis in R. Just in base R there are two(!): princomp and prcomp. They are confusing, but trust me - they all just do things that eigen also does. Thus:

prcomp(X.iris)
## Standard deviations (1, .., p=4):
## [1] 1.7083611 0.9560494 0.3830886 0.1439265
## 
## Rotation (n x k) = (4 x 4):
##                     PC1         PC2        PC3        PC4
## Sepal.Length  0.5210659 -0.37741762  0.7195664  0.2612863
## Sepal.Width  -0.2693474 -0.92329566 -0.2443818 -0.1235096
## Petal.Length  0.5804131 -0.02449161 -0.1421264 -0.8014492
## Petal.Width   0.5648565 -0.06694199 -0.6342727  0.5235971

This output is the standard deviations of the four components, and their actual weightings. We can compute those standard deviations directly from our principle component matrix above:

apply(PC, 2, sd)
##       PC1       PC2       PC3       PC4 
## 1.7083611 0.9560494 0.3830886 0.1439265

So they agree. Plotting these ordinations I find tricky to control (an argument for computing your out PCA’s). But this kind of works:

row.names(X.iris) <- substring(iris$Species, 1, 2)
biplot(princomp(X.iris))

Note that this ordination is flipped sign-wise (in the second component) relative to the one we did - that’s completely arbitrary.

Recall - again - that the standard deviation of the standardized raw covariates was always 1. So the first two principle components really absorbed pretty much ALL of the variation in the data.

See how the ordiplot (which I find a bit awkward to use) compares to our ordination plot above:

vegan::ordiplot(prcomp(X.iris), type = "n") |> 
  points("sites", col = alpha(cols[iris$Species], .3), pch = 19) |> 
  text("species", arrows = TRUE, col = "red")


  1. bold-faced means specifically \(n\times1\) vector, capitals refer to matrices↩︎

  2. which are, of course, not usually or even primarily errors, but more often unexplained variation.↩︎

  3. Note that while we usually use ordination in ecology to understand abundances or counts of many, many species (columns) across sites (rows), in this example, we will be understanding how the measurements (sepal width, sepal length, petal width, petal length) are distributed across indviduals. That’s not fundamentally important, but it’s confusing since so many ordinations assume the columns represent species.↩︎