Note: this is an unfinished lecture on the utility of matrices in ecology for a Practical Seminar in Quantitative Wildlife Ecology graduate level course at SUNY-ESF. A theme of the class is looking under the hood of various statistical operations and tools - and at times building our own engines. WIth that in mind, matrices come up over and over again in various applications, and this is an attempt to bring together several “magical” ways that matrices concretely solve problems, fit models, and perform other useful operations on data. This includes: linear regression, principle component analysis, spatial transformations, and population modeling … so really a lot of what we do!
In matrix notation, the ordinary least squares (OLS) estimates of simple linear regression and factorial analysis is a straightforward generalization of:
\[ {\bf y} = {\bf X} \beta^T + {\bf \epsilon} \]
Here, \({\bf y}\) is a \(n \times 1\) vector of observations1, \(\beta^T\) represents a \(1 \times k\) of regression coefficients (intercepts, group means, regression coefficients, etc.), \(X\) is an \(n \times k\) “design matrix” for the model (more on this later), and \({\bf \epsilon}\) is a vector of random “errors”2.
To sink into \(X\), the design matrix, a bit, consider the familiar intercept / slope model (\({\bf y} = \alpha + \beta {\bf x} + {\bf \epsilon}\)). Here \(X\) will have TWO columns \((1, {\bf x})\), one for the intercept, the second for the covariate \(x\). An even SIMPLER case is to only have an intercept column, where every element is = 1. This is equivalent to just estimating a mean and standard deviation. Walking through the examples below may provide some intuition for why a column of 1’s is an “intercept”.
And never forget that our goal is to estimate all the \(\beta\) values (and also the ever-forgotten \(\sigma\) standard deviation of the \(\epsilon\)).
Just a bit of algebra & calculus here. We really want to minimize those residuals \(\epsilon\), which we do by taking the derivative and setting to 0. So:
\[{\partial {\bf \epsilon} \over \partial\beta} = {\partial \left( {\bf y} - X \beta^T \right) \over \partial \beta}\]
It seems crazy that this is even possible, but once you get used to it - matrix / vector calculus is pretty similar to scalar (number) calculus. In any case, you CAN take that derivative, and you CAN set it equal to 0, and you CAN solve for \(\beta\) (even though it’s a vector and X is a FREAKING MATRIX[!!]) and you get a solution that looks like this:
\[\hat{\beta} = \left(\mathbf{X}^T \mathbf{X}\right)^{-1} \mathbf{X}^T y\]
This is general, and computable (i.e. no need to numerically look for a solution). But it does need some picking apart and explaining. We’ll do some simple examples below.
If there are no covariates, the model is simply:
\[Y \sim {\cal N}(\beta_0, \sigma^2)
\]
(Obviously, \(\beta_0\) = \(\mu\) = \(\alpha\), as the intercept is the mean)
The OLS estimate of \(\beta_0\) was: \[\widehat{\beta_0} = {1\over n} \sum_{i=1}^n Y_i\]
Let’s try to apply the more general solution: \[ \hat{\beta}=\left(\mathbf{X}^T\mathbf{X}\right)^{-1}\mathbf{X}^Ty\]
The design matrix \(X\) here is just a column vector of 1’s. In the following example, assume a vector \(Y = (1,2,3)\), and \(X = \left( \begin{array}{c}1\\1\\1\end{array} \right)\). In R:
(Y <- 1:3)
## [1] 1 2 3
(X <- matrix(1, nrow=3))
## [,1]
## [1,] 1
## [2,] 1
## [3,] 1
Matrices are basically just a very convenient / powerful shorthand to do bookkeeping and collapse sums on multi-dimensional vectors. You can define a \(n \times k\) matrix \(A\) just by enumerating the row and column elements \(A_{ij}\), where \(i \in {1,2,..,n}\) and \(j \in {1,2, ...,k}\). The following matrix operations are needed to obtain the estimates.
The transpose of \(A_{ij}\) is: \(A^T_{ij} = A_{ji}\). Thus: \(A^T\) is a \(k\times n\) matrix, and (in our example)
\(X^T = (1,1,1)\) (a \(1 \times 3\) matrix). In R, the function is
just t()
:
t(X)
## [,1] [,2] [,3]
## [1,] 1 1 1
Matrix multiplication: The product of a \(n \times m\) matrix \(A\), consisting of elements \(a_{ij}\) multiplied by an \(m \times p\) matrix \(B\), consisting of elements \(b_{ij}\) is a square \(m \times m\) matrix, the elements of which are: \[(AB)_{ij} = \sum_{k=1}^m a_{ik}\,b_{kj}\]
So, \(X^T X\) is a \(1 \times 1\) matrix, the element of which
is \(\sum_{k=1}^3 X^T_{1,k} X_{k,1} = 1+1+1 =
3\). In R, matrix multiplication (and other matrix operations)
are denotes with a %
delimeter. Thus:
t(X) %*% X
## [,1]
## [1,] 3
Compare with:
X %*% t(X)
## [,1] [,2] [,3]
## [1,] 1 1 1
## [2,] 1 1 1
## [3,] 1 1 1
And with
t(X) * X
## Error in t(X) * X: non-conformable arrays
The inverse of a matrix: \(A^{-1}\) is defined according to the following identity: \[A^{-1} A = A \times A^{-1} = I\] Where \(I\) is the identity matrix, or a diagonal vector of 1’s. The inversion of a matrix is a somewhat slippery and subtle thing. It is also an important step in lots of statistical calculations, and somewhat computationally intensive. If a regression procedure is slow, the inversion of the matrix is almost always the computationally limiting step. For example, inverting a \(1000 \times 1000\) matrix takes a noticable amount of time (several seconds on my machine):
system.time(solve(matrix(rnorm(1e6),nrow=1e3)))
## user system elapsed
## 1.548 0.026 1.577
Ok, back to our case. The inverse of \(X^T
X\) is the matrix which returns the identity matrix when
mutiplied by \(X^T X\) . Since \(X^T X = 3\), its inverse is just \(1/3\). In R, the inverse is obtained with
the solve()
function:
solve(t(X)%*%X)
## [,1]
## [1,] 0.3333333
We multiply this back by \(X^T\) to obtain for the “hat matrix” \(H\) (which converts \(Y\) to \(\widehat{Y}\)). This product (of a \(1\times1\) and a \(1\times 3\) matrix) just carries the 1/3 across all the elements of \(X^T\):
solve(t(X) %*% X) %*% t(X)
## [,1] [,2] [,3]
## [1,] 0.3333333 0.3333333 0.3333333
When we multiply this by \(Y\) (the last step), you are performing the sum - that is part of the definition of the sample mean - but it weighted by the 1/n term, thus: \[ \left(\mathbf{X}^T\mathbf{X}\right)^{-1}\mathbf{X}^T \, Y = H Y = {1\over 3} y_1 + {1\over 3} y_3 + {1\over 3} y_3 = {1\over3}(1+2+3) = 2.\] In R:
(beta <- solve(t(X)%*%X)%*%t(X)%*%Y)
## [,1]
## [1,] 2
I.e. \(\widehat{\beta_0} = \widehat{\mu} = \overline{X} = 2\). Voilà!
Note that that estimate of the variance on the residuals, expressed generally, is:
\[ \widehat{\sigma^2} = {1 \over n-k} (Y -
\widehat{Y})^T (Y - \widehat{Y})\]
In this case, \(k=1\), and the product
of the two matrices is just the regular sum of squared residuals in the
definition of the sample variance:
\[ \widehat{\sigma^2} = {1 \over n-1} \sum_{i=1}^n (Y_i - \widehat{Y})^2\]
Thus, in R:
H <- solve(t(X)%*%X)%*%t(X)
beta.hat <- H%*%Y
Y.hat <-X%*%beta.hat
(sigma.hat <- t(Y-Y.hat) %*% (Y - Y.hat) / (nrow(X) - ncol(X)))
## [,1]
## [1,] 1
Which is, of course, equal to:
sd(Y)
## [1] 1
Obviously, the advantage of the matrix notation is its great generality. We can perform a linear regression of \(Y = 1,2,3\) against some covariate \(X = -1,0,1\) using the exact same code. We need only to set up the design matrix:
(X <- cbind(c(1,1,1), -1:1))
## [,1] [,2]
## [1,] 1 -1
## [2,] 1 0
## [3,] 1 1
(H <- solve(t(X)%*%X)%*%t(X))
## [,1] [,2] [,3]
## [1,] 0.3333333 0.3333333 0.3333333
## [2,] -0.5000000 0.0000000 0.5000000
(beta.hat <- H%*%Y)
## [,1]
## [1,] 2
## [2,] 1
Y.hat <-X%*%beta.hat
(sigma.hat <- t(Y-Y.hat) %*% (Y - Y.hat) / (nrow(X) - ncol(X)))
## [,1]
## [1,] 0
So the regression coefficients are \(\beta_0 = 2\) and \(\beta_1 = 1\), and \(\sigma^2 = 0\). Obviously, that’s correct, but if you need proof:
plot(X[,2],Y, cex=2, pch=19)
abline(t(beta.hat), col=2, lwd=2)
Maybe you’re unimpressed because that’s such a fake example? Here’s an all-matrix regression analysis of the famous iris datas set:. The figure shows petal length as a response variable against sepal length and species in interaction.
require(ggplot2)
data(iris)
ggplot(iris, aes(Sepal.Length, Petal.Length, col = Species)) + geom_point() + geom_smooth(method = "lm")
What are the regression coefficients? Here’s your response:
y <- iris$Petal.Length
here’s your design matrix:
X <- with(iris,
cbind(Sepal.Length = iris$Sepal.Length,
setosa = Species == "setosa",
versicolor = Species == "versicolor",
virginica = Species == "virginica"))
Here’s how that matrix looks:
head(X)
## Sepal.Length setosa versicolor virginica
## [1,] 5.1 1 0 0
## [2,] 4.9 1 0 0
## [3,] 4.7 1 0 0
## [4,] 4.6 1 0 0
## [5,] 5.0 1 0 0
## [6,] 5.4 1 0 0
Here are the regression coefficients:
(beta.hat <- solve(t(X) %*% X) %*% t(X) %*% y)
## [,1]
## Sepal.Length 0.6321099
## setosa -1.7023422
## versicolor 0.5077956
## virginica 1.3876599
Compare those to the output of lm
:
(mylm <- lm(Petal.Length ~ Sepal.Length + Species-1, data = iris))
##
## Call:
## lm(formula = Petal.Length ~ Sepal.Length + Species - 1, data = iris)
##
## Coefficients:
## Sepal.Length Speciessetosa Speciesversicolor Speciesvirginica
## 0.6321 -1.7023 0.5078 1.3877
Voilà!
Note that that matrix X
is created by the formula
command in R y ~ X
, and can be returnes with the very
useful model.matrix()
function. Thus:
model.matrix(mylm) |> head()
## Sepal.Length Speciessetosa Speciesversicolor Speciesvirginica
## 1 5.1 1 0 0
## 2 4.9 1 0 0
## 3 4.7 1 0 0
## 4 4.6 1 0 0
## 5 5.0 1 0 0
## 6 5.4 1 0 0
What about the model variance? I.e. that hidden \(\widehat{\sigma^2}\) that everyone always forgets about?
y.hat <- X %*% beta.hat
(sigma.hat <- t(y-y.hat) %*% (y - y.hat) / (nrow(X) - ncol(X)))
## [,1]
## [1,] 0.07984347
And that little variance is in the model output here:
summary(mylm)$sigma^2
## [1] 0.07984347
Huzzah!
Mini-challenge: How would you write up the design matrix the interaction model, where there is a unique slope for each species?
A square (\(k \times k\)) matrix distorts a vector (or matrix) of length (or first dimension) \(k\) by twisting and stretching.
Here’s a 2D vector:
x <- rbind(c(1,1)) # note ... this has to be a "matrix" for everything to work
plot(0,0, xlim = c(-1,1), asp = 1, ylim = c(-1,1))
plotarrow <- function(x, l = .1, ...)
arrows(0,0, x[1], x[2], length = l, ...)
plotarrow(x)
here’s a vector that rotates everything by some angle \(\theta\)
theta <- pi/4 # this is 45 degrees counter-clockwise in radian-speak
M.rotate <- rbind(c(cos(theta),-sin(theta)),
c(sin(theta), cos(theta)))
Here’s a rotation matrix of that first vector. Note - there’s a lot of transforming that has to happen!:
x2 <- t(M.rotate %*% t(x))
plot(0,0,xlim = c(-2,2), asp = 1, ylim = c(-2,2))
plotarrow(x)
plotarrow(x2, col = "red")
You can go round and round and round:
narrows <- 100
palette(gplots::rich.colors(narrows))
theta <- 2*pi/narrows # this is 45 degrees counter-clockwise in radian-speak
M.rotate <- rbind(c(cos(theta),-sin(theta)),
c(sin(theta), cos(theta)))
x <- rbind(c(1,0))
plot(0,0,xlim = c(-2,2), asp = 1, ylim = c(-2,2))
for(i in 1:narrows){
x <- t(M.rotate %*% t(x))
plotarrow(x, col = i)
}
Other matrices simply stretch. Perhaps in different ways across different dimensions.
d.x <- 0.9; d.y <- 0.8
M.squeeze <- rbind(c(d.x,0),c(0,d.y))
Stretch/squeeze once:
x <- rbind(c(1,1))
x2 <- t(M.squeeze %*% t(x))
plot(0,0,xlim = c(-2,2), asp = 1, ylim = c(-2,2))
plotarrow(x)
plotarrow(x2, col = 2)
Lets do this a bunch of times:
narrows <- 20
palette(gplots::rich.colors(narrows))
x <- rbind(c(1,1))
plot(0,0,xlim = c(0,1), asp = 1, ylim = c(0,1))
for(i in 1:20){
x <- t(M.squeeze %*% t(x))
plotarrow(x, col = i)
}
Q. What happens if we start with the vector (1,0)? Or the vector (0,1)?
Nice! We can combine both of these transformations into a single Squeezing & Rotating matrix. To do this, you matrix multiple the matrices:
narrows <- 20
palette(gplots::rich.colors(narrows))
theta <- 2*pi/narrows # this is 45 degrees counter-clockwise in radian-speak
M.rotate <- rbind(c(cos(theta),-sin(theta)), c(sin(theta), cos(theta)))
M <- M.rotate %*% M.squeeze
x <- rbind(c(1,1))
plot(0,0,xlim = c(-.2,1), asp = 1, ylim = c(-.2,1))
for(i in 1:20){
x <- t(M %*% t(x))
plotarrow(x, col = i)
}
We can make a super general rotate, squeeze matrix function:
getDistortMatrix <- function(theta, d.x, d.y){
M.rotate <- rbind(c(cos(theta),-sin(theta)), c(sin(theta), cos(theta)))
M.squeeze <- rbind(c(d.x, 0), c(0,d.y))
M.rotate %*% M.squeeze
}
And we can apply it in surprising ways. For example, here are some coordinates of a country:
require(mapdata)
mexico <-map("world", "Mexico", plot = FALSE)
mexico.xy <- cbind(mexico$x, mexico$y) |> apply(2, scale)
plot(mexico.xy, type = "l")
## Warning in par(mar = c(0, 0, 0, 0), bty = "n", axes = FALSE, bg = "black"):
## "axes" is not a graphical parameter
n.frames <- 60
M <- getDistortMatrix(2*pi/n.frames, 1, 1)
mexico.t <- mexico.xy
for(i in 1:n.frames){
mexico.t <- t(M %*% t(mexico.t))
plot(mexico.t, type = "l",
ylim = c(-2,2), xlim = c(-2,2),
asp = 1, lwd = 2, col = "antiquewhite")
}
## Warning in par(mar = c(0, 0, 0, 0), bty = "n", axes = FALSE, bg = "black"):
## "axes" is not a graphical parameter
n.frames <- 60
M <- getDistortMatrix(3*pi/n.frames, .95, .95)
mexico.t <- mexico.xy
for(i in 1:n.frames){
mexico.t <- t(M %*% t(mexico.t))
plot(mexico.t, type = "l",
ylim = c(-2,2), xlim = c(-2,2),
asp = 1, lwd = 2, col = i)
}
This may seem silly - BUT - underneath all of your spatial transformations of georeferenced data is more or less completely straightfoward matrix multiplication.
This is the fundamental equation eigenvalue equation:
\[M {\bf v} = \lambda {\bf v}\]
In words this means that for some (not all!) SQUARE matrices, there is a magic vector (the eigenvector) which - when the matrix is applied - the effect is ONLY to distort it linearly.
This is important! This is not the first time you’ve seen \(\lambda\)!
A few points:
Our friend the “distortion/squeeze” matrix has a real eigenvector and eigenvalue:
M.squeeze <- rbind(c(.9,0),
c(0,.8))
M.squeeze
## [,1] [,2]
## [1,] 0.9 0.0
## [2,] 0.0 0.8
eigen(M.squeeze)
## eigen() decomposition
## $values
## [1] 0.9 0.8
##
## $vectors
## [,1] [,2]
## [1,] -1 0
## [2,] 0 -1
What does this mean? It means there are two eigenvector / eigenvalue pairs. One is: (-1, 0) - paired with the value 0.9, and the other is (0, -1) paired with the value 0.8.
Let look at the first eigenvector:
x <- rbind(c(-1,0)) # note ... this has to be a "matrix" for everything to work
plot(0,-1, xlim = c(-1,1), asp = 1, ylim = c(-1,1), type = "l")
plotarrow(x)
What happens when we apply the matrix to this vector?
M.squeeze %*% t(x)
## [,1]
## [1,] -0.9
## [2,] 0.0
It shrank! By exactly the first eigenvalue \(\lambda = 0.9\)! Same for the second pair:
M.squeeze %*% cbind(c(0,-1))
## [,1]
## [1,] 0.0
## [2,] -0.8
In practice what this means is that any vector along any of the two axes will always be only shrunk by these two respective numbers - that the “distortion” matrix becomes just a “stretching factor”. Why this is important will make sense - maybe - later.
The most fundamental/intuitive interpretation (for an ecologist witih any intuition for population biology) is with respect to Leslie matrices, and how they “transform” an age-structured population. For now, just check out these population ecology slides here:
https://eligurarie.github.io/EFB370/lectures/ModuleIII/PopulationStructure_PartII.html
… are basically the eigenvalue - eigenvector combinations of variance-covariance matrices. What the heck does that mean!? As usual, examples are our friends.
So … back to our iris data. 3
require(scales)
cols <- c("orange","purple","green")
data(iris)
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
To decompose things properly, it is important to scale the
variables you are ordinating. See how I use apply
to do
that below:
X.iris <- as.matrix(iris[,1:4] |> apply(2, scale))
pairs(X.iris, col = alpha(cols[iris$Species], .4),
pch = 19, upper.panel = NULL)
We will do a principle component analysis by hand. The goal will be to “rotate” the data along PC axes which do the best job accounting for correlated covariates.
We now obtain the correlation matrix:
M.corr <- cor(X.iris)
fields::image.plot(x = 1:4, y = 1:4, M.corr, yaxt = "n", ylab = "",
asp = 1, main = "correlation matrix of iris data")
mtext(side = 2, at = 1:4, colnames(X.iris), las = 1)
You can see that the correlation is very high for petal length & width, pretty high for sepal length, but negative for sepal width.
Here’s the decomposition:
eigen(M.corr)
## eigen() decomposition
## $values
## [1] 2.91849782 0.91403047 0.14675688 0.02071484
##
## $vectors
## [,1] [,2] [,3] [,4]
## [1,] 0.5210659 -0.37741762 0.7195664 0.2612863
## [2,] -0.2693474 -0.92329566 -0.2443818 -0.1235096
## [3,] 0.5804131 -0.02449161 -0.1421264 -0.8014492
## [4,] 0.5648565 -0.06694199 -0.6342727 0.5235971
Just to confirm that this is “true” according to the eigenvalue theorem:
(lambda1 <- eigen(M.corr)$value[1])
## [1] 2.918498
(v1 <- eigen(M.corr)$vector[,1])
## [1] 0.5210659 -0.2693474 0.5804131 0.5648565
lambda1 * v1
## [1] 1.5207297 -0.7860899 1.6939344 1.6485326
M.corr %*% v1
## [,1]
## Sepal.Length 1.5207297
## Sepal.Width -0.7860899
## Petal.Length 1.6939344
## Petal.Width 1.6485326
Perfect. So there is a linear combination of these variables that when the correlation matrix is “applied” to it - simply expands the values of those vectors by a pretty large value (x 2.9).
Maybe that’s not too intuitive, but in practice it means that if we combine those four variables according to the ratios described in that eigenvector, you will have stretched the data out maximally along a core axis of those data. Or, more specifically, that you’ve found the “axis” along which the variance of the observations is maximized.
Principal components are new variables that are constructed as linear combinations or mixtures of the initial variables, and those mixtures are in the eigenvector.
So, let’s take the first eigenvector:
(eigen1 <- eigen(M.corr)$vector[,1])
## [1] 0.5210659 -0.2693474 0.5804131 0.5648565
If we multiply these weightings by all the covariates and add them up (exactly - matrix multiplication) we “transform” the data into an orthogonalized version of the data:
PC1 <- X.iris %*% t(t(eigen1))
PC1 is a vector of the same length as the data, but it has the largest possible variance of any linear combination of those vectors!
Check it:
sd(PC1)
## [1] 1.708361
And the standard deviation of all our (scaled) variables, is just 1.
We can get all 4 components by doing all the multiplication and addition all at once with …. Matrix Multiplication!
PC <- X.iris %*% eigen(M.corr)$vector
colnames(PC) <- paste0("PC", 1:4)
Compare this pair of pairs plots:
pairs(PC, upper.panel = NULL, asp = 1, pch = 19,
col = alpha(cols[iris$Species], .5))
pairs(X.iris, upper.panel = NULL, pch = 19,
col = alpha(cols[iris$Species], .5))
The correlation among the principle components is now - basically - zero:
cor(PC) |> round(1e-10)
## PC1 PC2 PC3 PC4
## PC1 1 0 0 0
## PC2 0 1 0 0
## PC3 0 0 1 0
## PC4 0 0 0 1
When we plot these (& color-code by species), we see how well they separate:
plot(PC[,1], PC[,2], col = alpha(cols[iris$Species], .3),
pch = 19, asp = 1)
You can see how that 1st component has radically separated the data orthogonally along an axis (even though it was naive to species).
We can also see how each of the measured traits (singly) maps on to these new principle component axes … that’s simply their respective weights. Thus:
plot(PC[,1], PC[,2], col = alpha(cols[iris$Species], .3),
pch = 19, asp = 1)
weights1 <- eigen(M.corr)$vector[,1]
weights2 <- eigen(M.corr)$vector[,2]
arrows(rep(0,4),rep(0,4),weights1, weights2, col = "red",length = 0.05)
text(weights1, weights2, colnames(X.iris), cex = 0.7, col = "darkred", font = 3)
Petal length and petal width do something very very similar - they are so correlated it is hardly worth using both for any reason. Sepal length is, similar, but a bit deeper in the 2nd component, and sepal width is totally its own thing.
This thing we
There are lots of options for performing a principle components
analysis in R. Just in base R there are two(!): princomp
and prcomp
. They are confusing, but trust me - they all
just do things that eigen
also does. Thus:
prcomp(X.iris)
## Standard deviations (1, .., p=4):
## [1] 1.7083611 0.9560494 0.3830886 0.1439265
##
## Rotation (n x k) = (4 x 4):
## PC1 PC2 PC3 PC4
## Sepal.Length 0.5210659 -0.37741762 0.7195664 0.2612863
## Sepal.Width -0.2693474 -0.92329566 -0.2443818 -0.1235096
## Petal.Length 0.5804131 -0.02449161 -0.1421264 -0.8014492
## Petal.Width 0.5648565 -0.06694199 -0.6342727 0.5235971
This output is the standard deviations of the four components, and their actual weightings. We can compute those standard deviations directly from our principle component matrix above:
apply(PC, 2, sd)
## PC1 PC2 PC3 PC4
## 1.7083611 0.9560494 0.3830886 0.1439265
So they agree. Plotting these ordinations I find tricky to control (an argument for computing your out PCA’s). But this kind of works:
row.names(X.iris) <- substring(iris$Species, 1, 2)
biplot(princomp(X.iris))
Note that this ordination is flipped sign-wise (in the second component) relative to the one we did - that’s completely arbitrary.
Recall - again - that the standard deviation of the standardized raw covariates was always 1. So the first two principle components really absorbed pretty much ALL of the variation in the data.
See how the ordiplot
(which I find a bit awkward to use)
compares to our ordination plot above:
vegan::ordiplot(prcomp(X.iris), type = "n") |>
points("sites", col = alpha(cols[iris$Species], .3), pch = 19) |>
text("species", arrows = TRUE, col = "red")
bold-faced means specifically \(n\times1\) vector, capitals refer to matrices↩︎
which are, of course, not usually or even primarily errors, but more often unexplained variation.↩︎
Note that while we usually use ordination in ecology to understand abundances or counts of many, many species (columns) across sites (rows), in this example, we will be understanding how the measurements (sepal width, sepal length, petal width, petal length) are distributed across indviduals. That’s not fundamentally important, but it’s confusing since so many ordinations assume the columns represent species.↩︎