There are THREE exercises in this lab to submit by midnight Friday, March 1.
The paramecium.csv
data set
contains observations of paramecia (unicellular ciliates) densities (in
unclear units) measured daily in a petri dish with limited
resources.
paramecium <- read.csv("paramecium.csv")
Look at the columns in this data:
head(paramecium)
## Species Day Nt Nt1 dN r
## 1 Paramecium caudatum 2 5.849284 NA NA NA
## 2 Paramecium caudatum 3 13.672890 5.849284 7.823607 1.3375324
## 3 Paramecium caudatum 4 NA 13.672890 NA NA
## 4 Paramecium caudatum 5 35.697880 NA NA NA
## 5 Paramecium caudatum 6 73.947840 35.697880 38.249959 1.0714910
## 6 Paramecium caudatum 7 103.855191 73.947840 29.907352 0.4044385
Note the output of the following commands:
nrow(paramecium)
ncol(paramecium)
dim(paramecium)
names(paramecium)
The columns are:
Species
Day
of experimentNt
- number (or density) of paramecium nowNt1
- number one time step previous $N_{t-1Open the script file logisticfitscript.R
(you can
click on the file name, and either download, or just paste into a NEW
script). It contains just the following code:
paramecium <- read.csv("paramecium.csv")
N.logistic <- function(x, N0, K, r0) K/(1 + ((K - N0)/N0)*exp(-r0*x))
# try different values!
N0 <- 10
K <- 150
r0 <- 1
plot(Nt ~ Day, data = paramecium)
curve(N.logistic(x, N0, K, r0), add = TRUE, col = 2, lwd = 2)
The initial values guessed for N0
, K
, and
r0
are not very good.
Lab Exercise 1
Play around with those numbers and see how close you can come (visually) to fitting the curve. Report your estimates for \(N_0\), \(K\), and \(r_0\) and include a plot of the fit.
Lets plot the three figures we talked about in class: \(N\) as a function of time, \(\Delta N\) as a function of \(N\), and \(r\) as a function of \(N\)
These roughly show the patterns we expect of a logistic growth model: sigmoidal growth in time, parabolic growth as a function of density, and linear decrease in growth as a function of density.
One form of the logistic growth model simply says that:
\[ r(N) = r_0 - {r_0 \over K} N\]
So if we do a straightforward linear regression, we can estimate both \(r\) and \(K\):
lm(r ~ Nt1, data = paramecium)
##
## Call:
## lm(formula = r ~ Nt1, data = paramecium)
##
## Coefficients:
## (Intercept) Nt1
## 1.088436 -0.005205
This suggests that r0 is roughly 1.09. We write: \[\widehat{r_0} = 1.09\]
What does the slope tell us? Well - we can convert that to a carrying capacity, since: \[\beta = -{r_0 \over K}\].
Solving for \(\widehat{K}\) we get: \[\widehat{K} = 1.088 / 0.0052 = 209.1\].
Here is a plot of that linear model:
plot(r~Nt1, data = paramecium, main = "Growth rate over time")
abline(lm(r ~ Nt1, data = paramecium), col = 2, lwd = 2)
Lab Exercise 2a
Generate a plot of this fit of logistic growth over time on top of the paramecium data by using exactly the code in Exercise 1, but plugging in the estimates for \(K\) and \(r_0\). You’ll have to put in your own guess for \(N_0\). What is that guess?
Note - we can - again - get standard errors around these coefficients:
r.lm <- lm(r ~ Nt1, data = paramecium)
summary(r.lm)
##
## Call:
## lm(formula = r ~ Nt1, data = paramecium)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.30712 -0.07164 0.02884 0.11450 0.27954
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.0884356 0.1218525 8.932 4.43e-06 ***
## Nt1 -0.0052052 0.0007293 -7.137 3.15e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1899 on 10 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.8359, Adjusted R-squared: 0.8195
## F-statistic: 50.94 on 1 and 10 DF, p-value: 3.153e-05
Lab Exercise 2b
Report a point estimate and 95% confidence interval for the intrinsic growth rate \(r_0\).
Overall this method is … Ok. But, it doesn’t provide an estimate for \(N_0\)
One method of fitting a curve is to use the “least squares” method, i.e. find the parameters for which the squared deviation of the data from the curve is minimized. This is an optimization problem that the computer solves numerically.
Here’s a fun illustration / simulation of how “least squares” works: https://seeing-theory.brown.edu/regression-analysis/index.html
The function that does this in R is called nls()
. It is
“non-linear”, unlike the regression model above. To do this, we have to
use our function for the solution of the logistic curve and provide
initial values. But the basic syntax is similar in that the response is
Nt
and the predictor is the logistic function of time
(Day
), with the parameters to be estimated following that
Day
. And the formula symbol is always ~
.
N.logistic <- function(x, N0, K, r0) K/(1 + ((K - N0)/N0)*exp(-r0*x))
nls(Nt ~ N.logistic(Day, N0, K, r0), data = paramecium,
start = list(N0 = 1, K = 200, r0 = 0.75))
## Nonlinear regression model
## model: Nt ~ N.logistic(Day, N0, K, r0)
## data: paramecium
## N0 K r0
## 1.3087 222.3641 0.7075
## residual sum-of-squares: 2172
##
## Number of iterations to convergence: 8
## Achieved convergence tolerance: 8.406e-06
As usual, we can save the results to an object and get a summary table with the fits:
summary(nls(Nt ~ N.logistic(Day, N0, K, r0), data = paramecium,
start = list(N0 = 1, K = 200, r0 = 0.75)))
##
## Formula: Nt ~ N.logistic(Day, N0, K, r0)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## N0 1.30870 0.85195 1.536 0.148
## K 222.36412 5.22413 42.565 2.4e-15 ***
## r0 0.70747 0.09028 7.836 2.8e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.92 on 13 degrees of freedom
##
## Number of iterations to convergence: 8
## Achieved convergence tolerance: 8.406e-06
## (8 observations deleted due to missingness)
Lab Exercise 3
- Generate a plot of this third fit using exactly the code in Exercise 1, but plugging in estimates for \(K\), \(r_0\) and \(N_0\).
- Report 95% confidence intervals around all three estimates
- Which is the “best” method? Why?