To work through this lab, you’ll need to download the SeaOtters.csv dataset by following the links (also available on Blackboard). For the exercise, you’ll also need to download BC_SeaOtters.csv. If you’ve forgotten how, follow the instructions from the previous lab.

The TLDR version of this lab is: lm(log(Y)~X) will give you the fitted estimates of an exponential model.

SeaOtters <- read.csv("SeaOtters.csv")
the sea otter teaches us not to be afraid to ask for help!

1 Loading and plotting the Sea Otters

Load the sea otter data. It has two columns: year and count:

SeaOtters

and plot the data. There are several ways to do this (as you might remember). ALl the following lines of code to the same thing:

## plot(SeaOtters$year, SeaOtters$count)
## plot(SeaOtters$count~SeaOtters$year)

look closely at the at code and understand the difference in syntax … basically that plot(X,Y) is the same as plot(Y~X). Given our goal of modeling, the best version is this:

plot(count~year, data = SeaOtters)

This says (in English) plot count (response) against year (the predictor) using the data frame SeaOtters.

It is quite easy to make plots on a log scale, by the way, which can be useful:

plot(count ~ year, data = SeaOtters, log = "y")

NOTE: You can skip the section below as it is non-essential. But arguably “fun”.

1.1 Installing packages and ggplot

Or, if you like, you can use the fancy-shmancy ggplot function to make plots. But you have to install a new “library” called ggplot2. To install a library, you use the install.packages() function:

install.packages("ggplot2")

You only need to install a package once on your computer. After that, it will always be there.

Alternatively, you can use the R interface to install new packages. To do so:

  • Click the Packages tab in the bottom right window of Rstudio and click Install
  • … or … go to Tools > Ìnstall Packages in the menu bar.
  • In the blank “packages” line, start typing the name of the package you want. RStudio should automatically fill this in with suggestions. Use a comma to separate the names of multiple package if installing more than one simultaneously.
  • Click Install. This will automatically put the correct code in the console for you.

From now on, whenever you need that package loaded you use the library function. You only have to do this once each time you open R, after that, a package stays in your library for the rest of your session.

library(ggplot2)

If you receive prewritten code from someone and open the file in RStudio, the software will automatically identify packages called by the library funtion. If you do not have these packages installed, Rstudio will ask you if you’d like to install them, which can save you time. However, you will still need to run the library function on each package before you can use them. 0 Now - finally - you have access to the funky / fancy library of ggplotting functions.

The ggplot function allows for some more flexible plotting, but (at least initialy) with more complicated syntax. For example, the equivalent of the plot above is:

ggplot(SeaOtters, aes(x = year, y = count)) + geom_point()

Now it is easy to add other features. Unfortunately, this data set only has the two columns, but you can still have some fun. For example:

ggplot(SeaOtters, aes(x = year, y = count, color = count, size = year)) + geom_point()

2 Fitting a simple linear Model

Let’s first start by fitting a straightforward linear model, which is to say we’re looking for values of \(\alpha\) (intercept) and \(\beta\) (slope) in this equation:

\[Y_i = \alpha + \beta\, X_i + \epsilon_i\].

where \(X\) is a predictor, and \(Y\) is a response, and \(\epsilon\) is unexplained (residual) variation. Remember, that the syntax is exactly the same as for plotting:

otter.fit <- lm(count ~ year, data = SeaOtters)

Note, we take a “model fit” (which is a complicated thing) and we put it into a new object called otter.fit. The output is just the estimated parameters for the intercept and the “Year effect” (i.e. slope of the curve).

otter.fit
## 
## Call:
## lm(formula = count ~ year, data = SeaOtters)
## 
## Coefficients:
## (Intercept)         year  
##   -65751.54        33.19

Again, there is a lot more information if you ask for a summary:

summary(otter.fit)
## 
## Call:
## lm(formula = count ~ year, data = SeaOtters)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -152.74  -78.18  -58.79   34.71  417.48 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -65751.54    6158.81  -10.68 2.19e-10 ***
## year            33.19       3.08   10.78 1.83e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 142.4 on 23 degrees of freedom
## Multiple R-squared:  0.8347, Adjusted R-squared:  0.8275 
## F-statistic: 116.1 on 1 and 23 DF,  p-value: 1.83e-10

Of importance to us is the Estimate (same as above) and the Std. Error which quantifies how precise our estimate is. Two standard errors is (roughly) the same as a 95% Confidence Interval. Thus, the slope on this fit is 33.2 new sea otters / year ± 6.18. This is definitely bigger than 0! We can also see that because the p-value (under column Pr(>|t|)) is very very very very small.

We can plot this linear model on our data:

plot(count ~ year, data = SeaOtters)
abline(otter.fit)

What do you observe about this fit!? Is it a good one?

3 Fitting an exponential model

We kind of knew the growth was not linear (otherwise this week’s topic wouldn’t have been called Exponential Growth). But how can we use a linear model to fit an exponential growth function?

With a simple transformation!

\[\log(Y_i) = \alpha + \beta\, X_i + \epsilon_i\]

How does this relate to an exponential growth model?

\[e^{\log(Y_i)} = e^{\alpha + \beta\, X_i + \epsilon_i}\] \[Y_t = N_0 e^{\beta\, X_t + \epsilon_t}\]

We replaced \(e^\alpha\) with \(N_0\), and \(i\) with \(t\). But \(\beta\) is EXACTLY the growth rate \(r\) (and \(e^\beta = \lambda\)). The \(\epsilon_i\) hanging on the end is a bit of randomness

Anyways, doing this with our linear modeling tools is easy. We just wrap count in a log() function.

logotter.fit <- lm(log(count) ~ year, data = SeaOtters)
summary(logotter.fit)
## 
## Call:
## lm(formula = log(count) ~ year, data = SeaOtters)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.191084 -0.062944 -0.005104  0.055518  0.231704 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.402e+02  4.732e+00  -29.63   <2e-16 ***
## year         7.325e-02  2.367e-03   30.95   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1094 on 23 degrees of freedom
## Multiple R-squared:  0.9766, Adjusted R-squared:  0.9755 
## F-statistic: 958.1 on 1 and 23 DF,  p-value: < 2.2e-16

Note how our statistics changed. Specifically: our \(\beta\) estimate (which is also \(r\)) is now 0.07325, which is a good estimate of the intrinsic growth rate. Even better, we now have a standard error around that estimate: 0.00237, which means our growth rate can be written as: \(\widehat{r} = (7.32 \pm 0.47)\times 10^{-2}\). We can convert that to an annual growth rate in R as well (do NOT worry about understanding the following code:

r.hat <- summary(logotter.fit)$coefficients[2,1]
r.sd <- summary(logotter.fit)$coefficients[2,2]
exp(c(r.hat, r.hat-2*r.sd, r.hat+2*r.sd))
## [1] 1.076001 1.070920 1.081106

So our estimate of annual population growth is 7.6% (95% Confidence Interval 7.1%-8.1%). Note how much higher the \(R^2\) value is! This is (by that measure also) a much better fit.

4 Visualizing the exponential fit

To visualize the fit:

plot(log(count) ~ year, data = SeaOtters)
abline(logotter.fit)

Much better!

Let’s make our final graph a little nicer

plot(log(count) ~ year, main = 'Sea Otter Population', 
     col = 'brown', pch = 4, data = SeaOtters, lwd = 2)
abline(logotter.fit, col = "blue", lwd = 2)

The commands we’re using within the plot function include setting the title of the graph (main =), changing the color of the symbols and the line (col =), and changing the shape of the symbols (pch =). There also many more alterations you can make to get your graph perfect - but that can be a deep dark Rabbit hole!

Finally, we can also draw the fit on non-log transformed data as follows:

plot(count ~ year, main = 'Sea Otter Population', 
     col = 'brown', pch = 4, data = SeaOtters, lwd = 2)
curve(exp(logotter.fit$coef[1] + 
            logotter.fit$coef[2] * x), col = "blue", add = TRUE, lwd = 2)

5 Lab Exercise

Analyze the population growth rate of sea otters in British Columbia, the population just to the north of Washington state. The data are here in a file called BC_SeaOtters.

  1. Load the data
  2. plot the data,
  3. report how many were released, in what year, how many are there in the final year of data,
  4. estimate the growth rate with confidence intervals.

NOTE: You can skip the section below as it is non-essential. But arguably “fun”.

5.1 More ggplot plotting

It can be a bit fussy to plot the confidence interval around our prediction. Below, we include a little snippet of code which does just that using ggplot. You do not need to understand this code (there’s actually quite a bit going on to get it to work right), but the output is pretty.

require(ggplot2)

ggplot(SeaOtters, aes(year, count)) + geom_point() + 
    stat_smooth(method = "glm", 
                method.args = list(family = gaussian(link = "log")))
## `geom_smooth()` using formula = 'y ~ x'