Introduction to R and Linear Models for Exponential Growth

EFB 370: Recitation II.1

Dr. Gurarie and Colton Moyer

2024-02-12

Part 1: Basics

The following examples should give you a first look at what R does and how it works.

Introduction

R is a command-line program, which means commands are entered line-by-line at the prompt. Being a programming language it is very finicky. Everything has to be entered exactly right - including case-sensitivity. So, a Plot entry is different from plot!

There are two ways of entering commands (telling R to do a certain thing): either typing them out carefully into the “Console Window” (the lower-left window in Rstudio) and hitting Enter or writing and editing lines in the script window (upper-left window in Rstudio), and “passing” the code into the console by hitting Ctrl+Enter.

In general, it is better to do all of your coding in a script window, and then save the raw code file as a text document, which you can revisit and re-run at any point later. To create a new R script document, go to the upper-left corner, press File - New file - R Script or press Ctrl + Shift + N

R is a calculator

1+2
## [1] 3
3^6
## [1] 729
sqrt((20-19)^2 + (19-19)^2 + (19-18)^2)/2
## [1] 0.7071068
12345*54312
## [1] 670481640

Assigning variable names

The assignment operator is <-. It’s supposed to look like an arrow pointing left (the shortcut for entering it is Alt + -).

X <- 5      # sets X equal to 5 

Using the assignment operator sets the value of X but doesn’t print any output. To see what X is, you need to type:

X
## [1] 5

Note that X now appears in the upper-right panel of Rstudio, letting you know that there is now an object in memory (also called the “Environment”) called X.

Now, you can use X as if it were a number

X*2
## [1] 10
X^X
## [1] 3125

Note that you can name a variable ANYTHING, as long as it starts with a letter.

Fred <- 5
Nancy <- Fred*2
Fred + Nancy
## [1] 15

Vectors

Obviously, X can be many things more than just a single number. The most important kind of object in R is a “vector”, which is a series of inputs (and therefore resembles “data”).

c() is a function - a very useful function that creates “vectors”. In all functions, arguments are passed within parentheses.

We can use the c() function as follows:

X <- c(3,4,5)   # sets X equal to the vector (3,4,5)
X
## [1] 3 4 5

Now, let’s do some arithmetic with this vector:

X + 1
## [1] 4 5 6
X*2
## [1]  6  8 10
X^2
## [1]  9 16 25
((X+X^2/2)/X)^2
## [1]  6.25  9.00 12.25

Note that in all of these cases, the arithmetic operations are performed on a term-by-term basis.

We can easily model some exponential growth. As an example, let’s use Washington sea otter numbers: in 1970, 60 were released, and we want to know how many there are in 2020, i.e. after 50 years, at an annual growth rate of 7%. The following code models this process:

years <- 0:50
lambda <- 1.07
N0 <- 60
N0*lambda^years
##  [1]   60.00000   64.20000   68.69400   73.50258   78.64776   84.15310
##  [7]   90.04382   96.34689  103.09117  110.30755  118.02908  126.29112
## [13]  135.13150  144.59070  154.71205  165.54189  177.12982  189.52891
## [19]  202.79594  216.99165  232.18107  248.43374  265.82410  284.43179
## [25]  304.34202  325.64596  348.44118  372.83206  398.93030  426.85542
## [31]  456.73530  488.70677  522.91625  559.52039  598.68681  640.59489
## [37]  685.43653  733.41709  784.75628  839.68922  898.46747  961.36019
## [43] 1028.65541 1100.66129 1177.70758 1260.14711 1348.35740 1442.74242
## [49] 1543.73439 1651.79580 1767.42150

Exercise 1: Calculate population growth

You can get some really quick population growth answers this way. Compute how many sea otters there will be by 2050 and 2100 (80 and 130 years after release). HINT: you can just replace the vector with a single number.

Multiple Vectors and Data Frames

Data is most often multiple vectors of the same length. If we create a second vector Y we can use it alongside our first vector X using the data.frame() command. Now, both vectors became columns in our new data frame!

Y <- c(1,2,3)
data.frame(X,Y)
##   X Y
## 1 3 1
## 2 4 2
## 3 5 3

Running that command as a single line just outputs the data and allows us to look at it. To perform operations with it, you should save it as another object:

mydata <- data.frame(X,Y)

A data frame has columns with names:

ncol(mydata) # ncol() gives us a number of columns that this data frame has
## [1] 2
names(mydata) # names() lists all column names that this data frame has 
## [1] "X" "Y"

A column can be extracted (or called) from a dataframe with a $:

mydata$X
## [1] 3 4 5
mydata$Y
## [1] 1 2 3

Part 2: Loading and Exploring Data

The following examples should explain how to import data frames and to work with the data contained within them.

Loading Data

We will use Steller sea lion (Eumotopias jubatus) data as an example. These are weights, lengths, and girths (basically, under the arm/flipper pits) of sea lion pups about two months after birth as part of a tagging mark-recapture study. These data were collected (in part by Dr. Gurarie) on five islands in the Russian North Pacific.

This is what sea lion pups look like:

This dataset is available on Blackboard as SeaLions.csv, or at this link. Once you download it, you can use the File Explorer to determine its location and read it into R in a couple of ways:

  1. From the command line: you can download the dataset and modify the following line of code:
SeaLions <- read.csv("insert the directory instead of this sentence/SeaLions.csv")

A directory is another way to refer to a folder or, simply, a location of a data file on your computer. You can get the address of the directory if you open the folder where you saved the file through File Explorer, right-click on the navigation bar and select Copy address as text option. Note: If you copy and paste the file directory in, you have to change the direction of the slashes from \ to /!

Note that csv is a text based file type (Comma Separated Values) - it just means that commas between entries indicate separate columns. When a program “reads” the file, it “knows” that a comma means the end of one column and the start of another one. You can save any Excel file as a csv using the Save As function. CSVs are by far the the most common and convenient file type used for loading into R.

  1. Alternatively, you can import datasets into R using the RStudio point-and-click interface. To do this:
  1. Navigate to the Files tab in the bottom right corner of RStudio
  2. Click on SeaLions.csv
  3. RStudio will prompt you to either view the file or import the dataset. You want to import, so hit Import File
  4. A pop-up window will appear, showing you the preview of the data frame. Click Import and observe that your file is now loaded - it should have appeared in your Environment in the top right corner of RStudio.

This method does the same exact thing as the line of code above. It will automatically input the proper code into the console and save your file to the environment. Note that by default the file will have the same name rather than a name you designate for it.

Working with data frames

Look at some properties of this data file, with the following functions:

is(SeaLions) # tells what type of files we have
## [1] "data.frame" "list"       "oldClass"   "vector"
names(SeaLions) # tells us the names of all the columns
## [1] "Island" "Weight" "Length" "Sex"
head(SeaLions) # shows the first several rows of the dataframe
##     Island Weight Length Sex
## 1 Chirpoev   15.5     93   M
## 2 Chirpoev   29.0    106   F
## 3 Chirpoev   35.5    112   M
## 4 Chirpoev   32.0    107   M
## 5 Chirpoev   32.0    105   M
## 6 Chirpoev   33.5    111   M

Use a $ to extract a given column:

Length <- SeaLions$Length
Weight <-SeaLions$Weight
Island <- SeaLions$Island
Sex <- SeaLions$Sex

Summary Statistics

Some basic summary statistics include:

range(Length) # range
## [1]  93 126
median(Length) # median
## [1] 110
mean(Length) # mean
## [1] 109.8434
var(Length) # variance
## [1] 34.82854
sd(Length) # standard deviation
## [1] 5.901571

Graphical Summaries

Histogram

A histogram (invoked by hist() command) can show us the distribution of a single continuous variable:

hist(Length)
hist(Weight)

Boxplot

A boxplot shows us relationships between a continuous variable (like Length/Weight/Girth) and a discrete variable (like Island/Sex):

boxplot(Length ~ Island)
boxplot(Weight ~ Sex)

Scatterplot

A scatterplot shows us relationships between two continuous variables:

Exercise 2: Download and plot the sea otter population growth data

The first step is to import our new dataset. It is called SeaOtters.csv and is here or on Blackboard. Plot the downloaded data, using the “Year” column on the X-axis and “Count” column on the y-axis. For this exercise provide the plotting code.