Part 1: Basics
The following examples should give you a first look at what R does and how it works.
Introduction
R is a command-line program, which means commands are entered
line-by-line at the prompt. Being a programming language it is very
finicky. Everything has to be entered exactly right - including
case-sensitivity. So, a Plot
entry is different from
plot
!
There are two ways of entering commands (telling R to do a certain
thing): either typing them out carefully into the “Console Window” (the
lower-left window in Rstudio) and hitting Enter
or writing
and editing lines in the script window (upper-left window in Rstudio),
and “passing” the code into the console by hitting
Ctrl+Enter
.
In general, it is better to do all of your coding in a script window,
and then save the raw code file as a text document, which you can
revisit and re-run at any point later. To create a new R script
document, go to the upper-left corner, press File
-
New file
- R Script
or press Ctrl
+ Shift
+ N
R is a calculator
## [1] 3
## [1] 729
## [1] 0.7071068
## [1] 670481640
Assigning variable names
The assignment operator is <-
. It’s supposed to look
like an arrow pointing left (the shortcut for entering it is
Alt
+ -
).
Using the assignment operator sets the value of X
but
doesn’t print any output. To see what X
is, you need to
type:
## [1] 5
Note that X
now appears in the upper-right panel of
Rstudio, letting you know that there is now an object in memory (also
called the “Environment”) called X
.
Now, you can use X
as if it were a number
## [1] 10
## [1] 3125
Note that you can name a variable ANYTHING, as long as it starts with a letter.
## [1] 15
Vectors
Obviously, X
can be many things more than just a single
number. The most important kind of object in R is a “vector”, which is a
series of inputs (and therefore resembles “data”).
c()
is a function - a very useful function that creates
“vectors”. In all functions, arguments are passed within
parentheses.
We can use the c()
function as follows:
## [1] 3 4 5
Now, let’s do some arithmetic with this vector:
## [1] 4 5 6
## [1] 6 8 10
## [1] 9 16 25
## [1] 6.25 9.00 12.25
Note that in all of these cases, the arithmetic operations are performed on a term-by-term basis.
We can easily model some exponential growth. As an example, let’s use Washington sea otter numbers: in 1970, 60 were released, and we want to know how many there are in 2020, i.e. after 50 years, at an annual growth rate of 7%. The following code models this process:
## [1] 60.00000 64.20000 68.69400 73.50258 78.64776 84.15310
## [7] 90.04382 96.34689 103.09117 110.30755 118.02908 126.29112
## [13] 135.13150 144.59070 154.71205 165.54189 177.12982 189.52891
## [19] 202.79594 216.99165 232.18107 248.43374 265.82410 284.43179
## [25] 304.34202 325.64596 348.44118 372.83206 398.93030 426.85542
## [31] 456.73530 488.70677 522.91625 559.52039 598.68681 640.59489
## [37] 685.43653 733.41709 784.75628 839.68922 898.46747 961.36019
## [43] 1028.65541 1100.66129 1177.70758 1260.14711 1348.35740 1442.74242
## [49] 1543.73439 1651.79580 1767.42150
Exercise 1: Calculate population growth
You can get some really quick population growth answers this way. Compute how many sea otters there will be by 2050 and 2100 (80 and 130 years after release). HINT: you can just replace the vector with a single number.
Multiple Vectors and Data Frames
Data is most often multiple vectors of the same length. If we create
a second vector Y
we can use it alongside our first vector
X
using the data.frame()
command. Now, both
vectors became columns in our new data frame!
## X Y
## 1 3 1
## 2 4 2
## 3 5 3
Running that command as a single line just outputs the data and allows us to look at it. To perform operations with it, you should save it as another object:
A data frame has columns with names:
## [1] 2
## [1] "X" "Y"
A column can be extracted (or called) from a dataframe with a
$
:
## [1] 3 4 5
## [1] 1 2 3
Part 2: Loading and Exploring Data
The following examples should explain how to import data frames and to work with the data contained within them.
Loading Data
We will use Steller sea lion (Eumotopias jubatus) data as an example. These are weights, lengths, and girths (basically, under the arm/flipper pits) of sea lion pups about two months after birth as part of a tagging mark-recapture study. These data were collected (in part by Dr. Gurarie) on five islands in the Russian North Pacific.
This is what sea lion pups look like:
This dataset is available on Blackboard
as SeaLions.csv
, or at this
link. Once you download it, you can use the File Explorer to
determine its location and read it into R in a couple of ways:
- From the command line: you can download the dataset and modify the following line of code:
A directory is another way to refer to a folder or, simply, a
location of a data file on your computer. You can get the address of the
directory if you open the folder where you saved the file through File
Explorer, right-click on the navigation bar and select
Copy address as text
option. Note: If you copy and paste
the file directory in, you have to change the direction of the
slashes from \
to /
!
Note that csv
is a text based file type
(Comma Separated Values) - it just means that commas between entries
indicate separate columns. When a program “reads” the file, it “knows”
that a comma means the end of one column and the start of another one.
You can save any Excel file as a csv
using the Save
As function. CSVs are by far the the most common and convenient
file type used for loading into R.
- Alternatively, you can import datasets into
R
using theRStudio
point-and-click interface. To do this:
- Navigate to the
Files
tab in the bottom right corner of RStudio- Click on
SeaLions.csv
- RStudio will prompt you to either view the file or import the dataset. You want to import, so hit
Import File
- A pop-up window will appear, showing you the preview of the data frame. Click
Import
and observe that your file is now loaded - it should have appeared in your Environment in the top right corner of RStudio.
This method does the same exact thing as the line of code above. It will automatically input the proper code into the console and save your file to the environment. Note that by default the file will have the same name rather than a name you designate for it.
Working with data frames
Look at some properties of this data file, with the following functions:
## [1] "data.frame" "list" "oldClass" "vector"
## [1] "Island" "Weight" "Length" "Sex"
## Island Weight Length Sex
## 1 Chirpoev 15.5 93 M
## 2 Chirpoev 29.0 106 F
## 3 Chirpoev 35.5 112 M
## 4 Chirpoev 32.0 107 M
## 5 Chirpoev 32.0 105 M
## 6 Chirpoev 33.5 111 M
Use a $
to extract a given column:
Summary Statistics
Some basic summary statistics include:
## [1] 93 126
## [1] 110
## [1] 109.8434
## [1] 34.82854
## [1] 5.901571
Graphical Summaries
Histogram
A histogram (invoked by hist()
command) can show us the
distribution of a single continuous variable:
Boxplot
A boxplot shows us relationships between a continuous variable (like Length/Weight/Girth) and a discrete variable (like Island/Sex):
Scatterplot
A scatterplot shows us relationships between two continuous variables:
Exercise 2: Download and plot the sea otter population growth data
The first step is to import our new dataset. It is called
SeaOtters.csv
and is here or on Blackboard. Plot the downloaded data, using the “Year” column on the X-axis and “Count” column on the y-axis. For this exercise provide the plotting code.