6 Movement data in R

Loading, exploring and processing

Author

O. Couriot, E. Gurarie

This is an tutorial that illustrates some approaches to performing certain programming tasks in R. Always remember, there are many ways to perform similar or identical tasks, and the single best workflow and approach to programming is the one that works best for you!

6.1 Preamble

The goal of this tutorial is to provide the tools to comprehend the structure of certain common types of animal movement data, as well as to load, process and visualize movement data using R in Rstudio.

In particular we will be accessing data from Movebank, a free, online database of animal tracking data, where users can store, process, and share (or not share) location data. In the diverse and highly fragmented world of animal movement ecology, it is the closest to a “standard” format for movement data.

For this tutorial, several packages for downloading, processing and visualizing data are required. We will use: - plyr - dplyr - magrittr - lubridate - move - sf - mapview - ggplot2. Notably, the move package allows you to access and process data from Movebank directly from R.

Install and load these packages, for example via:

# create packages list
packages <- list("plyr","dplyr","magrittr", 
                 "lubridate","move","sp","sf", 
                 "mapview","ggplot2")
sapply(packages, require, character = TRUE)

6.2 Movebank Data

Here as an example we will use elk (Cervus elaphus¹) movement data from the Ya Ha Tinda Long-Term Elk Monitoring Project (Hebblewhite et al. 2020), which are hosted on Movebank (https://www.movebank.org). The dataset that we are using is a subset of the data, and contains GPS locations of 10 female Elk from Banff National Park, in Alberta. You can access the data as a .csv file at this link. But it is worth visiting MoveBank, getting an account, and learning how to navigate the website. For example, you can log in on MoveBank, clicking Data \> Studies, and changing from Studies where I am the Data Manager to All studies, and search “Ya Ha Tinda elk project, Banff National Park, 2001-2020 (females)” . Click on Download > Download data and follow further instructions.

¹ Or is it Cervus canadensis?

6.2.1 Loading movement data in R

If you are working from data on Movebank - including data with controlled or limited access that you have access to - you can load the data with the move::getMovebankData() function [^1: recall, the move:: notation just means that the function comes from the move package] using your own login and password. This is a two step process. First you have to create an object that contains your login information:

require(move)
mylogin <- movebankLogin(username='XXX', password='XXX')

Replacing the XXX’s with your private information. Next, you use that login in the getMovebankData function:

elk_move <- getMovebankData(
  study = "Ya Ha Tinda elk project, Banff National Park, 2001-2020 (females)",
  login = mylogin, removeDuplicatedTimestamps=TRUE)

This step might take a few minutes, since the data contains more than 1.5 million observations (GPS locations) and it depends on the quality and the speed of your Internet Connection.

The resulting object is a MoveStack, which can be converted to a data.frame using the as.data.frame() function. Not that the as.data.frame() function convert the object to a data.frame by extracting hidden elements from the MoveStack object. data.frame(), on the contrary, would create a data.frame of the MoveStack object using only the visible elements.

elk_df <- as.data.frame(elk_move)

For all the examples of this chapter, we will only use a subset of the Ya Ha Tinda data set (i.e., 10 random individuals) and select only the columns we are interested in and rename them for simplicity.

elk_df <- elk_df %>% mutate(ID = trackId, Time = timestamp, Lon = location_long, Lat = location_lat) %>% subset(select = c("ID", "Time", "Lon", "Lat", "sex", "number_of_events"))%>% subset(ID %in% sample(unique(ID), 10, replace = FALSE))

The coordinate reference system of the data set is WGS84. To add coordinates in a metric system, we will convert the data.frame to a sf (i.e., simple feature), using the Lon,Lat coordinates, with the coordinate reference system EPSG:4326. We then reproject to a metric coordinate system (here the UTM Zone 10N, EPSG: 32610; see chapter ?sec-visualization for more details).

elk_df <- elk_df %>% 
  st_as_sf(coords = c("Lon", "Lat"), crs = 4326) %>% 
  mutate(Lon = st_coordinates(.)[,1], Lat = st_coordinates(.)[,2]) %>% 
  st_transform(crs = 32610) %>% 
  mutate(X = st_coordinates(.)[,1], Y = st_coordinates(.)[,2]) %>% 
  as.data.frame %>% mutate(geometry = NULL)

6.2.2 Load the data from your computer

If you downloaded the data on your computer, from the website. You can use the read.csv() function to open it on R, using the path of the file on your Disk.

elk_df <- read.csv("data/elk.csv")

6.2.3 Data structure

Now that we loaded the data, we can explore its structure, dimensions, data types, etc. The following functions all provide some important or, at least, useful information:

The “class” of the object is given by looking at the object in the Environment on the right, or the class() or, more informative, is() functions:

class(elk_df)

[1] "data.frame"

is(elk_df)

[1] "data.frame" "list"       "oldClass"   "vector"

When using the getMovebankData() function, the resulting object is a MoveStack. We can use the as.data.frame() (or just data.frame()), to transform it into a data frame and see all the elements of the object. If you loaded the data from your computer, it is a data.frame.

The dimensions (number or rows and columns), by dim().

dim(elk_df)

[1] 97825     7

The names of the columns with names():

names(elk_df)

[1] "ID"   "Time" "sex"  "Lon"  "Lat"  "X"    "Y"

The first few rows with head(), useful for a quick look:

head(elk_df)

ID	Time	sex	Lon	Lat	X	Y
GR122	2006-03-06 16:22:41	f	-115.5828	51.75451	597820.5	5734685
GR122	2006-03-06 19:01:07	f	-115.5830	51.75455	597806.6	5734689
GR122	2006-03-06 23:01:48	f	-115.5865	51.73794	597600.9	5732837
GR122	2006-03-07 03:01:46	f	-115.5774	51.73691	598231.4	5732735
GR122	2006-03-07 11:01:46	f	-115.5828	51.75442	597820.7	5734675
GR122	2006-03-07 15:01:05	f	-115.5828	51.75447	597820.6	5734680

Movement data at the very minimum contains x and y (here, columns X and Y) or longitude/latitude (Lon and Lat) coordinates, and an associated timestamp (Time). If there is more than one individual, there will also be a unique identifier (ID). Additional columns might also be provided, as here with the sex of the individual.²

² A warning: you might have loaded data with the readr::read_csv() function instead of read.csv() - that is the default behavior if you click on a data file in newer versions of Rstudio, for example. This function will convert all of the sex column to the Boolean FALSE - implying that all of these elk are abstinent. This is an example of new and improved versions of software occasionally being a wee bit too smart for their own good.

More detail about the structure of the data frame is given by str():

str(elk_df)

'data.frame':   97825 obs. of  7 variables:
 $ ID  : chr  "GR122" "GR122" "GR122" "GR122" ...
 $ Time: chr  "2006-03-06 16:22:41" "2006-03-06 19:01:07" "2006-03-06 23:01:48" "2006-03-07 03:01:46" ...
 $ sex : chr  "f" "f" "f" "f" ...
 $ Lon : num  -116 -116 -116 -116 -116 ...
 $ Lat : num  51.8 51.8 51.7 51.7 51.8 ...
 $ X   : num  597821 597807 597601 598231 597821 ...
 $ Y   : num  5734685 5734689 5732837 5732735 5734675 ...

We see that all of the variables are character or numeric. For easy manipulation and visualization, it is more convenient if ID and sex are factors. To do so, we can use the mutate function, from the plyr package, which allows to create, delete or modify columns.

# set ID and sex as factors
elk_df <- elk_df %>% mutate(ID = as.factor(ID), sex = as.factor(sex))

6.2.4 Manipulating time

In addition, the characteristic of movement data is that they are spatio-temporal. This means that each GPS location is associated to a Date and Time. In our case, the Time column is a character object. The function as.POSIXct() converts a time column (e.g., considered as a character) into a POSIXct object - a standard computer format for temporal data (see ?as.POSIXct for more information). Note that the Date in R is always in the format Year-Month-Day Hour:Minute:Second (yyyy-mm-dd hh:MM:ss). Similar to transforming our variables in factors, we use the plyr::mutate() function to transform Time into POSIXct format.

elk_df <- elk_df %>% mutate(Time = as.POSIXct(Time, tz = "UTC"))

This can also be done with some generally more easy-to-use lubridate package functions. For example: ymd_hms() will convert from this format to POSIX:

elk_df <- elk_df %>% mutate(Time = ymd_hms(Time, tz = "UTC"))

whereas if the data were loaded from the original file in some obnoxious format, like the U.S. standard mm/dd/yyyy, this could be converted via:

elk_df <- elk_df %>% mutate(Time = mdy_hms(Time, tz = "UTC"))

and so on.

The lubridate also offers useful functions for extracting infromation from POSIX times. For example, the hour of day (hour()), the day of year (yday()), the month (month() or the year (year()). A convenient feature of mutate() is that multiple just commands can be strung together in a single command, for example:

elk_df <- elk_df %>% mutate(doy = yday(Time),
                            Month = month(Time),
                            Year = year(Time))
head(elk_df)

ID	Time	sex	Lon	Lat	X	Y	doy	Month	Year
GR122	2006-03-06 16:22:41	f	-115.5828	51.75451	597820.5	5734685	65	3	2006
GR122	2006-03-06 19:01:07	f	-115.5830	51.75455	597806.6	5734689	65	3	2006
GR122	2006-03-06 23:01:48	f	-115.5865	51.73794	597600.9	5732837	65	3	2006
GR122	2006-03-07 03:01:46	f	-115.5774	51.73691	598231.4	5732735	66	3	2006
GR122	2006-03-07 11:01:46	f	-115.5828	51.75442	597820.7	5734675	66	3	2006
GR122	2006-03-07 15:01:05	f	-115.5828	51.75447	597820.6	5734680	66	3	2006

Our data set is more or less as we want it. Depending on your research questions, you might have additional variables, and you should make sure that those variables are in the correct format. It is also worth noting that once you know what your raw data set looks like and you know how it should look like before starting processing, all those manipulations (loading the data, redefining the class of the columns, adding new columns) can be done in one line of code:

elk_df <- read.csv("data/elk.csv") %>% 
  mutate(ID = as.factor(ID), sex = as.factor(sex),
         Time = ymd_hms(Time, tz = "UTC"),
         doy = yday(Time), Month = month(Time), Year=year(Time))

6.3 Exploring raw movement data

Now that the data set is more manageable, we can explore the data, and especially look at the number of individuals, their sex, the duration of their monitoring, the fix rate, the number of missing data, etc.

However, as movement data is a time series, it is important when manipulating to FIRST order it by Individual and Time. The function arrange from the plyr (or dplyr) package is very handy:

elk_df <- elk_df %>% arrange(ID, Time)

Here, we walk through a sequence of some basic questions to be asked of movement data. s

6.3.1 How many individuals are there?

And what are their ID?

length(unique(elk_df$ID))

[1] 10

unique(elk_df$ID)

 [1] GR122       GR133       GR159       GR513       OR39        YL115_OR34 
 [7] YL121_BL250 YL5         YL77        YL89       
10 Levels: GR122 GR133 GR159 GR513 OR39 YL115_OR34 YL121_BL250 YL5 ... YL89

6.3.2 How many females/males?

There are several “base” R ways to count the males and females. For example the following three commands all give (essentially) the same output:

unique(elk_df[,c("sex","ID")])
unique(data.frame(elk_df$sex, elk_df$ID))
with(elk_df, unique(data.frame(sex, ID)))

	sex	ID
1	f	GR122
21568	f	GR133
39147	f	GR159
45195	f	GR513
59780	f	OR39
67409	f	YL115_OR34
83906	f	YL121_BL250
87390	f	YL5
90251	f	YL77
92890	f	YL89

i.e., a unique combination of sex and ID. An (uninteresting) count of those sexes is given by:

id.sex.count <- unique(elk_df[,c("sex","ID")])
table(id.sex.count$sex)


 f 
10

A more trendy approach is to use functions in the dplyr package, which allows you to summarize information per group.

elk_df %>% group_by(ID) %>% summarize(sex = unique(sex))

# A tibble: 10 × 2
   ID          sex  
   <fct>       <fct>
 1 GR122       f    
 2 GR133       f    
 3 GR159       f    
 4 GR513       f    
 5 OR39        f    
 6 YL115_OR34  f    
 7 YL121_BL250 f    
 8 YL5         f    
 9 YL77        f    
10 YL89        f

The resulting object is a tibble which is very similar to a data frame, but the preferred output of dplyr manipulations.

6.3.3 How many locations per individual?

To look at the average number of GPS locations per individual and other statistics (e.g., standard deviation), by using the mean(), sd(), and related functions.

But first, we want to make sure that there are no missing locations (missing Longitude or Latitude), in this data set. is,na() checks whether each element is an NA or not, and table() just counts:

# missing longitude
table(is.na(elk_df$Lon))


FALSE 
97825

# missing latitude
table(is.na(elk_df$Lat))


FALSE 
97825

No missing locations! That’s a relief.

Let’s look at the number of locations per individual:

# average number of GPS locations per individual
table(elk_df$ID) %>% mean

[1] 9782.5

# standard deviation of the number of locations per individual
table(elk_df$ID) %>% sd

[1] 7059.185

On average, an individual has more than 9000 locations. However, the standard deviation is almost as big, which means that some individuals have a lot of locations and some have fewer locations.

# maximum number of GPS locations per individual
table(elk_df$ID) %>% max

[1] 21567

# minimum number of GPS locations per individual
table(elk_df$ID) %>% min

[1] 2639

You can also access (almost) all the statistics by applying the all-purpose summary() function to the table() of the ID. Note that you need to convert it to a data frame first[^1]: [^1] The output of table() in R is actually quite weird.

# summary of the number of GPS locations per individual
table(elk_df$ID) %>% data.frame %>% summary

         Var1        Freq      
 GR122     :1   Min.   : 2639  
 GR133     :1   1st Qu.: 3847  
 GR159     :1   Median : 6838  
 GR513     :1   Mean   : 9782  
 OR39      :1   3rd Qu.:16019  
 YL115_OR34:1   Max.   :21567  
 (Other)   :4

6.3.4 What is the duration of monitoring?

The functions min(), max() and range() are self-explanatory, but importantly work excellenty with properly formatted time objects. The diff() function calculates differences among subsequent elements of a vector. But for time (POSIX) data, it is best to use the difftime(t1, t2) function since that allows you to specify the units of the time difference (hours, days, etc.) Otherwise, strange things can happen. For example, the overall time span of the monitoring effort is:

range(diff(elk_df$Time))

Time differences in secs
[1] -256413590  293543997

The number of seconds is not SO helpful. But difftime is a bit more useful:

difftime(max(elk_df$Time), min(elk_df$Time), units = "days")

Time difference of 5078.5 days

Better. That’s a lot of days. More than 10 years. Years is not an option with difftime, but to manipulate the output statistically, you need to convert to numeric. Thus the number of years

(difftime(max(elk_df$Time), min(elk_df$Time), units = "days") %>% as.numeric)/365.25

[1] 13.90417

That’s a good, long-term dataset!

Anyways, what we really want is to figure this out for each individual. One approach is to use plyr::ddply commands. This function allows you apply functions to different groups (subsets) of a data set. Here’s an example:

elk_df %>% ddply("ID", plyr::summarize, 
                 start= min(Time), end = max(Time)) %>% 
  mutate(duration = difftime(end, start, units = "days"))

ID	start	end	duration
GR122	2006-03-06 16:22:41	2007-06-08 07:00:58	458.6099 days
GR133	2006-03-17 18:26:03	2007-01-06 23:45:31	295.2219 days
GR159	2005-03-27 16:33:15	2005-11-09 13:00:50	226.8525 days
GR513	2015-02-28 01:00:47	2016-06-17 10:46:06	475.4065 days
OR39	2014-02-26 01:00:42	2014-11-23 15:02:51	270.5848 days
YL115_OR34	2014-02-23 01:01:02	2017-01-09 17:00:48	1051.6665 days

The (near) equivalent with dplyr commands - just jumping straight to the duration:

elk_df %>% group_by(ID) %>% 
  dplyr::summarize(duration = difftime(max(Time), min(Time), units = "days"))

# A tibble: 10 × 2
   ID          duration      
   <fct>       <drtn>        
 1 GR122        458.6099 days
 2 GR133        295.2219 days
 3 GR159        226.8525 days
 4 GR513        475.4065 days
 5 OR39         270.5848 days
 6 YL115_OR34  1051.6665 days
 7 YL121_BL250  378.1383 days
 8 YL5          300.4165 days
 9 YL77         250.0837 days
10 YL89         229.8036 days

To get statistical summaries of these durations, you have to convert the time range object (which is a unique difftime class) into numeric. Thus:

elk_df %>% group_by(ID) %>% 
  dplyr::summarize(time_range = difftime(max(Time), min(Time), units ="days") %>% 
                     as.numeric) %>% summary

          ID      time_range    
 GR122     :1   Min.   : 226.9  
 GR133     :1   1st Qu.: 255.2  
 GR159     :1   Median : 297.8  
 GR513     :1   Mean   : 393.7  
 OR39      :1   3rd Qu.: 438.5  
 YL115_OR34:1   Max.   :1051.7  
 (Other)   :4

The median duration of monitoring is ~300 days, or a little bit less than a year. Some individual(s) have ~3 years of monitoring, and some only a few days.

We can visualize the monitoring duration for each individual, on a plot, by extracting the start date and the end date of the monitoring for each individual.

n.summary <- elk_df %>% group_by(ID) %>% 
  summarize(start = min(Time), end = max(Time))

Here’s a version using ggplot2

require(ggplot2)
ggplot(n.summary, aes(y = ID, xmin = start, xmax = end)) + 
    geom_linerange()

It may make more sense to sort the individuals not alphabetically, but by the time of release. Here’s an approach, which relies on reordering the factor levels of the ID column (an often fussy task):

n.summary <- elk_df %>% group_by(ID) %>% 
  summarize(start = min(Time), end = max(Time)) %>% 
  arrange(start) %>% 
  mutate(ID = factor(ID, levels = as.character(ID)))
  
ggplot(n.summary, aes(y = ID, xmin = start, xmax = end)) + 
    geom_linerange()

This is quick and easy and attractive enough. But, for the record, if you wanted to use base plotting functions (which, for many applications, can be much more easy to customize), code for a similar plot would look something like this:

with(n.summary, {
  plot(start, ID, xlim = range(start, end), 
       type = "n", yaxt = "n", ylab = "", xlab = "")
  segments(start, as.integer(ID), end, as.integer(ID), lwd = 2)
  mtext(side = 2, at = 1:nrow(n.summary), ID, cex = 0.7, las = 1, line = .2)
  })

In any case, on this figure each line represents the duration of the monitoring (x axis) for a given individual (y axis). While we see the beginning and end of the monitoring for each individual, we cannot see if there are any gaps in the monitoring.

To see if there is one or multiple gaps, we can create a vector of Date for each individual (i.e., containing only the date and not the time, to simplify the it and get only get one row per day per individual). To get the date from a time vector, we use the as.Date function. We then use the slice() command to keep only row per day for each individual. Do not forget to arrange per ID and date as all these manipulation can sometimes mess up the ordering of your data.

elk_days <- elk_df %>% mutate(date = as.Date(Time)) %>% 
  group_by(ID, date) %>% 
  slice(1) %>% arrange(ID, date)

ggplot(elk_days, aes(y = ID, x = date)) + 
    geom_point(shape = 20, size = .5, alpha = .1)

Similar to the previous figure, each line represent the monitoring dates for a given individual. But from this figure, we can see that there are some gaps in the monitoring of some individuals.

6.3.5 What is the fix rate?

The fix rate, or the time lag between successive locations, can be extracted by using the difftime() function on the Time column. Again, this function needs to be applied to each individual separately. Here, we are subsetting the data set per ID, and applying a function which is adding a column difftime to each subset. Note that since the vector of time difference is smaller than the vector of time, we add a missing value at the beginning of each vector, for each value to represent the difference in time to the previous location.

elk_df <- elk_df %>% ddply("ID", mutate, 
        dtime = c(NA, difftime(Time[-1], Time[-length(Time)], units = "hours")))

ID	Time	sex	Lon	Lat	X	Y	doy	Month	Year
GR122	2006-03-06 16:22:41	f	-115.5828	51.75451	597820.5	5734685	65	3	2006
GR122	2006-03-06 19:01:07	f	-115.5830	51.75455	597806.6	5734689	65	3	2006
GR122	2006-03-06 23:01:48	f	-115.5865	51.73794	597600.9	5732837	65	3	2006
GR122	2006-03-07 03:01:46	f	-115.5774	51.73691	598231.4	5732735	66	3	2006
GR122	2006-03-07 11:01:46	f	-115.5828	51.75442	597820.7	5734675	66	3	2006
GR122	2006-03-07 15:01:05	f	-115.5828	51.75447	597820.6	5734680	66	3	2006

What are the statistics (min, max, mean, median, …) of this fix rate?

elk_df$dtime %>% summary

Length  Class   Mode 
     0   NULL   NULL

On average, the fix rate is ~1 hour (median is 15 minutes). The smallest one is 1 second and the biggest is a little more than a year (10664 hours ~ 178 days). This one probably comes from an animal that has been captured and equipped once, then recaptured a year after the end of the monitoring. Understanding the sources of these gaps underscores the importance of having a relationship with those people that actually collected the data, to understand their monitoring strategy and structure of the data, and understand how to process the data depending on your research questions.

6.4 Processing data

The previous section illustrated a few typical approaches to exploring a movement dataset. Processing the data - broadly speaking - implies that we will be organizing it, filtering it, or adding information to the data frame in ways that contributes in a meaningful way to some key research question. By that definition, for example, the inclusion of the individual specific fix rate above is a key bit of data processing.

As an example, we might investigate the following question: Is there a difference in movement rates between winter and summer, for female elk?

6.4.1 Subsetting by season

To answer this question, we need to focus on movement data of female elk during winter and summer only. Let’s say (arbitrarily) that winter is only January and February, and summer is just July and August. We can use month() to subset the data accordingly.

elk_winter_summer <- elk_df %>% mutate(Month = month(Time)) %>%
  subset(Month %in% c(1,2,6,7))

To add a column “season”, we can use the ifelse(), function which returns different values depending on whether a given element in a vector satisfies a condition.

elk_winter_summer <- elk_winter_summer %>% 
  mutate(season = ifelse(Month < 3, "Winter", "Summer"))
table(elk_winter_summer$season)


Summer Winter 
 29959   2306

You could also perform this operation with a vector of days of year and the useful cut() function, which transforms numeric data to ordered factors. Thus, to obtain breaks:

season.dates <- c(winter.start = yday("2023-01-01"), 
                  winter.end = yday("2023-02-28"),
                  summer.start = yday("2023-07-01"),
                  summer.end = yday("2023-08-31"))
season.dates

winter.start   winter.end summer.start   summer.end 
           1           59          182          243

cut.dates <- c(season.dates, 367)
elk_winter_summer <- elk_df %>% 
  mutate(season = cut(yday(Time), cut.dates, labels = c("winter","other","summer","other"))) %>% 
  subset(season != "other") %>% droplevels
table(elk_winter_summer$season)


winter summer 
  2176  17212

6.4.2 Estimating movement rate

To estimate the movement rate between subsequent steps for each individual and each season, we will use what we learned in chapter Section 4.1.

Create a Z vector combining the X and Y coordinates
Calculate the step lengths (SL)
Calculate the time difference of the steps
Calculate step’s movement rate

As we need to do this for each individual, year and season separately, we will use the ddply(), as before. Note, that to do this most effectively, it is nice to write our own function that makes all the computations we need. The key in this function is that the movement rate (MR) is the step length divided by the time difference, converted ti km/hour. Here’s one such function:

getMoveStats <- function(df){
  # df - is a generic data frame that will contain X,Y and Time columns
  Z <- df$X + 1i*df$Y
  Time <- df$Time
  Step <- c(NA, diff(Z)) # we add the extra NA because there is no step to the first location
  dT <- c(NA, difftime(Time[-1], Time[-length(Time)], hours) %>% as.numeric)
  
  SL <- Mod(Step)/1e3 # convert to km - so the speed is in km/hour
  MR <- SL/dT # computing the movement rate

  # this is what the function returns
  data.frame(df, Z, dT, Step, SL, MR)
}

We took care to pick this function apart into individual pieces. And understand that it returns a new data frame with the additional columns appended to the original data frame. THe ddply command will apply this function to every individual in every season in every year. This is now very quick:

elk_winter_summer <- elk_winter_summer %>% 
  plyr::ddply(c("ID", "Year", "season"), getMoveStats)

ID	Time	sex	Lon	Lat	X	Y	doy	Month	Year	season	Z	dT	Step	SL	MR
GR122	2006-07-02 00:00:19	f	-115.5844	51.68669	597856.4	5727141	183	7	2006	summer	597856+5727141i	NA	NA	NA	NA
GR122	2006-07-02 00:15:17	f	-115.5841	51.68658	597877.4	5727129	183	7	2006	summer	597877+5727129i	14.96667	20.9744-11.830i	0.0240806	0.0016089
GR122	2006-07-02 00:30:16	f	-115.5836	51.68594	597913.3	5727058	183	7	2006	summer	597913+5727058i	14.98333	35.9429-70.504i	0.0791373	0.0052817
GR122	2006-07-02 00:45:16	f	-115.5828	51.68509	597970.4	5726965	183	7	2006	summer	597970+5726965i	15.00000	57.1349-93.454i	0.1095356	0.0073024
GR122	2006-07-02 01:00:48	f	-115.5822	51.68451	598013.2	5726901	183	7	2006	summer	598013+5726901i	15.53333	42.7284-63.696i	0.0767000	0.0049378
GR122	2006-07-02 02:00:28	f	-115.5820	51.68433	598027.4	5726881	183	7	2006	summer	598027+5726881i	59.66667	14.2142-19.749i	0.0243324	0.0004078

6.4.3 Quick analysis of movement rates

We are ready now (finally) to answer the question: Is there a difference in movement rate between winter and summer? We can start with a quick boxplot of the movement rates against individual ID’s and season. The distributions are highly skewed and quite variable, with some animals really on the move, and some spending a lot of time not moving at all.

ggplot(elk_winter_summer, aes(ID, MR)) + geom_boxplot() + facet_wrap(.~season)

Some season & individual summary stats:

mr_summarystats <- elk_winter_summer %>% ddply(c("ID","season"), summarize,
  min = min(MR, na.rm = TRUE), max = max(MR, na.rm = TRUE),
  n = length(MR), NA.count = sum(is.na(MR)),
  Zero.count = sum(MR == 0, na.rm = TRUE))

ID	season	min	max	n	NA.count	Zero.count
GR122	summer	0.0000000	0.0827437	3806	1	96
GR133	winter	0.0000000	0.0777082	314	1	10
GR133	summer	0.0000000	0.0871294	3782	1	44
GR159	summer	0.0000000	0.1078212	3012	1	21
GR513	winter	0.0000000	0.6543712	677	2	10
GR513	summer	0.0000093	0.0271023	711	1	0
OR39	winter	0.0000000	0.5450862	35	1	1
OR39	summer	0.0005555	1.8531111	572	1	0
YL115_OR34	winter	0.0000000	0.8679974	270	3	1
YL115_OR34	summer	0.0000000	2.6096342	1279	2	2
YL121_BL250	winter	0.0000000	1.6492821	539	1	1
YL121_BL250	summer	0.0011136	2.7874279	541	1	0
YL5	winter	0.0000000	0.0761834	251	1	3
YL5	summer	0.0000183	0.0480011	515	1	0
YL77	winter	0.0005551	1.3129208	90	1	0
YL77	summer	0.0011128	2.0078617	622	1	0
YL89	summer	0.0000000	0.1279309	2372	1	14

Note there are lot of 0’s in the data, which is rather remarkable considering the relative imprecision of GPS data - certainly high enough to show variation at the 1 meter scale. This is an evident red flag … one might be inclined to simply remove those data entirely.

Without going into too much detail of the statistical analysis of these data, the most appropriate statistical test would use a linear mixed effects model with individuals as random effects, and look something like this:

fit <- lme4::lmer(log(MR) ~ season + (1 + season|ID), 
            data = elk_winter_summer %>% subset(MR != 0 & !is.na(MR)))
summary(fit)

Linear mixed model fit by REML ['lmerMod']
Formula: log(MR) ~ season + (1 + season | ID)
   Data: elk_winter_summer %>% subset(MR != 0 & !is.na(MR))

REML criterion at convergence: 70408.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.2795 -0.6753  0.0264  0.7044  3.2786 

Random effects:
 Groups   Name         Variance Std.Dev. Corr 
 ID       (Intercept)  3.355    1.832         
          seasonsummer 1.001    1.000    -0.14
 Residual              2.295    1.515         
Number of obs: 19164, groups:  ID, 10

Fixed effects:
             Estimate Std. Error t value
(Intercept)   -4.6853     0.6123  -7.653
seasonsummer  -0.2238     0.3738  -0.599

Correlation of Fixed Effects:
            (Intr)
seasonsummr -0.285

Movement rates are slightly lower in the summer - but, perhaps surprisingly, not with meaningful statistical significance (|\(t\) value| < 1).

Note that the residuals are quite well-behaved with the log-transformation.

plot(fit)

6.5 Conclusions

6.6 References

Hebblewhite, Mark, Evelyn H. Merrill, Hans Martin, Jodi E. Berg, Holger Bohm, and Scott L. Eggeman. 2020. “Data from: Study "Ya Ha Tinda Elk Project, Banff National Park, 2001-2020 (Females)".” Movebank Data Repository. https://doi.org/10.5441/001/1.5G4H5T6C.