I. More manipulation of movement data

Pick a movement data set of your choice with multiple individuals (but no more than 20). Ideally, a dataset that you plan to investigate yourself. Alternatively, work with some subset of the Ya Ha Tinda elk dataset on Movebank, following the instructions in Chapter: Movement Data in R.

We’ll work with a dataset in the amt package of fishers (Martes pennanti) from Albany, NY. We’ll simplify again to x, y, time and id, and turn it from a “tibble” to a data frame. The details of the code below aren’t important, but they allow this example to be fully replicable.

require(amt)
data(amt_fisher)
fisher <- data.frame(amt_fisher[,c("x_","y_","t_","id")])
names(fisher) <- c("x","y","time","id")
str(fisher)
## 'data.frame':    14230 obs. of  4 variables:
##  $ x   : num  1782673 1782680 1782683 1782686 1782681 ...
##  $ y   : num  2402297 2402297 2402292 2402305 2402297 ...
##  $ time: POSIXct, format: "2009-02-11 12:16:45" "2009-02-11 12:31:38" ...
##  $ id  : chr  "M1" "M1" "M1" "M1" ...
  1. Create a table that reports for each individual (i) the number of locations, (ii) the start, end time and duration of the monitoring, (iii) the median, mean and standard deviation of the duty cycle (i.e. the duration between subsequent observations).

First, let’s see how many individuals we have:

table(fisher$id)
## 
##   F1   F2   M1   M4 
## 1349 3004  919 8958

Just four, two females and two males. We can now summarize these data per individual. I like plyr commands for this, i.e. ddply combined with summarize:

require(plyr); require(magrittr)
duration.table <- fisher %>% ddply("id", summarize, 
                 n.locs = length(x), 
                 start = min(time) %>% as.Date, 
                 end = max(time) %>% as.Date,
                 duration = difftime(end, start, units = "days") %>% round(1))
duration.table
##   id n.locs      start        end duration
## 1 F1   1349 2011-02-11 2011-03-03  20 days
## 2 F2   3004 2010-12-16 2011-01-04  19 days
## 3 M1    919 2009-02-11 2009-03-04  21 days
## 4 M4   8958 2010-02-09 2010-03-31  50 days

That’s quite tidy! To get some of the details of the duty cycles, I’ll first compute the intervals by “id”, and then summarize:

dutycycle.table <- fisher %>% ddply("id", summarize, 
                 dT = difftime(time[-1],time[-length(time)], units = "mins")) %>% 
  ddply("id", summarize, 
        dT.median = median(dT), dT.mean = mean(dT), dT.sd = sd(dT))
dutycycle.table
##   id      dT.median        dT.mean     dT.sd
## 1 F1 10.016625 mins 20.693027 mins  95.18721
## 2 F2  2.033350 mins  8.930941 mins  48.05628
## 3 M1 15.033350 mins 32.745370 mins 104.20486
## 4 M4  2.033333 mins  8.041761 mins  43.98099

Just from these numbers it is clear that there are some VERY large gaps in the data. How can I tell?

  1. Report the following statistics:

    1. The mean, standard deviation, median, range, and interquartile range (25% and 75% quantiles) of the number of locations per individual;

Quick & easy with the table we had before, and the handy summary function:

summary(duration.table$n.locs)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     919    1242    2176    3558    4492    8958

Ok, technically the standard deviation is missing:

sd(duration.table$n.locs)
## [1] 3710.836

Not very informative with only 4 individuals. But still - basically excellent datasets

2.  The median, mean, and standard deviation of the location frequency (aka "duty cycle") over all the individuals.

We shouldn’t use the summary table we put together in the previous section here, but ALL the raw time intervals:

dT.table <- fisher %>% ddply("id", summarize, 
                 dT = difftime(time[-1],time[-length(time)], units = "mins"))
summary(dT.table$dT %>% as.numeric)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    0.100    1.967    2.100   11.022    9.283 1650.317

Ok - so MAINLY an ungodly 2 minutes between intervals. But also - at least in one place - a leap of 1650 minutes, or 27.5 hours. There will be holes!

II. Some Figures

Produce the following plots:

  1. A boxplot of the number of locations per individual

This question did not make very much sense. But technically, it is a single, uninteresting box with whiskers.

boxplot(duration.table$n.locs)

Beautiful!

  1. A plot of the overall duration of each individual’s data (like the ones in the chapter).

Using the ggplot example from Ophélie’s notes:

require(ggplot2)
fisher %>% ddply("id", summarize, start = min(time), end = max(time)) %>% 
                   ggplot(aes(xmin = start, xmax = end, y = id)) + 
                   geom_linerange()

Ok - these animals were followed for very short, very intense periods over af five year period. We might get a slightly better idea if we facet by the animals and free the time-scales:

fisher %>% ddply("id", summarize, start = min(time), end = max(time)) %>% 
  ggplot(aes(xmin = start, xmax = end, y = id)) + 
  geom_linerange() + facet_wrap(.~id, ncol = 2, scales = "free_x")

Not an extremely useful plot! But mainly we can see that the studies occurred primarily in February, with one female tracked in December.

  1. A plot of the duration for each individual’s data that also illustrates the gaps in the monitoring.

For these data, this is actually a bit tricky because the observation frequency is SO high. Here’s my version:

fisher %>% 
  ggplot(aes(time, y = id)) + 
  geom_point(cex = 0.5, alpha = 0.4) + facet_wrap(.~id, ncol = 2, scales = "free_x")

Lots of little gaps!

III. Some Movement Statistics

  1. For each location, compute the step-length preceding that location, the time interval of that step, and the movement rate.
fisher.mr <- fisher %>% ddply("id", mutate, z= x+1i*y, step = c(NA, diff(z)),
                 dT = c(NA, difftime(time[-1], time[-length(time)], units = "min")) %>% as.numeric,
                 move.rate = Mod(step)/dT)
  1. Produce a box-plot of the movement rate across all individuals against month of year, i.e. not just “Jan-Feb” and “July-August” like in the example, but for each month.

All of the movements here occurred in winter. So instead, we’ll break down by hour of day (also an interesting question):

require(lubridate) 
fisher.mr %>% ggplot(aes(factor(hour(time)), move.rate)) + 
  geom_boxplot() + ylab("movement rate: m/min")

Even from these plots it is clear that the fishers aren´t so active in the afternoons…. HOWEVER, it is VERY VERY important to note the time zone of the data collection! Thankfully, that is stored in the time data:

tz(fisher$time)
## [1] "UTC"

Yikes! Albany, NY, is not in UTC. We need to change the time zone to local time, and then recreate the figure:

fisher$time <- with_tz(fisher$time, tzone = "America/New_York")
fisher.mr <- fisher %>% ddply("id", mutate, z= x+1i*y, step = c(NA, diff(z)),
                 dT = c(NA, difftime(time[-1], time[-length(time)], units = "min")) %>% as.numeric,
                 move.rate = Mod(step)/dT)
fisher.mr %>% ggplot(aes(factor(hour(time)), move.rate)) + 
  geom_boxplot() + ylab("movement rate: m/min")

There are some outlying points and an extremely typical right skew to the movement rate distributions (& also an evident outlier). The right skew is perhaps easiest to deal with with a log-scaled axis. Here is another version that does that, and also breaks the patterns down by individual:

fisher.mr %>% ggplot(aes(factor(lubridate::hour(time)), move.rate, fill = id, col = id)) + 
  geom_boxplot(alpha = 0.5) + ylab("movement rate: m/min") +
  scale_y_log10() + facet_wrap(.~id)

  1. Can you detect any patterns? Discuss.

Considerable inter-individual variation (welcome to animal behavior!) Essentially similar patterns, but extremely pronounced for F2. Note the wide range of movement rates: from well under 0.1 m / min to well over 20 m / min, with consistent means (when active) of around 10 m/min. Given the extremely high resolution of the data, there is a high likelihood that GPS data errors are artifically increasing speeds.

  1. Extremely valuable extra credit: Fit some smoother (e.g. a GAM model or a LOESS smoother) to those movement rates across the year.

Here’s a nice comparison of patterns across individuals:

require(lubridate)
fisher.mr %>% mutate(timeofday = hour(time) + minute(time)/60) %>% 
  ggplot(aes(timeofday, move.rate, col = id)) + 
  ylab("movement rate: m/min") + 
  geom_smooth(method = "gam")  + 
  scale_y_log10()

The geom_smooth function plops down a smoother (in this case “gam”) in a very naive way. In fact, the confidence intervals here are WAY too narrow because of autocorrelation in the movement data - something we’ll learn about more later in the class. But the overall pattern is clear. If you want to find an active fisher in the winter, you’re much better off just about any time between 6pm and 5am. But don´t expect to find anyone at noon.

Fisher With A Broken Tail. Jackson Beardy (Aniishinabe). 1972 According to Ojibwe legend, the fisher saved humanity from endless winter, and was immortalized in a constellation (Ursa Major) when the sky people broke his tail.