Pick a movement data set of your choice with multiple individuals (but no more than 20). Ideally, a dataset that you plan to investigate yourself. Alternatively, work with some subset of the Ya Ha Tinda elk dataset on Movebank, following the instructions in Chapter: Movement Data in R.
We’ll work with a dataset in the
amt
package of fishers (Martes pennanti) from Albany, NY. We’ll simplify again to x, y, time and id, and turn it from a “tibble” to a data frame. The details of the code below aren’t important, but they allow this example to be fully replicable.
require(amt)
data(amt_fisher)
fisher <- data.frame(amt_fisher[,c("x_","y_","t_","id")])
names(fisher) <- c("x","y","time","id")
str(fisher)
## 'data.frame': 14230 obs. of 4 variables:
## $ x : num 1782673 1782680 1782683 1782686 1782681 ...
## $ y : num 2402297 2402297 2402292 2402305 2402297 ...
## $ time: POSIXct, format: "2009-02-11 12:16:45" "2009-02-11 12:31:38" ...
## $ id : chr "M1" "M1" "M1" "M1" ...
First, let’s see how many individuals we have:
table(fisher$id)
##
## F1 F2 M1 M4
## 1349 3004 919 8958
Just four, two females and two males. We can now summarize these data per individual. I like
plyr
commands for this, i.e.ddply
combined withsummarize
:
require(plyr); require(magrittr)
duration.table <- fisher %>% ddply("id", summarize,
n.locs = length(x),
start = min(time) %>% as.Date,
end = max(time) %>% as.Date,
duration = difftime(end, start, units = "days") %>% round(1))
duration.table
## id n.locs start end duration
## 1 F1 1349 2011-02-11 2011-03-03 20 days
## 2 F2 3004 2010-12-16 2011-01-04 19 days
## 3 M1 919 2009-02-11 2009-03-04 21 days
## 4 M4 8958 2010-02-09 2010-03-31 50 days
That’s quite tidy! To get some of the details of the duty cycles, I’ll first compute the intervals by “id”, and then summarize:
dutycycle.table <- fisher %>% ddply("id", summarize,
dT = difftime(time[-1],time[-length(time)], units = "mins")) %>%
ddply("id", summarize,
dT.median = median(dT), dT.mean = mean(dT), dT.sd = sd(dT))
dutycycle.table
## id dT.median dT.mean dT.sd
## 1 F1 10.016625 mins 20.693027 mins 95.18721
## 2 F2 2.033350 mins 8.930941 mins 48.05628
## 3 M1 15.033350 mins 32.745370 mins 104.20486
## 4 M4 2.033333 mins 8.041761 mins 43.98099
Just from these numbers it is clear that there are some VERY large gaps in the data. How can I tell?
Report the following statistics:
Quick & easy with the table we had before, and the handy
summary
function:
summary(duration.table$n.locs)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 919 1242 2176 3558 4492 8958
Ok, technically the standard deviation is missing:
sd(duration.table$n.locs)
## [1] 3710.836
Not very informative with only 4 individuals. But still - basically excellent datasets
2. The median, mean, and standard deviation of the location frequency (aka "duty cycle") over all the individuals.
We shouldn’t use the summary table we put together in the previous section here, but ALL the raw time intervals:
dT.table <- fisher %>% ddply("id", summarize,
dT = difftime(time[-1],time[-length(time)], units = "mins"))
summary(dT.table$dT %>% as.numeric)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.100 1.967 2.100 11.022 9.283 1650.317
Ok - so MAINLY an ungodly 2 minutes between intervals. But also - at least in one place - a leap of 1650 minutes, or 27.5 hours. There will be holes!
Produce the following plots:
This question did not make very much sense. But technically, it is a single, uninteresting box with whiskers.
boxplot(duration.table$n.locs)
Beautiful!
Using the
ggplot
example from Ophélie’s notes:
require(ggplot2)
fisher %>% ddply("id", summarize, start = min(time), end = max(time)) %>%
ggplot(aes(xmin = start, xmax = end, y = id)) +
geom_linerange()
Ok - these animals were followed for very short, very intense periods over af five year period. We might get a slightly better idea if we facet by the animals and free the time-scales:
fisher %>% ddply("id", summarize, start = min(time), end = max(time)) %>%
ggplot(aes(xmin = start, xmax = end, y = id)) +
geom_linerange() + facet_wrap(.~id, ncol = 2, scales = "free_x")
Not an extremely useful plot! But mainly we can see that the studies occurred primarily in February, with one female tracked in December.
For these data, this is actually a bit tricky because the observation frequency is SO high. Here’s my version:
fisher %>%
ggplot(aes(time, y = id)) +
geom_point(cex = 0.5, alpha = 0.4) + facet_wrap(.~id, ncol = 2, scales = "free_x")
Lots of little gaps!
fisher.mr <- fisher %>% ddply("id", mutate, z= x+1i*y, step = c(NA, diff(z)),
dT = c(NA, difftime(time[-1], time[-length(time)], units = "min")) %>% as.numeric,
move.rate = Mod(step)/dT)
All of the movements here occurred in winter. So instead, we’ll break down by hour of day (also an interesting question):
require(lubridate)
fisher.mr %>% ggplot(aes(factor(hour(time)), move.rate)) +
geom_boxplot() + ylab("movement rate: m/min")
Even from these plots it is clear that the fishers aren´t so active in the afternoons…. HOWEVER, it is VERY VERY important to note the time zone of the data collection! Thankfully, that is stored in the time data:
tz(fisher$time)
## [1] "UTC"
Yikes! Albany, NY, is not in UTC. We need to change the time zone to local time, and then recreate the figure:
fisher$time <- with_tz(fisher$time, tzone = "America/New_York")
fisher.mr <- fisher %>% ddply("id", mutate, z= x+1i*y, step = c(NA, diff(z)),
dT = c(NA, difftime(time[-1], time[-length(time)], units = "min")) %>% as.numeric,
move.rate = Mod(step)/dT)
fisher.mr %>% ggplot(aes(factor(hour(time)), move.rate)) +
geom_boxplot() + ylab("movement rate: m/min")
There are some outlying points and an extremely typical right skew to the movement rate distributions (& also an evident outlier). The right skew is perhaps easiest to deal with with a log-scaled axis. Here is another version that does that, and also breaks the patterns down by individual:
fisher.mr %>% ggplot(aes(factor(lubridate::hour(time)), move.rate, fill = id, col = id)) +
geom_boxplot(alpha = 0.5) + ylab("movement rate: m/min") +
scale_y_log10() + facet_wrap(.~id)
Considerable inter-individual variation (welcome to animal behavior!) Essentially similar patterns, but extremely pronounced for F2. Note the wide range of movement rates: from well under 0.1 m / min to well over 20 m / min, with consistent means (when active) of around 10 m/min. Given the extremely high resolution of the data, there is a high likelihood that GPS data errors are artifically increasing speeds.
Here’s a nice comparison of patterns across individuals:
require(lubridate)
fisher.mr %>% mutate(timeofday = hour(time) + minute(time)/60) %>%
ggplot(aes(timeofday, move.rate, col = id)) +
ylab("movement rate: m/min") +
geom_smooth(method = "gam") +
scale_y_log10()
The
geom_smooth
function plops down a smoother (in this case “gam”) in a very naive way. In fact, the confidence intervals here are WAY too narrow because of autocorrelation in the movement data - something we’ll learn about more later in the class. But the overall pattern is clear. If you want to find an active fisher in the winter, you’re much better off just about any time between 6pm and 5am. But don´t expect to find anyone at noon.