10  Mixed models (with autocorrelation!)

Author

E. Gurarie

Figure 10.1: This painting is amazingly called Dash for Liberty, and reminds us that chicks are both highly individual, and capable of movement. (source: Wikimedia commons)

10.1 Some motivation

Way back in chapter Chapter 3, we argued that movement data was so complicated and fancy that we absolutely needed fancy statistical concepts just to wrap our modeling around the data. Somewhat dismissively, we contrasted movement data to the kind of experimental, design-based statistics that are usually summed up with ANOVA. And - in particular - we used an example of an experiment with the weights of chicks (Gallus gallus domesticus) as modeled against various diets. Very simple, controlled experimental design: A bunch of chicks, several diets, measure weights, apply ANOVA - and your simple questions, namely: what diet leads to larger chicks?, is immediately answered.

In fact, nothing is quite so simple. This chapter - which has nothing directly to do with spatial data or movement data - mainly serves to systematically illustrate via an example what mixed models (that combines fixed effects and random effects) actually are, and when and why we might choose to use them. It also illustrates a somewhat non-trivial, but very important, context for incorporating auto-correlation in a linear model. And all of that in the context of asking a very simple question, that emerges directly from a simple experimental design in a controlled experiment, namely: under what diet do chicks grow fastest?

10.2 Chick Weights

The data are located in all R installations, and called ChickWeight:

data("ChickWeight")
weight Time Chick Diet
42 0 1 1
51 2 1 1
59 4 1 1
64 6 1 1
76 8 1 1
93 10 1 1
106 12 1 1
125 14 1 1
149 16 1 1
171 18 1 1
199 20 1 1
205 21 1 1
40 0 2 1
49 2 2 1
58 4 2 1
72 6 2 1
84 8 2 1
103 10 2 1
122 12 2 1
138 14 2 1
162 16 2 1
187 18 2 1
209 20 2 1
215 21 2 1
43 0 3 1
39 2 3 1
55 4 3 1
67 6 3 1
84 8 3 1
99 10 3 1
115 12 3 1
138 14 3 1
163 16 3 1
187 18 3 1
198 20 3 1
202 21 3 1
42 0 4 1
49 2 4 1
56 4 4 1
67 6 4 1
74 8 4 1
87 10 4 1
102 12 4 1
108 14 4 1
136 16 4 1
154 18 4 1
160 20 4 1
157 21 4 1
41 0 5 1
42 2 5 1
48 4 5 1
60 6 5 1
79 8 5 1
106 10 5 1
141 12 5 1
164 14 5 1
197 16 5 1
199 18 5 1
220 20 5 1
223 21 5 1
41 0 6 1
49 2 6 1
59 4 6 1
74 6 6 1
97 8 6 1
124 10 6 1
141 12 6 1
148 14 6 1
155 16 6 1
160 18 6 1
160 20 6 1
157 21 6 1
41 0 7 1
49 2 7 1
57 4 7 1
71 6 7 1
89 8 7 1
112 10 7 1
146 12 7 1
174 14 7 1
218 16 7 1
250 18 7 1
288 20 7 1
305 21 7 1
42 0 8 1
50 2 8 1
61 4 8 1
71 6 8 1
84 8 8 1
93 10 8 1
110 12 8 1
116 14 8 1
126 16 8 1
134 18 8 1
125 20 8 1
42 0 9 1
51 2 9 1
59 4 9 1
68 6 9 1
85 8 9 1
96 10 9 1
90 12 9 1
92 14 9 1
93 16 9 1
100 18 9 1
100 20 9 1
98 21 9 1
41 0 10 1
44 2 10 1
52 4 10 1
63 6 10 1
74 8 10 1
81 10 10 1
89 12 10 1
96 14 10 1
101 16 10 1
112 18 10 1
120 20 10 1
124 21 10 1
43 0 11 1
51 2 11 1
63 4 11 1
84 6 11 1
112 8 11 1
139 10 11 1
168 12 11 1
177 14 11 1
182 16 11 1
184 18 11 1
181 20 11 1
175 21 11 1
41 0 12 1
49 2 12 1
56 4 12 1
62 6 12 1
72 8 12 1
88 10 12 1
119 12 12 1
135 14 12 1
162 16 12 1
185 18 12 1
195 20 12 1
205 21 12 1
41 0 13 1
48 2 13 1
53 4 13 1
60 6 13 1
65 8 13 1
67 10 13 1
71 12 13 1
70 14 13 1
71 16 13 1
81 18 13 1
91 20 13 1
96 21 13 1
41 0 14 1
49 2 14 1
62 4 14 1
79 6 14 1
101 8 14 1
128 10 14 1
164 12 14 1
192 14 14 1
227 16 14 1
248 18 14 1
259 20 14 1
266 21 14 1
41 0 15 1
49 2 15 1
56 4 15 1
64 6 15 1
68 8 15 1
68 10 15 1
67 12 15 1
68 14 15 1
41 0 16 1
45 2 16 1
49 4 16 1
51 6 16 1
57 8 16 1
51 10 16 1
54 12 16 1
42 0 17 1
51 2 17 1
61 4 17 1
72 6 17 1
83 8 17 1
89 10 17 1
98 12 17 1
103 14 17 1
113 16 17 1
123 18 17 1
133 20 17 1
142 21 17 1
39 0 18 1
35 2 18 1
43 0 19 1
48 2 19 1
55 4 19 1
62 6 19 1
65 8 19 1
71 10 19 1
82 12 19 1
88 14 19 1
106 16 19 1
120 18 19 1
144 20 19 1
157 21 19 1
41 0 20 1
47 2 20 1
54 4 20 1
58 6 20 1
65 8 20 1
73 10 20 1
77 12 20 1
89 14 20 1
98 16 20 1
107 18 20 1
115 20 20 1
117 21 20 1
40 0 21 2
50 2 21 2
62 4 21 2
86 6 21 2
125 8 21 2
163 10 21 2
217 12 21 2
240 14 21 2
275 16 21 2
307 18 21 2
318 20 21 2
331 21 21 2
41 0 22 2
55 2 22 2
64 4 22 2
77 6 22 2
90 8 22 2
95 10 22 2
108 12 22 2
111 14 22 2
131 16 22 2
148 18 22 2
164 20 22 2
167 21 22 2
43 0 23 2
52 2 23 2
61 4 23 2
73 6 23 2
90 8 23 2
103 10 23 2
127 12 23 2
135 14 23 2
145 16 23 2
163 18 23 2
170 20 23 2
175 21 23 2
42 0 24 2
52 2 24 2
58 4 24 2
74 6 24 2
66 8 24 2
68 10 24 2
70 12 24 2
71 14 24 2
72 16 24 2
72 18 24 2
76 20 24 2
74 21 24 2
40 0 25 2
49 2 25 2
62 4 25 2
78 6 25 2
102 8 25 2
124 10 25 2
146 12 25 2
164 14 25 2
197 16 25 2
231 18 25 2
259 20 25 2
265 21 25 2
42 0 26 2
48 2 26 2
57 4 26 2
74 6 26 2
93 8 26 2
114 10 26 2
136 12 26 2
147 14 26 2
169 16 26 2
205 18 26 2
236 20 26 2
251 21 26 2
39 0 27 2
46 2 27 2
58 4 27 2
73 6 27 2
87 8 27 2
100 10 27 2
115 12 27 2
123 14 27 2
144 16 27 2
163 18 27 2
185 20 27 2
192 21 27 2
39 0 28 2
46 2 28 2
58 4 28 2
73 6 28 2
92 8 28 2
114 10 28 2
145 12 28 2
156 14 28 2
184 16 28 2
207 18 28 2
212 20 28 2
233 21 28 2
39 0 29 2
48 2 29 2
59 4 29 2
74 6 29 2
87 8 29 2
106 10 29 2
134 12 29 2
150 14 29 2
187 16 29 2
230 18 29 2
279 20 29 2
309 21 29 2
42 0 30 2
48 2 30 2
59 4 30 2
72 6 30 2
85 8 30 2
98 10 30 2
115 12 30 2
122 14 30 2
143 16 30 2
151 18 30 2
157 20 30 2
150 21 30 2
42 0 31 3
53 2 31 3
62 4 31 3
73 6 31 3
85 8 31 3
102 10 31 3
123 12 31 3
138 14 31 3
170 16 31 3
204 18 31 3
235 20 31 3
256 21 31 3
41 0 32 3
49 2 32 3
65 4 32 3
82 6 32 3
107 8 32 3
129 10 32 3
159 12 32 3
179 14 32 3
221 16 32 3
263 18 32 3
291 20 32 3
305 21 32 3
39 0 33 3
50 2 33 3
63 4 33 3
77 6 33 3
96 8 33 3
111 10 33 3
137 12 33 3
144 14 33 3
151 16 33 3
146 18 33 3
156 20 33 3
147 21 33 3
41 0 34 3
49 2 34 3
63 4 34 3
85 6 34 3
107 8 34 3
134 10 34 3
164 12 34 3
186 14 34 3
235 16 34 3
294 18 34 3
327 20 34 3
341 21 34 3
41 0 35 3
53 2 35 3
64 4 35 3
87 6 35 3
123 8 35 3
158 10 35 3
201 12 35 3
238 14 35 3
287 16 35 3
332 18 35 3
361 20 35 3
373 21 35 3
39 0 36 3
48 2 36 3
61 4 36 3
76 6 36 3
98 8 36 3
116 10 36 3
145 12 36 3
166 14 36 3
198 16 36 3
227 18 36 3
225 20 36 3
220 21 36 3
41 0 37 3
48 2 37 3
56 4 37 3
68 6 37 3
80 8 37 3
83 10 37 3
103 12 37 3
112 14 37 3
135 16 37 3
157 18 37 3
169 20 37 3
178 21 37 3
41 0 38 3
49 2 38 3
61 4 38 3
74 6 38 3
98 8 38 3
109 10 38 3
128 12 38 3
154 14 38 3
192 16 38 3
232 18 38 3
280 20 38 3
290 21 38 3
42 0 39 3
50 2 39 3
61 4 39 3
78 6 39 3
89 8 39 3
109 10 39 3
130 12 39 3
146 14 39 3
170 16 39 3
214 18 39 3
250 20 39 3
272 21 39 3
41 0 40 3
55 2 40 3
66 4 40 3
79 6 40 3
101 8 40 3
120 10 40 3
154 12 40 3
182 14 40 3
215 16 40 3
262 18 40 3
295 20 40 3
321 21 40 3
42 0 41 4
51 2 41 4
66 4 41 4
85 6 41 4
103 8 41 4
124 10 41 4
155 12 41 4
153 14 41 4
175 16 41 4
184 18 41 4
199 20 41 4
204 21 41 4
42 0 42 4
49 2 42 4
63 4 42 4
84 6 42 4
103 8 42 4
126 10 42 4
160 12 42 4
174 14 42 4
204 16 42 4
234 18 42 4
269 20 42 4
281 21 42 4
42 0 43 4
55 2 43 4
69 4 43 4
96 6 43 4
131 8 43 4
157 10 43 4
184 12 43 4
188 14 43 4
197 16 43 4
198 18 43 4
199 20 43 4
200 21 43 4
42 0 44 4
51 2 44 4
65 4 44 4
86 6 44 4
103 8 44 4
118 10 44 4
127 12 44 4
138 14 44 4
145 16 44 4
146 18 44 4
41 0 45 4
50 2 45 4
61 4 45 4
78 6 45 4
98 8 45 4
117 10 45 4
135 12 45 4
141 14 45 4
147 16 45 4
174 18 45 4
197 20 45 4
196 21 45 4
40 0 46 4
52 2 46 4
62 4 46 4
82 6 46 4
101 8 46 4
120 10 46 4
144 12 46 4
156 14 46 4
173 16 46 4
210 18 46 4
231 20 46 4
238 21 46 4
41 0 47 4
53 2 47 4
66 4 47 4
79 6 47 4
100 8 47 4
123 10 47 4
148 12 47 4
157 14 47 4
168 16 47 4
185 18 47 4
210 20 47 4
205 21 47 4
39 0 48 4
50 2 48 4
62 4 48 4
80 6 48 4
104 8 48 4
125 10 48 4
154 12 48 4
170 14 48 4
222 16 48 4
261 18 48 4
303 20 48 4
322 21 48 4
40 0 49 4
53 2 49 4
64 4 49 4
85 6 49 4
108 8 49 4
128 10 49 4
152 12 49 4
166 14 49 4
184 16 49 4
203 18 49 4
233 20 49 4
237 21 49 4
41 0 50 4
54 2 50 4
67 4 50 4
84 6 50 4
105 8 50 4
122 10 50 4
155 12 50 4
175 14 50 4
205 16 50 4
234 18 50 4
264 20 50 4
264 21 50 4

There are just three variables here: Four diet treatments (1-4)1, a bunch of chicks (48) and weight measurements over 20 days of feeding.

  • 1 The actual identities of the chick diets are lost in the mists of used and re-use as a classic statistics example.

  • For data like these (but not all data!) ggplot is very handy:

    require(ggplot2)
    ggplot(ChickWeight, aes(x = Time, y = weight, col=Chick)) + geom_line ()+ geom_point() + facet_wrap(~Diet)

    Clearly, there is variability among individuals, and among treatments, in the slope of growth.

    10.3 Some linear models

    10.3.1 the simplest

    Here’s a linear model with main and interaction effects of time and diet: \[Y_{ij} = (\beta + \gamma_j) \, T_i + \epsilon_{ij}\]

    In this formulation, intercept is 0, but the slope has a mean (\(\beta\)) and a diet specific effect (\(\gamma_j\)).

    Chick.lm <- lm(weight ~ Time:Diet - 1, data = ChickWeight)
    summary(Chick.lm)$coef
                Estimate Std. Error  t value      Pr(>|t|)
    Time:Diet1  8.929575  0.2010880 44.40631 8.087458e-188
    Time:Diet2 10.502625  0.2640756 39.77128 4.507463e-167
    Time:Diet3 12.629732  0.2640756 47.82620 2.201654e-202
    Time:Diet4 11.774316  0.2698661 43.63022 1.996372e-184

    Note that the “-1” coerced all of the intercepts to 0. Also, all the effects are significant: Weight increases with Time on average at rate 6.85 g/day, some diets lead to a higher overall mean.

    Pop quiz: What’s the difference between the following models? Which one matches the model above?

    Chick.lm1 <- lm(weight ~ Time * Diet, data = ChickWeight)
    Chick.lm2 <- lm(weight ~ Time + Diet, data = ChickWeight)
    Chick.lm3 <- lm(weight ~ Time : Diet, data = ChickWeight)

    Visualizing the regression lines

    ggplot(ChickWeight, aes(x = Time, y = weight, col=factor(Chick))) + 
      geom_line ()+ geom_point() + facet_wrap(~Diet) + 
      geom_line(data = cbind(ChickWeight, y.hat = predict(Chick.lm)), 
                aes(x = Time, y = y.hat), col="darkred", lwd = 1.5) + 
      theme(legend.position="none")

    We will be plotting this kind of prediction plot alot, so let’s make a function that makes it quicker. It will be function of a model fit:

    plotFit <- function(ChickModel, ...)
    ggplot(ChickWeight, aes(x = Time, y = weight, col=factor(Chick))) +
      geom_point(alpha = 0.5) + facet_wrap(~Diet) + 
           geom_line(data = cbind(ChickWeight, y.hat = predict(ChickModel)), 
                     aes(x = Time, y = y.hat), ...) + theme(legend.position="none")
    anova(Chick.lm)
    Analysis of Variance Table
    
    Response: weight
               Df  Sum Sq Mean Sq  F value    Pr(>F)    
    Time        1 2042344 2042344 1759.757 < 2.2e-16 ***
    Diet        3  129876   43292   37.302 < 2.2e-16 ***
    Time:Diet   3   80804   26935   23.208 3.474e-14 ***
    Residuals 570  661532    1161                       
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    ANOVA tells us everything is significant … but we need to check diagnostic plots.

    plot(Chick.lm, 1:2)

    Both heteroskedasticity and kurtosis2 in the residuals are big problems. The main reason for that is that each individual bird is quite different from the others (see figure Figure 10.1).

  • 2 Knock your socks off statsy vocab words!

  • 10.3.2 Adding individual chicks

    One option to deal with this is to simple include “individual bird” in the model:

    Chick.lm2 <- lm(weight ~ Time : Diet + Chick, data = ChickWeight)
    anova(Chick.lm2)
    Analysis of Variance Table
    
    Response: weight
               Df  Sum Sq Mean Sq F value    Pr(>F)    
    Chick      49  530105   10818  16.806 < 2.2e-16 ***
    Time:Diet   4 2047136  511784 795.029 < 2.2e-16 ***
    Residuals 524  337315     644                      
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    plotFit(Chick.lm2)

    plot(Chick.lm, 1:2)

    We’ve absorbed a lot of the variability, but there’s still strong structure to the residuals.

    10.3.3 Individual slopes (and getting curvy)

    Somewhat better would be to let each individual have a unique slope as well. Let’s also not be naive, and a second order polynomial term to the model - because, you know, growth is not linear and all that:

    Chick.lm3 <- lm(weight ~ (Diet * poly(Time,2)) +  Time : (Chick-1), data = ChickWeight)
    anova(Chick.lm3)
    Analysis of Variance Table
    
    Response: weight
                        Df  Sum Sq Mean Sq   F value    Pr(>F)    
    Diet                 4 8733214 2183303 13985.650 < 2.2e-16 ***
    poly(Time, 2)        2 2038109 1019055  6527.788 < 2.2e-16 ***
    Diet:poly(Time, 2)   6   84264   14044    89.962 < 2.2e-16 ***
    Time:Chick          46  555143   12068    77.306 < 2.2e-16 ***
    Residuals          520   81177     156                        
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    plotFit(Chick.lm3)

    Again, checking our residual diagnostics:

    plot(Chick.lm3, 1:2)

    Well - this model is begginning to look mighty fine! Check out that astronomical R^2 (summary(Chick.lm3)$r.squared = 0.9929361) and decent looking residuals. The big problem, however, is that this has cost us ENORMOUS degrees of freedom! We’ve estimated length(Chick.lm3$coef) = 62 coefficients.

    10.3.4 Quick comment on “updating”

    Before we fix this problem, a quick comment on a “smarter” workflow for iteratively building models using the update function. This function basically takes your model and adds or removes features as desired.

    For example, the Chick.lm1 model above was given by:

    Chick.lm1$call
    lm(formula = weight ~ Time * Diet, data = ChickWeight)

    Whereas:

    Chick.lm2$call
    lm(formula = weight ~ Time:Diet + Chick, data = ChickWeight)
    Chick.lm3$call
    lm(formula = weight ~ (Diet * poly(Time, 2)) + Time:(Chick - 
        1), data = ChickWeight)

    You can generate the second model by updating the first one:

    Chick.lm2b <- update(Chick.lm1, . ~ . - 1 - Time - Diet + Chick)
    Chick.lm2b$call
    lm(formula = weight ~ Chick + Time:Diet - 1, data = ChickWeight)

    As typical in R formula notation, the ‘.’ means whatever was there, the minuses mean remove the variable, and plus means add the variable.

    Generate the third model by updating the second one:

    Chick.lm3b <- update(Chick.lm2b, . ~ . + poly(Time, 2))

    The update command lets you add additional structural elements, like correlations and random effects (as below), change the response variable, or change the data - without rewriting the entire model call.

    10.4 Mixed effects models

    The model fitted above looks something like this: \[Y_{ijk} = \alpha_i + (\beta_i + \kappa_j) T_{ijk} + \gamma_j T^2_{ijk} + \epsilon_{ij}\]

    Where \(i \in \{1,2,3,4\}\) refers to diet, \(j \in {1,2, ..., 48}\) refers to individual chicken (each of which has a unique slope - but not intercept) and \(k\) to replicates.

    In fact, we don’t ACTUALLY care about individual birds! Wouldn’t it be great if we could take all of those \(\kappa_j\)’s and model them as a random variable? I.e.: \(\kappa_j \sim {\cal N}(\mu_\kappa, \sigma_\kappa)\)? Out of curiosity, lets look at all of the coefficients for the individual chicks:

    require(plyr) # for mutate
    betas <- data.frame(summary(Chick.lm3)$coef)
    betas <- mutate(betas, name = row.names(betas), 
                    beta = Estimate, lo  = beta - 2*Std..Error, hi = beta + 2*Std..Error)
    betas <- subset(betas, grepl("Chick", name))
    ggplot(betas, aes(y=beta, x=name, ymin=lo, ymax=hi)) + geom_pointrange() + coord_flip()

    That is exactly what a mixed effects model does! It’s called “mixed” because there are “fixed effects” (here, the diet, and the time are fixed effects, and the individual chick will be “random”).

    10.4.1 Mixed effect notation in general

    Formula:

    \[{\bf y}_i = {\bf X}_i\beta + Z_i{\bf b}_i + \epsilon_i; \,\,\, i = 1, ..., M\] \[{\bf b}_i = {\cal N}(0, \Phi); \,\,\, \epsilon_i \sim {\cal N}(0, \sigma^2 {\bf I})\]

    where \(\beta\) is the (p-dimensional) vector of fixed effects \({\bf b}_i\) is the q-dimensional vector of random effects, \({\bf X}_i\) (\(n \times p\)) is a the known fixed effect regressor matrix, and \({\bf Z}_i\) (\(n \times q\)) is the random effects data matrix.

    10.4.2 Applied to the chicks

    The nlme package allows you to fit mixed effects models. So does lme4 - which is in some ways faster and more modern, but does NOT model heteroskedasticity or (!spoiler alert!) autocorrelation.

    Let’s try a model that looks just like our best model above, but rather than have a unique Time slope for each chick, it will be a random variable. Thus: \[Y_{ijk} = \alpha_i + (\beta_i + K) T_{ijk} + \gamma_i T^2_{ijk} + \epsilon_{ij}\]

    Where the slope on Time is given by the random parameter K, which has some distribution: \(K \sim {\cal N}(0, \sigma_k^2)\)

    require(nlme)
    Chick.lme1 <- lme(fixed = weight ~ Diet * poly(Time,2), 
                     random =  ~ (Time-1)|Chick, data = ChickWeight, method = "ML")

    The summary output of this function is massive … below a somewhat truncated version:

    ## summary(Chick.lme1)
    Linear mixed-effects model fit by maximum likelihood
      Data: ChickWeight 
          AIC      BIC   logLik
      4789.38 4850.414 -2380.69
    
    Random effects:
     Formula: ~(Time - 1) | Chick
               Time Residual
    StdDev: 2.46184  12.3999
    
    Fixed effects:  weight ~ Diet * poly(Time, 2) 
                             Value Std.Error  DF   t-value p-value
    (Intercept)           101.2062   6.15721 520 16.437042  0.0000
    Diet2                  19.6553  10.50328  46  1.871350  0.0677
    Diet3                  39.3727  10.50328  46  3.748609  0.0005
    Diet4                  33.0455  10.50499  46  3.145699  0.0029
    poly(Time, 2)1       1033.6519  95.04407 520 10.875501  0.0000
    poly(Time, 2)2         70.4572  20.65100 520  3.411808  0.0007
    Diet2:poly(Time, 2)1  360.4877 161.54458 520  2.231506  0.0261
    Diet3:poly(Time, 2)1  812.8318 161.54458 520  5.031626  0.0000
    Diet4:poly(Time, 2)1  519.1017 161.67971 520  3.210679  0.0014
    Diet2:poly(Time, 2)2   52.6077  34.38622 520  1.529907  0.1266
    Diet3:poly(Time, 2)2  209.4174  34.38622 520  6.090154  0.0000
    Diet4:poly(Time, 2)2   -6.3517  34.92972 520 -0.181842  0.8558

    The fixed effects estimates should be similar as in the linear model, but here we also have a standard deviation (2.46) around the time slopes.

    10.4.3 Plotting Mixed-Effects fits and diagnostics

    Plot the fit identically as above:

    plotFit(Chick.lme1)

    There is a complicated set of plotting functions for lme objects. For example, you can just plot the residuals against fitted values:

    plot(Chick.lme1)

    plot(Chick.lme1, resid(., type = "p") ~ fitted(.) | Diet, abline = 0)

    Or observed weights versus fitted weights by Diet:

    plot(Chick.lme1, weight ~ fitted(.) | Diet, abline = c(0,1), cex=0.5)

    Or - just the residuals of each chick:

    plot(Chick.lme1, Chick ~ resid(.))

    10.4.4 AUTOCORRELATION WARNING!

    acf(Chick.lm3$residuals)

    Oh no!

    10.4.5 LME with serial autocorrelation

    The following model adds 1st order autocorrelation to a 2nd order time model with a unique diet effect:

    Chick.lme2 <- lme(fixed = weight ~ Diet + poly(Time,2)-1, 
                     random =  ~ (Time-1)|Chick, data = ChickWeight,
                     correlation = corAR1())

    Notice that the random grouping term is Time - 1 | Chick, i.e. the slopes for each chick is unique and modeled as a random variable, but the intercepts are always 0. Furthermore the intercept of the main effect variable is also set to 0 (via the -1). This makes sense biologically, but it is also the only way I could make a model with a serial autocorrelation converge.

    This is - anyway - an excellent model. The summary:

    ## summary(Chick.lme2)
    Linear mixed-effects model fit by REML
      Data: ChickWeight 
           AIC      BIC    logLik
      4208.346 4247.488 -2095.173
    
    Random effects:
     Formula: ~(Time - 1) | Chick
                Time Residual
    StdDev: 3.291071 15.73268
    
    Correlation Structure: AR(1)
     Formula: ~1 | Chick 
     Parameter estimate(s):
         Phi 
    0.869144 
    Fixed effects:  weight ~ Diet + poly(Time, 2) - 1 
                       Value Std.Error  DF   t-value p-value
    Diet1           119.1190   5.93250  46 20.079049       0
    Diet2           118.7396   6.83475  46 17.372931       0
    Diet3           116.6206   6.83475  46 17.062892       0
    Diet4           120.8724   6.83473  46 17.685023       0
    poly(Time, 2)1 1327.5352  79.13447 527 16.775690       0
    poly(Time, 2)2  128.1690  14.11702 527  9.079045       0

    reveals the additional estimate of the serial autocorrelation coefficient: 0.87.

    10.4.6 So - is diet a significant covariate?

    This is a “simple” question that is not, actually, that instant to answer, since it is difficult to interpret the output of the summary.

    The best solution is just to use AIC, comparing a model with and without diet (and, say, with and without autocorrelation), e.g.:

    Chick.lme <- lme.formula(fixed = weight ~ Diet * poly(Time, 2), data = ChickWeight, random = ~(Time - 1) | Chick, method = "ML")
    
    Chick.lmes <- list(ChickModel.Diet = Chick.lme, 
         ChickModel.NoDiet = update(Chick.lme, fixed = .~.-Diet:poly(Time,2)),
         ChickModel.Diet.AutoCorr = update(Chick.lme, correlation = corAR1()),
         ChickModel.NoDiet.AutoCorr = update(Chick.lme, fixed = .~.-Diet:poly(Time,2), correlation = corAR1()))
    
    ldply(Chick.lmes, AIC)
                             .id       V1
    1            ChickModel.Diet 4789.380
    2          ChickModel.NoDiet 4842.816
    3   ChickModel.Diet.AutoCorr 4206.544
    4 ChickModel.NoDiet.AutoCorr 4245.153

    If this “AIC table” is unsatisfactory, there’s a nice function called aictab in the package AICcmodavg:

    require(AICcmodavg)
    aictab(Chick.lmes)
    
    Model selection based on AICc:
    
                                K    AICc Delta_AICc AICcWt Cum.Wt       LL
    ChickModel.Diet.AutoCorr   15 4207.40       0.00      1      1 -2088.27
    ChickModel.NoDiet.AutoCorr  9 4245.47      38.07      0      1 -2113.58
    ChickModel.Diet            14 4790.13     582.73      0      1 -2380.69
    ChickModel.NoDiet           8 4843.07     635.67      0      1 -2413.41

    This is the commonly encountered delta AIC” table which sorts the models by increasing information criterion, and provides useful ancillary summaries. These tables are often reported in journal articles to provide support for model slection. Note that the function automatically chose the AICc (corrected AIC) as a criterion, which has better statistical properties for smaller sample sizes and lots of degrees of freedom. But the ultimate conclusion is the same.

    10.5 Exercise: Does Diet Matter?

    1. Summarize the conclusion of this analysis.
    2. How many parameters are estimated in the “best” model? Report all of their values.
    3. Expand the model selection to include models without the second order Diet term. Are any of those “better”.
    4. Expand the model selection to include corresponding linear models, with individual chick fixed effects. How do they compare?