class: center, middle, white, title-slide .title[ # How to model just about anything
(but especially habitat) Part II ] .subtitle[ ## EFB 390: Wildlife Ecology and Management ] .author[ ### Dr. Elie Gurarie ] .date[ ### October 3, 2024 ] --- <!-- https://bookdown.org/yihui/rmarkdown/xaringan-format.html --> .pull-left-70[ # Linear modeling ... is a very general method to quantifying relationships among variables. .pull-left[ ![](Lecture_Modeling_PartII_files/figure-html/unnamed-chunk-3-1.png)<!-- --> ] .pull-right[ `\(X_i\)` - is called: - covariate - independent variable - explanatory variable `\(Y_i\)` - is the property we are interested in modeling: - response variable - dependent variable ## Goals: - Fitting a model (usually with Maximum Likelihood) - Selecting a model (often with AIC) ] ] .pull-right-30[ ![](images/pups_small.jpg) Steller sea lion (*Eumatopias jubatus*) pups. ] --- # Models of Sea Lion Weight .pull-left-70[ ### Null (linear) model ![](Lecture_Modeling_PartII_files/figure-html/unnamed-chunk-4-1.png)<!-- --> ``` r mean(pups$Weight) ``` ``` ## [1] 33.51004 ``` ``` r sd(pups$Weight) ``` ``` ## [1] 5.661695 ``` ] .pull-right-30[ ![](images/pups_small.jpg) This suggests a model! `$$W \sim {\cal N}(\mu = 33\,kg, \sigma = 5.7)$$` With no covariates. ] --- .pull-left-70[ # Simple linear model *Probably* there is a relationship between length and weight. The simplest relationship is linear. `$$\large Y \sim { N (\text{mean} = \beta_0 + \beta_1 X,\,\, \text{sd} = \sigma)}$$` ![](Lecture_Modeling_PartII_files/figure-html/unnamed-chunk-6-1.png)<!-- --> ] .pull-right-30[ ![](images/pups_small.jpg) Steller sea lion (*Eumatopias jubatus*) pups. ] --- .pull-left[ ## Deterministic model: `$$Y_i = \beta_0 + \beta_1 X_i$$` - `\(\beta_0\)` - intercept - `\(\beta_1\)` - slope This is the **functional form of the predictor** ] .pull-right[ ## Statistical model: **Version 1:** `$$Y_i = \beta_0 + \beta_1 X_i + \epsilon_i$$` where `\(\epsilon_i \sim {\cal N}(0, \sigma)\)` or **Version 2:** `$$Y_i \sim {\cal N}(\beta_0 + \beta_1 X_i, \sigma)$$` ] V2 is better because it is more transparent about the number of parameters! - Two (intercept | slope) are part of the **functional form** - One (residual standard deviation) is part of the **random component**. --- ## But other variables might influence pup size .pull-left-30[ Lots of competing models with different **main** and **interaction** effects. ] .pull-right-70[ ![](Lecture_Modeling_PartII_files/figure-html/unnamed-chunk-9-1.png)<!-- --> ] --- .pull-left-60.small[ ## **Model Selection** <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Model </th> <th style="text-align:right;"> k </th> <th style="text-align:right;"> R2 </th> <th style="text-align:right;"> logLik </th> <th style="text-align:right;"> AIC </th> <th style="text-align:right;"> dAIC </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;background-color: yellow !important;"> Weight ~ Length * Sex + Island </td> <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 8 </td> <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 0.818 </td> <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> -1144.6 </td> <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 2307.3 </td> <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 0.0 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ Length * Sex * Island </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 0.824 </td> <td style="text-align:right;"> -1137.1 </td> <td style="text-align:right;"> 2316.1 </td> <td style="text-align:right;"> 8.8 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ Length + Sex + Island </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 0.811 </td> <td style="text-align:right;"> -1155.0 </td> <td style="text-align:right;"> 2325.9 </td> <td style="text-align:right;"> 18.6 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ Length * Sex </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 0.803 </td> <td style="text-align:right;"> -1164.5 </td> <td style="text-align:right;"> 2339.0 </td> <td style="text-align:right;"> 31.7 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ Length + Sex </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 0.795 </td> <td style="text-align:right;"> -1174.5 </td> <td style="text-align:right;"> 2357.0 </td> <td style="text-align:right;"> 49.7 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ Length </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 0.779 </td> <td style="text-align:right;"> -1193.4 </td> <td style="text-align:right;"> 2392.8 </td> <td style="text-align:right;"> 85.5 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ Sex </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 0.293 </td> <td style="text-align:right;"> -1483.2 </td> <td style="text-align:right;"> 2972.4 </td> <td style="text-align:right;"> 665.1 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ Island </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 0.028 </td> <td style="text-align:right;"> -1562.5 </td> <td style="text-align:right;"> 3137.0 </td> <td style="text-align:right;"> 829.7 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0.000 </td> <td style="text-align:right;"> -1569.5 </td> <td style="text-align:right;"> 3143.1 </td> <td style="text-align:right;"> 835.8 </td> </tr> </tbody> </table> > More or less what we expect ... the interaction between **sex** and **length** is consistent across islands, but there are some main effect differences across islands (mainly because of the time we sampled). ] -- .pull-right-40.small[ ### Components of a `\(\Delta\)`AIC table **Degrees of freedom *k*:** Number of estimated parameters. Measure of *complexity*. **Coefficient of determination R<sup>2</sup>:** Percent variation explained. It ALWAYS increases the more complex the model.<br> Always zero for the **NULL** model. **log-likelihood `\(\log({\cal L})\)`:** Total probability score of model. It ALWAYS increases the more complex the model. **Akaike Information Criterion:** - `\(AIC = -2 \log({\cal L}) + 2\,k\)` - Smaller is better - Grows if **model fit is bad** - Grows if **model complexity is too high**. - .red[**The lowest AIC value is the "best" model.**] - (but within 2 `\(\Delta AIC\)` is pretty much equivalent to best) ] --- ## Model selection vs. parameter estimates The best model: `$$Y_{ijk} = \beta_{sex} + \beta_{island} \text{Island}_{ijk} + (\beta_{length} \times\text{Length}_{ijk}) + \epsilon_{ijk}$$` .pull-left[ What are the **parameter estimates** (effect sizes) of the selected model? .small[ <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> SexFemale </td> <td style="text-align:right;"> -39.34 </td> <td style="text-align:right;"> 3.69 </td> <td style="text-align:right;"> -10.67 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> SexMale </td> <td style="text-align:right;"> -59.29 </td> <td style="text-align:right;"> 3.06 </td> <td style="text-align:right;"> -19.37 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> Length </td> <td style="text-align:right;"> 0.66 </td> <td style="text-align:right;"> 0.03 </td> <td style="text-align:right;"> 19.11 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> IslandChirpoev </td> <td style="text-align:right;"> -2.00 </td> <td style="text-align:right;"> 0.34 </td> <td style="text-align:right;"> -5.81 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> IslandLovushki </td> <td style="text-align:right;"> -0.47 </td> <td style="text-align:right;"> 0.34 </td> <td style="text-align:right;"> -1.35 </td> <td style="text-align:right;"> 0.18 </td> </tr> <tr> <td style="text-align:left;"> IslandRaykoke </td> <td style="text-align:right;"> -0.45 </td> <td style="text-align:right;"> 0.35 </td> <td style="text-align:right;"> -1.31 </td> <td style="text-align:right;"> 0.19 </td> </tr> <tr> <td style="text-align:left;"> IslandSrednova </td> <td style="text-align:right;"> -0.35 </td> <td style="text-align:right;"> 0.35 </td> <td style="text-align:right;"> -1.00 </td> <td style="text-align:right;"> 0.32 </td> </tr> <tr> <td style="text-align:left;"> SexMale:Length </td> <td style="text-align:right;"> 0.20 </td> <td style="text-align:right;"> 0.04 </td> <td style="text-align:right;"> 4.55 </td> <td style="text-align:right;"> 0.00 </td> </tr> </tbody> </table> ]] .pull-right[ ![](Lecture_Modeling_PartII_files/figure-html/unnamed-chunk-12-1.png)<!-- --> ] --- background-image: url('images/AfricanUngulatesBiomass.png') background-size: cover ## AIC in action: **What predicts ungulate body size?** Quality (Nitrogen)? or Type (browse/grass)? --- background-image: url('images/GurarieSpringMigrations0.png') background-size: cover .pull-left-60[ <!-- <video width="100%" controls="controls"> --> <iframe src="https://esf0-my.sharepoint.com/personal/egurarie_esf_edu/_layouts/15/embed.aspx?UniqueId=75126850-958b-4f5a-8e01-69137cacd16a&embed=%7B%22ust%22%3Atrue%2C%22hv%22%3A%22CopyEmbedCode%22%7D&referrer=StreamWebApp&referrerScenario=EmbedDialog.Create" width="640" height="360" frameborder="0" scrolling="no" allowfullscreen title="migrationanimation6.mp4"></iframe> </video> ] .pull-right-40[ ## Caribou spring migrations Remarkable temporal synchrony at a continental scale. ] --- background-image: url('images/GurarieSpringMigrations.png') background-size: cover ## Could the synchrony be driven by global weather drivers? .pull-right-70[ Pacific Decadal Oscillation, Arctic Oscillation, North Atlantic Oscillation: determine whether the winter is wet & snowy or dry & cold. ] --- background-image: url('images/GurarieSpringMigrations2.png') background-size: cover ## `\(\Delta\)`AIC Table 1: **Departure time** .pull-right[... driven by LARGE climate oscillations.] --- background-image: url('images/GurarieSpringMigrations3.png') background-size: cover ## `\(\Delta\)`AIC Table 2: **Arrival time** .pull-right[... completely independent of climate!] --- class: small .pull-left[ ## **Generalized** linear modeling ### Normal Model .large[$$Y_i \sim {\cal Normal}(\alpha_0 + \beta_1 X_i, \sigma)$$] Models continuous data with a "normal-like" distribution. ![](Lecture_Modeling_PartII_files/figure-html/unnamed-chunk-13-1.png)<!-- --> ] -- .pull-right[ ### Binomial model .large[$$Y_i \sim {\cal Bernoulli}\left( \frac{\exp(\alpha + \beta X_i)}{1 + \exp(\alpha + \beta X_i)} \right)$$] There's some *probability* of something happening that depends on the predictor `\(X\)`. **Bernoulli** just means the data are all 0 or 1. ![](Lecture_Modeling_PartII_files/figure-html/unnamed-chunk-14-1.png)<!-- --> This models **presence/absence**, **dead/alive**, **male/female** other response variables with **2** possible outcomes. ] --- background-image: url('images/SoleaSolea.png') background-size: cover ### What factors predict occurence of *Solea solea* larvae? Sampled in the estuary of the Tejo river in Portugal - Lots of environmental factors in data .pull-right-70[.footnotesize[ <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> depth </th> <th style="text-align:right;"> temp </th> <th style="text-align:right;"> salinity </th> <th style="text-align:right;"> transp </th> <th style="text-align:right;"> gravel </th> <th style="text-align:right;"> large_sand </th> <th style="text-align:right;"> fine_sand </th> <th style="text-align:right;"> mud </th> <th style="text-align:right;"> presence </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 3.0 </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 30 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 3.74 </td> <td style="text-align:right;"> 13.15 </td> <td style="text-align:right;"> 11.93 </td> <td style="text-align:right;"> 71.18 </td> <td style="text-align:right;font-weight: bold;"> 0 </td> </tr> <tr> <td style="text-align:right;"> 2.6 </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 1.94 </td> <td style="text-align:right;"> 4.99 </td> <td style="text-align:right;"> 5.43 </td> <td style="text-align:right;"> 87.63 </td> <td style="text-align:right;font-weight: bold;"> 0 </td> </tr> <tr> <td style="text-align:right;"> 2.6 </td> <td style="text-align:right;"> 19 </td> <td style="text-align:right;"> 30 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 2.88 </td> <td style="text-align:right;"> 8.98 </td> <td style="text-align:right;"> 16.85 </td> <td style="text-align:right;"> 71.29 </td> <td style="text-align:right;font-weight: bold;"> 1 </td> </tr> <tr> <td style="text-align:right;"> 2.1 </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 11.06 </td> <td style="text-align:right;"> 11.96 </td> <td style="text-align:right;"> 21.95 </td> <td style="text-align:right;"> 55.03 </td> <td style="text-align:right;font-weight: bold;"> 0 </td> </tr> <tr> <td style="text-align:right;"> 3.2 </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 30 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 9.87 </td> <td style="text-align:right;"> 28.60 </td> <td style="text-align:right;"> 19.49 </td> <td style="text-align:right;"> 42.04 </td> <td style="text-align:right;font-weight: bold;"> 0 </td> </tr> <tr> <td style="text-align:right;"> 3.5 </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 32 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 32.45 </td> <td style="text-align:right;"> 7.39 </td> <td style="text-align:right;"> 9.43 </td> <td style="text-align:right;"> 50.72 </td> <td style="text-align:right;font-weight: bold;"> 0 </td> </tr> </tbody> </table> ]] --- # Presence of *Solea solea* against **salinity** .pull-left-40[ ![](Lecture_Modeling_PartII_files/figure-html/unnamed-chunk-16-1.png)<!-- --> ] .pull-right-60[ Modeling is EXACTLY the same as **linear regression** except: - `glm` - for **generalized** linear model (instead of `lm`) - `family = 'binomial'` is the instruction to fit the logistic regression `glm(presence ~ salinity, family ='binomial')` <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Estimate </th> <th style="text-align:right;"> Std. Error </th> <th style="text-align:right;"> z value </th> <th style="text-align:right;"> Pr(>|z|) </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 2.661 </td> <td style="text-align:right;"> 0.902 </td> <td style="text-align:right;"> 2.951 </td> <td style="text-align:right;"> 0.003 </td> </tr> <tr> <td style="text-align:left;"> salinity </td> <td style="text-align:right;"> -0.130 </td> <td style="text-align:right;"> 0.035 </td> <td style="text-align:right;"> -3.716 </td> <td style="text-align:right;"> 0.000 </td> </tr> </tbody> </table> Clearly - *Solea solea* presence is very significantly *negatively* related to salinity. ] --- ### Out of this model we can make predictions <img src="Lecture_Modeling_PartII_files/figure-html/unnamed-chunk-19-1.png" width="80%" /> --- .pull-left[ ## `\(\Delta\)`AIC analysis - and coefficients <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> Model </th> <th style="text-align:right;"> k </th> <th style="text-align:right;"> logLik </th> <th style="text-align:right;"> AIC </th> <th style="text-align:right;"> dAIC </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> M9 </td> <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> salinity + gravel </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> 3 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> -33.2 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> 72.5 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> 0.0 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> M2 </td> <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> salinity </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 2 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> -34.3 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 72.6 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 0.1 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> M7 </td> <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> temp + salinity </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 3 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> -34.0 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 74.0 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 1.5 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> M5 </td> <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> depth + salinity </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 3 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> -34.1 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 74.3 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 1.8 </td> </tr> <tr> <td style="text-align:left;"> M11 </td> <td style="text-align:left;"> depth + temp + salinity </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> -33.9 </td> <td style="text-align:right;"> 75.8 </td> <td style="text-align:right;"> 3.3 </td> </tr> <tr> <td style="text-align:left;"> M0 </td> <td style="text-align:left;"> depth </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> -38.1 </td> <td style="text-align:right;"> 80.1 </td> <td style="text-align:right;"> 7.6 </td> </tr> <tr> <td style="text-align:left;"> M4 </td> <td style="text-align:left;"> depth + temp </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> -38.0 </td> <td style="text-align:right;"> 81.9 </td> <td style="text-align:right;"> 9.4 </td> </tr> <tr> <td style="text-align:left;"> M6 </td> <td style="text-align:left;"> depth + gravel </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> -38.0 </td> <td style="text-align:right;"> 82.0 </td> <td style="text-align:right;"> 9.5 </td> </tr> <tr> <td style="text-align:left;"> M10 </td> <td style="text-align:left;"> depth + temp + gravel </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> -37.8 </td> <td style="text-align:right;"> 83.7 </td> <td style="text-align:right;"> 11.2 </td> </tr> <tr> <td style="text-align:left;"> M1 </td> <td style="text-align:left;"> temp </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> -43.3 </td> <td style="text-align:right;"> 90.6 </td> <td style="text-align:right;"> 18.1 </td> </tr> <tr> <td style="text-align:left;"> M3 </td> <td style="text-align:left;"> gravel </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> -43.7 </td> <td style="text-align:right;"> 91.3 </td> <td style="text-align:right;"> 18.8 </td> </tr> <tr> <td style="text-align:left;"> M8 </td> <td style="text-align:left;"> temp + gravel </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> -43.3 </td> <td style="text-align:right;"> 92.6 </td> <td style="text-align:right;"> 20.1 </td> </tr> </tbody> </table> ] .pull-right[ **Salinity** clearly among the more important covariates (in the top 4 models). ![](Lecture_Modeling_PartII_files/figure-html/unnamed-chunk-22-1.png)<!-- --> ] --- class: inverse .pull-left-20[ ### We interrupt your regularly scheduled lecture for a special guest presentation... ] .pull-right-80[![](PCH_RSF.png)] --- # **Poisson regression** .pull-left-30[ ![](images/Poisson1.png) ![](images/Poisson4.png) ![](images/Poisson10.png) ] .pull-right-70[ .large[$$Y_i \sim {\cal Poisson}\left(\lambda = \exp(\alpha + \beta X_i) \right)$$] - We are **counting** something ... the data are between 0 and `\(\infty\)` - `\(\lambda\)` is a **density**; **densities** vary across habitat types (covariate **X**). .center[<img src='images/Moose2.png' width='70%'/>] ] --- ## Field flags .large[**Did flag densities vary with region?**] .pull-left-40[ Approximate areas: region | area --|-- **North:** | 82 m<sup>2</sup> **South:** | 82 m<sup>2</sup> **Perimeter:** | 196 m<sup>2</sup> --|-- **Sampling square (Circle)** | 0.5 m<sup>2</sup> ] .pull-right-60[ ![](images/court.png)] --- .pull-left[ ### Count data ``` ## Section ## Count N P S ## 0 9 10 9 ## 1 8 2 6 ## 2 4 2 4 ## 3 2 0 2 ## 4 0 0 1 ``` ![](Lecture_Modeling_PartII_files/figure-html/unnamed-chunk-24-1.png)<!-- --> ] .pull-right[ ### Fitting models .large[ `glm(count ~ region, family = 'poisson')` Exact same syntax as before, except the "family" is **Poisson.** ]] --- .pull-left[ ### Count data ``` ## Section ## Count N P S ## 0 9 10 9 ## 1 8 2 6 ## 2 4 2 4 ## 3 2 0 2 ## 4 0 0 1 ``` ![](Lecture_Modeling_PartII_files/figure-html/unnamed-chunk-26-1.png)<!-- --> ] .pull-right[ ### Fitting models <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Estimate </th> <th style="text-align:right;"> Std. Error </th> <th style="text-align:right;"> z value </th> <th style="text-align:right;"> Pr(>|z|) </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> -0.044 </td> <td style="text-align:right;"> 0.213 </td> <td style="text-align:right;"> -0.208 </td> <td style="text-align:right;"> 0.835 </td> </tr> <tr> <td style="text-align:left;"> RegionPerimeter </td> <td style="text-align:right;"> -0.803 </td> <td style="text-align:right;"> 0.461 </td> <td style="text-align:right;"> -1.743 </td> <td style="text-align:right;"> 0.081 </td> </tr> <tr> <td style="text-align:left;"> RegionSouth </td> <td style="text-align:right;"> 0.131 </td> <td style="text-align:right;"> 0.295 </td> <td style="text-align:right;"> 0.445 </td> <td style="text-align:right;"> 0.656 </td> </tr> </tbody> </table> The **intercept** here is "North", the *p*-values compare with North. So **Perimeter** has somewhat lower density (but not significantly). #### `\(\Delta AIC\)` table <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> df </th> <th style="text-align:right;"> AIC </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Null.model </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 153.69 </td> </tr> <tr> <td style="text-align:left;"> Region.model </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 152.50 </td> </tr> </tbody> </table> These models have very similar AIC, but Region is *slightly* significant ] --- ### Making predictions <table> <thead> <tr> <th style="text-align:left;"> Region </th> <th style="text-align:right;"> area </th> <th style="text-align:right;"> fit </th> <th style="text-align:right;"> se.fit </th> <th style="text-align:right;"> d.hat </th> <th style="text-align:right;"> d.low </th> <th style="text-align:right;"> d.high </th> <th style="text-align:right;"> N.hat </th> <th style="text-align:right;"> N.low </th> <th style="text-align:right;"> N.high </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> South </td> <td style="text-align:right;"> 82 </td> <td style="text-align:right;"> 0.087 </td> <td style="text-align:right;"> 0.204 </td> <td style="text-align:right;"> 1.091 </td> <td style="text-align:right;"> 0.725 </td> <td style="text-align:right;"> 1.641 </td> <td style="text-align:right;"> 89.5 </td> <td style="text-align:right;"> 59.4 </td> <td style="text-align:right;"> 134.6 </td> </tr> <tr> <td style="text-align:left;"> North </td> <td style="text-align:right;"> 82 </td> <td style="text-align:right;"> -0.044 </td> <td style="text-align:right;"> 0.213 </td> <td style="text-align:right;"> 0.957 </td> <td style="text-align:right;"> 0.624 </td> <td style="text-align:right;"> 1.465 </td> <td style="text-align:right;"> 78.5 </td> <td style="text-align:right;"> 51.2 </td> <td style="text-align:right;"> 120.1 </td> </tr> <tr> <td style="text-align:left;"> Perimeter </td> <td style="text-align:right;"> 196 </td> <td style="text-align:right;"> -0.847 </td> <td style="text-align:right;"> 0.408 </td> <td style="text-align:right;"> 0.429 </td> <td style="text-align:right;"> 0.189 </td> <td style="text-align:right;"> 0.970 </td> <td style="text-align:right;"> 84.1 </td> <td style="text-align:right;"> 37.0 </td> <td style="text-align:right;"> 190.1 </td> </tr> </tbody> </table> .small[ - **fit** and **se.fit** are in the log scale, so they need to be transformed via `\(exp\)` to intensities `\(\lambda\)`. - `l.hat` is the Poisson intensity `\(\lambda\)` of the sampling square (**hula hoop**), which we turn into an actual density by dividing by its area **0.5 m<sup>2</sup>**. - `d.hat` (and `d.low` and `d.high`) are the density estimates & confidence intervals, which we then turn into our numerical predictions by multiplying by area. ] ### Total estimate .large.green[ `$$\widehat{N} = 252 \,\, (95\%\, \text{C.I.}: 182 - 322)$$` ] **pretty ok!** The true values were ~ 210 total --- background-image: url('images/LEGO.jpg') background-size: cover .content-box-blue[ ## .darkred[**Take-aways on (linear, statistical) modeling**] 1. **Linear modeling** separates **patterns** (the model) from "**randomness**" (unexplained variation). 2. We structure our models to have a **response variable** and one or more **predictors** or **covariates**. 3. Depending on the reponse variable, a different **family** is chosen: - if **continuous** and symmetric: **Normal** family - if two values (presence/absence, dead/alive): **Binomial** family - if count data: **Poisson** family. 3. An important task is **Model selection**, identifying which model is "best" - Best means *"explains the most variation without overfitting"* - Very common criterion is **AIC.** 4. Once a model is "selected", we can: - analyze the results by seeing the **effect sizes** (magnitude of coefficients, aka *slopes*) and **directions** (signs of coefficients) - make **inferential predictions** by "spreading" our model over a larger landscape. 6. **Well over 90% of habitat modeling is done this way!** ]