HEAD ======= >>>>>>> 966bacd07be1e372fa38e963594a45bcdfa97dfc
class: center, middle, white, title-slide .title[ # How to model just about anything
(Part I) ] .subtitle[ ## EFB 390: Wildlife Ecology and Management ] .author[ ### Dr. Elie Gurarie ] .date[ ### October 3, 2024 ] --- <!-- https://bookdown.org/yihui/rmarkdown/xaringan-format.html --> ## Super fast primer on statistical modeling Everything you need to know to do 95% of all wildlife modeling in less than an hour and **FOUR** (or **FIVE**) easy steps!! .pull-left.large[ **I.** Linear modeling **II.** Multivariate modeling **III.** Model selection ] .pull-right.large[ **IV.** Generalized linear modeling - Poisson; Binomial **V.** Prediction ] --- .pull-left-70[ # **Step I:** Linear modeling ... is a very general method to quantifying relationships among variables. .pull-left[ ![](Lecture_Modeling_files/figure-html/unnamed-chunk-3-1.png)<!-- --> ] .pull-right[ `\(X_i\)` - is called: - covariate - independent variable - explanatory variable `\(Y_i\)` - is the property we are interested in modeling: - response variable - dependent variable .small[Note: There actually can be interest in wildlife studies to have models for **length** and **weight**, since **length** is easy to measure (e.g. from drones), but **weight** tells us more about physical condition and energetics.] ] ] .pull-right-30[ ![](images/pups_small.jpg) Steller sea lion (*Eumatopias jubatus*) pups. ] --- # Linear Models .pull-left[ #### Deterministic: `$$Y_i = a + bX_i$$` `\(a\)` - intercept; `\(b\)` - slope ] .pull-right[ #### Probabilistic: `$$Y_i = \alpha + \beta X_i + \epsilon_i$$` `\(\alpha\)` - intercept; `\(\beta\)` - slope; `\(\epsilon\)` - **randomness!**: `\(\epsilon_i \sim {\cal N}(0, \sigma)\)` ] .pull-left[ ![](Lecture_Modeling_files/figure-html/unnamed-chunk-4-1.png)<!-- --> ] .pull-right[ ![](Lecture_Modeling_files/figure-html/unnamed-chunk-5-1.png)<!-- --> ] --- # Fitting linear models is very easy in ![](images/R.png)! .pull-left[ **Point Estimate** This command fits a model: .small[ ``` r lm(Weight ~ Length, data = pups) ``` ``` ## ## Call: ## lm(formula = Weight ~ Length, data = pups) ## ## Coefficients: ## (Intercept) Length ## -49.1422 0.7535 ``` ] So for **each 1 cm** of length, add another **754 grams**, i.e. `\(\widehat{\beta} = 0.754\)` ] .pull-right[ ``` r plot(Weight ~ Length, data = pups) abline(my_model) ``` ![](Lecture_Modeling_files/figure-html/unnamed-chunk-8-1.png)<!-- --> The `abline` puts a line, with intercept `a` and slope `b` onto a figure. ] --- .pull-left-60[ # Statistical inference **Statistical inference** is the *science / art* of observings *something* from a **portion of a population** and making statements about the **entire population**. In practice - this is done by taking **data** and **estimating parameters** of a **model**. (This is also called *fitting* a model). Two related goals: 1. obtaining a **point estimate** and a **confidence interval** (precision) of the parameter estimate. 2. Assessing whether particular (combinations of) factors, i.e. **models**, provide any **explanatory power**. This is (almost always) done using **Maximum Likelihood Estimation**, i.e. an algorithm searches through possible values of the parameters that make the model **MOST LIKELY** (have the highest probability) given the data. ] .pull-right-40[ ![](images/SSL_withpup.jpg) .small[Another gratuitous sea lion picture.] ] --- # All models have these pieces: `$$\Huge Y = f({\bf X} | \bf{\Theta})$$` - **Y** - response | dependent variable. The thing we want to model / predict / understand. The **effect** (maybe). - **X** - predictor(s) | independent variable(s) | covariate(s). The thing(s) that "explain(s)" **Y**. The **cause** (maybe). - **f** - the model structure. This includes: some **deterministic functional form** form (*linear? periodic? polynomial? exponential?*) AND some **probabilistic assumptions**, i.e. a way to characterize the variability / randomness / unpredictability of the process. - `\(\bf \Theta\)` - the parameters of the model. There are usually some parameters associated with the **predictors**, and some associated with the **random bit**. --- # Goals (Art / Science) of Modeling `$$\Huge Y = f({\bf X} | \bf{\Theta})$$` .pull-left[ ### 1. Model fitting .darkred[ What are the **best** `\(\bf \Theta\)` values given `\(f, {\bf X}, Y\)`? ] **Fitting the model** = **estimating the parameters**. Usually according to some criterion (almost always **Maximum Likelihood**. ] -- .pull-right[ ### 2. Model selection .darkred[ What are the **best** of a set of models `\(f_1\)`, `\(f_2\)`, `\(f_3\)` given `\(\bf X\)` and `\(Y\)`? ] Different models *usually* vary by what particular variables go into **X**, but can also vary by **functional form** and **distribution assumptions** Use some **Criterion** (e.g. AIC) to "select" the best model, which balances **how many parameters you estimated** verses **how good the fit is**. ] --- # Whoa! What is "Maximum Likelihood"!? .pull-left[ ### Oakie ![](images/likelihoods/blackmorph.jpg) ] .pull-right[ ### Orange ![](images/likelihoods/greysquirrels.jpg) ] .red.center.large[**Q:** What is the "best model" for squirrel morph distribution?] --- ## Data and Models .pull-left[ ### Data / observations: `\(X_{ij}\)` | island | a. Orange | b. Oakie| |---|---|---| | squirrel 1: | `\(X_{a,1}\)` = 1 | `\(X_{b,1}\)` = 1 | | squirrel 2: | `\(X_{a,2}\)` = 1 | `\(X_{b,2}\)` = 0 | - 1 = light morph - 0 = dark morph ] .pull-right[ ### Models | | model | k| |---|:--|:---| | M1: | `\(P(X_{ij} = 1) = p = 0.5\)` | 1 | M2: | `\(p = 0.75\)` | 1 | M3: | `\(p_a = 1\)`; `\(p_b = .5\)` | 2 | M4: | `\(p_{a,1} = 1;\,\,\, p_{b,1} = 1 \\ p_{a,2} = 1;\,\,\, p_{b,2} = 0\)` | 4 **very important to keep track of the number of parameters!** ] --- ## Likelihoood (of a model) .large[ **Product** of **probabilities** of **data** given **model**. `$$\Large {\cal L}(model) = \prod_{i = 1}^n \text{Pr}(X | model)$$` - We **never** care about the **absolute** value of the likelihood! - Only the *relative* value of the likelihood. ] --- ## Four different Squirrel Models: Data: `$$X_{a1} = 1;\, X_{a2} = 1; X_{b1} = 1;\, X_{b2} = 0$$` | model | | likelihood | | |---|---|---|---| | M1 | `\(p = 0.5\)` | `\({1\over2} \times {1\over2} \times {1\over2} \times {1\over2}\)` | 0.0625 | | M2 | `\(p = 0.75\)` | `\({3\over4} \times {3\over4} \times {3\over4} \times {1\over4}\)` | 0.1055 | | M3 | `\(p_a = .5\)`; `\(p_b = 1\)` | `\(1\times1\times{1\over2}\times{1\over2}\)` | 0.25 | | M4 | `\(p_{a,1} = 1;\, p_{b,1} = 1; \, p_{a,2} = 1;\, p_{b,2} = 0\)` | `\(1\times1\times1\times1\)` | 1 | `$$\cal{L}(M4) > \cal{L}(M3) > \cal{L}(M2) > \cal{L}(M1)$$` .center[*M4** has the higest likelihood! But is this a useful model?] --- ## A(kaike) Information Criterion A good fit is great! But it is useless if it uses too much information (too many parameters). This is *overfitting*. **One parameter per data point is TOO MANY parameters!** .pull-left[![](images/likelihoods/akaike.jpg) Hirotugo Akaike 赤池 弘次 (1927-2006) ] .pull-right[ Simple formula: `$$AIC = -2 \log({\cal L})+ 2k$$` (where `\(k\)` is the number of parameters) - Better fit = higher `\(\cal L\)` = lower AIC. - Too complicated = more k = higher AIC. **Lowest AIC is "best" model** ] --- ## Compute AIC | model | likelihood | log-likelihood | k | AIC |---|---|---|---|---| | **M1:** coin flip | 0.0625 | -2.77 | 1 | 7.55 | **M2:** proportional odds| 0.1055 | -2.25 | 1 | **6.50** | **M3:** island specific| 0.25 | -1.39 | 2 | 6.77 | **M4:** individual specific| 1 | 0 | 4 | 8 .center[AIC2 < AIC3 < AIC1 < AIC4 .large[Most *parsimonious* model is M2!] **Conclusion:** not enough evidence to identify a difference between islands. ] --- ## Let's add one more observation ... .pull-left[ ### Oakie Island ![](images/likelihoods/blackmorph.jpg) ] .pull-right[ ### Orange Island ![](images/likelihoods/greysquirrels2.jpg) ] `$$X_{b,3} = dark$$` --- ## Updated squirrel models: model | probs | Likelihood | | k | AIC| ---|---|---|---|---|---| M1 | `\(p = {1\over2}\)` | `\({1\over2} \times {1\over2} \times {1\over2} \times {1\over2} \times {1\over2}\)` | = 0.03125 | 1 | 8.93 M2 | `\(p = {3\over 4}\)` | `\({3\over4} \times {3\over4} \times {3\over4} \times {1\over4} \times {1\over4}\)` | = 0.02637 | 1 | 9.27 M2b | `\(p = {3\over 5}\)` | `\({3\over5} \times {3\over5} \times {3\over5} \times {2\over5} \times {2\over5}\)` | = 0.0346 | 1 | 8.73 M3 | `\(p_a = .5; p_b = 1\)` | `\(1\times1\times{1\over2}\times{1\over2}\times0\)` | = 0 (!!) | 2 | `\(\infty\)` M3b | `\(p_a = {1\over2}; p_b = {2\over3}\)` | `\({1\over2} \times {1\over2} \times {2\over3} \times {2\over3} \times {1\over3}\)` | = 0.037 | 2 | 10.6 --- ## Updated (1 parameter) squirrel models: .pull-left[ model | probs | | `\({\cal L}\)` ---|---|---|---| M1 | `\(p = {1\over2}\)` | `\({1\over2} \times {1\over2} \times {1\over2} \times {1\over2} \times {1\over2}\)` | 0.03125 M2 | `\(p = {3\over 4}\)` | `\({3\over4} \times {3\over4} \times {3\over4} \times {1\over4} \times {1\over4}\)` | 0.02637 M2b | `\(p = {3\over 5}\)` | `\({3\over5} \times {3\over5} \times {3\over5} \times {2\over5} \times {2\over5}\)` | 0.0346 If you sweep through all possible values of `\(p\)`, you find that `\(\widehat p = 3/5\)` leads to the highest likelihood. This is the **maximum likelihood estimate** (MLE) of the probability that a squirrel is light morph. But you can also get (good) Confidence Intervals from looking at the curve of the profile. ] .pull-right[ ### Likelihood profile ![](Lecture_Modeling_files/figure-html/unnamed-chunk-10-1.png)<!-- --> ] --- # **Step II:** More Models of Sea Lion Weights .pull-left-70[ ### Null (linear) model ![](Lecture_Modeling_files/figure-html/unnamed-chunk-11-1.png)<!-- --> <<<<<<< HEAD ``` r ======= ```r >>>>>>> 966bacd07be1e372fa38e963594a45bcdfa97dfc mean(pups$Weight) ``` ``` ## [1] 33.51004 ``` <<<<<<< HEAD ``` r ======= ```r >>>>>>> 966bacd07be1e372fa38e963594a45bcdfa97dfc sd(pups$Weight) ``` ``` ## [1] 5.661695 ``` ] .pull-right-30[ ![](images/pups_small.jpg) This suggests a model! `$$W \sim {\cal N}(\mu = 33\,kg, \sigma = 5.7)$$` With no covariates. ] --- .pull-left-70[ # Simple linear model *Probably* there is a relationship between length and weight. The simplest relationship is linear. `$$\large Y \sim { N (\text{mean} = \beta_0 + \beta_1 X,\,\, \text{sd} = \sigma)}$$` ![](Lecture_Modeling_files/figure-html/unnamed-chunk-13-1.png)<!-- --> ] .pull-right-30[ ![](images/pups_small.jpg) Steller sea lion (*Eumatopias jubatus*) pups. ] --- .pull-left[ ## Deterministic model: `$$Y_i = \beta_0 + \beta_1 X_i$$` - `\(\beta_0\)` - intercept - `\(\beta_1\)` - slope This is the **functional form of the predictor** ] .pull-right[ ## Statistical model: **Version 1:** `$$Y_i = \beta_0 + \beta_1 X_i + \epsilon_i$$` where `\(\epsilon_i \sim {\cal N}(0, \sigma)\)` or **Version 2:** `$$Y_i \sim {\cal N}(\beta_0 + \beta_1 X_i, \sigma)$$` ] V2 is better because it is more transparent about the number of parameters! - Two (intercept | slope) are part of the **functional form** - One (residual standard deviation) is part of the **random component**. --- ## But other variables might influence pup size .pull-left-30[ Lots of competing models with different **main** and **interaction** effects. ] .pull-right-70[ ![](Lecture_Modeling_files/figure-html/unnamed-chunk-16-1.png)<!-- --> ] --- ## Fitting and model selection .pull-left-70[ <<<<<<< HEAD <table class="table" style="margin-left: auto; margin-right: auto;"> ======= <table class="table" style="color: black; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Model </th> <th style="text-align:right;"> k </th> <th style="text-align:right;"> R2 </th> <th style="text-align:right;"> logLik </th> <th style="text-align:right;"> AIC </th> <th style="text-align:right;"> dAIC </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;background-color: yellow !important;"> Weight ~ Length * Sex + Island </td> <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 8 </td> <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 0.818 </td> <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> -1144.6 </td> <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 2307.3 </td> <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 0.0 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ Length * Sex * Island </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 0.824 </td> <td style="text-align:right;"> -1137.1 </td> <td style="text-align:right;"> 2316.1 </td> <td style="text-align:right;"> 8.8 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ Length + Sex + Island </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 0.811 </td> <td style="text-align:right;"> -1155.0 </td> <td style="text-align:right;"> 2325.9 </td> <td style="text-align:right;"> 18.6 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ Length * Sex </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 0.803 </td> <td style="text-align:right;"> -1164.5 </td> <td style="text-align:right;"> 2339.0 </td> <td style="text-align:right;"> 31.7 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ Length + Sex </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 0.795 </td> <td style="text-align:right;"> -1174.5 </td> <td style="text-align:right;"> 2357.0 </td> <td style="text-align:right;"> 49.7 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ Length </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 0.779 </td> <td style="text-align:right;"> -1193.4 </td> <td style="text-align:right;"> 2392.8 </td> <td style="text-align:right;"> 85.5 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ Sex </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 0.293 </td> <td style="text-align:right;"> -1483.2 </td> <td style="text-align:right;"> 2972.4 </td> <td style="text-align:right;"> 665.1 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ Island </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 0.028 </td> <td style="text-align:right;"> -1562.5 </td> <td style="text-align:right;"> 3137.0 </td> <td style="text-align:right;"> 829.7 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0.000 </td> <td style="text-align:right;"> -1569.5 </td> <td style="text-align:right;"> 3143.1 </td> <td style="text-align:right;"> 835.8 </td> </tr> </tbody> </table> ] .pull-right-30[ This is what we expect ... the interaction between **sex** and **length** is consistent across islands, but there are some main effect differences across islands (mainly because of the time we sampled). ] --- ## Model selection vs. parameter estimates The best model: `$$Y_{ijk} = \beta_0 + \beta_{1i} \text{Island}_{ijk} + (\beta_2 + \beta_{3j} \text{Sex}_{ijk})\times(\text{Length}_{ijk}) + \epsilon_{ijk}$$` .pull-left[ What are the **parameter estimates** (effect sizes) of the selected model? .small[ |term | estimate| std.error| statistic| p.value| |:--------------|--------:|---------:|---------:|-------:| |SexFemale | -39.34| 3.69| -10.67| 0.00| |SexMale | -59.29| 3.06| -19.37| 0.00| |Length | 0.66| 0.03| 19.11| 0.00| |IslandChirpoev | -2.00| 0.34| -5.81| 0.00| |IslandLovushki | -0.47| 0.34| -1.35| 0.18| |IslandRaykoke | -0.45| 0.35| -1.31| 0.19| |IslandSrednova | -0.35| 0.35| -1.00| 0.32| |SexMale:Length | 0.20| 0.04| 4.55| 0.00| ]] .pull-right[ ![](Lecture_Modeling_files/figure-html/unnamed-chunk-19-1.png)<!-- --> ] --- .pull-left[ # **Step III:** Model Selection ANOVA is helpful for "nested" models, where each one is a subset of another more complex one. For comparing a **set of competing,non-nested** models, we use . .footnotesize[ ### `\(\Delta\)`AIC table <table class="table" style="color: black; margin-left: auto; margin-right: auto;"> >>>>>>> 966bacd07be1e372fa38e963594a45bcdfa97dfc <thead> <tr> <th style="text-align:left;"> Model </th> <th style="text-align:right;"> k </th> <th style="text-align:right;"> R2 </th> <th style="text-align:right;"> logLik </th> <th style="text-align:right;"> AIC </th> <th style="text-align:right;"> dAIC </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;background-color: yellow !important;"> Weight ~ Length * Sex + Island </td> <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 8 </td> <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 0.818 </td> <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> -1144.6 </td> <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 2307.3 </td> <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 0.0 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ Length * Sex * Island </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 0.824 </td> <td style="text-align:right;"> -1137.1 </td> <td style="text-align:right;"> 2316.1 </td> <td style="text-align:right;"> 8.8 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ Length + Sex + Island </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 0.811 </td> <td style="text-align:right;"> -1155.0 </td> <td style="text-align:right;"> 2325.9 </td> <td style="text-align:right;"> 18.6 </td> </tr> <tr> <<<<<<< HEAD <td style="text-align:left;"> Weight ~ Length * Sex </td> ======= <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> M7 </td> <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> Length * Sex + Island </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> 8 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> 0.818 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> -1144.6 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> 2307.3 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> 0.0 </td> </tr> <tr> <td style="text-align:left;"> M8 </td> <td style="text-align:left;"> Length * Sex * Island </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 0.824 </td> <td style="text-align:right;"> -1137.1 </td> <td style="text-align:right;"> 2316.1 </td> <td style="text-align:right;"> 8.8 </td> </tr> </tbody> </table> ] ] .pull-right[ **Degrees of freedom *k*:** - Number of estimated parameters. Measure of *complexity*. **Coefficient of determination R<sup>2</sup>:** - Percent variation explained. It ALWAYS increases the more complex the model. - Is is always zero for the **NULL** model. **log-likelihood `\(\log({\cal L})\)`:** - Total probability score of model. It ALWAYS increases the more complex the model. **Akaike Information Criterion:** - `\(AIC = -2 \log({\cal L}) + 2\,k\)` - A measure of model quality. - Smaller is better It starts getting bigger if the model complexity gets too high. - .red[**The lowest AIC value is the "best" model.**] - (but within 2 `\(\Delta AIC\)` is pretty much equivalent to best) ] --- background-image: url('images/AfricanUngulatesBiomass.png') background-size: cover ## AIC in action: **What predicts ungulate body size?** Quality (Nitrogen)? or Type (browse/grass)? --- background-image: url('images/GurarieSpringMigrations0.png') background-size: cover .pull-left-60[ <!-- <video width="100%" controls="controls"> --> <iframe src="https://esf0-my.sharepoint.com/personal/egurarie_esf_edu/_layouts/15/embed.aspx?UniqueId=75126850-958b-4f5a-8e01-69137cacd16a&embed=%7B%22ust%22%3Atrue%2C%22hv%22%3A%22CopyEmbedCode%22%7D&referrer=StreamWebApp&referrerScenario=EmbedDialog.Create" width="640" height="360" frameborder="0" scrolling="no" allowfullscreen title="migrationanimation6.mp4"></iframe> </video> ] .pull-right-40[ ## Caribou spring migrations Remarkable temporal synchrony at a continental scale. ] --- background-image: url('images/GurarieSpringMigrations.png') background-size: cover ## Could the synchrony be driven by global weather drivers? .pull-right-70[ Pacific Decadal Oscillation, Arctic Oscillation, North Atlantic Oscillation: determine whether the winter is wet & snowy or dry & cold. ] --- background-image: url('images/GurarieSpringMigrations2.png') background-size: cover ## `\(\Delta\)`AIC Table 1: **Departure time** .pull-right[... driven by LARGE climate oscillations.] --- background-image: url('images/GurarieSpringMigrations3.png') background-size: cover ## `\(\Delta\)`AIC Table 2: **Arrival time** .pull-right[... completely independent of climate!] --- class: small .pull-left[ ## ***Step IV:*** *Generalized* linear modeling ### Normal Model .large[$$Y_i \sim {\cal Normal}(\alpha_0 + \beta_1 X_i, \sigma)$$] Models continuous data with a "normal-like" distribution. ![](Lecture_Modeling_files/figure-html/unnamed-chunk-21-1.png)<!-- --> ] -- .pull-right[ ### Binomial model .large[$$Y_i \sim {\cal Bernoulli}\left( \frac{\exp(\alpha + \beta X_i)}{1 + \exp(\alpha + \beta X_i)} \right)$$] There's some *probability* of something happening that depends on the predictor `\(X\)`. **Bernoulli** just means the data are all 0 or 1. ![](Lecture_Modeling_files/figure-html/unnamed-chunk-22-1.png)<!-- --> This models **presence/absence**, **dead/alive**, **male/female** other response variables with **2** possible outcomes. ] --- background-image: url('images/SoleaSolea.png') background-size: cover ### What factors predict occurence of *Solea solea* larvae? Sampled in the estuary of the Tejo river in Portugal - Lots of environmental factors in data .pull-right-70[.footnotesize[ <table class="table" style="color: black; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> depth </th> <th style="text-align:right;"> temp </th> <th style="text-align:right;"> salinity </th> <th style="text-align:right;"> transp </th> <th style="text-align:right;"> gravel </th> <th style="text-align:right;"> large_sand </th> <th style="text-align:right;"> fine_sand </th> <th style="text-align:right;"> mud </th> <th style="text-align:right;"> presence </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 3.0 </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 30 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 3.74 </td> <td style="text-align:right;"> 13.15 </td> <td style="text-align:right;"> 11.93 </td> <td style="text-align:right;"> 71.18 </td> <td style="text-align:right;font-weight: bold;"> 0 </td> </tr> <tr> <td style="text-align:right;"> 2.6 </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 1.94 </td> <td style="text-align:right;"> 4.99 </td> <td style="text-align:right;"> 5.43 </td> <td style="text-align:right;"> 87.63 </td> <td style="text-align:right;font-weight: bold;"> 0 </td> </tr> <tr> <td style="text-align:right;"> 2.6 </td> <td style="text-align:right;"> 19 </td> <td style="text-align:right;"> 30 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 2.88 </td> <td style="text-align:right;"> 8.98 </td> <td style="text-align:right;"> 16.85 </td> <td style="text-align:right;"> 71.29 </td> <td style="text-align:right;font-weight: bold;"> 1 </td> </tr> <tr> <td style="text-align:right;"> 2.1 </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 11.06 </td> <td style="text-align:right;"> 11.96 </td> <td style="text-align:right;"> 21.95 </td> <td style="text-align:right;"> 55.03 </td> <td style="text-align:right;font-weight: bold;"> 0 </td> </tr> <tr> <td style="text-align:right;"> 3.2 </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 30 </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 9.87 </td> <td style="text-align:right;"> 28.60 </td> <td style="text-align:right;"> 19.49 </td> <td style="text-align:right;"> 42.04 </td> <td style="text-align:right;font-weight: bold;"> 0 </td> </tr> <tr> <td style="text-align:right;"> 3.5 </td> <td style="text-align:right;"> 20 </td> <td style="text-align:right;"> 32 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 32.45 </td> <td style="text-align:right;"> 7.39 </td> <td style="text-align:right;"> 9.43 </td> <td style="text-align:right;"> 50.72 </td> <td style="text-align:right;font-weight: bold;"> 0 </td> </tr> </tbody> </table> ]] --- # Presence of *Solea solea* against **salinity** .pull-left-40[ ![](Lecture_Modeling_files/figure-html/unnamed-chunk-24-1.png)<!-- --> ] .pull-right-60[ Modeling is EXACTLY the same as **linear regression** except: - `glm` - for **generalized** linear model (instead of `lm`) - `family = 'binomial'` is the instruction to fit the logistic regression `glm(presence ~ salinity, family ='binomial')` | | Estimate| Std. Error| z value| Pr(>|z|)| |:-----------|--------:|----------:|-------:|------------------:| |(Intercept) | 2.661| 0.902| 2.951| 0.003| |salinity | -0.130| 0.035| -3.716| 0.000| Clearly - *Solea solea* presence is very significantly *negatively* related to salinity. ] --- ### Out of this model we can make predictions <img src="Lecture_Modeling_files/figure-html/unnamed-chunk-27-1.png" width="80%" /> --- .pull-left[ ## `\(\Delta\)`AIC analysis - and coefficients <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> Model </th> <th style="text-align:right;"> k </th> <th style="text-align:right;"> logLik </th> <th style="text-align:right;"> AIC </th> <th style="text-align:right;"> dAIC </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> M9 </td> <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> salinity + gravel </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> 3 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> -33.2 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> 72.5 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> 0.0 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> M2 </td> <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> salinity </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 2 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> -34.3 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 72.6 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 0.1 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> M7 </td> <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> temp + salinity </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 3 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> -34.0 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 74.0 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 1.5 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> M5 </td> <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> depth + salinity </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 3 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> -34.1 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 74.3 </td> <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 1.8 </td> </tr> <tr> <td style="text-align:left;"> M11 </td> <td style="text-align:left;"> depth + temp + salinity </td> >>>>>>> 966bacd07be1e372fa38e963594a45bcdfa97dfc <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 0.803 </td> <td style="text-align:right;"> -1164.5 </td> <td style="text-align:right;"> 2339.0 </td> <td style="text-align:right;"> 31.7 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ Length + Sex </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 0.795 </td> <td style="text-align:right;"> -1174.5 </td> <td style="text-align:right;"> 2357.0 </td> <td style="text-align:right;"> 49.7 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ Length </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 0.779 </td> <td style="text-align:right;"> -1193.4 </td> <td style="text-align:right;"> 2392.8 </td> <td style="text-align:right;"> 85.5 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ Sex </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 0.293 </td> <td style="text-align:right;"> -1483.2 </td> <td style="text-align:right;"> 2972.4 </td> <td style="text-align:right;"> 665.1 </td> </tr> <tr> <<<<<<< HEAD <td style="text-align:left;"> Weight ~ Island </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 0.028 </td> <td style="text-align:right;"> -1562.5 </td> <td style="text-align:right;"> 3137.0 </td> <td style="text-align:right;"> 829.7 </td> </tr> <tr> <td style="text-align:left;"> Weight ~ 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0.000 </td> <td style="text-align:right;"> -1569.5 </td> <td style="text-align:right;"> 3143.1 </td> <td style="text-align:right;"> 835.8 </td> </tr> </tbody> </table> ======= <td style="text-align:left;"> M8 </td> <td style="text-align:left;"> temp + gravel </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> -43.3 </td> <td style="text-align:right;"> 92.6 </td> <td style="text-align:right;"> 20.1 </td> </tr> </tbody> </table> ] .pull-right[ **Salinity** clearly among the more important covariates (in the top 4 models). ![](Lecture_Modeling_files/figure-html/unnamed-chunk-30-1.png)<!-- --> ] --- ## Poisson regression .pull-left-30[ ![](images/Poisson1.png) ![](images/Poisson4.png) ![](images/Poisson10.png) ] .pull-right-70[ .large[$$Y_i \sim {\cal Poisson}\left(\lambda = \exp(\alpha + \beta X_i) \right)$$] - We are **counting** something ... the data are between 0 and `\(\infty\)` - `\(\lambda\)` is a **density**; **densities** vary across habitat types (covariate **X**). .center[<img src='images/Moose2.png' width='70%'/>] ] --- ## Field flags .large[**Did flag densities vary with region?**] .pull-left-40[ Approximate areas: region | area --|-- **North:** | 82 m<sup>2</sup> **South:** | 82 m<sup>2</sup> **Perimeter:** | 196 m<sup>2</sup> --|-- **Sampling square (hula hoop)** | 0.5 m<sup>2</sup> ] .pull-right-60[ ![](images/court.png)] --- .pull-left[ ### Count data Lots of 0's, some 1's, and just one 2 count. ``` ## Region ## Count North Perimeter South ## 0 15 12 9 ## 1 1 1 6 ## 2 0 0 1 ``` ![](Lecture_Modeling_files/figure-html/unnamed-chunk-32-1.png)<!-- --> ] .pull-right[ ### Fitting models .large[ `glm(count ~ region, family = 'poisson')` Exact same syntax as before, except the "family" is **Poisson.** ]] --- .pull-left[ ### Count data Lots of 0's, some 1's, and just one 2 count. ``` ## Region ## Count North Perimeter South ## 0 15 12 9 ## 1 1 1 6 ## 2 0 0 1 ``` ![](Lecture_Modeling_files/figure-html/unnamed-chunk-34-1.png)<!-- --> ] .pull-right[ ### Fitting models | | Estimate| Std. Error| z value| Pr(>|z|)| |:---------------|--------:|----------:|-------:|------------------:| |(Intercept) | -2.773| 1.000| -2.773| 0.006| |RegionPerimeter | 0.208| 1.414| 0.147| 0.883| |RegionSouth | 2.079| 1.061| 1.961| 0.050| The **intercept** here is "North", the *p*-values compare with North. So **South** has - borderline - significantly more #### `\(\Delta AIC\)` table | | df| AIC| |:------------|--:|-----:| |Null.model | 1| 53.47| |Region.model | 3| 49.15| Model that includes **Region** has lower AIC >>>>>>> 966bacd07be1e372fa38e963594a45bcdfa97dfc ] .pull-right-30[ This is what we expect ... the interaction between **sex** and **length** is consistent across islands, but there are some main effect differences across islands (mainly because of the time we sampled). ] --- ## Model selection vs. parameter estimates <<<<<<< HEAD The best model: `$$Y_{ijk} = \beta_{sex} + \beta_{island} \text{Island}_{ijk} + (\beta_{length} \times\text{Length}_{ijk}) + \epsilon_{ijk}$$` .pull-left[ What are the **parameter estimates** (effect sizes) of the selected model? ======= |Region | area| fit| se.fit| l.hat| l.low| l.high| d.hat| d.low| d.high| N.hat| N.low| N.high| |:---------|----:|------:|------:|-----:|-----:|------:|-----:|-----:|------:|-----:|-----:|------:| |South | 82| -0.693| 0.354| 0.500| 0.247| 1.014| 1.000| 0.493| 2.028| 82.0| 40.4| 166.3| |North | 82| -2.773| 1.000| 0.063| 0.008| 0.462| 0.125| 0.017| 0.924| 10.2| 1.4| 75.8| |Perimeter | 196| -2.565| 1.000| 0.077| 0.010| 0.568| 0.154| 0.021| 1.137| 30.2| 4.1| 222.9| >>>>>>> 966bacd07be1e372fa38e963594a45bcdfa97dfc .small[ <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> SexFemale </td> <td style="text-align:right;"> -39.34 </td> <td style="text-align:right;"> 3.69 </td> <td style="text-align:right;"> -10.67 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> SexMale </td> <td style="text-align:right;"> -59.29 </td> <td style="text-align:right;"> 3.06 </td> <td style="text-align:right;"> -19.37 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> Length </td> <td style="text-align:right;"> 0.66 </td> <td style="text-align:right;"> 0.03 </td> <td style="text-align:right;"> 19.11 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> IslandChirpoev </td> <td style="text-align:right;"> -2.00 </td> <td style="text-align:right;"> 0.34 </td> <td style="text-align:right;"> -5.81 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:left;"> IslandLovushki </td> <td style="text-align:right;"> -0.47 </td> <td style="text-align:right;"> 0.34 </td> <td style="text-align:right;"> -1.35 </td> <td style="text-align:right;"> 0.18 </td> </tr> <tr> <td style="text-align:left;"> IslandRaykoke </td> <td style="text-align:right;"> -0.45 </td> <td style="text-align:right;"> 0.35 </td> <td style="text-align:right;"> -1.31 </td> <td style="text-align:right;"> 0.19 </td> </tr> <tr> <td style="text-align:left;"> IslandSrednova </td> <td style="text-align:right;"> -0.35 </td> <td style="text-align:right;"> 0.35 </td> <td style="text-align:right;"> -1.00 </td> <td style="text-align:right;"> 0.32 </td> </tr> <tr> <td style="text-align:left;"> SexMale:Length </td> <td style="text-align:right;"> 0.20 </td> <td style="text-align:right;"> 0.04 </td> <td style="text-align:right;"> 4.55 </td> <td style="text-align:right;"> 0.00 </td> </tr> </tbody> </table> ]] .pull-right[ ![](Lecture_Modeling_files/figure-html/unnamed-chunk-19-1.png)<!-- --> ]