How to model just about anything (Part I)

<<<<<<< HEAD ======= >>>>>>> 966bacd07be1e372fa38e963594a45bcdfa97dfc

class: center, middle, white, title-slide

.title[
# How to model just about anything<br>(Part I)
]
.subtitle[
## EFB 390: Wildlife Ecology and Management
]
.author[
### Dr. Elie Gurarie
]
.date[
### October 3, 2024
]

---

## Super fast primer on statistical modeling

Everything you need to know to do 95% of all wildlife modeling in less than an hour and **FOUR** (or **FIVE**) easy steps!!

.pull-left.large[

**I.** Linear modeling

**II.** Multivariate modeling

**III.** Model selection 
]

.pull-right.large[

**IV.** Generalized linear modeling 
  - Poisson; Binomial

**V.** Prediction
]

---

.pull-left-70[
# **Step I:** Linear modeling

... is a very general method to quantifying relationships among variables.

.pull-left[
![](Lecture_Modeling_files/figure-html/unnamed-chunk-3-1.png)
]
.pull-right[

`$X_i$` - is called:
- covariate
- independent variable
- explanatory variable

`$Y_i$` - is the property we are interested in modeling:
- response variable
- dependent variable

.small[Note: There actually can be interest in wildlife studies to have models for **length** and **weight**, since **length** is easy to measure (e.g. from drones), but **weight** tells us more about physical condition and energetics.]
]
]

.pull-right-30[
![](images/pups_small.jpg)

Steller sea lion (*Eumatopias jubatus*) pups.

]

---

# Linear Models
.pull-left[
#### Deterministic:

`$$Y_i = a + bX_i$$`

`$a$` - intercept; `$b$` - slope
]

.pull-right[
#### Probabilistic:

`$$Y_i = \alpha + \beta X_i + \epsilon_i$$`

`$\alpha$` - intercept; `$\beta$` - slope; `$\epsilon$` - **randomness!**: `$\epsilon_i \sim {\cal N}(0, \sigma)$`
]

.pull-left[
![](Lecture_Modeling_files/figure-html/unnamed-chunk-4-1.png)
]

.pull-right[
![](Lecture_Modeling_files/figure-html/unnamed-chunk-5-1.png)
]

---

# Fitting linear models is very easy in ![](images/R.png)!

.pull-left[

**Point Estimate**

This command fits a model:
.small[

``` r
lm(Weight ~ Length, data = pups)
```

```
## 
## Call:
## lm(formula = Weight ~ Length, data = pups)
## 
## Coefficients:
## (Intercept)       Length  
##    -49.1422       0.7535
```
]

So for **each 1 cm** of length, add another **754 grams**, i.e. `$\widehat{\beta} = 0.754$`

]

.pull-right[

``` r
plot(Weight ~ Length, data = pups)
abline(my_model)
```

![](Lecture_Modeling_files/figure-html/unnamed-chunk-8-1.png)

The `abline` puts a line, with intercept `a` and slope `b` onto a figure.
]

---

.pull-left-60[

# Statistical inference

**Statistical inference** is the *science / art* of observings *something* from a **portion of a population** and making statements about the **entire population**.

In practice - this is done by taking  **data** and **estimating  parameters** of a **model**.  (This is also called *fitting* a model).

Two related goals:

1. obtaining a **point estimate** and a **confidence interval** (precision) of the parameter estimate. 
2. Assessing whether particular (combinations of) factors, i.e. **models**, provide any **explanatory power**.

This is (almost always) done using **Maximum Likelihood Estimation**, i.e. an algorithm searches through possible values of the parameters that make the model **MOST LIKELY** (have the highest probability) given the data. 
]

.pull-right-40[
![](images/SSL_withpup.jpg)

.small[Another gratuitous sea lion picture.]
]

---

# All models have these pieces:

`$$\Huge Y = f({\bf X} | \bf{\Theta})$$`
 - **Y** - response | dependent variable.  The thing we want to model / predict / understand. The **effect** (maybe). 
 
- **X** - predictor(s) | independent variable(s) | covariate(s).  The thing(s) that "explain(s)" **Y**. The **cause** (maybe).

- **f** - the model structure.  This includes: some **deterministic functional form** form (*linear? periodic? polynomial? exponential?*) AND some **probabilistic assumptions**, i.e. a way to characterize the variability / randomness / unpredictability of the process.

- `$\bf \Theta$` - the parameters of the model.  There are usually some parameters associated with the **predictors**, and some associated with the **random bit**.

---

# Goals (Art / Science)  of Modeling

`$$\Huge Y = f({\bf X} | \bf{\Theta})$$`

.pull-left[

### 1. Model fitting

.darkred[
What are the **best** `$\bf \Theta$` values given `$f, {\bf X}, Y$`?
]

**Fitting the model** = **estimating the parameters**.

Usually according to some criterion (almost always **Maximum Likelihood**.

]

.pull-right[
### 2. Model selection

.darkred[

What are the **best** of a set of models `$f_1$`, `$f_2$`, `$f_3$` given `$\bf X$` and `$Y$`?

]

Different models *usually* vary by what particular variables go into **X**, but can also vary by **functional form** and **distribution assumptions**

Use some **Criterion** (e.g. AIC) to "select" the best model, which balances **how many parameters you estimated**  verses **how good the fit is**.

]

---

# Whoa!   What is "Maximum Likelihood"!?

.pull-left[

### Oakie

![](images/likelihoods/blackmorph.jpg)

]

.pull-right[

### Orange

![](images/likelihoods/greysquirrels.jpg)

]

.red.center.large[**Q:** What is the "best model" for squirrel morph distribution?]

---

## Data and Models

.pull-left[

### Data / observations: `$X_{ij}$`

| island | a. Orange | b. Oakie|
|---|---|---|
| squirrel 1: | `$X_{a,1}$` = 1 | `$X_{b,1}$` = 1 |
| squirrel 2: | `$X_{a,2}$` = 1 | `$X_{b,2}$` = 0 |

- 1 = light morph
- 0 = dark morph
]

.pull-right[

### Models

| | model | k|
|---|:--|:---|
| M1: | `$P(X_{ij} = 1) = p = 0.5$` | 1
| M2: | `$p = 0.75$` | 1
| M3: | `$p_a = 1$`; `$p_b = .5$` | 2
| M4: | `$p_{a,1} = 1;\,\,\, p_{b,1} = 1 \\ p_{a,2} = 1;\,\,\, p_{b,2} = 0$` | 4

**very important to keep track of the number of parameters!**
]

---

## Likelihoood (of a model)

.large[
**Product** of **probabilities** of **data** given **model**.

`$$\Large {\cal L}(model) = \prod_{i = 1}^n \text{Pr}(X | model)$$`

- We **never** care about the **absolute** value of the likelihood! 
- Only the *relative* value of the likelihood. 
]

---

## Four different Squirrel Models:

Data: `$$X_{a1} = 1;\, X_{a2} = 1; X_{b1} = 1;\, X_{b2} = 0$$`

| model | | likelihood | |
|---|---|---|---|
| M1 | `$p = 0.5$` | `${1\over2} \times {1\over2} \times {1\over2} \times {1\over2}$` | 0.0625 |
| M2 | `$p = 0.75$` | `${3\over4} \times {3\over4} \times {3\over4} \times {1\over4}$` | 0.1055 |
| M3 | `$p_a = .5$`; `$p_b = 1$` | `$1\times1\times{1\over2}\times{1\over2}$` | 0.25 |
| M4 | `$p_{a,1} = 1;\, p_{b,1} = 1; \, p_{a,2} = 1;\, p_{b,2} = 0$` | `$1\times1\times1\times1$` | 1 |

`$$\cal{L}(M4) > \cal{L}(M3) > \cal{L}(M2) > \cal{L}(M1)$$`

.center[*M4** has the higest likelihood!  But is this a useful model?]

---

## A(kaike) Information Criterion

A good fit is great!  But it is useless if it uses too much information (too many parameters).  This is *overfitting*.  **One parameter per data point is TOO MANY parameters!**

.pull-left[![](images/likelihoods/akaike.jpg)

Hirotugo Akaike 赤池 弘次 (1927-2006)
]

.pull-right[

Simple formula:

`$$AIC = -2 \log({\cal L})+ 2k$$`

(where `$k$` is the number of parameters)

- Better fit = higher `$\cal L$` = lower AIC.  
- Too complicated = more k = higher AIC.

**Lowest AIC is "best" model**
]

---

## Compute AIC

| model | likelihood | log-likelihood | k | AIC
|---|---|---|---|---|
| **M1:** coin flip |  0.0625 | -2.77 | 1 | 7.55
| **M2:** proportional odds| 0.1055  | -2.25 | 1 | **6.50**
| **M3:** island specific| 0.25    | -1.39 | 2 | 6.77
| **M4:** individual specific| 1       | 0     | 4 | 8

.center[AIC2 < AIC3 < AIC1 < AIC4

.large[Most *parsimonious* model is M2!]

**Conclusion:** not enough evidence to identify a difference between islands.   
]

---

## Let's add one more observation ...
.pull-left[

### Oakie Island

![](images/likelihoods/blackmorph.jpg)

]

.pull-right[

### Orange Island
![](images/likelihoods/greysquirrels2.jpg)
]

`$$X_{b,3} = dark$$`

---

## Updated squirrel models:

model | probs | Likelihood |  | k |  AIC|
---|---|---|---|---|---|
 M1 | `$p = {1\over2}$` | `${1\over2} \times {1\over2} \times {1\over2} \times {1\over2} \times {1\over2}$` | = 0.03125 | 1 | 8.93
 M2 | `$p = {3\over 4}$` | `${3\over4} \times {3\over4} \times {3\over4} \times {1\over4} \times {1\over4}$` | = 0.02637 | 1 | 9.27
 M2b | `$p = {3\over 5}$` | `${3\over5} \times {3\over5} \times {3\over5} \times {2\over5} \times {2\over5}$` | = 0.0346 | 1 | 8.73
 M3 | `$p_a = .5; p_b = 1$` | `$1\times1\times{1\over2}\times{1\over2}\times0$` | = 0 (!!) | 2 | `$\infty$`
 M3b | `$p_a = {1\over2}; p_b = {2\over3}$` | `${1\over2} \times {1\over2} \times {2\over3} \times {2\over3} \times {1\over3}$` | = 0.037 | 2 | 10.6

---

## Updated (1 parameter) squirrel models:

.pull-left[
 model | probs | | `${\cal L}$` 
---|---|---|---|
 M1 | `$p = {1\over2}$` | `${1\over2} \times {1\over2} \times {1\over2} \times {1\over2} \times {1\over2}$` | 0.03125 
 M2 | `$p = {3\over 4}$` | `${3\over4} \times {3\over4} \times {3\over4} \times {1\over4} \times {1\over4}$` | 0.02637 
 M2b | `$p = {3\over 5}$` | `${3\over5} \times {3\over5} \times {3\over5} \times {2\over5} \times {2\over5}$` | 0.0346

If you sweep through all possible values of `$p$`, you find that
`$\widehat p = 3/5$` leads to the highest likelihood.

This is the **maximum likelihood estimate** (MLE) of the probability that a squirrel is light morph.

But you can also get (good) Confidence Intervals from looking at the curve of the profile.

]

.pull-right[

### Likelihood profile

![](Lecture_Modeling_files/figure-html/unnamed-chunk-10-1.png)
]

---
# **Step II:** More Models of Sea Lion Weights

.pull-left-70[

### Null (linear) model

![](Lecture_Modeling_files/figure-html/unnamed-chunk-11-1.png)

<<<<<<< HEAD
``` r
=======
```r
>>>>>>> 966bacd07be1e372fa38e963594a45bcdfa97dfc
mean(pups$Weight)
```

```
## [1] 33.51004
```

<<<<<<< HEAD
``` r
=======
```r
>>>>>>> 966bacd07be1e372fa38e963594a45bcdfa97dfc
sd(pups$Weight)
```

```
## [1] 5.661695
```
]

.pull-right-30[

![](images/pups_small.jpg)

This suggests a model!

`$$W \sim {\cal N}(\mu = 33\,kg, \sigma = 5.7)$$`
With no covariates.

]

---
.pull-left-70[

# Simple linear model

*Probably* there is a relationship between length and weight.  The simplest relationship is linear.

`$$\large Y \sim {  N (\text{mean} = \beta_0 + \beta_1 X,\,\, \text{sd} = \sigma)}$$`

![](Lecture_Modeling_files/figure-html/unnamed-chunk-13-1.png)
]

.pull-right-30[
![](images/pups_small.jpg)

Steller sea lion (*Eumatopias jubatus*) pups.  
]

---

.pull-left[
## Deterministic model:

`$$Y_i = \beta_0 + \beta_1 X_i$$`

- `$\beta_0$` - intercept
- `$\beta_1$` - slope

This is the **functional form of the predictor**
]

.pull-right[
## Statistical model:

**Version 1:**

`$$Y_i = \beta_0 + \beta_1 X_i + \epsilon_i$$`
where `$\epsilon_i \sim {\cal N}(0, \sigma)$`

**Version 2:**

`$$Y_i \sim {\cal N}(\beta_0 + \beta_1 X_i, \sigma)$$`
]

V2 is better because it is more transparent about the number of parameters!

- Two (intercept | slope) are part of the **functional form** 
- One (residual standard deviation) is part of the **random component**.

---

## But other variables might influence pup size

.pull-left-30[
Lots of competing models with different **main** and **interaction** effects. 
]

.pull-right-70[
![](Lecture_Modeling_files/figure-html/unnamed-chunk-16-1.png)
]

---

## Fitting and model selection

.pull-left-70[
<<<<<<< HEAD
<table class="table" style="margin-left: auto; margin-right: auto;">
=======
<table class="table" style="color: black; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Model </th>
   <th style="text-align:right;"> k </th>
   <th style="text-align:right;"> R2 </th>
   <th style="text-align:right;"> logLik </th>
   <th style="text-align:right;"> AIC </th>
   <th style="text-align:right;"> dAIC </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;font-weight: bold;background-color: yellow !important;"> Weight ~ Length * Sex + Island </td>
   <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 8 </td>
   <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 0.818 </td>
   <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> -1144.6 </td>
   <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 2307.3 </td>
   <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 0.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Weight ~ Length * Sex * Island </td>
   <td style="text-align:right;"> 20 </td>
   <td style="text-align:right;"> 0.824 </td>
   <td style="text-align:right;"> -1137.1 </td>
   <td style="text-align:right;"> 2316.1 </td>
   <td style="text-align:right;"> 8.8 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Weight ~ Length + Sex + Island </td>
   <td style="text-align:right;"> 7 </td>
   <td style="text-align:right;"> 0.811 </td>
   <td style="text-align:right;"> -1155.0 </td>
   <td style="text-align:right;"> 2325.9 </td>
   <td style="text-align:right;"> 18.6 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Weight ~ Length * Sex </td>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 0.803 </td>
   <td style="text-align:right;"> -1164.5 </td>
   <td style="text-align:right;"> 2339.0 </td>
   <td style="text-align:right;"> 31.7 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Weight ~ Length + Sex </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 0.795 </td>
   <td style="text-align:right;"> -1174.5 </td>
   <td style="text-align:right;"> 2357.0 </td>
   <td style="text-align:right;"> 49.7 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Weight ~ Length </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 0.779 </td>
   <td style="text-align:right;"> -1193.4 </td>
   <td style="text-align:right;"> 2392.8 </td>
   <td style="text-align:right;"> 85.5 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Weight ~ Sex </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 0.293 </td>
   <td style="text-align:right;"> -1483.2 </td>
   <td style="text-align:right;"> 2972.4 </td>
   <td style="text-align:right;"> 665.1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Weight ~ Island </td>
   <td style="text-align:right;"> 5 </td>
   <td style="text-align:right;"> 0.028 </td>
   <td style="text-align:right;"> -1562.5 </td>
   <td style="text-align:right;"> 3137.0 </td>
   <td style="text-align:right;"> 829.7 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Weight ~ 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0.000 </td>
   <td style="text-align:right;"> -1569.5 </td>
   <td style="text-align:right;"> 3143.1 </td>
   <td style="text-align:right;"> 835.8 </td>
  </tr>
</tbody>
</table>
]

.pull-right-30[

This is what we expect ... the interaction between **sex** and **length** is consistent across islands, but there are some main effect differences across islands (mainly because of the time we sampled). 
]
---

## Model selection vs. parameter estimates

The best model:

`$$Y_{ijk} = \beta_0 + \beta_{1i} \text{Island}_{ijk} + (\beta_2 + \beta_{3j} \text{Sex}_{ijk})\times(\text{Length}_{ijk})  + \epsilon_{ijk}$$`

.pull-left[

What are the **parameter estimates** (effect sizes) of the selected model?

.small[

|term           | estimate| std.error| statistic| p.value|
|:--------------|--------:|---------:|---------:|-------:|
|SexFemale      |   -39.34|      3.69|    -10.67|    0.00|
|SexMale        |   -59.29|      3.06|    -19.37|    0.00|
|Length         |     0.66|      0.03|     19.11|    0.00|
|IslandChirpoev |    -2.00|      0.34|     -5.81|    0.00|
|IslandLovushki |    -0.47|      0.34|     -1.35|    0.18|
|IslandRaykoke  |    -0.45|      0.35|     -1.31|    0.19|
|IslandSrednova |    -0.35|      0.35|     -1.00|    0.32|
|SexMale:Length |     0.20|      0.04|      4.55|    0.00|

]]

.pull-right[
![](Lecture_Modeling_files/figure-html/unnamed-chunk-19-1.png)
]

---

.pull-left[
# **Step III:** Model Selection

ANOVA is helpful for "nested" models, where each one is a subset of another more complex one.  For comparing a **set of competing,non-nested** models, we use .

.footnotesize[

### `$\Delta$`AIC table

<table class="table" style="color: black; margin-left: auto; margin-right: auto;">
>>>>>>> 966bacd07be1e372fa38e963594a45bcdfa97dfc
 <thead>
  <tr>
   <th style="text-align:left;"> Model </th>
   <th style="text-align:right;"> k </th>
   <th style="text-align:right;"> R2 </th>
   <th style="text-align:right;"> logLik </th>
   <th style="text-align:right;"> AIC </th>
   <th style="text-align:right;"> dAIC </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;font-weight: bold;background-color: yellow !important;"> Weight ~ Length * Sex + Island </td>
   <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 8 </td>
   <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 0.818 </td>
   <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> -1144.6 </td>
   <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 2307.3 </td>
   <td style="text-align:right;font-weight: bold;background-color: yellow !important;"> 0.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Weight ~ Length * Sex * Island </td>
   <td style="text-align:right;"> 20 </td>
   <td style="text-align:right;"> 0.824 </td>
   <td style="text-align:right;"> -1137.1 </td>
   <td style="text-align:right;"> 2316.1 </td>
   <td style="text-align:right;"> 8.8 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Weight ~ Length + Sex + Island </td>
   <td style="text-align:right;"> 7 </td>
   <td style="text-align:right;"> 0.811 </td>
   <td style="text-align:right;"> -1155.0 </td>
   <td style="text-align:right;"> 2325.9 </td>
   <td style="text-align:right;"> 18.6 </td>
  </tr>
  <tr>
<<<<<<< HEAD
   <td style="text-align:left;"> Weight ~ Length * Sex </td>
=======
   <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> M7 </td>
   <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> Length * Sex + Island </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> 8 </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> 0.818 </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> -1144.6 </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> 2307.3 </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> 0.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> M8 </td>
   <td style="text-align:left;"> Length * Sex * Island </td>
   <td style="text-align:right;"> 20 </td>
   <td style="text-align:right;"> 0.824 </td>
   <td style="text-align:right;"> -1137.1 </td>
   <td style="text-align:right;"> 2316.1 </td>
   <td style="text-align:right;"> 8.8 </td>
  </tr>
</tbody>
</table>
]
]

.pull-right[

**Degrees of freedom *k*:**

- Number of estimated parameters. Measure of *complexity*.

**Coefficient of determination R<sup>2</sup>:**

- Percent variation explained. It ALWAYS increases the more complex the model.
- Is is always zero for the **NULL** model.

**log-likelihood `$\log({\cal L})$`:**

- Total probability score of model. It ALWAYS increases the more complex the model.

**Akaike Information Criterion:**

- `$AIC = -2 \log({\cal L}) + 2\,k$`
- A measure of model quality. 
- Smaller is better It starts getting bigger if the model complexity gets too high. 
- .red[**The lowest AIC value is the "best" model.**] 
- (but within 2 `$\Delta AIC$` is pretty much equivalent to best)
]

---
background-image: url('images/AfricanUngulatesBiomass.png')
background-size: cover

## AIC in action: **What predicts ungulate body size?**

Quality (Nitrogen)? or Type (browse/grass)?

---
background-image: url('images/GurarieSpringMigrations0.png')
background-size: cover

.pull-left-60[

</video>
]

.pull-right-40[
## Caribou spring migrations

Remarkable temporal synchrony at a continental scale. 
]

---
background-image: url('images/GurarieSpringMigrations.png')
background-size: cover

## Could the synchrony be driven by global weather drivers?

.pull-right-70[
Pacific Decadal Oscillation, Arctic Oscillation, North Atlantic Oscillation:  determine whether the winter is wet & snowy or dry & cold. 
]

---
background-image: url('images/GurarieSpringMigrations2.png')
background-size: cover

## `$\Delta$`AIC Table 1: **Departure time**

.pull-right[... driven by LARGE climate oscillations.]

---
background-image: url('images/GurarieSpringMigrations3.png')
background-size: cover

## `$\Delta$`AIC Table 2: **Arrival time**

.pull-right[... completely independent of climate!]

---
class: small

.pull-left[
## ***Step IV:*** *Generalized* linear modeling

### Normal Model

.large[$$Y_i \sim {\cal Normal}(\alpha_0 + \beta_1 X_i, \sigma)$$]
Models continuous data with a "normal-like" distribution.

![](Lecture_Modeling_files/figure-html/unnamed-chunk-21-1.png)
]

.pull-right[
### Binomial model

.large[$$Y_i \sim {\cal Bernoulli}\left( \frac{\exp(\alpha + \beta X_i)}{1 + \exp(\alpha + \beta X_i)} \right)$$]

There's some *probability* of something happening that depends on the predictor `$X$`.

**Bernoulli** just means the data are all 0 or 1.

![](Lecture_Modeling_files/figure-html/unnamed-chunk-22-1.png)

This models **presence/absence**, **dead/alive**, **male/female** other response variables with **2** possible outcomes. 
]

---
background-image: url('images/SoleaSolea.png')
background-size: cover

### What factors predict occurence of *Solea solea* larvae?

Sampled in the estuary of the Tejo river in Portugal

- Lots of environmental factors in data

.pull-right-70[.footnotesize[
<table class="table" style="color: black; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:right;"> depth </th>
   <th style="text-align:right;"> temp </th>
   <th style="text-align:right;"> salinity </th>
   <th style="text-align:right;"> transp </th>
   <th style="text-align:right;"> gravel </th>
   <th style="text-align:right;"> large_sand </th>
   <th style="text-align:right;"> fine_sand </th>
   <th style="text-align:right;"> mud </th>
   <th style="text-align:right;"> presence </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 3.0 </td>
   <td style="text-align:right;"> 20 </td>
   <td style="text-align:right;"> 30 </td>
   <td style="text-align:right;"> 15 </td>
   <td style="text-align:right;"> 3.74 </td>
   <td style="text-align:right;"> 13.15 </td>
   <td style="text-align:right;"> 11.93 </td>
   <td style="text-align:right;"> 71.18 </td>
   <td style="text-align:right;font-weight: bold;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 2.6 </td>
   <td style="text-align:right;"> 18 </td>
   <td style="text-align:right;"> 29 </td>
   <td style="text-align:right;"> 15 </td>
   <td style="text-align:right;"> 1.94 </td>
   <td style="text-align:right;"> 4.99 </td>
   <td style="text-align:right;"> 5.43 </td>
   <td style="text-align:right;"> 87.63 </td>
   <td style="text-align:right;font-weight: bold;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 2.6 </td>
   <td style="text-align:right;"> 19 </td>
   <td style="text-align:right;"> 30 </td>
   <td style="text-align:right;"> 15 </td>
   <td style="text-align:right;"> 2.88 </td>
   <td style="text-align:right;"> 8.98 </td>
   <td style="text-align:right;"> 16.85 </td>
   <td style="text-align:right;"> 71.29 </td>
   <td style="text-align:right;font-weight: bold;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 2.1 </td>
   <td style="text-align:right;"> 20 </td>
   <td style="text-align:right;"> 29 </td>
   <td style="text-align:right;"> 15 </td>
   <td style="text-align:right;"> 11.06 </td>
   <td style="text-align:right;"> 11.96 </td>
   <td style="text-align:right;"> 21.95 </td>
   <td style="text-align:right;"> 55.03 </td>
   <td style="text-align:right;font-weight: bold;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 3.2 </td>
   <td style="text-align:right;"> 20 </td>
   <td style="text-align:right;"> 30 </td>
   <td style="text-align:right;"> 15 </td>
   <td style="text-align:right;"> 9.87 </td>
   <td style="text-align:right;"> 28.60 </td>
   <td style="text-align:right;"> 19.49 </td>
   <td style="text-align:right;"> 42.04 </td>
   <td style="text-align:right;font-weight: bold;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 3.5 </td>
   <td style="text-align:right;"> 20 </td>
   <td style="text-align:right;"> 32 </td>
   <td style="text-align:right;"> 7 </td>
   <td style="text-align:right;"> 32.45 </td>
   <td style="text-align:right;"> 7.39 </td>
   <td style="text-align:right;"> 9.43 </td>
   <td style="text-align:right;"> 50.72 </td>
   <td style="text-align:right;font-weight: bold;"> 0 </td>
  </tr>
</tbody>
</table>
]]

---

# Presence of *Solea solea* against **salinity**

.pull-left-40[
![](Lecture_Modeling_files/figure-html/unnamed-chunk-24-1.png)
]

.pull-right-60[

Modeling is EXACTLY the same as **linear regression** except:
- `glm` - for **generalized** linear model (instead of `lm`)
- `family = 'binomial'` is the instruction to fit the logistic regression

`glm(presence ~ salinity, family ='binomial')`

|            | Estimate| Std. Error| z value| Pr(>&#124;z&#124;)|
|:-----------|--------:|----------:|-------:|------------------:|
|(Intercept) |    2.661|      0.902|   2.951|              0.003|
|salinity    |   -0.130|      0.035|  -3.716|              0.000|

Clearly - *Solea solea* presence is very significantly *negatively* related to salinity.  
]

---
### Out of this model we can make predictions

---

.pull-left[

## `$\Delta$`AIC analysis - and coefficients

<table>
 <thead>
  <tr>
   <th style="text-align:left;">  </th>
   <th style="text-align:left;"> Model </th>
   <th style="text-align:right;"> k </th>
   <th style="text-align:right;"> logLik </th>
   <th style="text-align:right;"> AIC </th>
   <th style="text-align:right;"> dAIC </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> M9 </td>
   <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> salinity + gravel </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> 3 </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> -33.2 </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> 72.5 </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;font-weight: bold;color: darkblue !important;background-color: yellow !important;"> 0.0 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> M2 </td>
   <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> salinity </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 2 </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> -34.3 </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 72.6 </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 0.1 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> M7 </td>
   <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> temp + salinity </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 3 </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> -34.0 </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 74.0 </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 1.5 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> M5 </td>
   <td style="text-align:left;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> depth + salinity </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 3 </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> -34.1 </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 74.3 </td>
   <td style="text-align:right;font-weight: bold;color: darkblue !important;background-color: lightgreen !important;"> 1.8 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> M11 </td>
   <td style="text-align:left;"> depth + temp + salinity </td>
>>>>>>> 966bacd07be1e372fa38e963594a45bcdfa97dfc
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 0.803 </td>
   <td style="text-align:right;"> -1164.5 </td>
   <td style="text-align:right;"> 2339.0 </td>
   <td style="text-align:right;"> 31.7 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Weight ~ Length + Sex </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 0.795 </td>
   <td style="text-align:right;"> -1174.5 </td>
   <td style="text-align:right;"> 2357.0 </td>
   <td style="text-align:right;"> 49.7 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Weight ~ Length </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 0.779 </td>
   <td style="text-align:right;"> -1193.4 </td>
   <td style="text-align:right;"> 2392.8 </td>
   <td style="text-align:right;"> 85.5 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Weight ~ Sex </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 0.293 </td>
   <td style="text-align:right;"> -1483.2 </td>
   <td style="text-align:right;"> 2972.4 </td>
   <td style="text-align:right;"> 665.1 </td>
  </tr>
  <tr>
<<<<<<< HEAD
   <td style="text-align:left;"> Weight ~ Island </td>
   <td style="text-align:right;"> 5 </td>
   <td style="text-align:right;"> 0.028 </td>
   <td style="text-align:right;"> -1562.5 </td>
   <td style="text-align:right;"> 3137.0 </td>
   <td style="text-align:right;"> 829.7 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Weight ~ 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0.000 </td>
   <td style="text-align:right;"> -1569.5 </td>
   <td style="text-align:right;"> 3143.1 </td>
   <td style="text-align:right;"> 835.8 </td>
  </tr>
</tbody>
</table>
=======
   <td style="text-align:left;"> M8 </td>
   <td style="text-align:left;"> temp + gravel </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> -43.3 </td>
   <td style="text-align:right;"> 92.6 </td>
   <td style="text-align:right;"> 20.1 </td>
  </tr>
</tbody>
</table>
]

.pull-right[
**Salinity** clearly among the more important covariates (in the top 4 models).

![](Lecture_Modeling_files/figure-html/unnamed-chunk-30-1.png)

]

---
## Poisson regression

.pull-left-30[

![](images/Poisson1.png)
![](images/Poisson4.png)
![](images/Poisson10.png)

]

.pull-right-70[

.large[$$Y_i \sim {\cal Poisson}\left(\lambda = \exp(\alpha + \beta X_i) \right)$$]

- We are **counting** something ... the data are between 0 and `$\infty$` 
- `$\lambda$` is a **density**;  **densities** vary across habitat types (covariate **X**).

.center[<img src='images/Moose2.png' width='70%'/>]
]

---

## Field flags

.large[**Did flag densities vary with region?**]

.pull-left-40[

Approximate areas:

![](images/court.png)]

---

.pull-left[

### Count data

Lots of 0's, some 1's, and just one 2 count.

```
##      Region
## Count North Perimeter South
##     0    15        12     9
##     1     1         1     6
##     2     0         0     1
```

![](Lecture_Modeling_files/figure-html/unnamed-chunk-32-1.png)
]

.pull-right[

### Fitting models

.large[
`glm(count ~ region, family = 'poisson')`

Exact same syntax as before, except the "family" is **Poisson.** 
]]

---

.pull-left[

### Count data

Lots of 0's, some 1's, and just one 2 count.

```
##      Region
## Count North Perimeter South
##     0    15        12     9
##     1     1         1     6
##     2     0         0     1
```

![](Lecture_Modeling_files/figure-html/unnamed-chunk-34-1.png)
]

.pull-right[

### Fitting models

|                | Estimate| Std. Error| z value| Pr(>&#124;z&#124;)|
|:---------------|--------:|----------:|-------:|------------------:|
|(Intercept)     |   -2.773|      1.000|  -2.773|              0.006|
|RegionPerimeter |    0.208|      1.414|   0.147|              0.883|
|RegionSouth     |    2.079|      1.061|   1.961|              0.050|

The **intercept** here is "North", the *p*-values compare with North.  So **South** has - borderline - significantly more

#### `$\Delta AIC$` table

|             | df|   AIC|
|:------------|--:|-----:|
|Null.model   |  1| 53.47|
|Region.model |  3| 49.15|

Model that includes **Region** has lower AIC

>>>>>>> 966bacd07be1e372fa38e963594a45bcdfa97dfc
]

.pull-right-30[

## Model selection vs. parameter estimates

<<<<<<< HEAD
The best model:

`$$Y_{ijk} = \beta_{sex} + \beta_{island} \text{Island}_{ijk} + (\beta_{length} \times\text{Length}_{ijk}) + \epsilon_{ijk}$$`

.pull-left[

What are the **parameter estimates** (effect sizes) of the selected model? 
=======

|Region    | area|    fit| se.fit| l.hat| l.low| l.high| d.hat| d.low| d.high| N.hat| N.low| N.high|
|:---------|----:|------:|------:|-----:|-----:|------:|-----:|-----:|------:|-----:|-----:|------:|
|South     |   82| -0.693|  0.354| 0.500| 0.247|  1.014| 1.000| 0.493|  2.028|  82.0|  40.4|  166.3|
|North     |   82| -2.773|  1.000| 0.063| 0.008|  0.462| 0.125| 0.017|  0.924|  10.2|   1.4|   75.8|
|Perimeter |  196| -2.565|  1.000| 0.077| 0.010|  0.568| 0.154| 0.021|  1.137|  30.2|   4.1|  222.9|
>>>>>>> 966bacd07be1e372fa38e963594a45bcdfa97dfc

.small[
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> std.error </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> SexFemale </td>
   <td style="text-align:right;"> -39.34 </td>
   <td style="text-align:right;"> 3.69 </td>
   <td style="text-align:right;"> -10.67 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> SexMale </td>
   <td style="text-align:right;"> -59.29 </td>
   <td style="text-align:right;"> 3.06 </td>
   <td style="text-align:right;"> -19.37 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Length </td>
   <td style="text-align:right;"> 0.66 </td>
   <td style="text-align:right;"> 0.03 </td>
   <td style="text-align:right;"> 19.11 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> IslandChirpoev </td>
   <td style="text-align:right;"> -2.00 </td>
   <td style="text-align:right;"> 0.34 </td>
   <td style="text-align:right;"> -5.81 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> IslandLovushki </td>
   <td style="text-align:right;"> -0.47 </td>
   <td style="text-align:right;"> 0.34 </td>
   <td style="text-align:right;"> -1.35 </td>
   <td style="text-align:right;"> 0.18 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> IslandRaykoke </td>
   <td style="text-align:right;"> -0.45 </td>
   <td style="text-align:right;"> 0.35 </td>
   <td style="text-align:right;"> -1.31 </td>
   <td style="text-align:right;"> 0.19 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> IslandSrednova </td>
   <td style="text-align:right;"> -0.35 </td>
   <td style="text-align:right;"> 0.35 </td>
   <td style="text-align:right;"> -1.00 </td>
   <td style="text-align:right;"> 0.32 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> SexMale:Length </td>
   <td style="text-align:right;"> 0.20 </td>
   <td style="text-align:right;"> 0.04 </td>
   <td style="text-align:right;"> 4.55 </td>
   <td style="text-align:right;"> 0.00 </td>
  </tr>
</tbody>
</table>

]]

.pull-right[
![](Lecture_Modeling_files/figure-html/unnamed-chunk-19-1.png)
]