---
title: "APTS Statistical Modelling"
subtitle: "Lecture Discussion 3"
date: "April 2021"
author: Helen Ogden
output:
beamer_presentation:
includes:
in_header: preamble.tex
urlcolor: blue
---
## Lecture discussion 3
- Please mute your microphone and turn off video, unless you are asking a question.
- Feel free to ask questions about any of the lecture content so far.
- Use the chat to ask questions directly, or to state that you have a question.
## Return to toxoplasmosis example
Recall the dataset `toxo` in `SMPracticals` provides
data on the number of people testing positive
for toxoplasmosis (`r`) out of the number of people
tested (`m`) in 34 cities in El Salvador,
along with the annual rainfall in mm (`rain`) in those
cities.
```{r, echo = FALSE, message = FALSE}
library(SMPracticals)
```
```{r}
head(toxo)
```
## Logistic regression
We can fit various logistic regression models for
relating toxoplasmosis incidence to rainfall, of the form
\[R_i \sim \text{Binomial}(m_i, \mu_i), \quad \text{logit}(\mu_i) = \eta_i.\]
```{r}
mod0 <- glm(r/m ~ 1, data = toxo, weights = m,
family = "binomial")
mod3 <- glm(r/m ~ poly(rain, 3), data = toxo,
weights = m, family = "binomial")
```
## Logistic regression with random intercepts
Alternatively, we might want to include models with random intercepts
```{r}
library(lme4)
toxo$city <- factor(1:nrow(toxo))
mod0_ri <- glmer(r/m ~ (1 | city),
data = toxo, weights = m,
family = "binomial")
mod3_ri <- glmer(r/m ~ poly(rain, 3) + (1 | city),
data = toxo, weights = m,
family = "binomial")
```
## Comparing models via information criteria
```{r}
AICs <- c(AIC(mod0), AIC(mod3),
AIC(mod0_ri), AIC(mod3_ri))
AICs
BICs <- c(BIC(mod0), BIC(mod3),
BIC(mod0_ri), BIC(mod3_ri))
BICs
```
## Individual-level toxoplasmosis data
We could "expand out" the toxoplasmosis dataset, to give $\sum_{i=1}^{34} m_i = 697$
records on whether each individual in each city tests positive to toxoplasmosis:
```{r, echo = FALSE}
expand_to_binary <- function(r, m) {
c(rep(1, r), rep(0, m - r))
}
toxo_ind <- data.frame(
rain = unlist(mapply(rep, times = toxo$m, x = toxo$rain)),
y = unlist(mapply(expand_to_binary, r = toxo$r, m = toxo$m)),
city = unlist(mapply(rep, times = toxo$m, x = toxo$city))
)
```
```{r}
head(toxo_ind)
```
## Models for individual-level data
We can fit various logistic regression models for
relating toxoplasmosis incidence to rainfall, of the form
\[Y_{ij} \sim \text{Bernoulli}(\mu_i), \quad \text{logit}(\mu_i) = \eta_i.\]
```{r}
mod0_ind <- glm(y ~ 1, data = toxo_ind,
family = "binomial")
mod3_ind <- glm(y ~ poly(rain, 3), data = toxo_ind,
family = "binomial")
mod0_ri_ind <- glmer(y ~ (1 | city), data = toxo_ind,
family = "binomial")
mod3_ri_ind <- glmer(y ~ poly(rain, 3) + (1 | city),
data = toxo_ind,
family = "binomial")
```
## Parameter estimates from individual-level data
```{r}
coef(mod0)
```
Is `coef(mod0_ind)` also `r format(coef(mod0), 8)`?
- Yes, the two estimates are the same
- No, the two estimates are different
- Not sure or need more information
\pause
```{r}
coef(mod0_ind)
```
## AICs from individual-level data
```{r}
AIC(mod0)
```
Recall $\text{AIC} = 2\left( p-\hat\ell\right)$.
Is `AIC(mod0_ind)` also `r format(AIC(mod0), 4)`?
- Yes, the two AICs are the same
- No, the two AICs are different
- Not sure or need more information
\pause
```{r}
AIC(mod0_ind)
```
## Why is this happening?
For the original data, we have $R_i \sim \text{Binomial}(m_i, \mu_i(\beta))$
with likelihood
\small
\[L(\beta) = \prod_{i=1}^{34} {m_i \choose r_i} \mu_i(\beta)^{r_i} (1 - \mu_i(\beta))^{m_i - r_i}.\]
\normalsize
For the individual-level data, we have $Y_{ij} \sim \text{Bernoulli}(\mu_i(\beta))$
with likelihood
\small
\[L_\text{ind} (\beta) = \prod_{i=1}^{34} \prod_{j=1}^{m_i} \mu_i(\beta)^{y_{ij}} (1 - \mu_i(\beta))^{1 - y_{ij}}
= \prod_{i=1}^{34} \mu_i(\beta)^{r_i} (1 - \mu_i(\beta))^{m_i - r_i},\]
\normalsize
so
\small
\[\ell(\beta) = \sum_{i=1}^{34} \log {m_i \choose r_i} + \ell_\text{ind}(\beta). \]
\normalsize
The difference is constant in $\beta$, but does change the values of $\hat \ell$ in
AIC.
## Differences in AICs
```{r}
AIC(mod3) - AIC(mod0)
```
Is `AIC(mod3_ind) - AIC(mod0_ind)` also `r format(AIC(mod3) - AIC(mod0), 6)`?
- Yes, the two differences in AICs are the same
- No, the two differences in AICs are different
- Not sure or need more information
\pause
```{r}
AIC(mod3_ind) - AIC(mod0_ind)
```
## Differences in AICs
```{r}
AICs_ind <- c(AIC(mod0_ind), AIC(mod3_ind),
AIC(mod0_ri_ind), AIC(mod3_ri_ind))
```
```{r}
AICs - min(AICs)
AICs_ind - min(AICs_ind)
```
## Differences in BICs
```{r}
BIC(mod3) - BIC(mod0)
```
Recall $\text{BIC} = 2\left(\frac{1}{2} p\log n-\hat\ell\right)$.
Is
`BIC(mod3_ind) - BIC(mod0_ind)` also `r format(BIC(mod3) - BIC(mod0), 6)`?
- Yes, the two differences in BICs are the same
- No, the two differences in BICs are different
- Not sure or need more information
\pause
```{r}
BIC(mod3_ind) - BIC(mod0_ind)
```
## Comparing BICs
```{r}
BICs_ind <- c(BIC(mod0_ind), BIC(mod3_ind),
BIC(mod0_ri_ind), BIC(mod3_ri_ind))
```
```{r}
BICs - min(BICs)
BICs_ind - min(BICs_ind)
```
Differences in BIC are not the same in the two equivalent forms of the model!
## Why is this happening?
Recall
\[\text{BIC} = 2\left(\frac{1}{2} p\log n-\hat\ell\right).\]
How is $n$ measured here?
```{r}
nobs(mod0)
nobs(mod0_ind)
```
## What should $n$ be here?
The theoretical development of BIC (not covered in lectures)
assumes $n$ is the rate at which information grows, i.e.
the Fisher information matrix $I(\theta)$ grows at rate $n$.
See
[Neath and Cavanaugh (2011)](https://www.doi.org/10.1002/wics.199)
for a nice review.
If we believe the logistic regression models, then should therefore
take $n =$ `r nobs(mod0_ind)`.
\pause
What if there is really clustering in the data, as in the
mixed effects models?
Much more complex story: different components of the
Fisher information matrix might grow at different rates!
See
[Delattre et. al. (2014)](http://www.doi.org/10.1214/14-EJS890)
for some discussion.
## Summary
- Be careful when applying information criteria: you should use
the same form of response in all models being compared, and make
sure the full loglikelihood (including all constant terms) is used.
- BIC requires particular care, and the defaults in R might give
unexpected results.
- Exercise even more caution using BIC with mixed effects models.
Questions on lecture content?