--- title: "APTS Statistical Modelling" subtitle: "Lecture Discussion 3" date: "April 2021" author: Helen Ogden output: beamer_presentation: includes: in_header: preamble.tex urlcolor: blue --- ## Lecture discussion 3 - Please mute your microphone and turn off video, unless you are asking a question. - Feel free to ask questions about any of the lecture content so far. - Use the chat to ask questions directly, or to state that you have a question. ## Return to toxoplasmosis example Recall the dataset `toxo` in `SMPracticals` provides data on the number of people testing positive for toxoplasmosis (`r`) out of the number of people tested (`m`) in 34 cities in El Salvador, along with the annual rainfall in mm (`rain`) in those cities. ```{r, echo = FALSE, message = FALSE} library(SMPracticals) ``` ```{r} head(toxo) ``` ## Logistic regression We can fit various logistic regression models for relating toxoplasmosis incidence to rainfall, of the form \[R_i \sim \text{Binomial}(m_i, \mu_i), \quad \text{logit}(\mu_i) = \eta_i.\] ```{r} mod0 <- glm(r/m ~ 1, data = toxo, weights = m, family = "binomial") mod3 <- glm(r/m ~ poly(rain, 3), data = toxo, weights = m, family = "binomial") ``` ## Logistic regression with random intercepts Alternatively, we might want to include models with random intercepts ```{r} library(lme4) toxo$city <- factor(1:nrow(toxo)) mod0_ri <- glmer(r/m ~ (1 | city), data = toxo, weights = m, family = "binomial") mod3_ri <- glmer(r/m ~ poly(rain, 3) + (1 | city), data = toxo, weights = m, family = "binomial") ``` ## Comparing models via information criteria ```{r} AICs <- c(AIC(mod0), AIC(mod3), AIC(mod0_ri), AIC(mod3_ri)) AICs BICs <- c(BIC(mod0), BIC(mod3), BIC(mod0_ri), BIC(mod3_ri)) BICs ``` ## Individual-level toxoplasmosis data We could "expand out" the toxoplasmosis dataset, to give $\sum_{i=1}^{34} m_i = 697$ records on whether each individual in each city tests positive to toxoplasmosis: ```{r, echo = FALSE} expand_to_binary <- function(r, m) { c(rep(1, r), rep(0, m - r)) } toxo_ind <- data.frame( rain = unlist(mapply(rep, times = toxo$m, x = toxo$rain)), y = unlist(mapply(expand_to_binary, r = toxo$r, m = toxo$m)), city = unlist(mapply(rep, times = toxo$m, x = toxo$city)) ) ``` ```{r} head(toxo_ind) ``` ## Models for individual-level data We can fit various logistic regression models for relating toxoplasmosis incidence to rainfall, of the form \[Y_{ij} \sim \text{Bernoulli}(\mu_i), \quad \text{logit}(\mu_i) = \eta_i.\] ```{r} mod0_ind <- glm(y ~ 1, data = toxo_ind, family = "binomial") mod3_ind <- glm(y ~ poly(rain, 3), data = toxo_ind, family = "binomial") mod0_ri_ind <- glmer(y ~ (1 | city), data = toxo_ind, family = "binomial") mod3_ri_ind <- glmer(y ~ poly(rain, 3) + (1 | city), data = toxo_ind, family = "binomial") ``` ## Parameter estimates from individual-level data ```{r} coef(mod0) ``` Is `coef(mod0_ind)` also `r format(coef(mod0), 8)`? - Yes, the two estimates are the same - No, the two estimates are different - Not sure or need more information \pause ```{r} coef(mod0_ind) ``` ## AICs from individual-level data ```{r} AIC(mod0) ``` Recall $\text{AIC} = 2\left( p-\hat\ell\right)$. Is `AIC(mod0_ind)` also `r format(AIC(mod0), 4)`? - Yes, the two AICs are the same - No, the two AICs are different - Not sure or need more information \pause ```{r} AIC(mod0_ind) ``` ## Why is this happening? For the original data, we have $R_i \sim \text{Binomial}(m_i, \mu_i(\beta))$ with likelihood \small \[L(\beta) = \prod_{i=1}^{34} {m_i \choose r_i} \mu_i(\beta)^{r_i} (1 - \mu_i(\beta))^{m_i - r_i}.\] \normalsize For the individual-level data, we have $Y_{ij} \sim \text{Bernoulli}(\mu_i(\beta))$ with likelihood \small \[L_\text{ind} (\beta) = \prod_{i=1}^{34} \prod_{j=1}^{m_i} \mu_i(\beta)^{y_{ij}} (1 - \mu_i(\beta))^{1 - y_{ij}} = \prod_{i=1}^{34} \mu_i(\beta)^{r_i} (1 - \mu_i(\beta))^{m_i - r_i},\] \normalsize so \small \[\ell(\beta) = \sum_{i=1}^{34} \log {m_i \choose r_i} + \ell_\text{ind}(\beta). \] \normalsize The difference is constant in $\beta$, but does change the values of $\hat \ell$ in AIC. ## Differences in AICs ```{r} AIC(mod3) - AIC(mod0) ``` Is `AIC(mod3_ind) - AIC(mod0_ind)` also `r format(AIC(mod3) - AIC(mod0), 6)`? - Yes, the two differences in AICs are the same - No, the two differences in AICs are different - Not sure or need more information \pause ```{r} AIC(mod3_ind) - AIC(mod0_ind) ``` ## Differences in AICs ```{r} AICs_ind <- c(AIC(mod0_ind), AIC(mod3_ind), AIC(mod0_ri_ind), AIC(mod3_ri_ind)) ``` ```{r} AICs - min(AICs) AICs_ind - min(AICs_ind) ``` ## Differences in BICs ```{r} BIC(mod3) - BIC(mod0) ``` Recall $\text{BIC} = 2\left(\frac{1}{2} p\log n-\hat\ell\right)$. Is `BIC(mod3_ind) - BIC(mod0_ind)` also `r format(BIC(mod3) - BIC(mod0), 6)`? - Yes, the two differences in BICs are the same - No, the two differences in BICs are different - Not sure or need more information \pause ```{r} BIC(mod3_ind) - BIC(mod0_ind) ``` ## Comparing BICs ```{r} BICs_ind <- c(BIC(mod0_ind), BIC(mod3_ind), BIC(mod0_ri_ind), BIC(mod3_ri_ind)) ``` ```{r} BICs - min(BICs) BICs_ind - min(BICs_ind) ``` Differences in BIC are not the same in the two equivalent forms of the model! ## Why is this happening? Recall \[\text{BIC} = 2\left(\frac{1}{2} p\log n-\hat\ell\right).\] How is $n$ measured here? ```{r} nobs(mod0) nobs(mod0_ind) ``` ## What should $n$ be here? The theoretical development of BIC (not covered in lectures) assumes $n$ is the rate at which information grows, i.e. the Fisher information matrix $I(\theta)$ grows at rate $n$. See [Neath and Cavanaugh (2011)](https://www.doi.org/10.1002/wics.199) for a nice review. If we believe the logistic regression models, then should therefore take $n =$ `r nobs(mod0_ind)`. \pause What if there is really clustering in the data, as in the mixed effects models? Much more complex story: different components of the Fisher information matrix might grow at different rates! See [Delattre et. al. (2014)](http://www.doi.org/10.1214/14-EJS890) for some discussion. ## Summary - Be careful when applying information criteria: you should use the same form of response in all models being compared, and make sure the full loglikelihood (including all constant terms) is used. - BIC requires particular care, and the defaults in R might give unexpected results. - Exercise even more caution using BIC with mixed effects models. Questions on lecture content?