BIOS 667 - Lecture 14: Generalized Linear Mixed Effects Models (Ch. 14)

Fitzmaurice, Laird & Ware (2011) - Applied Longitudinal Data Analysis

Naim Rashid

Lecture Objectives

Position GLMMs within the marginal-mixed-transition modeling triad
Derive the GLMM likelihood and understand why integrals over random effects drive computation
Fit and interpret subject-specific effects for binary and count outcomes in R
Master the critical distinction between marginal and conditional interpretations
Work a real FLW Ch. 14 Case Study (seizure-count data) end to end
Diagnose model fit, test variance components correctly, and report results honestly

Notation reference

Symbol	Meaning
\(Y_{ij}\)	response for subject \(i\) at occasion \(j\) (scalar)
\(\mathbf{Y}_i\)	response vector for subject \(i\) \((n_i \times 1)\)
\(\mathbf{X}_i,\ \boldsymbol\beta\)	fixed-effect design \(n_i \times p\) and coefficients
\(\mathbf{Z}_i,\ \mathbf{b}_i\)	random-effect design and random effects, \(\mathbf{b}_i \sim N(0, G)\)
\(G\)	between-subject random-effects covariance (FLW’s letter, Ch. 8)
\(R_i\)	within-subject residual covariance, \(\boldsymbol\varepsilon_i \sim N(0, R_i)\)
\(\Sigma_i = \mathbf{Z}_i G \mathbf{Z}_i^\top + R_i\)	marginal (total) covariance \(\operatorname{Cov}(\mathbf{Y}_i)\) for subject \(i\)
\(\rho\)	correlation parameter (AR(1), exchangeable)

Indices: \(i\) subject \(1,\dots,N\); \(j\) occasion/time \(1,\dots,n_i\) (\(n\) when balanced); \(k\) state (transition models); \(g\) group. Bold = vector or matrix.

Notation (this lecture)

Symbol	Meaning
\(Y_{ij}\mid \mathbf{b}_i\)	outcome conditional on the random effects \(\mathbf{b}_i\)
\(g(\cdot)\)	link: \(g\!\left(E[Y_{ij}\mid \mathbf{b}_i]\right) = X_{ij}\beta + Z_{ij}\mathbf{b}_i\)
\(\mathbf{b}_i \in \mathbb{R}^q\)	subject-specific random effects, \(\mathbf{b}_i \sim N_q(0, G)\)
\(\pi_{ij}\)	binary response probability (recall L11)
\(\phi\)	dispersion parameter (recall L11)
\(f_G(\mathbf{b}_i)\)	random-effects density, \(N(\mathbf{b}_i;\,0,G)\)

Roadmap

Part I (Day 1) - Concepts

Motivating examples and the model hierarchy
GEE vs GLMM: terminology, missing-data asymmetry
GLMM anatomy and the likelihood integral
Binary random-intercept GLMM: subject-specific interpretation
Conditional vs marginal effects; Jensen; the Zeger approximation

Part II (Day 2) - Worked machinery

Predictions taxonomy (subject / at-RE / true marginal)
Real Case Study: FLW seizure-count data (Table 14.6)
Overdispersion, variance-component testing, diagnostics
Convergence recipe, pitfalls, reporting

Part I (Day 1): Concepts

Where We Are

Chapter	Topic	Focus
11	Generalized linear models	Discrete outcomes, one observation
12-13	Marginal models + GEE	Population-average inference
14	GLMMs	Subject-specific inference for discrete outcomes

GLMMs Excel When

Subject heterogeneity drives the association
Covariate effects are interpreted conditional on latent subject traits
Predicted subject trajectories or shrinkage estimates are needed
Interest is at the individual level

GEE vs GLMM: Translation Guide

GEE (marginal)	GLMM (conditional)
Population-averaged effect	Subject-specific effect
Working correlation \(\operatorname{Corr}_i(\alpha)\)	Correlation induced from \(G\)
Robust (sandwich) SE	Model-based SE
\(\text{OR}_{\text{population}}\)	\(\text{OR}_{\text{conditional}}\)
“Average over people”	“Within a person, holding \(\mathbf{b}_i\) fixed”
Valid inference needs MCAR	Valid inference under MAR

Key insight: Both answer valid but different scientific questions.

When to Choose Which

Choose GLMM when	Choose GEE when
Subject-specific effect is the question	Population-averaged effect is the question
Need individual trajectory predictions	Covariance is a nuisance
Heterogeneity is scientifically interesting	Want robust, low-assumption inference
Cluster sizes adequate (\(n_i \geq 5\) for binary)	Very small clusters (\(n_i = 2\)-\(3\)), sparse binary
Missing data plausibly MAR	Missing data plausibly MCAR

Why GLMMs Matter: Asthma Example

Analysis	Treatment OR	Interpretation
GEE (marginal)	1.8	Population: steroid raises odds of control 80%
GLMM (conditional)	2.4	Within child: steroid raises odds 140%

Why do they differ? Averaging the nonlinear logit over between-child variability shrinks the population-averaged OR toward 1 (non-collapsibility / Jensen). The subject-specific OR is larger. This is attenuation, not differential benefit.

Note

Forward reference: we make this precise on the Jensen slides and quantify it with the Zeger approximation (a Ch. 16 result).

Non-collapsibility (define it now)

Non-collapsibility: an odds ratio computed within strata can differ from the odds ratio computed in the pooled (marginal) population even when there is no confounding.

Risk differences and (log) rate ratios are collapsible; odds ratios are not
Mixing subjects with different baseline risk (\(\mathbf{b}_i\)) shifts the marginal OR toward 1
This is a property of the odds-ratio scale, not a bias

Model Hierarchy

Level 1 (conditional on \(\mathbf{b}_i \in \mathbb{R}^q\)): \[Y_{ij} \mid \mathbf{b}_i \sim \text{EF}(\mu_{ij}, \phi), \quad g(\mu_{ij}) = \eta_{ij} = X_{ij}\beta + Z_{ij}\mathbf{b}_i\]

Level 2: \[\mathbf{b}_i \sim N(0, G)\]

EF = exponential-family distribution (binomial, Poisson, …; recall L11)
Grounding links: logit for binary \(\pi_{ij}\), log for Poisson rates

Jensen’s Inequality and Marginal Means

Marginal mean: \(E(Y_{ij}) \neq g^{-1}(X_{ij}\beta)\) because of Jensen’s inequality:

\[E\!\left[g^{-1}(\eta + b)\right] \neq g^{-1}\!\left(E[\eta + b]\right) \quad \text{for nonlinear } g^{-1}\]

Key: random effects create heterogeneity; averaging over it attenuates effects on the probability scale.

Note

The explicit conditional-vs-marginal numeric contrast (next two slides) previews FLW Ch. 16 (Fig. 16.1). Ch. 14 itself establishes the conditional interpretation; we show the contrast now to motivate it.

Jensen’s Inequality: Concrete Numerical Example

Setup: logistic model, random intercept \(b_i\) taking three equally likely values \(\{-2, 0, 2\}\).

At \(\eta = 0\):

\(b_i\)	\(\eta + b_i\)	\(P(Y=1 \mid b_i) = \text{expit}(\eta + b_i)\)
-2	-2	0.119
0	0	0.500
+2	+2	0.881

Plug-in at \(b=0\): 0.500. Average of probabilities: \((0.119+0.500+0.881)/3 = 0.500\). They match here because \(\eta=0\) is symmetric.

Jensen’s Inequality: The Calculation

Now try \(\eta = 1\):

\(b_i\)	\(\eta + b_i\)	\(P(Y=1 \mid b_i)\)
-2	-1	0.269
0	1	0.731
+2	3	0.953

Plug-in at \(b=0\): \(\text{expit}(1) = 0.731\)
Average the probabilities: \((0.269+0.731+0.953)/3 = 0.651\)

Difference: 8 percentage points. This is why re.form=NA does NOT give marginal predictions.

Quick Check

Why is the true marginal probability (0.651) lower than the plug-in at \(b=0\) (0.731)?

Answer: above \(\eta = 0\) the logistic curve is concave, so \(E[g(X)] \leq g(E[X])\). The \(b_i=-2\) subject pulls the average down more than \(b_i=+2\) can pull it up (the curve flattens near 1).

Likelihood Challenge

Subject-specific contribution: \[L_i(\beta, G) = \int_{\mathbb{R}^q} \prod_{j=1}^{n_i} f(y_{ij} \mid \mathbf{b}_i;\, \beta)\; f_G(\mathbf{b}_i)\; d\mathbf{b}_i\]

\(f_G(\mathbf{b}_i) = N(\mathbf{b}_i;\, 0, G)\) is the random-effects density
No closed form when \(f(\cdot\mid\mathbf{b}_i)\) is non-Gaussian
This integral is what makes GLMM fitting hard

Estimation: ML via Numerical Integration

Random intercept only, moderate n_i (>=5)?
  -> Laplace approximation (lme4 default, nAGQ = 1): fast, usually adequate
Random slopes, or small/sparse clusters?
  -> Adaptive Gauss-Hermite quadrature (nAGQ = 5-50): more accurate
Complex G, sparse data, full uncertainty wanted?
  -> Bayesian MCMC (brms, MCMCglmm)

Note

Recipe: start at Laplace, confirm with a few quadrature points, escalate only if estimates move. FLW’s Ch. 14 case studies all use ML via 50-point adaptive Gaussian quadrature (PROC GLIMMIX).

Warning

PQL / MQL are Ch. 15 (approximate methods), not Ch. 14 estimation. They are biased for sparse binary data; we treat them next chapter.

R Setup

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
  library(ggplot2)
  library(lme4)
  library(emmeans)
  library(broom.mixed)
  library(patchwork)
})
theme_set(theme_minimal(base_size = 16))

R Setup

Example 1: Binary Asthma Control Trial (simulated)

Warning

Simulated to isolate a teaching point (the conditional-vs-marginal gap on the binary/logit scale). The real FLW Ch. 14 Case Study is the seizure data, worked in Part II.

120 children, quarterly visits for 2 years
Outcome: controlled asthma (1) vs uncontrolled (0)
Predictors: treatment (steroid vs usual care), visit time, baseline severity
Random intercept captures latent propensity for control

Example 1: Data Simulation

set.seed(667)
n_subj <- 120; visits <- 8
asthma_dat <- tibble(
  id    = rep(1:n_subj, each = visits),
  visit = rep(0:(visits - 1), times = n_subj),
  weeks = rep(0:(visits - 1), times = n_subj) * 12,
  treat = factor(rep(sample(c("Steroid","Usual"), n_subj, replace = TRUE), each = visits)),
  severity = rep(sample(c("Mild","Moderate","Severe"), n_subj, replace = TRUE,
                        prob = c(0.5, 0.35, 0.15)), each = visits)
) %>%
  mutate(
    sev_score = case_when(severity == "Mild" ~ 0, severity == "Moderate" ~ 0.6, TRUE ~ 1.2),
    treat_num = if_else(treat == "Steroid", 0.6, 0),   # TRUTH: steroid log-OR = 0.6
    time_eff  = 0.05 * weeks / 12,                      # TRUTH: time log-OR per yr = 0.05
    rand_int  = rep(rnorm(n_subj, sd = 0.9), each = visits),  # TRUTH: RE intercept SD = 0.9
    linpred   = -1 + treat_num + time_eff - 0.8 * sev_score + # TRUTH: severity log-OR = -0.8
                0.15 * treat_num * (weeks / 12) + rand_int,    # TRUTH: treat x time = 0.15
    prob      = plogis(linpred),
    control   = rbinom(n_subj * visits, size = 1, prob = prob)
  )
asthma_dat$treat <- relevel(asthma_dat$treat, ref = "Usual")

Example 1: Data Simulation

Fit the Random-Intercept Logistic GLMM

glmm_fit <- glmer(
  control ~ treat * weeks + severity + (1 | id),
  data = asthma_dat,
  family = binomial(link = "logit")
)
broom.mixed::tidy(glmm_fit, effects = "fixed", conf.int = TRUE) %>%
  dplyr::select(term, estimate, std.error, conf.low, conf.high)

Note

Recovered steroid log-OR 1 (truth 0.6) and random-intercept SD 1.04 (truth 0.9): the conditional estimates track the DGP.

Fit the Random-Intercept Logistic GLMM

# A tibble: 6 × 5
  term               estimate std.error conf.low conf.high
  <chr>                 <dbl>     <dbl>    <dbl>     <dbl>
1 (Intercept)        -1.20      0.218   -1.63     -0.771  
2 treatSteroid        1.00      0.306    0.404     1.60   
3 weeks               0.00712   0.00116  0.00484   0.00939
4 severityModerate   -0.0986    0.272   -0.632     0.435  
5 severitySevere     -1.29      0.389   -2.05     -0.523  
6 treatSteroid:weeks -0.00409   0.00421 -0.0123    0.00417

Interpreting the Fixed Effects (conditional)

\(\exp(\beta)\) is an odds ratio conditional on the random intercept (within a given child)
Positive steroid effect = better control for the same child
The interaction tests diverging within-child trajectories
Critical: these are NOT population-averaged effects

Conditional vs Marginal: Why They Differ

Reason	Explanation
Non-collapsibility	Odds ratios do not average over subgroups (unlike risk differences / log rate ratios)
Jensen’s inequality	\(E[g^{-1}(\eta + b)] \neq g^{-1}(E[\eta + b])\) for nonlinear link \(g\)
Averaging direction	Averaging the nonlinear link over \(\mathbf{b}_i\) pulls the marginal effect toward the null

Converting Conditional to Marginal (Zeger approx.)

For random-intercept logistic models (Zeger, Liang & Albert 1988; FLW Ch. 16 / p. 477):

\[\beta^{\text{marg}} \approx \frac{\beta^{\text{cond}}}{\sqrt{1 + c_2^{2}\,\sigma_b^2}}, \qquad c_2^{2} \approx 0.346\]

Note

The multiplier under the root is \(c_2^2 \approx 0.346\), where \(c_2 = 16\sqrt{3}/(15\pi) \approx 0.588\). FLW p. 477 also uses \(k = 0.588\) directly; \(0.346 = (1/1.7)^2\) is the probit-matching teaching form.

Where Does 0.346 Come From?

Warning

Derivation sketch (not examinable). The point is that averaging attenuates, by roughly this factor.

The logistic CDF \(F_{\text{logit}}(x) = 1/(1+e^{-x})\) is well approximated by the probit \(\Phi(x/1.7)\), where \(\Phi\) is the standard-normal CDF.
For a probit link the marginalization integral is closed form: \[\int_{-\infty}^{\infty} \Phi\!\left(\frac{\eta + b}{c}\right) f_G(b)\, db = \Phi\!\left(\frac{\eta}{\sqrt{c^2 + \sigma_b^2}}\right)\]
Matching the logistic to the probit gives \(c = 1.7\), so the slope attenuation multiplier is \(c^{-2} = (1.7)^{-2} \approx 0.346\).

Applying the Zeger Approximation

beta_cond <- fixef(glmm_fit)["treatSteroid"]
var_u     <- as.numeric(attr(VarCorr(glmm_fit)$id, "stddev"))^2
atten     <- sqrt(1 + 0.346 * var_u)      # 0.346 = c_2^2 multiplier
beta_marg <- beta_cond / atten

cat("Conditional beta (GLMM):", round(beta_cond, 3),
    " -> OR =", round(exp(beta_cond), 2), "\n")

Applying the Zeger Approximation

Conditional beta (GLMM): 1.004  -> OR = 2.73

cat("Marginal beta (Zeger):  ", round(beta_marg, 3),
    " -> OR =", round(exp(beta_marg), 2), "\n")

Marginal beta (Zeger):   0.857  -> OR = 2.36

cat("Attenuation factor:     ", round(atten, 3), "\n")

Attenuation factor:      1.172

Side-by-Side: Conditional vs Marginal

Analysis	Treatment OR	Interpretation
GLMM (Conditional)	2.73	Within-child effect: child’s latent propensity held fixed
Zeger-approximated marginal	2.36	Population-average effect (analytic approximation)

Check Your Understanding: Zeger

A GLMM for smoking cessation gives conditional log-OR \(\beta^{\text{cond}} = 1.1\) (OR = 3.0) and \(\sigma_b^2 = 2.0\). Approximate marginal OR?

\[\beta^{\text{marg}} = \frac{1.1}{\sqrt{1 + 0.346 \times 2.0}} = \frac{1.1}{\sqrt{1.692}} = \frac{1.1}{1.30} = 0.85,\quad \text{OR} = e^{0.85} = 2.34\]

The within-person OR (3.0) attenuates to a population-average OR (2.34) because of baseline-propensity heterogeneity.

Part II (Day 2): Worked Machinery

Recap of Part I

GLMM: \(g(E[Y_{ij}\mid\mathbf{b}_i]) = X_{ij}\beta + Z_{ij}\mathbf{b}_i\), \(\mathbf{b}_i\sim N(0,G)\)
\(\exp(\beta)\) is a conditional (within-subject) odds/rate ratio
Marginal effect is attenuated by Jensen / non-collapsibility (Zeger approx., Ch. 16)
Fit by ML (Laplace / adaptive quadrature); PQL/MQL are Ch. 15
GLMM valid under MAR; standard GEE needs MCAR

How to Report GLMM Results

Correct:

“Within a given child, steroid increased odds of control by OR = 2.73 (95% CI …)”
“At the population level (marginal), the OR is approximately 2.36”

Incorrect:

“Steroid increased the odds of control” (conditional or marginal?)

Always state the frame of reference.

Three Types of Predictions

Type	R code	Interpretation
Subject-specific	`predict(fit, type="response")`	Uses predicted \(\hat{\mathbf{b}}_i\) (BLUPs)
At \(\mathbf{b}=0\)	`predict(fit, re.form=NA)`	Sets \(\mathbf{b}_i=0\); still conditional
True marginal	Monte Carlo integration	\(E(Y) = \int E(Y\mid \mathbf{b})\,f_G(\mathbf{b})\,d\mathbf{b}\)

For nonlinear links, “at \(\mathbf{b}=0\)” \(\neq\) “marginal mean.”

Common Misconception: re.form = NA

# WRONG to call this a marginal prediction.
# It is E(Y | b = 0): conditional at the median latent subject.
pred_at_b0 <- predict(glmm_fit,
  newdata = data.frame(treat = "Steroid", weeks = 48, severity = "Mild"),
  re.form = NA, type = "response")
cat("Prediction at b=0:", round(pred_at_b0, 4), "  (this is E(Y|b=0), NOT E(Y))\n")

Common Misconception: re.form = NA

Prediction at b=0: 0.4877   (this is E(Y|b=0), NOT E(Y))

True Marginal Predictions via Monte Carlo

sigma_b   <- as.numeric(attr(VarCorr(glmm_fit)$id, "stddev"))
nd        <- data.frame(treat = "Steroid", weeks = 48, severity = "Mild")
eta_fixed <- predict(glmm_fit, newdata = nd, re.form = NA, type = "link")

set.seed(667)
b_sim       <- rnorm(5000, mean = 0, sd = sigma_b)   # draw from N(0, G)
marginal_pred <- mean(plogis(eta_fixed + b_sim))      # average expit over b

cat("At b=0 (NOT marginal):           ", round(pred_at_b0, 4), "\n")

True Marginal Predictions via Monte Carlo

At b=0 (NOT marginal):            0.4877

cat("True marginal (Monte Carlo):     ", round(marginal_pred, 4), "\n")

True marginal (Monte Carlo):      0.4873

cat("Difference:                      ", round(marginal_pred - pred_at_b0, 4), "\n")

Difference:                       -3e-04

Why the Difference Matters

Note

Reading it: the marginal curve (averaged over \(\mathbf{b}\)) is flatter / pulled toward 0.5 than the plug-in curve. The gap is largest where the logistic curve is most nonlinear and shrinks at the extremes. Flatter marginal = attenuated population-average effect.

A Real FLW Case Study: Seizure Counts

Note

FLW Ch. 14 canonical example (Leppik et al. 1987; Thall & Vail 1990). Placebo-controlled trial of progabide in 59 epileptics. Baseline 8-week seizure count, then four 2-week post-randomization counts. Model and target match FLW Table 14.5 / 14.6.

Outcome: seizure count per period, Poisson with offset \(\log(T_{ij})\) (\(T = 8\) wk baseline, \(2\) wk follow-up)
Random intercept and slope on Time (baseline vs follow-up): \(q = 2\)
Question: does progabide reduce the subject-specific seizure rate?

Load and Reshape the Seizure Data

sz_url  <- "https://content.sph.harvard.edu/fitzmaur/ala2e/epilepsy.txt"
sz_file <- "../../data/epilepsy-data.txt"     # local copy (header stripped)
if (!file.exists(sz_file)) {
  raw <- readLines(url(sz_url))               # FLW file has a comment header
  dat <- raw[grepl("^\\s*[0-9]", raw)]        # keep numeric rows only
  writeLines(dat, sz_file)
}
sz_wide <- read.table(sz_file, header = FALSE,
  col.names = c("id", "trt", "age", "baseline", "y1", "y2", "y3", "y4"))

sz_long <- sz_wide %>%
  pivot_longer(c(baseline, y1, y2, y3, y4),
               names_to = "period_lab", values_to = "count") %>%
  mutate(
    period = recode(period_lab, baseline = 0L, y1 = 1L, y2 = 2L, y3 = 3L, y4 = 4L),
    Time   = if_else(period == 0L, 0L, 1L),         # 0 = baseline, 1 = follow-up
    Tij    = if_else(period == 0L, 8, 2),           # exposure weeks
    logT   = log(Tij),
    Trt    = trt                                    # 0 = placebo, 1 = progabide
  ) %>% arrange(id, period)
cat("Rows:", nrow(sz_long), " Subjects:", dplyr::n_distinct(sz_long$id), "\n")

Load and Reshape the Seizure Data

Rows: 295  Subjects: 59

Fit the Seizure GLMM (FLW Table 14.6)

sz_fit <- glmer(
  count ~ Trt * Time + (1 + Time | id),
  offset = logT, data = sz_long,
  family = poisson(link = "log"),
  control = glmerControl(optimizer = "bobyqa")
)
broom.mixed::tidy(sz_fit, effects = "fixed", conf.int = TRUE) %>%
  dplyr::select(term, estimate, std.error, conf.low, conf.high)

Note

Reproduces FLW Table 14.6: Intercept \(\approx\) 1.07, Time \(\approx\) 0.00, Trt \(\approx\) 0.05, Trt\(\times\)Time \(\approx\) -0.306 (FLW -0.3065), \(\widehat{g}_{11}\approx\) 0.5 (FLW 0.50). Laplace matches the book’s 50-point quadrature here.

Fit the Seizure GLMM (FLW Table 14.6)

# A tibble: 4 × 5
  term         estimate std.error conf.low conf.high
  <chr>           <dbl>     <dbl>    <dbl>     <dbl>
1 (Intercept)  1.07         0.140    0.796    1.35  
2 Trt          0.0512       0.193   -0.326    0.429 
3 Time        -0.000474     0.109   -0.214    0.213 
4 Trt:Time    -0.306        0.150   -0.601   -0.0114

Interpreting the Seizure Fit (subject-specific)

All covariates are dichotomous, so \(\exp(\beta)\) are subject-specific (log) rate ratios (FLW Table 14.5)
\(\exp(\beta_4)\) (Trt\(\times\)Time) is a ratio of rate ratios: progabide vs placebo change from baseline
\(\widehat{\beta}_4 = -0.306 < 0\): progabide reduces the seizure rate from baseline
For a “typical” progabide patient (\(b_{2i}=0\)): rate ratio \(e^{\widehat\beta_2 + \widehat\beta_4} \approx 0.74\) (about a 26% drop)

Seizure Variance Components

print(VarCorr(sz_fit), comp = c("Variance", "Std.Dev."))

Note

\(\widehat{g}_{11} \approx\) 0.5 (baseline-rate heterogeneity) and a positive random-slope variance: patients differ in baseline rate and in how their rate changes over time, the FLW motivation for both random effects.

Seizure Variance Components

 Groups Name        Variance Std.Dev. Corr  
 id     (Intercept) 0.49987  0.70701        
        Time        0.23195  0.48161  0.160

The Same GLMM in SAS (reference)

Static reference (not run here): the seizure log-linear GLMM in PROC GLIMMIX, mirroring FLW Table 14.11 (ML via adaptive quadrature).

proc glimmix data=seizure method=quad(qpoints=50);
   class id trt;
   model count = trt time trt*time / dist=poisson link=log solution
                 offset=logT;
   random intercept time / subject=id type=un;
run;

Note

method=quad(qpoints=50) is adaptive Gaussian quadrature (FLW’s choice); random intercept time / subject=id type=un is lme4’s (1 + time | id) with an unstructured \(G\); offset=logT matches the R offset.

Overdispersion Check

pr  <- residuals(sz_fit, type = "pearson")
od  <- sum(pr^2) / df.residual(sz_fit)
cat("Overdispersion ratio (Pearson X^2 / df):", round(od, 2), "\n")

Ratio	Reading
\(\approx 1\)	Poisson variance adequate
\(> 1.5\)	Overdispersion likely
\(> 3\)	Serious overdispersion

Overdispersion Check

Overdispersion ratio (Pearson X^2 / df): 1.34

Overdispersion Remedies

Option 1 - observation-level random effect (OLRE):

sz_long$obs <- factor(seq_len(nrow(sz_long)))
glmer(count ~ Trt * Time + (1 + Time | id) + (1 | obs),
      offset = logT, data = sz_long, family = poisson)

Option 2 - negative binomial (FLW Table 14.8):

glmmTMB::glmmTMB(count ~ Trt * Time + (1 + Time | id),
                 offset = logT, data = sz_long, family = nbinom2)

Hypothesis Testing: Fixed Effects

Wald tests (from summary()) are asymptotic; small clusters \(\rightarrow\) prefer profile likelihood

confint(sz_fit, parm = "beta_", method = "profile")  # profile CIs for fixed effects

Hypothesis Testing: Variance Components

Testing \(H_0\): random slope dropped ((1+Time|id) \(\to\) (1|id)) removes two parameters (slope variance and intercept-slope covariance).

Warning

Two boundary cases (Stram-Lee):

Drop a single random-intercept variance (\(\sigma^2 = 0\)): null is \(\tfrac12\chi^2_0 + \tfrac12\chi^2_1\)
Drop a random slope from a RI+RS model (this Case Study): null is \(\tfrac12\chi^2_1 + \tfrac12\chi^2_2\)

lme4/anova() report the naive \(\chi^2_2\) p-value; the mixture correction must be applied by hand.

Warning

RLRsim::exactRLRT() is for linear (Gaussian) mixed models only and is not valid for GLMMs. Use a parametric-bootstrap LRT instead.

Parametric-Bootstrap LRT for the Random Slope

null_fit <- glmer(count ~ Trt * Time + (1 | id), offset = logT,
                  data = sz_long, family = poisson,
                  control = glmerControl(optimizer = "bobyqa"))
lr_obs <- as.numeric(2 * (logLik(sz_fit) - logLik(null_fit)))

set.seed(667)
B <- 100
sim_y   <- simulate(null_fit, nsim = B)          # simulate under the null
lr_boot <- vapply(seq_len(B), function(b) {
  f0 <- refit(null_fit, newresp = sim_y[[b]])
  f1 <- refit(sz_fit,  newresp = sim_y[[b]])
  as.numeric(2 * (logLik(f1) - logLik(f0)))
}, numeric(1))

p_boot <- (1 + sum(lr_boot >= lr_obs)) / (1 + B)
p_mix  <- 0.5 * pchisq(lr_obs, 1, lower.tail = FALSE) +
          0.5 * pchisq(lr_obs, 2, lower.tail = FALSE)
cat("LR statistic:        ", round(lr_obs, 2), "\n")

Parametric-Bootstrap LRT for the Random Slope

LR statistic:         171.87

cat("Naive chi^2_2 p:     ", round(pchisq(lr_obs, 2, lower.tail = FALSE), 4), "\n")

Naive chi^2_2 p:      0

cat("Stram-Lee mixture p: ", round(p_mix, 4), "\n")

Stram-Lee mixture p:  0

cat("Parametric bootstrap p (B=", B, "): ", round(p_boot, 4), "\n", sep = "")

Parametric bootstrap p (B=100): 0.0099

Post-Hoc Contrasts with emmeans

emm_contrasts <- emmeans(glmm_fit, specs = pairwise ~ treat | weeks,
                         at = list(weeks = c(0, 48)), type = "response")
summary(emm_contrasts$contrasts)

Post-Hoc Contrasts with emmeans

weeks = 0:
 contrast        odds.ratio    SE  df null z.ratio p.value
 Usual / Steroid      0.366 0.112 Inf    1  -3.279  0.0010

weeks = 48:
 contrast        odds.ratio    SE  df null z.ratio p.value
 Usual / Steroid      0.446 0.113 Inf    1  -3.196  0.0014

Results are averaged over the levels of: severity 
Tests are performed on the log odds ratio scale

Interpreting emmeans Contrasts

These are subject-specific odds ratios evaluated at \(\mathbf{b}_i = 0\) (the median subject on the logit scale), NOT the population-average OR.

At \(\mathbf{b}_i = 0\) = conditional on the median latent subject
The population-average OR is attenuated (Jensen); see the Three Types of Predictions table
Do not read these as “the average subject in the population”

Random-Effects Heterogeneity (seizure slopes)

Note

Reading it: spread along the vertical axis = baseline-rate heterogeneity (\(g_{11}\)); spread along the horizontal = heterogeneity in the rate change (\(g_{22}\)). The mild positive tilt is the small positive \(g_{12}\) (FLW 0.054).

Diagnostics: Residuals vs Fitted

Note

Reading it: good = a flat, structureless band around the red zero line. Bad = a funnel (variance grows with the mean \(\to\) overdispersion) or curvature in the blue smoother (mean misspecified). A few large positive residuals here trace to the high-count patients (e.g. patient 49).

Diagnostics: Random-Effects Q-Q

Warning

Reading it (with a caveat): points near the line support \(\mathbf{b}_i \sim N(0,G)\); heavy tails or curvature suggest non-normal random effects. But these are empirical-Bayes (BLUP) predictions, which are shrunk toward 0 and toward normality, so a clean Q-Q plot understates non-normality. Treat it as a weak check (FLW Ch. 14.2).

Convergence Troubleshooting Recipe

Warning	First fix
“failed to converge”	Rescale continuous predictors (`scale()`)
“nearly unidentifiable”	Simplify \(G\): drop a near-zero random slope
“singular fit” (variance \(\approx 0\))	Use uncorrelated REs: `(1\|id) + (0 + x\|id)`
slow / stuck	Switch optimizer: `glmerControl(optimizer = "bobyqa")`
separation (binary)	Bayesian priors (`brms`) or penalized likelihood (`brglm`)

Rule of thumb: if a random-slope variance is \(< 0.01\) and convergence struggles, drop it.

Convergence: A Worked Fix

# Rescaling the continuous predictor often resolves convergence trouble.
asthma_dat$weeks_z <- as.numeric(scale(asthma_dat$weeks))
fit_scaled <- glmer(control ~ treat * weeks_z + severity + (weeks_z | id),
                    data = asthma_dat, family = binomial,
                    control = glmerControl(optimizer = "bobyqa"))
cat("Converged:", is.null(fit_scaled@optinfo$conv$lme4$messages), "\n")

Convergence: A Worked Fix

Converged: FALSE

cat("Random-slope SD:",
    round(as.data.frame(VarCorr(fit_scaled))$sdcor[2], 3), "\n")

Random-slope SD: 0.178

Common Pitfalls and Remedies

Pitfall	Remedy
Treating \(\beta\) as a marginal effect	It is conditional; use Zeger (Ch. 16) for the marginal effect
`re.form=NA` read as marginal mean	It is \(E(Y\mid\mathbf{b}=0)\); integrate over \(\mathbf{b}\) (Monte Carlo)
Comparing GLMM and GEE ORs as if equal	Different estimands; both valid for their question
`exactRLRT` for a GLMM variance test	Gaussian-LMM only; use a parametric bootstrap
Wrong boundary mixture	Match params dropped: 1 var \(\to \tfrac12\chi^2_0+\tfrac12\chi^2_1\); slope \(\to \tfrac12\chi^2_1+\tfrac12\chi^2_2\)
Near-singular \(G\)	Reduce random slopes; center/scale covariates
Ignoring convergence warnings	Rescale, simplify \(G\), switch optimizer

Reporting Checklist

State “conditional on the random effects” (or “within-subject”) when interpreting \(\beta\)
Report the estimand: conditional OR/RR, and the marginal version if relevant
Report variance components with CIs
State the estimation method (Laplace, nAGQ = X, or MCMC)
For variance-component tests, state the mixture / bootstrap used

Quick Check

A colleague reports “treatment increased the odds 2.5-fold (OR = 2.5)” from a logistic GLMM. What is missing?

Answer: the frame of reference. It is a conditional (within-subject) OR. Complete: “Within a given subject, treatment increased the odds 2.5-fold (conditional OR = 2.5, 95% CI …); the population-average OR is approximately [Zeger].”

Extensions and Software

Extension (Ch. 14)	Software note
Negative-binomial / zero-inflated counts	`glmmTMB` (FLW Table 14.8)
Crossed random effects	`lme4` `(1\|a) + (1\|b)`
Bayesian GLMMs (full posterior)	`brms`, `MCMCglmm`
Selection of fixed + random effects	`glmmPen` (penalized MCEM)

Wrap-Up

GLMM = GLM plus latent subject effects; inference is subject-specific (conditional)
The marginal effect is attenuated (Jensen / non-collapsibility); Zeger approximates it (Ch. 16)
Fit by ML (Laplace / adaptive quadrature); valid under MAR
re.form=NA is conditional at \(\mathbf{b}=0\), not marginal; integrate for true marginal
Test variance components with the right mixture or a parametric bootstrap (never exactRLRT for a GLMM)
Next chapter (Ch. 15): approximate likelihood methods (PQL/MQL) and their tradeoffs

Reading

Required: Fitzmaurice et al. Ch. 14, sections 14.1-14.6, plus the Appendix on adaptive Gaussian quadrature.

Optional:

Zeger, Liang & Albert (1988), marginal vs subject-specific approximation (the 0.346 factor; see FLW Ch. 16)
Bolker et al. (2009), GLMMs: a practical guide

Session Info

sessionInfo()

Session Info

R version 4.6.0 (2026-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] patchwork_1.3.2     broom.mixed_0.2.9.7 emmeans_2.0.3      
[4] lme4_2.0-1          Matrix_1.7-5        ggplot2_4.0.3      
[7] tidyr_1.3.2         dplyr_1.2.1        

loaded via a namespace (and not attached):
 [1] gtable_0.3.6       xfun_0.57          lattice_0.22-9     vctrs_0.7.3       
 [5] tools_4.6.0        Rdpack_2.6.6       generics_0.1.4     parallel_4.6.0    
 [9] sandwich_3.1-1     tibble_3.3.1       pkgconfig_2.0.3    RColorBrewer_1.1-3
[13] S7_0.2.2           lifecycle_1.0.5    compiler_4.6.0     farver_2.1.2      
[17] codetools_0.2-20   htmltools_0.5.9    yaml_2.3.12        pillar_1.11.1     
[21] furrr_0.4.0        nloptr_2.2.1       MASS_7.3-65        reformulas_0.4.4  
[25] boot_1.3-32        multcomp_1.4-30    nlme_3.1-169       parallelly_1.47.0 
[29] tidyselect_1.2.1   digest_0.6.39      mvtnorm_1.3-7      future_1.70.0     
[33] purrr_1.2.2        listenv_0.10.1     labeling_0.4.3     forcats_1.0.1     
[37] splines_4.6.0      fastmap_1.2.0      grid_4.6.0         cli_3.6.6         
[41] magrittr_2.0.5     survival_3.8-6     utf8_1.2.6         broom_1.0.12      
[45] TH.data_1.1-5      withr_3.0.2        scales_1.4.0       backports_1.5.1   
[49] estimability_1.5.1 rmarkdown_2.31     globals_0.19.1     otel_0.2.0        
[53] zoo_1.8-15         coda_0.19-4.1      evaluate_1.0.5     knitr_1.51        
[57] rbibutils_2.4.1    mgcv_1.9-4         rlang_1.2.0        Rcpp_1.1.1-1.1    
[61] xtable_1.8-8       glue_1.8.1         minqa_1.2.8        jsonlite_2.0.0    
[65] R6_2.6.1