BIOS 667 - Lecture 14: Generalized Linear Mixed Effects Models (Ch. 14)

Fitzmaurice, Laird & Ware (2011) - Applied Longitudinal Data Analysis

Naim Rashid

Lecture Objectives

  • Position GLMMs within the marginal-mixed-transition modeling triad
  • Derive the GLMM likelihood and understand why integrals over random effects drive computation
  • Fit and interpret subject-specific effects for binary and count outcomes in R
  • Master the critical distinction between marginal and conditional interpretations
  • Work a real FLW Ch. 14 Case Study (seizure-count data) end to end
  • Diagnose model fit, test variance components correctly, and report results honestly

Notation reference

Symbol Meaning
\(Y_{ij}\) response for subject \(i\) at occasion \(j\) (scalar)
\(\mathbf{Y}_i\) response vector for subject \(i\) \((n_i \times 1)\)
\(\mathbf{X}_i,\ \boldsymbol\beta\) fixed-effect design \(n_i \times p\) and coefficients
\(\mathbf{Z}_i,\ \mathbf{b}_i\) random-effect design and random effects, \(\mathbf{b}_i \sim N(0, G)\)
\(G\) between-subject random-effects covariance (FLW’s letter, Ch. 8)
\(R_i\) within-subject residual covariance, \(\boldsymbol\varepsilon_i \sim N(0, R_i)\)
\(\Sigma_i = \mathbf{Z}_i G \mathbf{Z}_i^\top + R_i\) marginal (total) covariance \(\operatorname{Cov}(\mathbf{Y}_i)\) for subject \(i\)
\(\rho\) correlation parameter (AR(1), exchangeable)

Indices: \(i\) subject \(1,\dots,N\); \(j\) occasion/time \(1,\dots,n_i\) (\(n\) when balanced); \(k\) state (transition models); \(g\) group. Bold = vector or matrix.

Notation (this lecture)

Symbol Meaning
\(Y_{ij}\mid \mathbf{b}_i\) outcome conditional on the random effects \(\mathbf{b}_i\)
\(g(\cdot)\) link: \(g\!\left(E[Y_{ij}\mid \mathbf{b}_i]\right) = X_{ij}\beta + Z_{ij}\mathbf{b}_i\)
\(\mathbf{b}_i \in \mathbb{R}^q\) subject-specific random effects, \(\mathbf{b}_i \sim N_q(0, G)\)
\(\pi_{ij}\) binary response probability (recall L11)
\(\phi\) dispersion parameter (recall L11)
\(f_G(\mathbf{b}_i)\) random-effects density, \(N(\mathbf{b}_i;\,0,G)\)

Roadmap

Part I (Day 1) - Concepts

  1. Motivating examples and the model hierarchy
  2. GEE vs GLMM: terminology, missing-data asymmetry
  3. GLMM anatomy and the likelihood integral
  4. Binary random-intercept GLMM: subject-specific interpretation
  5. Conditional vs marginal effects; Jensen; the Zeger approximation

Part II (Day 2) - Worked machinery

  1. Predictions taxonomy (subject / at-RE / true marginal)
  2. Real Case Study: FLW seizure-count data (Table 14.6)
  3. Overdispersion, variance-component testing, diagnostics
  4. Convergence recipe, pitfalls, reporting

Part I (Day 1): Concepts

Where We Are

Chapter Topic Focus
11 Generalized linear models Discrete outcomes, one observation
12-13 Marginal models + GEE Population-average inference
14 GLMMs Subject-specific inference for discrete outcomes

GLMMs Excel When

  • Subject heterogeneity drives the association
  • Covariate effects are interpreted conditional on latent subject traits
  • Predicted subject trajectories or shrinkage estimates are needed
  • Interest is at the individual level

GEE vs GLMM: Translation Guide

GEE (marginal) GLMM (conditional)
Population-averaged effect Subject-specific effect
Working correlation \(\operatorname{Corr}_i(\alpha)\) Correlation induced from \(G\)
Robust (sandwich) SE Model-based SE
\(\text{OR}_{\text{population}}\) \(\text{OR}_{\text{conditional}}\)
“Average over people” “Within a person, holding \(\mathbf{b}_i\) fixed”
Valid inference needs MCAR Valid inference under MAR

Key insight: Both answer valid but different scientific questions.

When to Choose Which

Choose GLMM when Choose GEE when
Subject-specific effect is the question Population-averaged effect is the question
Need individual trajectory predictions Covariance is a nuisance
Heterogeneity is scientifically interesting Want robust, low-assumption inference
Cluster sizes adequate (\(n_i \geq 5\) for binary) Very small clusters (\(n_i = 2\)-\(3\)), sparse binary
Missing data plausibly MAR Missing data plausibly MCAR

Why GLMMs Matter: Asthma Example

Analysis Treatment OR Interpretation
GEE (marginal) 1.8 Population: steroid raises odds of control 80%
GLMM (conditional) 2.4 Within child: steroid raises odds 140%

Why do they differ? Averaging the nonlinear logit over between-child variability shrinks the population-averaged OR toward 1 (non-collapsibility / Jensen). The subject-specific OR is larger. This is attenuation, not differential benefit.

Note

Forward reference: we make this precise on the Jensen slides and quantify it with the Zeger approximation (a Ch. 16 result).

Non-collapsibility (define it now)

Non-collapsibility: an odds ratio computed within strata can differ from the odds ratio computed in the pooled (marginal) population even when there is no confounding.

  • Risk differences and (log) rate ratios are collapsible; odds ratios are not
  • Mixing subjects with different baseline risk (\(\mathbf{b}_i\)) shifts the marginal OR toward 1
  • This is a property of the odds-ratio scale, not a bias

Model Hierarchy

Level 1 (conditional on \(\mathbf{b}_i \in \mathbb{R}^q\)): \[Y_{ij} \mid \mathbf{b}_i \sim \text{EF}(\mu_{ij}, \phi), \quad g(\mu_{ij}) = \eta_{ij} = X_{ij}\beta + Z_{ij}\mathbf{b}_i\]

Level 2: \[\mathbf{b}_i \sim N(0, G)\]

  • EF = exponential-family distribution (binomial, Poisson, …; recall L11)
  • Grounding links: logit for binary \(\pi_{ij}\), log for Poisson rates

Jensen’s Inequality and Marginal Means

Marginal mean: \(E(Y_{ij}) \neq g^{-1}(X_{ij}\beta)\) because of Jensen’s inequality:

\[E\!\left[g^{-1}(\eta + b)\right] \neq g^{-1}\!\left(E[\eta + b]\right) \quad \text{for nonlinear } g^{-1}\]

Key: random effects create heterogeneity; averaging over it attenuates effects on the probability scale.

Note

The explicit conditional-vs-marginal numeric contrast (next two slides) previews FLW Ch. 16 (Fig. 16.1). Ch. 14 itself establishes the conditional interpretation; we show the contrast now to motivate it.

Jensen’s Inequality: Concrete Numerical Example

Setup: logistic model, random intercept \(b_i\) taking three equally likely values \(\{-2, 0, 2\}\).

At \(\eta = 0\):

\(b_i\) \(\eta + b_i\) \(P(Y=1 \mid b_i) = \text{expit}(\eta + b_i)\)
-2 -2 0.119
0 0 0.500
+2 +2 0.881

Plug-in at \(b=0\): 0.500. Average of probabilities: \((0.119+0.500+0.881)/3 = 0.500\). They match here because \(\eta=0\) is symmetric.

Jensen’s Inequality: The Calculation

Now try \(\eta = 1\):

\(b_i\) \(\eta + b_i\) \(P(Y=1 \mid b_i)\)
-2 -1 0.269
0 1 0.731
+2 3 0.953
  • Plug-in at \(b=0\): \(\text{expit}(1) = 0.731\)
  • Average the probabilities: \((0.269+0.731+0.953)/3 = 0.651\)

Difference: 8 percentage points. This is why re.form=NA does NOT give marginal predictions.

Quick Check

Why is the true marginal probability (0.651) lower than the plug-in at \(b=0\) (0.731)?

Answer: above \(\eta = 0\) the logistic curve is concave, so \(E[g(X)] \leq g(E[X])\). The \(b_i=-2\) subject pulls the average down more than \(b_i=+2\) can pull it up (the curve flattens near 1).

Likelihood Challenge

Subject-specific contribution: \[L_i(\beta, G) = \int_{\mathbb{R}^q} \prod_{j=1}^{n_i} f(y_{ij} \mid \mathbf{b}_i;\, \beta)\; f_G(\mathbf{b}_i)\; d\mathbf{b}_i\]

  • \(f_G(\mathbf{b}_i) = N(\mathbf{b}_i;\, 0, G)\) is the random-effects density
  • No closed form when \(f(\cdot\mid\mathbf{b}_i)\) is non-Gaussian
  • This integral is what makes GLMM fitting hard

Estimation: ML via Numerical Integration

Random intercept only, moderate n_i (>=5)?
  -> Laplace approximation (lme4 default, nAGQ = 1): fast, usually adequate
Random slopes, or small/sparse clusters?
  -> Adaptive Gauss-Hermite quadrature (nAGQ = 5-50): more accurate
Complex G, sparse data, full uncertainty wanted?
  -> Bayesian MCMC (brms, MCMCglmm)

Note

Recipe: start at Laplace, confirm with a few quadrature points, escalate only if estimates move. FLW’s Ch. 14 case studies all use ML via 50-point adaptive Gaussian quadrature (PROC GLIMMIX).

Warning

PQL / MQL are Ch. 15 (approximate methods), not Ch. 14 estimation. They are biased for sparse binary data; we treat them next chapter.

R Setup

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
  library(ggplot2)
  library(lme4)
  library(emmeans)
  library(broom.mixed)
  library(patchwork)
})
theme_set(theme_minimal(base_size = 16))

R Setup

Example 1: Binary Asthma Control Trial (simulated)

Warning

Simulated to isolate a teaching point (the conditional-vs-marginal gap on the binary/logit scale). The real FLW Ch. 14 Case Study is the seizure data, worked in Part II.

  • 120 children, quarterly visits for 2 years
  • Outcome: controlled asthma (1) vs uncontrolled (0)
  • Predictors: treatment (steroid vs usual care), visit time, baseline severity
  • Random intercept captures latent propensity for control

Example 1: Data Simulation

set.seed(667)
n_subj <- 120; visits <- 8
asthma_dat <- tibble(
  id    = rep(1:n_subj, each = visits),
  visit = rep(0:(visits - 1), times = n_subj),
  weeks = rep(0:(visits - 1), times = n_subj) * 12,
  treat = factor(rep(sample(c("Steroid","Usual"), n_subj, replace = TRUE), each = visits)),
  severity = rep(sample(c("Mild","Moderate","Severe"), n_subj, replace = TRUE,
                        prob = c(0.5, 0.35, 0.15)), each = visits)
) %>%
  mutate(
    sev_score = case_when(severity == "Mild" ~ 0, severity == "Moderate" ~ 0.6, TRUE ~ 1.2),
    treat_num = if_else(treat == "Steroid", 0.6, 0),   # TRUTH: steroid log-OR = 0.6
    time_eff  = 0.05 * weeks / 12,                      # TRUTH: time log-OR per yr = 0.05
    rand_int  = rep(rnorm(n_subj, sd = 0.9), each = visits),  # TRUTH: RE intercept SD = 0.9
    linpred   = -1 + treat_num + time_eff - 0.8 * sev_score + # TRUTH: severity log-OR = -0.8
                0.15 * treat_num * (weeks / 12) + rand_int,    # TRUTH: treat x time = 0.15
    prob      = plogis(linpred),
    control   = rbinom(n_subj * visits, size = 1, prob = prob)
  )
asthma_dat$treat <- relevel(asthma_dat$treat, ref = "Usual")

Example 1: Data Simulation

Fit the Random-Intercept Logistic GLMM

glmm_fit <- glmer(
  control ~ treat * weeks + severity + (1 | id),
  data = asthma_dat,
  family = binomial(link = "logit")
)
broom.mixed::tidy(glmm_fit, effects = "fixed", conf.int = TRUE) %>%
  dplyr::select(term, estimate, std.error, conf.low, conf.high)

Note

Recovered steroid log-OR 1 (truth 0.6) and random-intercept SD 1.04 (truth 0.9): the conditional estimates track the DGP.

Fit the Random-Intercept Logistic GLMM

# A tibble: 6 × 5
  term               estimate std.error conf.low conf.high
  <chr>                 <dbl>     <dbl>    <dbl>     <dbl>
1 (Intercept)        -1.20      0.218   -1.63     -0.771  
2 treatSteroid        1.00      0.306    0.404     1.60   
3 weeks               0.00712   0.00116  0.00484   0.00939
4 severityModerate   -0.0986    0.272   -0.632     0.435  
5 severitySevere     -1.29      0.389   -2.05     -0.523  
6 treatSteroid:weeks -0.00409   0.00421 -0.0123    0.00417

Interpreting the Fixed Effects (conditional)

  • \(\exp(\beta)\) is an odds ratio conditional on the random intercept (within a given child)
  • Positive steroid effect = better control for the same child
  • The interaction tests diverging within-child trajectories
  • Critical: these are NOT population-averaged effects

Conditional vs Marginal: Why They Differ

Reason Explanation
Non-collapsibility Odds ratios do not average over subgroups (unlike risk differences / log rate ratios)
Jensen’s inequality \(E[g^{-1}(\eta + b)] \neq g^{-1}(E[\eta + b])\) for nonlinear link \(g\)
Averaging direction Averaging the nonlinear link over \(\mathbf{b}_i\) pulls the marginal effect toward the null

Converting Conditional to Marginal (Zeger approx.)

For random-intercept logistic models (Zeger, Liang & Albert 1988; FLW Ch. 16 / p. 477):

\[\beta^{\text{marg}} \approx \frac{\beta^{\text{cond}}}{\sqrt{1 + c_2^{2}\,\sigma_b^2}}, \qquad c_2^{2} \approx 0.346\]

Note

The multiplier under the root is \(c_2^2 \approx 0.346\), where \(c_2 = 16\sqrt{3}/(15\pi) \approx 0.588\). FLW p. 477 also uses \(k = 0.588\) directly; \(0.346 = (1/1.7)^2\) is the probit-matching teaching form.

Where Does 0.346 Come From?

Warning

Derivation sketch (not examinable). The point is that averaging attenuates, by roughly this factor.

  1. The logistic CDF \(F_{\text{logit}}(x) = 1/(1+e^{-x})\) is well approximated by the probit \(\Phi(x/1.7)\), where \(\Phi\) is the standard-normal CDF.
  2. For a probit link the marginalization integral is closed form: \[\int_{-\infty}^{\infty} \Phi\!\left(\frac{\eta + b}{c}\right) f_G(b)\, db = \Phi\!\left(\frac{\eta}{\sqrt{c^2 + \sigma_b^2}}\right)\]
  3. Matching the logistic to the probit gives \(c = 1.7\), so the slope attenuation multiplier is \(c^{-2} = (1.7)^{-2} \approx 0.346\).

Applying the Zeger Approximation

beta_cond <- fixef(glmm_fit)["treatSteroid"]
var_u     <- as.numeric(attr(VarCorr(glmm_fit)$id, "stddev"))^2
atten     <- sqrt(1 + 0.346 * var_u)      # 0.346 = c_2^2 multiplier
beta_marg <- beta_cond / atten

cat("Conditional beta (GLMM):", round(beta_cond, 3),
    " -> OR =", round(exp(beta_cond), 2), "\n")

Applying the Zeger Approximation

Conditional beta (GLMM): 1.004  -> OR = 2.73 
cat("Marginal beta (Zeger):  ", round(beta_marg, 3),
    " -> OR =", round(exp(beta_marg), 2), "\n")
Marginal beta (Zeger):   0.857  -> OR = 2.36 
cat("Attenuation factor:     ", round(atten, 3), "\n")
Attenuation factor:      1.172 

Side-by-Side: Conditional vs Marginal

Analysis Treatment OR Interpretation
GLMM (Conditional) 2.73 Within-child effect: child’s latent propensity held fixed
Zeger-approximated marginal 2.36 Population-average effect (analytic approximation)

Check Your Understanding: Zeger

A GLMM for smoking cessation gives conditional log-OR \(\beta^{\text{cond}} = 1.1\) (OR = 3.0) and \(\sigma_b^2 = 2.0\). Approximate marginal OR?

\[\beta^{\text{marg}} = \frac{1.1}{\sqrt{1 + 0.346 \times 2.0}} = \frac{1.1}{\sqrt{1.692}} = \frac{1.1}{1.30} = 0.85,\quad \text{OR} = e^{0.85} = 2.34\]

The within-person OR (3.0) attenuates to a population-average OR (2.34) because of baseline-propensity heterogeneity.

Part II (Day 2): Worked Machinery

Recap of Part I

  • GLMM: \(g(E[Y_{ij}\mid\mathbf{b}_i]) = X_{ij}\beta + Z_{ij}\mathbf{b}_i\), \(\mathbf{b}_i\sim N(0,G)\)
  • \(\exp(\beta)\) is a conditional (within-subject) odds/rate ratio
  • Marginal effect is attenuated by Jensen / non-collapsibility (Zeger approx., Ch. 16)
  • Fit by ML (Laplace / adaptive quadrature); PQL/MQL are Ch. 15
  • GLMM valid under MAR; standard GEE needs MCAR

How to Report GLMM Results

Correct:

  • “Within a given child, steroid increased odds of control by OR = 2.73 (95% CI …)”
  • “At the population level (marginal), the OR is approximately 2.36”

Incorrect:

  • “Steroid increased the odds of control” (conditional or marginal?)

Always state the frame of reference.

Three Types of Predictions

Type R code Interpretation
Subject-specific predict(fit, type="response") Uses predicted \(\hat{\mathbf{b}}_i\) (BLUPs)
At \(\mathbf{b}=0\) predict(fit, re.form=NA) Sets \(\mathbf{b}_i=0\); still conditional
True marginal Monte Carlo integration \(E(Y) = \int E(Y\mid \mathbf{b})\,f_G(\mathbf{b})\,d\mathbf{b}\)

For nonlinear links, “at \(\mathbf{b}=0\)\(\neq\) “marginal mean.”

Common Misconception: re.form = NA

# WRONG to call this a marginal prediction.
# It is E(Y | b = 0): conditional at the median latent subject.
pred_at_b0 <- predict(glmm_fit,
  newdata = data.frame(treat = "Steroid", weeks = 48, severity = "Mild"),
  re.form = NA, type = "response")
cat("Prediction at b=0:", round(pred_at_b0, 4), "  (this is E(Y|b=0), NOT E(Y))\n")

Common Misconception: re.form = NA

Prediction at b=0: 0.4877   (this is E(Y|b=0), NOT E(Y))

True Marginal Predictions via Monte Carlo

sigma_b   <- as.numeric(attr(VarCorr(glmm_fit)$id, "stddev"))
nd        <- data.frame(treat = "Steroid", weeks = 48, severity = "Mild")
eta_fixed <- predict(glmm_fit, newdata = nd, re.form = NA, type = "link")

set.seed(667)
b_sim       <- rnorm(5000, mean = 0, sd = sigma_b)   # draw from N(0, G)
marginal_pred <- mean(plogis(eta_fixed + b_sim))      # average expit over b

cat("At b=0 (NOT marginal):           ", round(pred_at_b0, 4), "\n")

True Marginal Predictions via Monte Carlo

At b=0 (NOT marginal):            0.4877 
cat("True marginal (Monte Carlo):     ", round(marginal_pred, 4), "\n")
True marginal (Monte Carlo):      0.4873 
cat("Difference:                      ", round(marginal_pred - pred_at_b0, 4), "\n")
Difference:                       -3e-04 

Why the Difference Matters

Note

Reading it: the marginal curve (averaged over \(\mathbf{b}\)) is flatter / pulled toward 0.5 than the plug-in curve. The gap is largest where the logistic curve is most nonlinear and shrinks at the extremes. Flatter marginal = attenuated population-average effect.

A Real FLW Case Study: Seizure Counts

Note

FLW Ch. 14 canonical example (Leppik et al. 1987; Thall & Vail 1990). Placebo-controlled trial of progabide in 59 epileptics. Baseline 8-week seizure count, then four 2-week post-randomization counts. Model and target match FLW Table 14.5 / 14.6.

  • Outcome: seizure count per period, Poisson with offset \(\log(T_{ij})\) (\(T = 8\) wk baseline, \(2\) wk follow-up)
  • Random intercept and slope on Time (baseline vs follow-up): \(q = 2\)
  • Question: does progabide reduce the subject-specific seizure rate?

Load and Reshape the Seizure Data

sz_url  <- "https://content.sph.harvard.edu/fitzmaur/ala2e/epilepsy.txt"
sz_file <- "../../data/epilepsy-data.txt"     # local copy (header stripped)
if (!file.exists(sz_file)) {
  raw <- readLines(url(sz_url))               # FLW file has a comment header
  dat <- raw[grepl("^\\s*[0-9]", raw)]        # keep numeric rows only
  writeLines(dat, sz_file)
}
sz_wide <- read.table(sz_file, header = FALSE,
  col.names = c("id", "trt", "age", "baseline", "y1", "y2", "y3", "y4"))

sz_long <- sz_wide %>%
  pivot_longer(c(baseline, y1, y2, y3, y4),
               names_to = "period_lab", values_to = "count") %>%
  mutate(
    period = recode(period_lab, baseline = 0L, y1 = 1L, y2 = 2L, y3 = 3L, y4 = 4L),
    Time   = if_else(period == 0L, 0L, 1L),         # 0 = baseline, 1 = follow-up
    Tij    = if_else(period == 0L, 8, 2),           # exposure weeks
    logT   = log(Tij),
    Trt    = trt                                    # 0 = placebo, 1 = progabide
  ) %>% arrange(id, period)
cat("Rows:", nrow(sz_long), " Subjects:", dplyr::n_distinct(sz_long$id), "\n")

Load and Reshape the Seizure Data

Rows: 295  Subjects: 59 

Fit the Seizure GLMM (FLW Table 14.6)

sz_fit <- glmer(
  count ~ Trt * Time + (1 + Time | id),
  offset = logT, data = sz_long,
  family = poisson(link = "log"),
  control = glmerControl(optimizer = "bobyqa")
)
broom.mixed::tidy(sz_fit, effects = "fixed", conf.int = TRUE) %>%
  dplyr::select(term, estimate, std.error, conf.low, conf.high)

Note

Reproduces FLW Table 14.6: Intercept \(\approx\) 1.07, Time \(\approx\) 0.00, Trt \(\approx\) 0.05, Trt\(\times\)Time \(\approx\) -0.306 (FLW -0.3065), \(\widehat{g}_{11}\approx\) 0.5 (FLW 0.50). Laplace matches the book’s 50-point quadrature here.

Fit the Seizure GLMM (FLW Table 14.6)

# A tibble: 4 × 5
  term         estimate std.error conf.low conf.high
  <chr>           <dbl>     <dbl>    <dbl>     <dbl>
1 (Intercept)  1.07         0.140    0.796    1.35  
2 Trt          0.0512       0.193   -0.326    0.429 
3 Time        -0.000474     0.109   -0.214    0.213 
4 Trt:Time    -0.306        0.150   -0.601   -0.0114

Interpreting the Seizure Fit (subject-specific)

  • All covariates are dichotomous, so \(\exp(\beta)\) are subject-specific (log) rate ratios (FLW Table 14.5)
  • \(\exp(\beta_4)\) (Trt\(\times\)Time) is a ratio of rate ratios: progabide vs placebo change from baseline
  • \(\widehat{\beta}_4 = -0.306 < 0\): progabide reduces the seizure rate from baseline
  • For a “typical” progabide patient (\(b_{2i}=0\)): rate ratio \(e^{\widehat\beta_2 + \widehat\beta_4} \approx 0.74\) (about a 26% drop)

Seizure Variance Components

print(VarCorr(sz_fit), comp = c("Variance", "Std.Dev."))

Note

\(\widehat{g}_{11} \approx\) 0.5 (baseline-rate heterogeneity) and a positive random-slope variance: patients differ in baseline rate and in how their rate changes over time, the FLW motivation for both random effects.

Seizure Variance Components

 Groups Name        Variance Std.Dev. Corr  
 id     (Intercept) 0.49987  0.70701        
        Time        0.23195  0.48161  0.160 

The Same GLMM in SAS (reference)

Static reference (not run here): the seizure log-linear GLMM in PROC GLIMMIX, mirroring FLW Table 14.11 (ML via adaptive quadrature).

proc glimmix data=seizure method=quad(qpoints=50);
   class id trt;
   model count = trt time trt*time / dist=poisson link=log solution
                 offset=logT;
   random intercept time / subject=id type=un;
run;

Note

method=quad(qpoints=50) is adaptive Gaussian quadrature (FLW’s choice); random intercept time / subject=id type=un is lme4’s (1 + time | id) with an unstructured \(G\); offset=logT matches the R offset.

Overdispersion Check

pr  <- residuals(sz_fit, type = "pearson")
od  <- sum(pr^2) / df.residual(sz_fit)
cat("Overdispersion ratio (Pearson X^2 / df):", round(od, 2), "\n")
Ratio Reading
\(\approx 1\) Poisson variance adequate
\(> 1.5\) Overdispersion likely
\(> 3\) Serious overdispersion

Overdispersion Check

Overdispersion ratio (Pearson X^2 / df): 1.34 

Overdispersion Remedies

Option 1 - observation-level random effect (OLRE):

sz_long$obs <- factor(seq_len(nrow(sz_long)))
glmer(count ~ Trt * Time + (1 + Time | id) + (1 | obs),
      offset = logT, data = sz_long, family = poisson)

Option 2 - negative binomial (FLW Table 14.8):

glmmTMB::glmmTMB(count ~ Trt * Time + (1 + Time | id),
                 offset = logT, data = sz_long, family = nbinom2)

Hypothesis Testing: Fixed Effects

  • Wald tests (from summary()) are asymptotic; small clusters \(\rightarrow\) prefer profile likelihood
confint(sz_fit, parm = "beta_", method = "profile")  # profile CIs for fixed effects

Hypothesis Testing: Variance Components

Testing \(H_0\): random slope dropped ((1+Time|id) \(\to\) (1|id)) removes two parameters (slope variance and intercept-slope covariance).

Warning

Two boundary cases (Stram-Lee):

  • Drop a single random-intercept variance (\(\sigma^2 = 0\)): null is \(\tfrac12\chi^2_0 + \tfrac12\chi^2_1\)
  • Drop a random slope from a RI+RS model (this Case Study): null is \(\tfrac12\chi^2_1 + \tfrac12\chi^2_2\)

lme4/anova() report the naive \(\chi^2_2\) p-value; the mixture correction must be applied by hand.

Warning

RLRsim::exactRLRT() is for linear (Gaussian) mixed models only and is not valid for GLMMs. Use a parametric-bootstrap LRT instead.

Parametric-Bootstrap LRT for the Random Slope

null_fit <- glmer(count ~ Trt * Time + (1 | id), offset = logT,
                  data = sz_long, family = poisson,
                  control = glmerControl(optimizer = "bobyqa"))
lr_obs <- as.numeric(2 * (logLik(sz_fit) - logLik(null_fit)))

set.seed(667)
B <- 100
sim_y   <- simulate(null_fit, nsim = B)          # simulate under the null
lr_boot <- vapply(seq_len(B), function(b) {
  f0 <- refit(null_fit, newresp = sim_y[[b]])
  f1 <- refit(sz_fit,  newresp = sim_y[[b]])
  as.numeric(2 * (logLik(f1) - logLik(f0)))
}, numeric(1))

p_boot <- (1 + sum(lr_boot >= lr_obs)) / (1 + B)
p_mix  <- 0.5 * pchisq(lr_obs, 1, lower.tail = FALSE) +
          0.5 * pchisq(lr_obs, 2, lower.tail = FALSE)
cat("LR statistic:        ", round(lr_obs, 2), "\n")

Parametric-Bootstrap LRT for the Random Slope

LR statistic:         171.87 
cat("Naive chi^2_2 p:     ", round(pchisq(lr_obs, 2, lower.tail = FALSE), 4), "\n")
Naive chi^2_2 p:      0 
cat("Stram-Lee mixture p: ", round(p_mix, 4), "\n")
Stram-Lee mixture p:  0 
cat("Parametric bootstrap p (B=", B, "): ", round(p_boot, 4), "\n", sep = "")
Parametric bootstrap p (B=100): 0.0099

Post-Hoc Contrasts with emmeans

emm_contrasts <- emmeans(glmm_fit, specs = pairwise ~ treat | weeks,
                         at = list(weeks = c(0, 48)), type = "response")
summary(emm_contrasts$contrasts)

Post-Hoc Contrasts with emmeans

weeks = 0:
 contrast        odds.ratio    SE  df null z.ratio p.value
 Usual / Steroid      0.366 0.112 Inf    1  -3.279  0.0010

weeks = 48:
 contrast        odds.ratio    SE  df null z.ratio p.value
 Usual / Steroid      0.446 0.113 Inf    1  -3.196  0.0014

Results are averaged over the levels of: severity 
Tests are performed on the log odds ratio scale 

Interpreting emmeans Contrasts

These are subject-specific odds ratios evaluated at \(\mathbf{b}_i = 0\) (the median subject on the logit scale), NOT the population-average OR.

  • At \(\mathbf{b}_i = 0\) = conditional on the median latent subject
  • The population-average OR is attenuated (Jensen); see the Three Types of Predictions table
  • Do not read these as “the average subject in the population”

Random-Effects Heterogeneity (seizure slopes)

Note

Reading it: spread along the vertical axis = baseline-rate heterogeneity (\(g_{11}\)); spread along the horizontal = heterogeneity in the rate change (\(g_{22}\)). The mild positive tilt is the small positive \(g_{12}\) (FLW 0.054).

Diagnostics: Residuals vs Fitted

Note

Reading it: good = a flat, structureless band around the red zero line. Bad = a funnel (variance grows with the mean \(\to\) overdispersion) or curvature in the blue smoother (mean misspecified). A few large positive residuals here trace to the high-count patients (e.g. patient 49).

Diagnostics: Random-Effects Q-Q

Warning

Reading it (with a caveat): points near the line support \(\mathbf{b}_i \sim N(0,G)\); heavy tails or curvature suggest non-normal random effects. But these are empirical-Bayes (BLUP) predictions, which are shrunk toward 0 and toward normality, so a clean Q-Q plot understates non-normality. Treat it as a weak check (FLW Ch. 14.2).

Convergence Troubleshooting Recipe

Warning First fix
“failed to converge” Rescale continuous predictors (scale())
“nearly unidentifiable” Simplify \(G\): drop a near-zero random slope
“singular fit” (variance \(\approx 0\)) Use uncorrelated REs: (1|id) + (0 + x|id)
slow / stuck Switch optimizer: glmerControl(optimizer = "bobyqa")
separation (binary) Bayesian priors (brms) or penalized likelihood (brglm)

Rule of thumb: if a random-slope variance is \(< 0.01\) and convergence struggles, drop it.

Convergence: A Worked Fix

# Rescaling the continuous predictor often resolves convergence trouble.
asthma_dat$weeks_z <- as.numeric(scale(asthma_dat$weeks))
fit_scaled <- glmer(control ~ treat * weeks_z + severity + (weeks_z | id),
                    data = asthma_dat, family = binomial,
                    control = glmerControl(optimizer = "bobyqa"))
cat("Converged:", is.null(fit_scaled@optinfo$conv$lme4$messages), "\n")

Convergence: A Worked Fix

Converged: FALSE 
cat("Random-slope SD:",
    round(as.data.frame(VarCorr(fit_scaled))$sdcor[2], 3), "\n")
Random-slope SD: 0.178 

Common Pitfalls and Remedies

Pitfall Remedy
Treating \(\beta\) as a marginal effect It is conditional; use Zeger (Ch. 16) for the marginal effect
re.form=NA read as marginal mean It is \(E(Y\mid\mathbf{b}=0)\); integrate over \(\mathbf{b}\) (Monte Carlo)
Comparing GLMM and GEE ORs as if equal Different estimands; both valid for their question
exactRLRT for a GLMM variance test Gaussian-LMM only; use a parametric bootstrap
Wrong boundary mixture Match params dropped: 1 var \(\to \tfrac12\chi^2_0+\tfrac12\chi^2_1\); slope \(\to \tfrac12\chi^2_1+\tfrac12\chi^2_2\)
Near-singular \(G\) Reduce random slopes; center/scale covariates
Ignoring convergence warnings Rescale, simplify \(G\), switch optimizer

Reporting Checklist

  • State “conditional on the random effects” (or “within-subject”) when interpreting \(\beta\)
  • Report the estimand: conditional OR/RR, and the marginal version if relevant
  • Report variance components with CIs
  • State the estimation method (Laplace, nAGQ = X, or MCMC)
  • For variance-component tests, state the mixture / bootstrap used

Quick Check

A colleague reports “treatment increased the odds 2.5-fold (OR = 2.5)” from a logistic GLMM. What is missing?

Answer: the frame of reference. It is a conditional (within-subject) OR. Complete: “Within a given subject, treatment increased the odds 2.5-fold (conditional OR = 2.5, 95% CI …); the population-average OR is approximately [Zeger].”

Extensions and Software

Extension (Ch. 14) Software note
Negative-binomial / zero-inflated counts glmmTMB (FLW Table 14.8)
Crossed random effects lme4 (1|a) + (1|b)
Bayesian GLMMs (full posterior) brms, MCMCglmm
Selection of fixed + random effects glmmPen (penalized MCEM)

Wrap-Up

  • GLMM = GLM plus latent subject effects; inference is subject-specific (conditional)
  • The marginal effect is attenuated (Jensen / non-collapsibility); Zeger approximates it (Ch. 16)
  • Fit by ML (Laplace / adaptive quadrature); valid under MAR
  • re.form=NA is conditional at \(\mathbf{b}=0\), not marginal; integrate for true marginal
  • Test variance components with the right mixture or a parametric bootstrap (never exactRLRT for a GLMM)
  • Next chapter (Ch. 15): approximate likelihood methods (PQL/MQL) and their tradeoffs

Reading

Required: Fitzmaurice et al. Ch. 14, sections 14.1-14.6, plus the Appendix on adaptive Gaussian quadrature.

Optional:

  • Zeger, Liang & Albert (1988), marginal vs subject-specific approximation (the 0.346 factor; see FLW Ch. 16)
  • Bolker et al. (2009), GLMMs: a practical guide

Session Info

sessionInfo()

Session Info

R version 4.6.0 (2026-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] patchwork_1.3.2     broom.mixed_0.2.9.7 emmeans_2.0.3      
[4] lme4_2.0-1          Matrix_1.7-5        ggplot2_4.0.3      
[7] tidyr_1.3.2         dplyr_1.2.1        

loaded via a namespace (and not attached):
 [1] gtable_0.3.6       xfun_0.57          lattice_0.22-9     vctrs_0.7.3       
 [5] tools_4.6.0        Rdpack_2.6.6       generics_0.1.4     parallel_4.6.0    
 [9] sandwich_3.1-1     tibble_3.3.1       pkgconfig_2.0.3    RColorBrewer_1.1-3
[13] S7_0.2.2           lifecycle_1.0.5    compiler_4.6.0     farver_2.1.2      
[17] codetools_0.2-20   htmltools_0.5.9    yaml_2.3.12        pillar_1.11.1     
[21] furrr_0.4.0        nloptr_2.2.1       MASS_7.3-65        reformulas_0.4.4  
[25] boot_1.3-32        multcomp_1.4-30    nlme_3.1-169       parallelly_1.47.0 
[29] tidyselect_1.2.1   digest_0.6.39      mvtnorm_1.3-7      future_1.70.0     
[33] purrr_1.2.2        listenv_0.10.1     labeling_0.4.3     forcats_1.0.1     
[37] splines_4.6.0      fastmap_1.2.0      grid_4.6.0         cli_3.6.6         
[41] magrittr_2.0.5     survival_3.8-6     utf8_1.2.6         broom_1.0.12      
[45] TH.data_1.1-5      withr_3.0.2        scales_1.4.0       backports_1.5.1   
[49] estimability_1.5.1 rmarkdown_2.31     globals_0.19.1     otel_0.2.0        
[53] zoo_1.8-15         coda_0.19-4.1      evaluate_1.0.5     knitr_1.51        
[57] rbibutils_2.4.1    mgcv_1.9-4         rlang_1.2.0        Rcpp_1.1.1-1.1    
[61] xtable_1.8-8       glue_1.8.1         minqa_1.2.8        jsonlite_2.0.0    
[65] R6_2.6.1