BIOS 667 - Lecture 15: GLMM Random Slopes (Ch. 14)

Fitzmaurice, Laird & Ware (2011) - Applied Longitudinal Data Analysis

Naim Rashid

Lecture Objectives

  • Extend random-intercept GLMMs to include random slopes
  • Specify random slope models for binary and count outcomes
  • Interpret the variance-covariance structure \(G\) of random effects
  • Compute and interpret empirical Bayes predictors for subject-specific inference
  • Understand computational approaches: Laplace, adaptive Gauss-Hermite quadrature
  • Apply model selection criteria for comparing GLMMs

A Note on FLW Chapter Numbering

Note

This lecture’s content is FLW Chapter 14 (Generalized Linear Mixed Effects Models): random-slope specification, subject-specific interpretation, empirical Bayes prediction, and adaptive Gaussian quadrature, including the very epilepsy Poisson random-slope model used here (FLW Examples, Sec. 14.3-14.7, Tables 14.5-14.8).

The course slot is “Lecture 15,” but FLW’s physical Chapter 15 is a different topic: Approximate Methods of Estimation (penalized and marginal quasi-likelihood, PQL/MQL). We touch PQL/MQL and the Laplace approximation only briefly, and flag them as FLW Ch. 15 material when we do.

Roadmap

Part I (Day 1, ~75 min): specification and interpretation

Part Topic
I From Random Intercepts to Random Slopes
II Model Specification for Discrete Outcomes
III Interpretation of Random Slope Variance
IV Prediction and Empirical Bayes Predictors

Part II (Day 2, ~75 min): computation, selection, full case study

Part Topic
V Computational Considerations
VI Model Selection and Comparison
VII Applied Case Study: Epilepsy Data

Notation reference

Symbol Meaning
\(Y_{ij}\) response for subject \(i\) at occasion \(j\) (scalar)
\(\mathbf{Y}_i\) response vector for subject \(i\) \((n_i \times 1)\)
\(\mathbf{X}_i,\ \boldsymbol\beta\) fixed-effect design \(n_i \times p\) and coefficients
\(\mathbf{Z}_i,\ \mathbf{b}_i\) random-effect design and random effects, \(\mathbf{b}_i \sim N(0, G)\)
\(G\) between-subject random-effects covariance (FLW’s letter, Ch. 8)
\(R_i\) within-subject residual covariance, \(\boldsymbol\varepsilon_i \sim N(0, R_i)\)
\(\Sigma_i = \mathbf{Z}_i G \mathbf{Z}_i^\top + R_i\) marginal (total) covariance \(\operatorname{Cov}(\mathbf{Y}_i)\) for subject \(i\)
\(\rho\) correlation parameter (AR(1), exchangeable)

Indices: \(i\) subject \(1,\dots,N\); \(j\) occasion/time \(1,\dots,n_i\) (\(n\) when balanced); \(k\) state (transition models); \(g\) group. Bold = vector or matrix.

Notation (this lecture)

Symbol Meaning
\(G = \begin{pmatrix} \sigma_0^2 & \sigma_{01} \\ \sigma_{01} & \sigma_1^2 \end{pmatrix}\) \(2\times2\) random-effects covariance (intercept, slope)
\(\sigma_0^2,\ \sigma_1^2\) between-subject variance of random intercept, random slope
\(\sigma_{01}\) intercept-slope covariance; \(\rho_{01} = \sigma_{01}/(\sigma_0\sigma_1)\)
\(\hat{\mathbf{b}}_i = E[\mathbf{b}_i \mid \mathbf{Y}_i; \hat{\boldsymbol\beta}, \hat G]\) empirical Bayes (EB) predictor of subject \(i\)’s random effects
\(v(\mu)\) GLM mean-variance function (scalar)

Part I: From Random Intercepts to Random Slopes

Review: Random Intercept GLMM

In Lecture 14, we modeled:

\[g(E[Y_{ij}|\mathbf{b}_i]) = X_{ij}\beta + b_{0i}\]

where \(b_{0i} \sim N(0, \sigma_0^2)\)

Limitation: All subjects share the same covariate effects (slopes).

Why Random Slopes?

In longitudinal studies, subjects often differ not just in baseline levels but also in rates of change.

Aspect Random Intercept Random Slope
Subject heterogeneity Baseline only Baseline + trajectory
Covariate effect Same for all subjects Varies across subjects
Correlation structure Induced by shared intercept More flexible

Motivating Example: Epilepsy Trial

  • 59 epileptic patients randomized to progabide vs placebo
  • Baseline seizure count over an 8-week interval, then counts at 4 post-baseline visits (2-week intervals)
  • Patients may differ in:
    • Baseline seizure rate (random intercept)
    • Response to treatment over time (random slope for time)

Note

This is FLW’s Ch. 14 case study (Sec. 14.7, Tables 14.5-14.8; Thall and Vail 1990; Leppik et al. 1987). We carry it through to the full worked analysis in Part VII.

Random Slope Model: Concept

\[g(\mu_{ij}) = X_{ij}\beta + b_{0i} + b_{1i}\, t_{ij}\]

where:

  • \(b_{0i}\) = subject-specific intercept deviation
  • \(b_{1i}\) = subject-specific slope deviation for time \(t_{ij}\) (a column of \(Z_{ij}\))

\[\mathbf{b}_i = \begin{pmatrix} b_{0i} \\ b_{1i} \end{pmatrix} \sim N\left(\begin{pmatrix} 0 \\ 0 \end{pmatrix}, G = \begin{pmatrix} \sigma_0^2 & \sigma_{01} \\ \sigma_{01} & \sigma_1^2 \end{pmatrix}\right)\]

What the Covariance Matrix G Captures

\[G = \begin{pmatrix} \sigma_0^2 & \sigma_{01} \\ \sigma_{01} & \sigma_1^2 \end{pmatrix}\]

Parameter Interpretation
\(\sigma_0^2\) Between-subject variance in intercepts
\(\sigma_1^2\) Between-subject variance in slopes
\(\sigma_{01}\) Covariance between intercepts and slopes
\(\rho_{01} = \sigma_{01}/(\sigma_0 \sigma_1)\) Correlation between intercepts and slopes

Correlation Interpretation

Sign of \(\rho_{01}\) Interpretation
\(\rho_{01} > 0\) Subjects with higher baseline also have steeper slopes
\(\rho_{01} < 0\) Subjects with higher baseline have flatter slopes
\(\rho_{01} = 0\) Intercepts and slopes are independent

This correlation can be scientifically meaningful, but it is often weak and hard to estimate.

Warning

Do not assume a strong intercept-slope correlation. For the epilepsy data, FLW find \(\rho_{01}\) near zero (Table 14.6: \(g_{12}=0.054\), indistinguishable from 0), and our own fit in Part VII reproduces this. Interpret the table above as the meaning of the sign, not a prediction.

Part II: Model Specification for Discrete Outcomes

Binary Outcome: Random Slope Logistic GLMM

Model:

\[\log\left(\frac{P(Y_{ij}=1|b_i)}{1-P(Y_{ij}=1|b_i)}\right) = X_{ij}\beta + b_{0i} + b_{1i} \cdot \text{time}_{ij}\]

Key features:

  • Subject-specific log-odds trajectory
  • Coefficients \(\beta\) are conditional on \((b_{0i}, b_{1i})\)
  • Allows subjects to have different treatment response rates

Count Outcome: Random Slope Poisson GLMM

Model:

\[\log(E[Y_{ij}|\mathbf{b}_i]) = X_{ij}\beta + b_{0i} + b_{1i} \cdot t_{ij}\]

For epilepsy data: allows each patient to have:

  • Own baseline seizure rate (intercept)
  • Own time trend (slope)

General Random Slope GLMM Structure

Level 1 (observation level):

\[Y_{ij} | \mathbf{b}_i \sim \text{EF}(\mu_{ij}, \phi), \quad g(\mu_{ij}) = X_{ij}\beta + Z_{ij} \mathbf{b}_i\]

Level 2 (subject level):

\[\mathbf{b}_i \sim N_q(0, G)\]

where \(q\) = dimension of random effects vector.

Note

\(\text{EF}(\mu,\phi)\) is the exponential family from Ch. 11 (mean \(\mu\), dispersion \(\phi\), variance \(\phi\, v(\mu)\)). For the Poisson count outcome, \(g^{-1}=\exp\) and \(\phi=1\) (no separate dispersion parameter), so do not look for a \(\phi\) estimate in the output.

Design Matrices for Random Slopes

Fixed effects design: \(X_i\) (typically includes all predictors)

Random effects design: \(Z_i\) (subset of \(X_i\))

Common specification \(Z_i\) columns
Random intercept only 1 (constant)
Random intercept + time slope 1, time
Random intercept + treatment slope 1, treatment indicator

Datasets and Packages (load cold)

Dataset: epilepsy.txt (FLW Ch. 14 case study). Load local first, else download from the FLW site.

Packages: dplyr, tidyr, ggplot2, lme4 (fitting), broom.mixed (tidy output), patchwork (plot layout), glmmTMB (negative binomial mixed model, called qualified). MASS is also called qualified (MASS::mvrnorm) so it does not mask dplyr::select.

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
  library(ggplot2)
  library(lme4)
  library(broom.mixed)
  library(patchwork)
})
theme_set(theme_minimal(base_size = 16))

Datasets and Packages (load cold)

Load Epilepsy Data (download fallback)

# Local first, else download from the FLW site (CLAUDE.md data-loading pattern)
data_url  <- "https://content.sph.harvard.edu/fitzmaur/ala2e/epilepsy.txt"
data_file <- "../../data/epilepsy.txt"
if (!file.exists(data_file)) {
  dir.create(dirname(data_file), showWarnings = FALSE, recursive = TRUE)
  download.file(data_url, data_file)
}

# The file has a 36-line comment header; columns are wide (one row per patient)
epi_raw <- read.table(data_file, skip = 36, header = FALSE,
                      col.names = c("id", "trt", "age", "base",
                                    "y1", "y2", "y3", "y4"))

# Reshape to long format
epi_long <- epi_raw %>%
  pivot_longer(cols = y1:y4, names_to = "visit_num", values_to = "seizures") %>%
  mutate(
    visit = as.integer(gsub("y", "", visit_num)),
    time = (visit - 1) * 2,  # weeks since first post-baseline visit
    trt = factor(trt, levels = c(0, 1), labels = c("Placebo", "Progabide")),
    log_base = log(base / 4),  # log of baseline rate per 2-week period
    age_c = age - mean(age)    # centered age
  )

glimpse(epi_long)

Warning

Instructor variant (flag for the class). FLW’s Ch. 14 model (p. 422, Table 14.5) includes the baseline period as period 0 with an offset \(\log(T_{ij})\) (8 weeks baseline vs 2 weeks follow-up) and a binary Time indicator. Our reshaping keeps only the 4 post-baseline visits, uses continuous visit, and adds \(\log(\text{base}/4)\) and centered age as covariates with no offset. This is the standard Thall-Vail / Breslow-Clayton epilepsy parameterization, a legitimate variant, but it is not FLW’s exact Ch. 14 model. We reproduce FLW’s offset model in the case study.

Load Epilepsy Data (download fallback)

Rows: 236
Columns: 10
$ id        <int> 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, …
$ trt       <fct> Placebo, Placebo, Placebo, Placebo, Placebo, Placebo, Placeb…
$ age       <int> 31, 31, 31, 31, 30, 30, 30, 30, 25, 25, 25, 25, 36, 36, 36, …
$ base      <int> 11, 11, 11, 11, 11, 11, 11, 11, 6, 6, 6, 6, 8, 8, 8, 8, 66, …
$ visit_num <chr> "y1", "y2", "y3", "y4", "y1", "y2", "y3", "y4", "y1", "y2", …
$ seizures  <int> 5, 3, 3, 3, 3, 5, 3, 3, 2, 4, 0, 5, 4, 4, 1, 4, 7, 18, 9, 21…
$ visit     <int> 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, …
$ time      <dbl> 0, 2, 4, 6, 0, 2, 4, 6, 0, 2, 4, 6, 0, 2, 4, 6, 0, 2, 4, 6, …
$ log_base  <dbl> 1.0116009, 1.0116009, 1.0116009, 1.0116009, 1.0116009, 1.011…
$ age_c     <dbl> 2.1694915, 2.1694915, 2.1694915, 2.1694915, 1.1694915, 1.169…

Explore the Data Structure

# Summary by treatment
epi_long %>%
  group_by(trt, visit) %>%
  summarise(
    mean_seizures = mean(seizures),
    sd_seizures = sd(seizures),
    median = median(seizures),
    n = n(),
    .groups = "drop"
  )

Explore the Data Structure

# A tibble: 8 × 6
  trt       visit mean_seizures sd_seizures median     n
  <fct>     <int>         <dbl>       <dbl>  <dbl> <int>
1 Placebo       1          9.36       10.1     5      28
2 Placebo       2          8.29        8.16    4.5    28
3 Placebo       3          8.79       14.7     5      28
4 Placebo       4          8           7.61    5      28
5 Progabide     1          8.58       18.2     4      31
6 Progabide     2          8.42       11.9     5      31
7 Progabide     3          8.13       13.9     4      31
8 Progabide     4          6.74       11.3     4      31

Visualize Individual Trajectories

ggplot(epi_long, aes(x = visit, y = seizures, group = id, color = trt)) +
  geom_line(alpha = 0.4) +
  stat_summary(aes(group = trt), fun = mean, geom = "line",
               linewidth = 1.5, linetype = "dashed") +
  facet_wrap(~trt) +
  labs(x = "Visit (2-week periods)", y = "Seizure Count",
       title = "Individual Seizure Trajectories by Treatment") +
  scale_y_log10() +
  theme(legend.position = "none")

Visualize Individual Trajectories

Key Observations

  • High variability between subjects (need random intercepts)
  • Some subjects show decreasing trends, others increasing (need random slopes?)
  • Potential outliers with very high counts (overdispersion concern)
  • Treatment effect may evolve over time

Part III: Interpretation of Random Slope Variance

What Does \(\sigma_1^2\) Tell Us?

\(\sigma_1^2\) = variance of subject-specific slopes (on the link scale)

  • Small \(\sigma_1^2\): Subjects respond similarly to time/treatment
  • Large \(\sigma_1^2\): Heterogeneous responses across subjects

Tip

Anchor on the multiplicative (exp) scale. For a log link, \(\sigma_1\) is the SD of per-time-unit log-rate-of-change. \(\sigma_1 = 0.5\) means subject slopes vary by about a factor of \(\exp(0.5) \approx 1.65\times\) per time unit (roughly 95% within \(\exp(\pm 2\sigma_1)\)). The “\(\sigma_1 > 0.5\)” cutoff is an instructor heuristic, not an FLW threshold; always read \(\sigma_1\) through the inverse link in context.

Interpreting the Correlation \(\rho_{01}\)

The sign of \(\rho_{01}\) has a clinical reading:

\(\rho_{01}\) Interpretation (illustrative)
\(\rho_{01} > 0\) Patients with more baseline seizures also have steeper time trends
\(\rho_{01} < 0\) Patients with more baseline seizures have flatter/lower time trends

Warning

This is the meaning of the sign, not the epilepsy finding. For these data FLW estimate \(\rho_{01}\) near zero (Table 14.6) and set the covariance to zero in their negative-binomial extension. Do not present a strong correlation as the result before seeing the estimate.

Conditional vs Marginal Effects (Revisited)

Because subjects fan out, the average of the subject curves is not a single curve with the same slope. With a non-linear link, the population-average (marginal) effect is not the conditional (\(b=0\)) effect.

Note

Recall (L14): the marginal coefficient is an attenuated version of the conditional one. The attenuation gets harder to summarize with a single factor once slopes are random (it varies by covariate level). We quantify the gap on the real fit in Part VII.

Part IV: Prediction and Empirical Bayes Predictors

What Are Empirical Bayes Predictors?

Definition (once, then reused): the empirical Bayes (EB) predictor of subject \(i\)’s random effects is the posterior mean given that subject’s data,

\[\hat{\mathbf{b}}_i = E[\mathbf{b}_i \mid \mathbf{Y}_i;\, \hat{\boldsymbol\beta}, \hat\phi, \hat G]\]

evaluated at the ML estimates (hence “empirical”). It requires integrating over \(\mathbf{b}_i\), so it is computed numerically.

Note

FLW use “empirical Bayes” and “empirical BLUP” interchangeably for GLMM random-effect predictions (Sec. 14.5, pp. 411-412): \(\hat{\mathbf{b}}_i\) “coincides with the empirical Bayes or BLUP used in Chapter 8.” The “linear/unbiased” properties of BLUP are exact only in the LMM; for a GLMM the term is FLW’s convention, not a claim of exact linearity.

Properties of Empirical Bayes Predictors

Property Description
Shrinkage Extreme observations pulled toward population mean
Reliability More shrinkage with fewer observations per subject
Conditioning Based on observed data for that subject
Normality leverage EB predictors inherit the assumed normal shape (FLW p. 412)

Shrinkage Effect (LMM vs GLMM)

GLMM takeaway: shrinkage still happens. A subject’s EB predictor is pulled toward the population mean, and more so when that subject has few observations or when the between-subject variance is small. We see this directly in the epilepsy shrinkage plot (Part VII).

Note

In the linear (LMM) case this is exact: the shrinkage factor has a closed form, \[\frac{\sigma^2_{\text{within}}}{\sigma^2_{\text{within}} + n_i\, \sigma^2_{\text{between}}}.\] For a GLMM with a non-linear link the same forces operate but there is no simple closed form.

Computing Empirical Bayes Predictors in R

# After fitting a model:
ranef(model)              # Random effects estimates (EB predictors)
coef(model)               # Fixed + random effects combined
predict(model, type = "response")  # Subject-specific predictions

Subject-Specific Predictions

For subject \(i\) at covariate value \(x^*\) (with random-effect design row \(z^*\)):

\[\hat{\mu}_i = g^{-1}\!\left(x^{*}\hat{\boldsymbol\beta} + z^{*}\hat{\mathbf{b}}_i\right)\]

This gives individual-level predictions incorporating the subject’s EB-predicted random effects.

Part II (Day 2, ~75 min): Computation, Selection, Case Study

Recap of Day 1

On Day 1 we covered specification and interpretation:

  • Random slopes let each subject have an individual rate of change, with \(\mathbf{b}_i \sim N(0, G)\)
  • \(G\) holds \(\sigma_0^2\) (intercept var), \(\sigma_1^2\) (slope var), \(\sigma_{01}\) (their covariance)
  • Read variance components through the inverse link (exp for log); \(\rho_{01}\) is often weak
  • \(\beta\) is conditional on \(\mathbf{b}_i\); the marginal effect is attenuated (non-linear link)
  • Empirical Bayes predictors give shrinkage-based subject-specific estimates

Part V: Computational Considerations

The Integration Challenge

Marginal (integrated) likelihood (FLW Sec. 14.5):

\[L(\beta, \phi, G) = \prod_{i=1}^N \int \left[\prod_{j=1}^{n_i} f(y_{ij}\mid \mathbf{b}_i; \beta)\right] \phi(\mathbf{b}_i; G)\, d\mathbf{b}_i\]

Unlike the LMM (Ch. 8), this integral has no closed form for a non-linear link. With random slopes \(\mathbf{b}_i \in \mathbb{R}^q\) (typically \(q = 2\)), it is a \(q\)-dimensional integral per subject.

Adaptive Gauss-Hermite Quadrature (the FLW Ch. 14 method)

Idea: approximate the integral by a weighted sum of integrand evaluations.

\[L(\beta,\phi,G) \approx \prod_{i=1}^N \sum_{k=1}^K f(\mathbf{Y}_i \mid \mathbf{b}_i = v_k)\, w_k\]

Adaptive: centers the quadrature points \(v_k\) near the mode of each subject’s integrand (more accurate than fixed points).

Note

This is FLW’s true-likelihood method (Tables 14.2/14.6 used 50-point adaptive Gaussian quadrature). FLW Table 14.3 shows estimates stabilize once \(K\) exceeds about 30; using too few points (5-10) gave misleading values. Recommendation: increase \(K\) until estimates and SEs stop moving.

Laplace Approximation (beyond FLW Ch. 14)

Warning

Beyond FLW Ch. 14 (this is FLW Ch. 15 / approximate-methods material). FLW Ch. 14 names only generic quadrature; it does not derive the Laplace formula and explicitly defers the less-demanding approximate methods (PQL, Laplace-type) to Chapter 15.

Idea: approximate the integrand with a single Gaussian centered at the mode.

\[\int f(b)\, db \approx f(\hat{b}) \cdot (2\pi)^{q/2} |H(\hat{b})|^{-1/2}\]

In lme4, nAGQ = 1 is the Laplace approximation (one quadrature point at the mode). It is fast and the only option for \(q \geq 2\) in lme4, but can be biased for sparse/binary data.

Choosing the Number of Quadrature Points

nAGQ Accuracy When
1 (Laplace) Lowest Random slopes in lme4 (\(q \geq 2\)), large \(n_i\)
5-10 Moderate Random intercept, small \(n_i\)
15-30+ High Sparse binary data; raise until stable (FLW Table 14.3)

Cost: time scales as \(O(K^q)\), \(q\) = number of random effects per subject.

Note

lme4 limit: nAGQ > 1 works only for a single random effect (\(q=1\)). For random slopes (\(q \geq 2\)) lme4 uses Laplace; for higher accuracy with \(q \geq 2\), use glmmTMB or a Bayesian fit.

Fit Random Intercept Model (Baseline)

# Random intercept Poisson GLMM
fit_ri <- glmer(
  seizures ~ trt * visit + log_base + age_c + (1 | id),
  data = epi_long,
  family = poisson(link = "log")
)

summary(fit_ri)

Fit Random Intercept Model (Baseline)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: seizures ~ trt * visit + log_base + age_c + (1 | id)
   Data: epi_long

      AIC       BIC    logLik -2*log(L)  df.resid 
   1349.1    1373.4    -667.6    1335.1       229 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.3385 -0.8659 -0.0989  0.5806  7.1501 

Random effects:
 Groups Name        Variance Std.Dev.
 id     (Intercept) 0.2702   0.5198  
Number of obs: 236, groups:  id, 59

Fixed effects:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)         0.102483   0.223561   0.458    0.647    
trtProgabide       -0.253713   0.179299  -1.415    0.157    
visit              -0.041517   0.028680  -1.448    0.148    
log_base            1.020695   0.101777  10.029   <2e-16 ***
age_c               0.008282   0.010443   0.793    0.428    
trtProgabide:visit -0.031466   0.040345  -0.780    0.435    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) trtPrg visit  log_bs age_c 
trtProgabid -0.369                            
visit       -0.314  0.392                     
log_base    -0.819 -0.048  0.000              
age_c       -0.156  0.016  0.000  0.184       
trtPrgbd:vs  0.223 -0.546 -0.711  0.000  0.000

Fit Random Slope Model

# Random intercept + slope for visit
fit_rs <- glmer(
  seizures ~ trt * visit + log_base + age_c + (1 + visit | id),
  data = epi_long,
  family = poisson(link = "log"),
  control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5))
)

summary(fit_rs)

Fit Random Slope Model

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: seizures ~ trt * visit + log_base + age_c + (1 + visit | id)
   Data: epi_long
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))

      AIC       BIC    logLik -2*log(L)  df.resid 
   1333.8    1365.0    -657.9    1315.8       227 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.9935 -0.7827 -0.0961  0.5458  6.3786 

Random effects:
 Groups Name        Variance Std.Dev. Corr  
 id     (Intercept) 0.42239  0.6499         
        visit       0.02133  0.1461   -0.61 
Number of obs: 236, groups:  id, 59

Fixed effects:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)         0.091225   0.238537   0.382    0.702    
trtProgabide       -0.300367   0.218408  -1.375    0.169    
visit              -0.043466   0.044081  -0.986    0.324    
log_base            1.020904   0.101389  10.069   <2e-16 ***
age_c               0.007992   0.010419   0.767    0.443    
trtProgabide:visit -0.010119   0.062746  -0.161    0.872    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) trtPrg visit  log_bs age_c 
trtProgabid -0.423                            
visit       -0.462  0.496                     
log_base    -0.759 -0.039 -0.015              
age_c       -0.142  0.016 -0.009  0.184       
trtPrgbd:vs  0.318 -0.729 -0.673  0.000 -0.005

Understanding VarCorr() Output

vc <- VarCorr(fit_rs)
print(vc)                 # human-readable: SDs on the diagonal, Corr off-diagonal

Note

VarCorr(fit) returns a list keyed by grouping factor, so VarCorr(fit)$id is the \(2\times2\) covariance \(\hat G\) for the id grouping. print() shows standard deviations (not variances) and the correlation. as.data.frame(vc)$sdcor is ordered: intercept SD, slope SD, then the correlation.

Understanding VarCorr() Output

 Groups Name        Std.Dev. Corr   
 id     (Intercept) 0.64991         
        visit       0.14606  -0.607 

Interpret the Random Effects Structure

vc_df <- as.data.frame(vc)
sigma_int   <- vc_df$sdcor[1]   # intercept SD
sigma_slope <- vc_df$sdcor[2]   # slope SD
rho         <- vc_df$sdcor[3]   # intercept-slope correlation

cat("Random intercept SD:", round(sigma_int, 3), "\n")

Interpret the Random Effects Structure

Random intercept SD: 0.65 
cat("Random slope SD:    ", round(sigma_slope, 3), "\n")
Random slope SD:     0.146 
cat("Intercept-slope correlation:", round(rho, 3), "\n")
Intercept-slope correlation: -0.607 

Interpretation of Correlation (what the data actually say)

For these data the intercept-slope correlation is -0.607: essentially zero.

  • This matches FLW Table 14.6, where \(g_{12} = 0.054\) and the LRT of \(H_0: g_{12}=0\) gives \(G^2 = 0.08\) (1 df, \(p > 0.95\)). FLW set the covariance to zero in their negative-binomial extension.
  • So there is no evidence that baseline seizure rate predicts the rate of change. The “high starters decline fastest” regression-to-the-mean story does not hold here.

Warning

This is the corrected reading. An earlier instinct to narrate a strong negative correlation (regression to the mean) is not supported by either our fit or FLW’s. With only 4 occasions per subject, the intercept-slope covariance is barely identified.

Check Your Understanding: Variance Components

Question: A Poisson GLMM for hospital readmission counts yields:

  • Random intercept SD: \(\sigma_0 = 0.8\)
  • Random slope SD (for time): \(\sigma_1 = 0.3\)
  • Intercept-slope correlation: \(\rho_{01} = -0.6\)

Interpret these results clinically.

Answer:

  1. \(\sigma_0 = 0.8\): Substantial between-patient variability in baseline readmission rates. On the log scale, patients vary by roughly \(\pm 1.6\) (i.e., \(\pm 2 \times 0.8\)), meaning some patients have baseline rates \(\exp(1.6) \approx 5\) times higher than others.

  2. \(\sigma_1 = 0.3\): Patients differ in how their readmission risk changes over time. Some improve faster, others slower.

  3. \(\rho_{01} = -0.6\): Patients with higher baseline readmission rates tend to show more improvement (steeper decline) over time. This could reflect regression to the mean or that high-risk patients benefit more from interventions.

Increase Quadrature Points for Accuracy

# Fit with adaptive Gauss-Hermite (nAGQ > 1)
# Note: nAGQ > 1 only works with 1 random effect dimension in lme4
# For 2+ random effects, use glmmTMB or stay with Laplace

fit_ri_agq <- glmer(
  seizures ~ trt * visit + log_base + age_c + (1 | id),
  data = epi_long,
  family = poisson(link = "log"),
  nAGQ = 10
)

# Compare coefficient estimates
cbind(Laplace = fixef(fit_ri), AGQ10 = fixef(fit_ri_agq)) %>% round(4)

Increase Quadrature Points for Accuracy

                   Laplace   AGQ10
(Intercept)         0.1025  0.1023
trtProgabide       -0.2537 -0.2539
visit              -0.0415 -0.0415
log_base            1.0207  1.0207
age_c               0.0083  0.0083
trtProgabide:visit -0.0315 -0.0315

A Word on PQL and MQL (FLW Ch. 15)

Note

Penalized quasi-likelihood (PQL) and marginal quasi-likelihood (MQL) are approximate estimation methods (a Laplace/Taylor-expansion approach), the subject of FLW Chapter 15, not Chapter 14.

  • PQL (MASS::glmmPQL) is fast and the only “legitimate” approximate method of the two, but biased, worst for binary outcomes and few-cluster data.
  • MQL does not even estimate the GLMM parameters and is generally not recommended.
  • Benchmark: adaptive Gaussian quadrature (the Ch. 14 method we use) is the more accurate integrated-likelihood estimator. Prefer it when feasible; reach for PQL only when AGQ is computationally out of reach.

Software for Flexible GLMMs

For more complex models (negative binomial, zero-inflation), consider:

Package Features
lme4::glmer ML via Laplace / AGQ (used here); Poisson, binomial
glmmTMB Negative binomial, zero-inflation, flexible covariance
brms Bayesian GLMMs with Stan (full posterior)
MCMCglmm Bayesian with multivariate response
MASS::glmmPQL Penalized quasi-likelihood (PQL); fast, approximate, biased (FLW Ch. 15)

Part VI: Model Selection and Comparison

Model Comparison Approaches

Method Use Case Notes
LRT Nested models Chi-square approximation may fail at boundary
AIC/BIC Non-nested OK BIC more conservative
Profile CI Single parameters More accurate than Wald

Likelihood Ratio Test: Random Slope

# Compare random intercept vs random intercept + slope
# (this drops TWO parameters: the slope variance AND the intercept-slope covariance)
anova(fit_ri, fit_rs)

Likelihood Ratio Test: Random Slope

Data: epi_long
Models:
fit_ri: seizures ~ trt * visit + log_base + age_c + (1 | id)
fit_rs: seizures ~ trt * visit + log_base + age_c + (1 + visit | id)
       npar    AIC    BIC  logLik -2*log(L) Chisq Df Pr(>Chisq)    
fit_ri    7 1349.1 1373.4 -667.56    1335.1                        
fit_rs    9 1333.8 1365.0 -657.92    1315.8 19.28  2  6.506e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

LRT Caveat: Boundary Mixture (Stram-Lee)

Testing whether to add the random slope puts the null on the boundary of the parameter space, so the reference distribution is not a plain \(\chi^2\).

Warning

This test drops TWO parameters (\(\sigma_1^2\) and \(\sigma_{01}\)), so the correct reference is the Stram-Lee 50:50 mixture \[\tfrac12\chi^2_1 + \tfrac12\chi^2_2,\] NOT “halve the p-value” (which is right only for a single 1-df boundary test). Dropping just a 1-df variance would use \(\tfrac12\chi^2_0 + \tfrac12\chi^2_1\); dropping a slope from a random-intercept model is the 1-vs-2 case above.

# anova() reports a naive 2-df chi-square; recompute under the Stram-Lee mixture
lrt <- anova(fit_ri, fit_rs)
G2  <- lrt$Chisq[2]
p_naive   <- lrt$`Pr(>Chisq)`[2]                            # naive chi^2_2
p_mixture <- 0.5 * pchisq(G2, 1, lower.tail = FALSE) +
             0.5 * pchisq(G2, 2, lower.tail = FALSE)          # 1/2 chi^2_1 + 1/2 chi^2_2
cat("LRT statistic G^2:", round(G2, 2), "\n")

Note

Gold standard for a GLMM: the exact-LRT simulation in RLRsim is Gaussian-LMM only. For this Poisson GLMM use a parametric bootstrap instead (pbkrtest::PBmodcomp(fit_rs, fit_ri)), which simulates the null reference distribution directly and needs no boundary formula.

LRT Caveat: Boundary Mixture (Stram-Lee)

LRT statistic G^2: 19.28 
cat("Naive 2-df p-value:        ", signif(p_naive, 3), "\n")
Naive 2-df p-value:         6.51e-05 
cat("Stram-Lee mixture p-value: ", signif(p_mixture, 3), "\n")
Stram-Lee mixture p-value:  3.82e-05 

AIC/BIC Comparison

model_comparison <- data.frame(
  Model = c("Random Intercept (Poisson)", "Random Slope (Poisson)"),
  AIC = c(AIC(fit_ri), AIC(fit_rs)),
  BIC = c(BIC(fit_ri), BIC(fit_rs))
)
model_comparison

AIC/BIC Comparison

                       Model      AIC      BIC
1 Random Intercept (Poisson) 1349.117 1373.364
2     Random Slope (Poisson) 1333.837 1365.012

Check Overdispersion in Poisson Model

# Check overdispersion in Poisson model
pearson_resid <- residuals(fit_rs, type = "pearson")
overdispersion <- sum(pearson_resid^2) / df.residual(fit_rs)
cat("Overdispersion ratio (Poisson):", round(overdispersion, 2), "\n")

Check Overdispersion in Poisson Model

Overdispersion ratio (Poisson): 1.28 

When to Choose Negative Binomial

Overdispersion Ratio Recommendation
< 1.5 Poisson adequate
1.5 - 2.5 Consider NB
> 2.5 Strongly prefer NB

Random effects can absorb some overdispersion, but not all.

Likelihood (MAR) vs GEE (MCAR)

A signature GLMM advantage (FLW Sec. 14.5; Table 14.4 footnote, p. 420):

Note

Likelihood-based GLMM estimation (what glmer does) gives valid inference under MAR (missing at random), provided the model is correctly specified. GEE / marginal models require the stronger MCAR (missing completely at random).

  • Epilepsy and most longitudinal trials have dropout, so this matters in practice.
  • It is a reason to prefer the likelihood-based mixed model when the scientific target is subject-specific and data are incomplete.

Part VII: Applied Case Study - Complete Analysis

Final Model: Poisson with Random Slopes

# Refit with centered visit for interpretability
epi_long <- epi_long %>%
  mutate(visit_c = visit - mean(visit))

final_model <- glmer(
  seizures ~ trt * visit_c + log_base + age_c + (1 + visit_c | id),
  data = epi_long,
  family = poisson(link = "log"),
  control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5))
)

broom.mixed::tidy(final_model, effects = "fixed", conf.int = TRUE)

Final Model: Poisson with Random Slopes

# A tibble: 6 × 8
  effect term           estimate std.error statistic  p.value conf.low conf.high
  <chr>  <chr>             <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 fixed  (Intercept)    -0.0174     0.212    -0.0824 9.34e- 1  -0.432     0.397 
2 fixed  trtProgabide   -0.326      0.150    -2.18   2.95e- 2  -0.619    -0.0324
3 fixed  visit_c        -0.0435     0.0441   -0.986  3.24e- 1  -0.130     0.0429
4 fixed  log_base        1.02       0.101    10.1    7.56e-24   0.822     1.22  
5 fixed  age_c           0.00799    0.0104    0.767  4.43e- 1  -0.0124    0.0284
6 fixed  trtProgabide:… -0.0101     0.0627   -0.161  8.72e- 1  -0.133     0.113 

Fixed Effects Interpretation

# Exponentiate for rate ratios
fe <- broom.mixed::tidy(final_model, effects = "fixed", conf.int = TRUE) %>%
  mutate(
    RR = exp(estimate),
    RR_lower = exp(conf.low),
    RR_upper = exp(conf.high)
  ) %>%
  dplyr::select(term, estimate, RR, RR_lower, RR_upper, p.value)

fe

Fixed Effects Interpretation

# A tibble: 6 × 6
  term                 estimate    RR RR_lower RR_upper  p.value
  <chr>                   <dbl> <dbl>    <dbl>    <dbl>    <dbl>
1 (Intercept)          -0.0174  0.983    0.649    1.49  9.34e- 1
2 trtProgabide         -0.326   0.722    0.539    0.968 2.95e- 2
3 visit_c              -0.0435  0.957    0.878    1.04  3.24e- 1
4 log_base              1.02    2.78     2.28     3.39  7.56e-24
5 age_c                 0.00799 1.01     0.988    1.03  4.43e- 1
6 trtProgabide:visit_c -0.0101  0.990    0.875    1.12  8.72e- 1

Rate Ratio Interpretation

Effect Interpretation
trtProgabide Treatment effect at average visit
visit_c Time trend for placebo group
log_base Per-unit increase in log baseline rate
trt:visit_c Differential time trend for progabide vs placebo

All effects are conditional on random effects (subject-specific).

Reproducing FLW’s Exact Ch. 14 Model

FLW’s actual model uses an offset \(\log(T_{ij})\) (8 wk baseline, 2 wk follow-up), the baseline period included (\(j=0\)), and a binary Time indicator:

\[\log E(Y_{ij}\mid \mathbf{b}_i) = \log(T_{ij}) + (\beta_1 + b_{1i}) + (\beta_2 + b_{2i})\,\text{Time}_{ij} + \beta_3 \text{Trt}_i + \beta_4 \text{Trt}_i\!\times\!\text{Time}_{ij}\]

# Rebuild in FLW's layout: baseline as period 0, offset for time-at-risk, binary Time
flw <- epi_raw %>%
  pivot_longer(cols = c(base, y1, y2, y3, y4),
               names_to = "period", values_to = "count") %>%
  mutate(
    Time = ifelse(period == "base", 0, 1),          # binary baseline/follow-up
    Tij  = ifelse(period == "base", 8, 2),           # weeks at risk
    Trt  = factor(trt, levels = c(0, 1), labels = c("Placebo", "Progabide"))
  )

flw_fit <- glmer(count ~ Time * Trt + (1 + Time | id),
  data = flw, family = poisson(link = "log"), offset = log(Tij),
  control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))

round(fixef(flw_fit), 4)        # compare to FLW Table 14.6

Note

This matches FLW Table 14.6 closely: Intercept \(\approx 1.07\), Time \(\approx 0.00\), Trt \(\approx 0.05\), Trt\(\times\)Time \(\approx -0.31\) (FLW: 1.0707, -0.0004, 0.0513, -0.3065). The intercept-slope correlation is again near zero (\(g_{12}=0.054\), LRT \(p>0.95\)).

Reproducing FLW’s Exact Ch. 14 Model

      (Intercept)              Time      TrtProgabide Time:TrtProgabide 
           1.0709           -0.0005            0.0512           -0.3062 
print(VarCorr(flw_fit))
 Groups Name        Std.Dev. Corr  
 id     (Intercept) 0.70701        
        Time        0.48161  0.160 

Outlier and Overdispersion: FLW’s Resolution

# (1) Refit excluding patient 49 (the 151/302-count outlier; FLW Table 14.7)
flw_fit49 <- glmer(count ~ Time * Trt + (1 + Time | id),
  data = subset(flw, id != 49), family = poisson(link = "log"),
  offset = log(Tij),
  control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))

# (2) Negative binomial mixed model, excluding 49 (FLW Table 14.8); needs glmmTMB
nb_fit <- glmmTMB::glmmTMB(count ~ Time * Trt + (1 + Time || id),
  data = subset(flw, id != 49), family = glmmTMB::nbinom2,
  offset = log(Tij))

rbind(
  `Poisson, full (T14.6)`      = fixef(flw_fit),
  `Poisson, excl 49 (T14.7)`   = fixef(flw_fit49),
  `Neg. binom, excl 49 (T14.8)`= glmmTMB::fixef(nb_fit)$cond
) |> round(4)

Note

Patient 49 (151 baseline, 302 follow-up seizures) inflates the variance components. Excluding it (Table 14.7) and switching to a negative binomial mixed model (Table 14.8) is FLW’s actual overdispersion remedy. The Trt\(\times\)Time effect is stable across all three (about \(-0.31\) to \(-0.36\)), so the treatment conclusion is robust.

Outlier and Overdispersion: FLW’s Resolution

                            (Intercept)    Time TrtProgabide Time:TrtProgabide
Poisson, full (T14.6)            1.0709 -0.0005       0.0512           -0.3062
Poisson, excl 49 (T14.7)         1.0694  0.0078      -0.0080           -0.3459
Neg. binom, excl 49 (T14.8)      1.0988  0.0056       0.0034           -0.3584

Extract and Visualize Empirical Bayes Predictors

# Get random effects
re <- ranef(final_model)$id
colnames(re) <- c("Intercept", "Slope")
re$id <- rownames(re)

ggplot(re, aes(x = Intercept, y = Slope)) +
  geom_point(alpha = 0.7, size = 2) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  geom_vline(xintercept = 0, linetype = "dashed", color = "red") +
  labs(x = "Random Intercept (log scale)",
       y = "Random Slope for Visit",
       title = "Subject-Specific Deviations from Population Mean") +
  annotate("text", x = max(re$Intercept) * 0.8, y = min(re$Slope) * 0.8,
           label = paste("Correlation:", round(cor(re$Intercept, re$Slope), 2)))

Extract and Visualize Empirical Bayes Predictors

Subject-Specific Predictions

# Predictions for each subject
epi_long$pred <- predict(final_model, type = "response")

# Plot for 9 random subjects
set.seed(42)
sample_ids <- sample(unique(epi_long$id), 9)

epi_long %>%
  filter(id %in% sample_ids) %>%
  ggplot(aes(x = visit)) +
  geom_point(aes(y = seizures), alpha = 0.7) +
  geom_line(aes(y = pred), color = "blue", linewidth = 1) +
  facet_wrap(~id, scales = "free_y", ncol = 3) +
  labs(x = "Visit", y = "Seizure Count",
       title = "Observed (points) vs Subject-Specific Predictions (lines)")

Subject-Specific Predictions

Empirical Bayes Shrinkage Illustration

# Compare observed means to predicted (shrunk) means
subject_summary <- epi_long %>%
  group_by(id, trt) %>%
  summarise(
    observed_mean = mean(seizures),
    predicted_mean = mean(pred),
    n_obs = n(),
    .groups = "drop"
  )

ggplot(subject_summary, aes(x = observed_mean, y = predicted_mean, color = trt)) +
  geom_point(alpha = 0.7, size = 2) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed") +
  labs(x = "Observed Mean Seizures", y = "Predicted Mean (Shrunk)",
       title = "Empirical Bayes Shrinkage: Predictions vs Observed",
       subtitle = "Points below diagonal show shrinkage toward population mean") +
  coord_equal()

Note

Reading it: the dashed line is “no shrinkage” (predicted = observed). Points pulled toward the population mean (off the line, especially extreme subjects with few/noisy observations) show empirical Bayes shrinkage in action.

Empirical Bayes Shrinkage Illustration

Diagnostic: Residuals vs Fitted, Residuals by Visit

Note

Reading it. Left (residuals vs fitted): the red line at 0 is the target; a flat blue smoother and a roughly symmetric, constant-width band is good. A funnel (spread growing with the fitted value) signals overdispersion, expected here and the reason we consider negative binomial. Right (residuals by visit): each box should center on 0 with similar spread; a drifting median means the time trend is mis-specified.

Diagnostic: Random Effects Q-Q Plots

Note

Reading it. Points on the red line support normality of the random effects. Points curving up at the right are a heavy right tail (positive skew); points curving down at the left are a heavy left tail. A few extreme points at the very ends are usually fine.

Warning

Caveat (FLW p. 412): empirical BLUPs are shrunk toward a normal, so a clean Q-Q plot does not by itself confirm normality. Fixed-effect estimates are fairly robust to non-normal random effects; treat these plots as a rough screen, not a formal test.

Q-Q Plot Pattern Reference

Quick reference for random-effects Q-Q patterns:

Pattern Interpretation Concern
Points on the line Normality satisfied None
Slight S-curve Light tails Minor
Curving up at right Heavy right tail (positive skew) Moderate
Curving down at left Heavy left tail (negative skew) Moderate
Systematic departure Severe non-normality Serious

Identifying Influential Subjects

# Subjects with extreme random effects
extreme_subjects <- re %>%
  mutate(
    extreme_int = abs(Intercept) > 2 * sd(Intercept),
    extreme_slope = abs(Slope) > 2 * sd(Slope)
  ) %>%
  filter(extreme_int | extreme_slope)

cat("Subjects with extreme random effects:\n")

Identifying Influential Subjects

Subjects with extreme random effects:
print(extreme_subjects)
    Intercept       Slope id extreme_int extreme_slope
8   0.4494711 -0.22187529  8       FALSE          TRUE
10  1.0164095 -0.28675332 10        TRUE          TRUE
16 -0.9587330 -0.10099345 16        TRUE         FALSE
25  0.8700290  0.19686056 25       FALSE          TRUE
35  1.0996060 -0.03093807 35        TRUE         FALSE
49  0.9906770 -0.08582488 49        TRUE         FALSE
56  1.0846777  0.10252224 56        TRUE         FALSE
58 -0.9936508  0.01924798 58        TRUE         FALSE

Predictions at \(b_i = 0\) (re.form = NA): the Concept

Critical distinction: re.form = NA gives predictions at \(\mathbf{b}_i = 0\), which is NOT the true marginal (population-average) prediction.

Per FLW (p. 477): for non-linear links, \(E[Y \mid \mathbf{b} = 0] \neq E[Y]\).

  • re.form = NA: a hypothetical subject with \(\mathbf{b}_i = 0\) (the “typical” subject)
  • True marginal: \(E[Y] = \displaystyle\int g^{-1}(X_{ij}\beta + Z_{ij}\mathbf{b})\, \phi(\mathbf{b}; G)\, d\mathbf{b}\)

Warning

Reporting re.form = NA as a population-average mean is a common error. For a log link the true marginal is larger (Jensen). We quantify the gap two slides on.

Predictions at \(b_i = 0\): the Code

# Create prediction grid
newdata <- expand.grid(
  trt = c("Placebo", "Progabide"),
  visit_c = seq(-1.5, 1.5, length.out = 50),
  log_base = mean(epi_long$log_base),
  age_c = 0
)

# Predictions at b_i = 0 (NOT marginal/population-average!)
newdata$pred <- predict(final_model, newdata = newdata,
                        re.form = NA, type = "response")

ggplot(newdata, aes(x = visit_c + 2.5, y = pred, color = trt)) +
  geom_line(linewidth = 1.2) +
  labs(x = "Visit", y = "Predicted Seizure Rate",
       title = "Predictions at b = 0 (Conditional on Zero Random Effects)",
       subtitle = "Note: This is NOT the same as true marginal/population-average predictions",
       color = "Treatment")

Predictions at \(b_i = 0\): the Code

True Marginal Predictions Require Integration

For GLMMs with non-linear links, the true marginal mean is

\[E[Y_{ij}] = \int g^{-1}(X_{ij}\beta + Z_{ij}\mathbf{b})\, \phi(\mathbf{b}; G)\, d\mathbf{b}\]

Ways to evaluate it:

  1. Monte Carlo integration: simulate many \(\mathbf{b}\) from \(N(0, G)\), predict, average (we use this)
  2. Numerical quadrature: Gauss-Hermite approximation of the integral

Note

A closed-form attenuation factor (Zeger-Liang) exists for some links but is a Ch. 16 / Zeger, Liang & Albert (1988) result, beyond FLW Ch. 14. Monte Carlo is exact up to simulation error and needs no such formula.

Monte Carlo Marginal Predictions: the Recipe

The marginal mean is \(E[Y] = \int g^{-1}(X_{ij}\beta + Z_{ij}\mathbf{b})\,\phi(\mathbf{b};G)\,d\mathbf{b}\). Approximate it:

1. Simulate b_1, ..., b_M  ~  N(0, G_hat)         # M = 2000 draws from the fitted G
2. For each prediction point:
     eta_fixed = X*beta_hat                         # linear predictor at b = 0 (re.form = NA)
     for each draw b_m:  eta_m = eta_fixed + z' b_m  # add the random part, z = (1, visit_c)
3. Average exp(eta_m) over the M draws              # inverse link (exp) then Monte Carlo mean

Note

Assumptions: a single grouping factor and a Poisson/log link (\(g^{-1} = \exp\)). For other families swap the inverse link in step 3; for crossed/nested grouping factors the simulation is more involved.

Monte Carlo Marginal Predictions: the Function

# Monte Carlo approach for true marginal predictions with random slopes
compute_marginal_poisson <- function(fit, newdata, n_sim = 2000, seed = 667) {
  set.seed(seed)

  # --- Step 1: draw b ~ N(0, G_hat) ------------------------------------------
  # VarCorr(fit) is a LIST keyed by grouping factor; $id is the 2x2 G_hat
  # (assumes a SINGLE grouping factor named "id")
  G_hat <- as.matrix(VarCorr(fit)$id)
  b_sim <- MASS::mvrnorm(n_sim, mu = c(0, 0), Sigma = G_hat)   # n_sim x 2

  # Fixed-effect linear predictor at b = 0 (the X*beta term), one per grid point
  eta_fixed <- predict(fit, newdata = newdata, re.form = NA, type = "link")

  # --- Step 2 & 3: add z'b per draw, inverse-link, then average --------------
  marginal_preds <- sapply(seq_along(eta_fixed), function(i) {
    z_i <- c(1, newdata$visit_c[i])     # random-effects design row z = (intercept, slope)
    eta_with_re <- eta_fixed[i] + b_sim %*% z_i   # Step 2: X*beta + z'b for all draws
    mean(exp(eta_with_re))              # Step 3: exp() is g^{-1}; mean() is the MC integral
  })

  return(marginal_preds)
}

# Apply to prediction grid
newdata_marginal <- expand.grid(
  trt = c("Placebo", "Progabide"),
  visit_c = c(-1.5, 0, 1.5),
  log_base = mean(epi_long$log_base),
  age_c = 0
)

newdata_marginal$pred_b0 <- predict(final_model, newdata = newdata_marginal,
                                     re.form = NA, type = "response")
newdata_marginal$pred_marginal <- compute_marginal_poisson(final_model, newdata_marginal)

# Compare predictions
contrast_tbl <- newdata_marginal %>%
  mutate(visit = visit_c + 2.5,
         difference = pred_marginal - pred_b0,
         pct_diff = round(100 * difference / pred_b0, 1)) %>%
  dplyr::select(trt, visit, pred_b0, pred_marginal, pct_diff)
contrast_tbl

Monte Carlo Marginal Predictions: the Function

        trt visit  pred_b0 pred_marginal pct_diff
1   Placebo   1.0 6.379507      7.484249     17.3
2 Progabide   1.0 4.676739      5.486612     17.3
3   Placebo   2.5 5.976849      6.825309     14.2
4 Progabide   2.5 4.315537      4.928161     14.2
5   Placebo   4.0 5.599607      6.517867     16.4
6 Progabide   4.0 3.982233      4.635265     16.4

Conditional vs Marginal: the Number

For the epilepsy fit, the true marginal (population-average) seizure rate is about 16% higher than the conditional (\(b=0\)) prediction. For the placebo group at the average visit: conditional rate 5.98, marginal rate 6.83 (a 14.2% gap).

Note

This is FLW’s Ch. 14 signature point (Table 14.1 illustration: the population-average effect is roughly 15-25% smaller/larger than the subject-specific one). Here the gap is driven by the random-intercept variance through Jensen (\(E[\exp(\cdot)] > \exp(E[\cdot])\)). The treatment rate ratio is nearly unchanged, because a shared-scale random intercept multiplies both arms equally; the gap shows up in the level, not the ratio.

Interpreting the Marginal vs b=0 Difference

# Visualize the difference
newdata_plot <- expand.grid(
  trt = c("Placebo", "Progabide"),
  visit_c = seq(-1.5, 1.5, length.out = 20),
  log_base = mean(epi_long$log_base),
  age_c = 0
)

newdata_plot$pred_b0 <- predict(final_model, newdata = newdata_plot,
                                 re.form = NA, type = "response")
newdata_plot$pred_marginal <- compute_marginal_poisson(final_model, newdata_plot)

newdata_plot %>%
  pivot_longer(cols = c(pred_b0, pred_marginal),
               names_to = "type", values_to = "pred") %>%
  mutate(type = recode(type,
                       pred_b0 = "At b=0 (re.form=NA)",
                       pred_marginal = "True Marginal (MC)"),
         visit = visit_c + 2.5) %>%
  ggplot(aes(x = visit, y = pred, color = trt, linetype = type)) +
  geom_line(linewidth = 1) +
  labs(x = "Visit", y = "Predicted Seizure Rate",
       title = "True Marginal vs b=0 Predictions",
       subtitle = "True marginal predictions are higher due to Jensen's inequality (log link)",
       color = "Treatment", linetype = "Prediction Type") +
  theme(legend.position = "bottom")

Key insight: For log link, true marginal > b=0 prediction because \(E[\exp(X)] > \exp(E[X])\) (Jensen).

Interpreting the Marginal vs b=0 Difference

Uncorrelated Random Effects (Alternative)

# Sometimes useful when correlation is near-zero or causing convergence issues
fit_uncorr <- glmer(
  seizures ~ trt * visit_c + log_base + age_c +
    (1 | id) + (0 + visit_c | id),  # Uncorrelated
  data = epi_long,
  family = poisson(link = "log"),
  control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5))
)

# Compare AIC
cat("Correlated RE AIC:", round(AIC(final_model), 1), "\n")

Uncorrelated Random Effects (Alternative)

Correlated RE AIC: 1333.8 
cat("Uncorrelated RE AIC:", round(AIC(fit_uncorr), 1), "\n")
Uncorrelated RE AIC: 1331.9 

Why Use Uncorrelated Random Effects?

Situation Recommendation
Correlation near 0 Uncorrelated may be sufficient
Convergence issues Uncorrelated often more stable
Few observations per subject Correlation hard to estimate
Scientific interest in correlation Keep correlated structure

Understanding the Uncorrelated Syntax

Syntax: (1 | id) + (0 + visit_c | id)

Let us break this down:

  • (1 | id) - random intercept for each subject
  • (0 + visit_c | id) - random slope for visit_c, without an intercept in this term

Why 0 +? The 0 + removes the implicit intercept from the second random effects term. Without it, (visit_c | id) would estimate both an intercept AND a slope, leading to redundancy.

The key insight: When you specify two separate random effects terms for the same grouping factor, lme4 estimates them independently (i.e., assumes zero correlation).

Comparison:

  • (1 + visit_c | id) - estimates \(\sigma_0^2\), \(\sigma_1^2\), and \(\rho_{01}\) (3 parameters)
  • (1 | id) + (0 + visit_c | id) - estimates \(\sigma_0^2\) and \(\sigma_1^2\) only (2 parameters, \(\rho_{01} = 0\))

Convergence Troubleshooting Summary

Issue Solution
Model fails to converge Center continuous predictors
Singular fit (variance = 0) Drop the problematic random effect
Near-singular Hessian Simplify random effects structure
Very slow fitting Reduce nAGQ or use Laplace

SAS Equivalent (reference only)

FLW Tables 14.2/14.6 use PROC GLIMMIX with adaptive quadrature. The standalone program for our random-slope Poisson GLMM:

proc glimmix data=epilepsy method=quad(qpoints=50);
  class id;
  model seizures = trt|visit_c log_base age_c
        / dist=poisson link=log solution;
  random intercept visit_c / subject=id type=un;
run;

Note

trt|visit_c expands to the main effects and their interaction. type=un gives the unstructured \(2\times2\) \(G\) (the correlated form); use type=vc for the uncorrelated version. method=quad(qpoints=50) requests 50-point adaptive Gaussian quadrature, matching FLW; /solution prints the fixed-effect estimates.

Practical Workflow for Random Slope GLMMs

  1. Start with the random intercept model
  2. Add random slopes one at a time
  3. Check convergence and variance estimates (watch for singular fits)
  4. Justify complexity with LRT (boundary mixture) or AIC
  5. Check overdispersion (counts); consider negative binomial
  6. Examine empirical Bayes predictor distributions
  7. Run diagnostics on residuals and random effects
  8. Report both conditional and marginal effects

Summary

  • Random slopes allow subject-specific covariate effects
  • The correlation \(\rho_{01}\) between intercepts and slopes is often interpretable
  • Empirical Bayes predictors provide shrinkage-based predictions for individual subjects
  • re.form = NA gives predictions at \(b_i = 0\), not true marginal predictions
  • Computation is harder: Laplace for speed, AGQ for accuracy
  • Model selection balances fit (AIC/LRT) with interpretability
  • Always check for overdispersion in count data

Key Equations

Random Slope GLMM:

\[g(\mu_{ij}) = X_{ij}\beta + Z_{ij} b_i, \quad b_i \sim N_q(0, G)\]

Variance-Covariance Matrix:

\[G = \begin{pmatrix} \sigma_0^2 & \sigma_{01} \\ \sigma_{01} & \sigma_1^2 \end{pmatrix}\]

Empirical Bayes Predictor: \(\hat{\mathbf{b}}_i = E[\mathbf{b}_i \mid \mathbf{Y}_i, \hat{\boldsymbol\beta}, \hat{G}]\)

Reporting Random Slope Models

When reporting, include:

  1. Fixed effects with CIs (state: conditional on random effects)
  2. Random effects variance components (\(\sigma_0^2\), \(\sigma_1^2\)) on an interpretable scale
  3. Intercept-slope correlation and whether it is distinguishable from zero
  4. Model comparison (AIC; LRT with the boundary mixture for variance components)
  5. Estimation method (Laplace vs AGQ, number of quadrature points)
  6. Software and version

Common Mistakes in Random Slope GLMMs

Mistake Why It Is Wrong Correction
Adding random slopes with too few observations per subject Cannot reliably estimate slope variance with \(n_i < 5\) Use random intercept only, or collect more data
Ignoring singular fit warnings Variance estimate at boundary (0) indicates overfitting Simplify random effects structure
Using one combined random-slope term when you want uncorrelated REs A single combined term estimates the intercept-slope correlation \(\rho_{01}\) Use the two-separate-terms (uncorrelated) form shown earlier to force \(\rho_{01}=0\)
Not centering time variables Can cause convergence issues and make intercept hard to interpret Center time so 0 is meaningful
Forgetting that re.form=NA is NOT marginal For Poisson/logistic, \(E[Y \mid b=0] \neq E[Y]\) Use Monte Carlo integration for true marginal
Using Laplace when nAGQ would help Laplace can be biased with small clusters or random slopes Increase nAGQ to 5-10 for random slope models (lme4: only for q=1)
Halving the p-value for a random-slope LRT Adding a slope drops TWO parameters (variance + covariance) Use the Stram-Lee mixture \(\tfrac12\chi^2_1+\tfrac12\chi^2_2\), or a parametric bootstrap

Check Your Understanding: Model Selection

Question: You fit two models to seizure count data:

  • Model A: Random intercept only (AIC = 1245)
  • Model B: Random intercept + slope for time (AIC = 1238)

anova(A, B) reports \(\chi^2 = 9.0\) on 2 df (\(p = 0.011\)). Model B shows a singular fit with \(\hat{\sigma}_1^2 = 0.001\).

Which model should you report, and what is the correct boundary correction?

Answer: Report Model A (random intercept only).

  1. Going A \(\to\) B adds two parameters (\(\sigma_1^2\) and \(\sigma_{01}\)), so the boundary reference is the Stram-Lee mixture \(\tfrac12\chi^2_1 + \tfrac12\chi^2_2\), not “halve the p-value.” Under the mixture, \(p = 0.5\,P(\chi^2_1 > 9) + 0.5\,P(\chi^2_2 > 9) \approx 0.007\), still nominally significant.
  2. But the variance estimate is essentially zero (\(\hat{\sigma}_1^2 = 0.001\), singular fit): the random slope buys nothing.
  3. Rule: when a variance component is at the boundary, prefer the simpler model regardless of the test. A “significant” LRT against a near-zero variance is an artifact of the boundary, not evidence the slope is real.

For Next Time

Topics:

  • Transition models for longitudinal data
  • Markov structures for discrete outcomes
  • Comparison of marginal, mixed, and transition approaches

Reading:

  • Fitzmaurice et al. Chapter 16

Further Reading

  • Fitzmaurice, Laird & Ware (2011), Applied Longitudinal Analysis, Ch. 14 (GLMMs; this lecture), Sec. 14.7 case studies (Tables 14.5-14.8). Ch. 15 for PQL/MQL approximate estimation; Ch. 16 for marginal-vs-mixed coefficient comparison.
  • Breslow & Clayton (1993), “Approximate inference in generalized linear mixed models,” JASA 88, 9-25 (PQL; the epilepsy parameterization).
  • Thall & Vail (1990), Biometrics 46, 657-671 (the seizure data).
  • Zeger, Liang & Albert (1988), Biometrics 44, 1049-1060 (marginal-vs-conditional attenuation; beyond Ch. 14).
  • Stram & Lee (1994), Biometrics 50, 1171-1177 (boundary mixture for variance-component LRTs).
  • R: lme4 (Bates et al. 2015, JSS), glmmTMB (Brooks et al. 2017) for negative binomial mixed models.

Session Info

sessionInfo()

Session Info

R version 4.6.0 (2026-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] patchwork_1.3.2     broom.mixed_0.2.9.7 lme4_2.0-1         
[4] Matrix_1.7-5        ggplot2_4.0.3       tidyr_1.3.2        
[7] dplyr_1.2.1        

loaded via a namespace (and not attached):
 [1] gtable_0.3.6        TMB_1.9.21          xfun_0.57          
 [4] lattice_0.22-9      numDeriv_2016.8-1.1 vctrs_0.7.3        
 [7] tools_4.6.0         Rdpack_2.6.6        generics_0.1.4     
[10] stats4_4.6.0        parallel_4.6.0      sandwich_3.1-1     
[13] tibble_3.3.1        pkgconfig_2.0.3     RColorBrewer_1.1-3 
[16] S7_0.2.2            lifecycle_1.0.5     compiler_4.6.0     
[19] farver_2.1.2        codetools_0.2-20    htmltools_0.5.9    
[22] yaml_2.3.12         pillar_1.11.1       furrr_0.4.0        
[25] nloptr_2.2.1        MASS_7.3-65         reformulas_0.4.4   
[28] multcomp_1.4-30     boot_1.3-32         nlme_3.1-169       
[31] parallelly_1.47.0   tidyselect_1.2.1    digest_0.6.39      
[34] mvtnorm_1.3-7       future_1.70.0       purrr_1.2.2        
[37] listenv_0.10.1      labeling_0.4.3      forcats_1.0.1      
[40] splines_4.6.0       fastmap_1.2.0       grid_4.6.0         
[43] cli_3.6.6           magrittr_2.0.5      survival_3.8-6     
[46] utf8_1.2.6          TH.data_1.1-5       broom_1.0.12       
[49] withr_3.0.2         scales_1.4.0        backports_1.5.1    
[52] estimability_1.5.1  rmarkdown_2.31      emmeans_2.0.3      
[55] globals_0.19.1      otel_0.2.0          zoo_1.8-15         
[58] coda_0.19-4.1       evaluate_1.0.5      knitr_1.51         
[61] glmmTMB_1.1.14      rbibutils_2.4.1     mgcv_1.9-4         
[64] rlang_1.2.0         Rcpp_1.1.1-1.1      xtable_1.8-8       
[67] glue_1.8.1          minqa_1.2.8         jsonlite_2.0.0     
[70] R6_2.6.1