BIOS 667 - Lecture 8: Mixed-Effects Models (Ch. 8)

Fitzmaurice, Laird & Ware (2011) - Applied Longitudinal Data Analysis

Naim Rashid

Lecture Objectives

By the end of this lecture, you will be able to:

  • Distinguish between marginal (GEE) and subject-specific (mixed-effects) approaches
  • Formulate random-intercept and random-slope models for longitudinal data
  • Understand how random effects induce covariance structures (CS, nonstationary)
  • Apply shrinkage/EBLUP concepts for subject-specific predictions
  • Choose between ML and REML for appropriate model comparisons

Note

Reading. FLW Ch. 8 (8.1-8.10). Assessed by: HW3 (LME with CS/AR(1)/UN covariance; marginal vs conditional interpretation) and the longitudinal-modeling component of the final project.

Roadmap

Section Topics
Foundations Why mixed-effects? RI vs RS models
Estimation ML/REML, BLUPs, shrinkage
TLC Application Data wrangling, model fitting, diagnostics
Model Selection Comparing structures, boundary tests
Advanced Topics Heterogeneous variances, AR(1), splines
Summary Reporting checklist, practical guidance

Note

Pacing (2 class periods). Day 1 (Part I): foundations through BLUPs and shrinkage, plus the core RI/RS fits on the TLC data. Day 2 (Part II): heterogeneous variances, AR(1), diagnostics, testing, and practical guidance.

Prerequisites / recall

Before this lecture you should be able to:

  • Describe the CS, AR(1), and UN covariance structures and what each assumes (Ch. 7).
  • Interpret a marginal (population-average) regression coefficient.
  • Fit a basic linear model in R and read its summary() output.

Notation reference

Symbol Meaning
\(Y_{ij}\) response for subject \(i\) at occasion \(j\) (scalar)
\(\mathbf{Y}_i\) response vector for subject \(i\) \((n_i \times 1)\)
\(\mathbf{X}_i,\ \boldsymbol\beta\) fixed-effect design \(n_i \times p\) and coefficients
\(\mathbf{Z}_i,\ \mathbf{b}_i\) random-effect design and random effects, \(\mathbf{b}_i \sim N(0, G)\)
\(G\) between-subject random-effects covariance (FLW’s letter, Ch. 8)
\(R_i\) within-subject residual covariance, \(\boldsymbol\varepsilon_i \sim N(0, R_i)\)
\(\Sigma_i = \mathbf{Z}_i G \mathbf{Z}_i^\top + R_i\) marginal (total) covariance \(\operatorname{Cov}(\mathbf{Y}_i)\) for subject \(i\)
\(\rho\) correlation parameter (AR(1), exchangeable)

Indices: \(i\) subject \(1,\dots,N\); \(j\) occasion/time \(1,\dots,n_i\) (\(n\) when balanced); \(k\) state (transition models); \(g\) group. Bold = vector or matrix.

Part I (Day 1): Models, Fitting, and Interpretation

Why Mixed-Effects?

Longitudinal data: same person measured repeatedly leads to correlated outcomes.

Mixed-effects = fixed effects (population-average) + random effects (person-specific deviations).

What Mixed-Effects Captures

Feature Description
Between-subject heterogeneity Different baselines/slopes across individuals
Within-subject correlation Implied by shared random effects
Unbalanced designs Handles varying numbers of observations per subject
Mistimed visits Continuous time naturally accommodated

Subject-Specific vs Marginal Views

“Marginal” vs “subject-specific” is a contrast of estimands, not methods. The LME delivers both:

Estimand Quantity From the LME
Subject-specific \(E(Y_i \mid b_i) = X_i\boldsymbol\beta + Z_i b_i\) Conditional on the subject’s \(b_i\)
Marginal \(E(Y_i) = X_i\boldsymbol\beta\), \(\operatorname{Cov}(Y_i) = Z_i G Z_i^\top + \sigma^2 I\) Averaging over \(b_i\) (FLW pp. 190-191)

Averaging the subject-specific mean over \(b_i \sim N(0, G)\) gives the marginal mean \(X_i\boldsymbol\beta\), so the LME also has a marginal interpretation. GEE is one way to target that marginal mean directly (with robust SEs); the LME targets it while also predicting subject trajectories and leveraging all available data under MAR.

Model Building Blocks

Let \(Y_{ij}\) be the response for subject \(i\) at time \(t_{ij}\).

Random-Intercept (RI) Model

\[ Y_{ij} = \beta_0 + X_{ij}\boldsymbol\beta + b_{0i} + \varepsilon_{ij} \]

where:

  • \(b_{0i} \sim N(0, \sigma_b^2)\) is the subject-specific intercept
  • \(\varepsilon_{ij} \sim N(0, \sigma^2)\) is the residual error

Random-Slope (RS) Model

\[ Y_{ij} = \beta_0 + \beta_1 t_{ij} + X_{ij}\boldsymbol\beta + b_{0i} + b_{1i} t_{ij} + \varepsilon_{ij} \]

where the random effects follow:

\[ \begin{pmatrix} b_{0i} \\ b_{1i} \end{pmatrix} \sim N\!\left(\mathbf{0}, G\right), \quad G = \begin{pmatrix} \sigma_{b0}^2 & \sigma_{b01} \\ \sigma_{b01} & \sigma_{b1}^2 \end{pmatrix} \]

Implied Covariance (RI Model)

For the random-intercept model:

\[ \mathrm{Var}(Y_{ij}) = \sigma_b^2 + \sigma^2 \]

The shared random intercept \(b_{0i}\) is the only thing two observations on subject \(i\) have in common, so the off-diagonal covariance is

\[ \mathrm{Cov}(Y_{ij}, Y_{ik}) = \sigma_b^2 \quad (j \ne k) \]

Dividing by the variance gives the correlation, which is the same for every pair of times:

\[ \mathrm{Corr}(Y_{ij}, Y_{ik}) = \frac{\sigma_b^2}{\sigma_b^2 + \sigma^2} \quad \text{(Compound Symmetry)} \]

Implied Covariance (RS Model)

For the random-slope model the implied variance changes with time (nonstationary):

\[ \mathrm{Var}(Y_{ij}) = \sigma_{b0}^2 + 2\, t_{ij}\, \sigma_{b01} + t_{ij}^2\, \sigma_{b1}^2 + \sigma^2 \]

This is a quadratic in \(t_{ij}\): trajectories fan out (or in) over time, unlike the constant variance of compound symmetry. This is the nonstationary, “linear-in-time” covariance shown in the RI+RS row of the Link to Chapter 7 table below (it is introduced here, not in Ch. 7).

RI vs RS: When and Why

Model Use When
RI Subjects have different baselines but common trajectory shape
RS Subjects have different rates of change; trajectories fan in/out

If slopes vary across persons (common in biology), RS is often essential.

Two-Stage Random Effects Formulation (FLW 8.4)

FLW motivate the LME model in two stages, which makes clear where the random effects come from:

  • Stage 1 (within-subject): \(\mathbf{Y}_i = \mathbf{Z}_i \boldsymbol\beta_i + \boldsymbol\varepsilon_i\), with \(\boldsymbol\varepsilon_i \sim N(0, R_i)\). Each subject follows their own regression with coefficients \(\boldsymbol\beta_i\).
  • Stage 2 (between-subject): \(\boldsymbol\beta_i = \mathbf{A}_i \boldsymbol\beta + \mathbf{b}_i\), with \(\mathbf{b}_i \sim N(0, G)\). Here \(\mathbf{A}_i\) is a known between-subject design matrix linking each subject’s coefficients to the population means \(\boldsymbol\beta\) (often an identity, or a map of subject-level covariates). The subject-specific coefficients vary around \(\boldsymbol\beta\).

Substituting Stage 2 into Stage 1 gives the familiar single-equation form

\[ \mathbf{Y}_i = \mathbf{X}_i \boldsymbol\beta + \mathbf{Z}_i \mathbf{b}_i + \boldsymbol\varepsilon_i, \qquad \Sigma_i = \mathbf{Z}_i G \mathbf{Z}_i^\top + R_i . \]

The random-intercept and random-slope models are special cases: the two-stage view simply makes explicit that \(G\) is the between-subject covariance of the \(\mathbf{b}_i\).

Check Your Understanding: Random Effects Basics

  1. In the random-intercept model, what is the implied correlation between two observations \(Y_{ij}\) and \(Y_{ik}\) from the same subject?

  2. When would you choose a random-slope model over a random-intercept model?

  3. A random-intercept model is equivalent to which covariance pattern from Chapter 7?

Answers:

  1. \(\text{Corr}(Y_{ij}, Y_{ik}) = \frac{\sigma_b^2}{\sigma_b^2 + \sigma^2}\). This is the intraclass correlation (ICC), representing the proportion of total variance attributable to between-subject differences.

  2. When you expect subjects to have different rates of change over time, not just different starting points. Visual signs include trajectories that “fan out” or cross over time in spaghetti plots.

  3. Compound Symmetry (CS). The random intercept induces constant correlation between any two time points for the same subject.

Estimation Overview

Method Use For
ML Comparing different fixed effects
REML Comparing different random-effects structures (same fixed effects)

Mixed-model equations (Henderson) yield BLUPs of random effects; with unknown variance components they become EBLUPs (empirical BLUPs, the FLW term for predicted random effects: BLUPs computed using the estimated variance components rather than known ones).

Shrinkage Intuition (EBLUP)

Borrowing strength: subject-specific intercepts/slopes are pulled toward population means.

Shrinkage is stronger when:

  • Subject has few measurements
  • Measurement error \(\sigma^2\) is large
  • Random-effect variance is small

This prevents overfitting noisy within-subject patterns.

TLC Data Setup

library(dplyr); library(tidyr); library(nlme); library(ggplot2); library(emmeans)

path <- "../../data/tlc.csv"
tlc_wide <- read.csv(path, check.names = FALSE, stringsAsFactors = FALSE)
names(tlc_wide) <- trimws(names(tlc_wide))

tlc <- tlc_wide %>%
  rename(
    id  = ID,
    trt = `Treatment Group`,
    w0  = `Lead Level Week 0`,
    w1  = `Lead Level Week 1`,
    w4  = `Lead Level Week 4`,
    w6  = `Lead Level Week 6`
  ) %>%
  mutate(trt = factor(trt, levels = c("P","A"), labels = c("Placebo","Succimer"))) %>%
  pivot_longer(c(w0,w1,w4,w6), names_to = "wk", values_to = "y") %>%
  mutate(
    time     = recode(wk, w0="0", w1="1", w4="4", w6="6"),
    time_f   = factor(time, levels = c("0","1","4","6")),
    time_num = as.numeric(as.character(time)),
    id       = factor(id),
    group    = trt
  ) %>%
  select(id, group, time_f, time_num, y) %>%
  arrange(id, time_num)

stopifnot(all(!is.na(tlc$y)))
dplyr::count(tlc, id)

TLC Data Setup

# A tibble: 100 × 2
   id        n
   <fct> <int>
 1 1         4
 2 2         4
 3 3         4
 4 4         4
 5 5         4
 6 6         4
 7 7         4
 8 8         4
 9 9         4
10 10        4
# ℹ 90 more rows

TLC Visuals: Spaghetti Plot

ggplot(tlc, aes(time_num, y, group = id, color = group)) +
  geom_line(alpha = 0.25) +
  stat_summary(aes(group = group), fun = mean, geom = "line", linewidth = 1.1) +
  labs(title = "Lead levels over time (TLC)", x = "Week", y = "Lead") +
  theme_minimal() + guides(color = guide_legend(title = "Group"))

TLC Visuals: Spaghetti Plot

Random-Intercept Model (RI)

lme_ri <- lme(y ~ group * time_f, random = ~ 1 | id, data = tlc, method = "REML")
summary(lme_ri)$tTable[1:8, , drop = FALSE]

Random-Intercept Model (RI)

                        Value Std.Error  DF    t-value      p-value
(Intercept)            26.272 0.9370175 294 28.0378980 4.595088e-85
groupSuccimer           0.268 1.3251428  98  0.2022424 8.401465e-01
time_f1                -1.612 0.8428574 294 -1.9125418 5.677827e-02
time_f4                -2.202 0.8428574 294 -2.6125416 9.449112e-03
time_f6                -2.626 0.8428574 294 -3.1155923 2.017079e-03
groupSuccimer:time_f1 -11.406 1.1919804 294 -9.5689496 4.631638e-19
groupSuccimer:time_f4  -8.824 1.1919804 294 -7.4028065 1.411814e-12
groupSuccimer:time_f6  -3.152 1.1919804 294 -2.6443389 8.624648e-03
VarCorr(lme_ri)
id = pdLogChol(1) 
            Variance StdDev  
(Intercept) 26.13987 5.112717
Residual    17.76021 4.214287

Reading VarCorr Output: Random-Intercept Model

How to Interpret VarCorr for RI Model

The VarCorr() output shows:

Row Meaning Interpretation
(Intercept) under id \(\sigma_b\) (StdDev) and \(\sigma_b^2\) (Variance) Between-subject SD: How much subjects differ at baseline
Residual \(\sigma\) (StdDev) and \(\sigma^2\) (Variance) Within-subject SD: Variability around each subject’s trajectory

Derived quantities:

  • ICC (intraclass correlation) = \(\frac{\sigma_b^2}{\sigma_b^2 + \sigma^2}\) = proportion of variance due to between-subject differences
  • Total variance = \(\sigma_b^2 + \sigma^2\)

Example interpretation: If VarCorr shows (Intercept) StdDev = 8.5 and Residual StdDev = 5.2, then ICC = \(8.5^2/(8.5^2 + 5.2^2) = 72.25/99.29 \approx 0.73\). About 73% of the total variance is between subjects.

RI Model Specification

  • Fixed: group x time (factor) = maximal mean for covariance selection
  • Random: intercept only (\(b_{0i}\))

Random-Slope Model (RS)

A random slope needs numeric time (time_num) so each subject has a single slope coefficient. (We keep the factor time_f for the population mean in the next model; the consequence of mixing factor and numeric time for model comparison shows up on the AIC slide.)

ctrl <- lmeControl(
  niterEM     = 200,
  msMaxIter   = 200,
  msMaxEval   = 400,
  tolerance   = 1e-6,
  returnObject= TRUE
)

lme_rs <- lme(y ~ group * time_num, random = ~ time_num | id, data = tlc, method = "REML", control = ctrl)
summary(lme_rs)$tTable

Random coefficients allow individual-specific baseline and slope.

Note

ctrl (defined above via lmeControl) just relaxes nlme’s convergence settings and is reused via control = ctrl in the later fits. If a fit warns about convergence this usually helps. Note that returnObject = TRUE returns the fit even if it did not fully converge, so always confirm the fit converged before trusting it.

Random-Slope Model (RS)

                             Value Std.Error  DF    t-value      p-value
(Intercept)            25.68536264 0.8268135 298 31.0654847 1.854745e-95
groupSuccimer          -5.41872527 1.1692909  98 -4.6341978 1.104758e-05
time_num               -0.37213187 0.1766540 298 -2.1065573 3.599182e-02
groupSuccimer:time_num -0.05773626 0.2498265 298 -0.2311054 8.173915e-01
VarCorr(lme_rs)
id = pdLogChol(time_num) 
            Variance   StdDev    Corr  
(Intercept) 14.8548089 3.8541937 (Intr)
time_num     0.1017498 0.3189825 1     
Residual    33.1827562 5.7604476       

Reading VarCorr Output: Random-Slope Model

How to Interpret VarCorr for RS Model

The VarCorr() output for a random-slope model shows:

Row Meaning Interpretation
(Intercept) under id \(\sigma_{b0}\) Between-subject SD in intercepts: How much starting points vary
time_num under id \(\sigma_{b1}\) Between-subject SD in slopes: How much rates of change vary
Corr \(\rho_{01}\) Intercept-slope correlation: Do high-baseline subjects change faster or slower?
Residual \(\sigma\) Within-subject residual SD: Unexplained variability

Interpreting the correlation:

  • \(\rho_{01} > 0\): Subjects with higher intercepts tend to have steeper slopes
  • \(\rho_{01} < 0\): Subjects with higher intercepts tend to have flatter (or declining) slopes
  • \(\rho_{01} \approx 0\): Baseline and rate of change are unrelated

Note: The intercept-slope correlation is easier to interpret when time is centered (e.g., at baseline or mid-study).

RI vs RS: AIC/REML Comparison

data.frame(
  model = c("RI (time_f)", "RS (time_num)"),
  AIC   = c(AIC(lme_ri), AIC(lme_rs)),
  logLik= c(logLik(lme_ri), logLik(lme_rs))
)

Not strictly comparable if fixed effects differ (time factor vs numeric): REML AIC compares random structures only when the fixed effects match (see Estimation Overview).

RI vs RS: AIC/REML Comparison

          model      AIC    logLik
1   RI (time_f) 2480.621 -1230.311
2 RS (time_num) 2681.281 -1332.641

Factor-Time Fixed Effects with RS Random

Strategy: keep time as factor in fixed effects (rich mean), use time_num in random part to permit RS.

lme_rifs <- lme(y ~ group * time_f, random = ~ time_num | id, data = tlc, method = "REML", control = ctrl)
data.frame(
  model = c("RI", "RI+RS"),
  AIC   = c(AIC(lme_ri), AIC(lme_rifs)),
  logLik= c(logLik(lme_ri), logLik(lme_rifs))
)

Factor-Time Fixed Effects with RS Random

  model      AIC    logLik
1    RI 2480.621 -1230.311
2 RI+RS 2471.239 -1223.620

Interpreting Fixed Effects with RS

Component Interpretation
Fixed effects Summarize average trajectory
Random effects Capture individual departures
Interaction group x time Difference-in-change averaged across subjects

EMMs for Contrasts at Planned Times

emm_ri <- emmeans(lme_ri, ~ group | time_f)
pairs(emm_ri)

Same use of emmeans as in Ch. 7; SEs now reflect random-effects structure.

EMMs for Contrasts at Planned Times

time_f = 0:
 contrast           estimate   SE df t.ratio p.value
 Placebo - Succimer   -0.268 1.33 98  -0.202  0.8401

time_f = 1:
 contrast           estimate   SE df t.ratio p.value
 Placebo - Succimer   11.138 1.33 98   8.405 <0.0001

time_f = 4:
 contrast           estimate   SE df t.ratio p.value
 Placebo - Succimer    8.556 1.33 98   6.457 <0.0001

time_f = 6:
 contrast           estimate   SE df t.ratio p.value
 Placebo - Succimer    2.884 1.33 98   2.176  0.0319

Degrees-of-freedom method: containment 

BLUPs: Subject-Specific Intercepts/Slopes

re <- ranef(lme_rifs)
head(re)

So what? A wide spread of these intercept BLUPs \(b_{0i}\) means substantial between-subject variation in baseline, consistent with a large \(\sigma_{b0}\) in VarCorr: the histogram is the picture of that variance component. (The QQ-plot slide recomputes re from the same fit.)

BLUPs: Subject-Specific Intercepts/Slopes

  (Intercept)    time_num
1   1.3275459  0.10983619
2   0.9802875  0.08112388
3   2.5109979  0.20778267
4  -0.8697587 -0.07197501
5  -7.1566619 -0.59223765
6  -5.9970622 -0.49627353
re_df <- as.data.frame(re)
ggplot(re_df, aes(x = `(Intercept)`)) + geom_histogram(bins = 20) +
  labs(title = "BLUPs of random intercepts", x = expression(b[0*i]), y = "Count") +
  theme_minimal()

Part II (Day 2): Extensions, Diagnostics, and Practice

Heterogeneous Residual Variances + RI/RS

Mixed models can combine random effects + a within-subject residual covariance \(R_i\):

  • Heterogeneous residual SDs via weights = varIdent(~ 1 | time_f) (a separate residual SD per visit, instead of one constant \(\sigma\))
  • Additional within-subject correlation via corAR1(~ time_num | id) (residual AR(1): correlation that decays with the time gap) if needed
lme_rifs_het <- lme(y ~ group * time_f, random = ~ time_num | id,
                    weights = varIdent(~ 1 | time_f),
                    data = tlc, method = "REML", control = ctrl)
AIC(lme_rifs, lme_rifs_het)

Heterogeneous Residual Variances + RI/RS

             df      AIC
lme_rifs     12 2471.239
lme_rifs_het 12 2471.239

Adding Residual AR(1) to RI/RS

lme_rifs_ar1 <- lme(y ~ group * time_f, random = ~ time_num | id,
                    correlation = corAR1(form = ~ time_num | id),
                    data = tlc, method = "REML", control = ctrl)
data.frame(model = c("RI+RS", "RI+RS + AR(1)"),
           AIC   = c(AIC(lme_rifs), AIC(lme_rifs_ar1)))

Only add AR(1) if random effects do not remove residual serial correlation.

Adding Residual AR(1) to RI/RS

          model      AIC
1         RI+RS 2471.239
2 RI+RS + AR(1) 2473.233

Diagnostics: Residuals vs Fitted

res <- resid(lme_rifs, type = "normalized")
fit <- fitted(lme_rifs)

ggplot(data.frame(fit, res), aes(fit, res)) +
  geom_point(alpha = 0.5) + geom_hline(yintercept = 0, linewidth = 0.3) +
  labs(title = "Residuals vs fitted (normalized)", x = "Fitted", y = "Residual") +
  theme_minimal()

Normalized residuals are Cholesky-standardized: they strip out the modeled within-subject correlation, which is why we check these (not raw residuals) for a correlated-data model.

Diagnostics: Residuals vs Fitted

Diagnostics: ACF of Residuals

acf_out <- ACF(lme_rifs, resType = "normalized")
ggplot(acf_out, aes(lag, ACF)) + geom_col(width = 0.1) +
  geom_hline(yintercept = 0, linewidth = 0.3) +
  labs(title = "ACF of normalized residuals", x = "Lag", y = "ACF",
       subtitle = "Reading rule: bars within the band (flat) = good; slow decay = leftover serial correlation") +
  theme_minimal()

Diagnostics: ACF of Residuals

Diagnostics: Semivariogram

vg <- Variogram(lme_rifs, form = ~ time_num | id, resType = "normalized")
ggplot(vg, aes(dist, variog)) + geom_point() +
  geom_smooth(se = FALSE, span = 0.9) +
  geom_hline(yintercept = 1, linetype = 3) +
  labs(title = "Semivariogram of normalized residuals", x = "Lag (weeks apart)", y = expression(hat(gamma)),
       subtitle = "Reading rule: flat near 1 = correlation captured; rising from a low value = consider AR(1)") +
  theme_minimal()

Diagnostics: Semivariogram

Diagnostics: Random-Effects QQ Plot

re <- ranef(lme_rifs)
qq <- as.data.frame(re)
ggplot(qq, aes(sample = `(Intercept)`)) +
  stat_qq() + stat_qq_line() +
  labs(title = "QQ plot: predicted random intercepts (BLUPs)", x = "Theoretical", y = "Sample") +
  theme_minimal()

Do NOT use this to test random-effects normality

The predicted random effects (empirical BLUPs) are shrunk toward 0, so their distribution understates the true random-effects variance and is not a valid check of the \(N(0, G)\) assumption (FLW p. 273). Use the BLUP QQ plot only to spot individual subjects with unusual profiles (outliers/influence), never as a normality test.

Diagnostics: Random-Effects QQ Plot

Testing Fixed Effects

Test Type Method
Wald tests (z/F) Large-sample z (FLW); KR/Satterthwaite df is a modern add-on
Likelihood-ratio tests Between nested fixed-effects models (requires ML fits)

FLW Ch. 8 uses large-sample inference: Wald z-statistics and LRTs without a small-sample df adjustment (Tables 8.5 and 8.12 report z-statistics).

Warning

Beyond FLW (standard modern add-on). Kenward-Roger and Satterthwaite df correct Wald tests in small samples but are not part of FLW Ch. 8. nlme::lme (used here) provides neither: it reports containment df. For KR/Satterthwaite, fit with lme4::lmer and use lmerTest (Satterthwaite) or pbkrtest (Kenward-Roger). The LRT route below works directly in nlme once both models are refit with ML.

lme_rifs_ML <- update(lme_rifs, method = "ML")
lme_rifs_noInt_ML <- lme(y ~ group + time_f, random = ~ time_num | id, data = tlc, method = "ML", control = ctrl)
anova(lme_rifs_noInt_ML, lme_rifs_ML)

Testing Fixed Effects

                  Model df      AIC      BIC    logLik   Test  L.Ratio p-value
lme_rifs_noInt_ML     1  9 2578.232 2614.155 -1280.116                        
lme_rifs_ML           2 12 2481.167 2529.064 -1228.583 1 vs 2 103.0653  <.0001

Testing Variance Components (Boundary)

Variance-component tests are boundary tests (a variance cannot be negative), so the null distribution is a 50:50 mixture, not a plain \(\chi^2\). Which mixture depends on how many parameters you drop:

  • One variance (e.g. \(\sigma_{b0}^2 = 0\): is a random intercept needed at all?): \(\tfrac12\chi^2_0 + \tfrac12\chi^2_1\). Only here is the corrected p the naive \(\chi^2_1\) p-value halved.
  • A random slope, dropped from a model that already has a random intercept (our example below: removes \(\sigma_{b1}^2\) and the intercept-slope covariance \(\sigma_{b01}\), two parameters): \(\tfrac12\chi^2_1 + \tfrac12\chi^2_2\) (Stram & Lee). Not a simple halving.
lrt <- anova(lme_ri, lme_rifs)   # REML; same fixed effects, drop the random slope
lrt

The naive anova() \(\chi^2_2\) p-value is conservative (too large). The mixture above is the correct adjustment for this two-parameter test.

Note

FLW’s procedure (Section 8.5, p. 209). Compare the LRT statistic against the 50:50 chi-squared mixture critical values in FLW Appendix Table C.1. For comparing two correlated random effects (random intercept + slope) versus one (random intercept only) – exactly this random-slope test, the \(\tfrac12\chi^2_1 + \tfrac12\chi^2_2\) mixture – the \(\alpha = 0.05\) critical value is the \(q=1\) row of Table C.1, 5.14 (smaller than the naive \(\chi^2_2\) value of 5.99). FLW recommend \(\alpha = 0.10\) instead of \(0.05\) only for more complex comparisons (\(q\) vs \(q+k\) correlated random effects with \(k > 1\)), where the null distribution is less well understood. (Testing a single variance component at 0 is the \(\tfrac12\chi^2_0 + \tfrac12\chi^2_1\) case, \(\alpha = 0.05\) critical value about 2.71.)

Tip

Beyond FLW (modern extension). For exact inference on a single variance component, a parametric bootstrap of the restricted LRT is available via RLRsim::exactRLRT; pbkrtest offers a parametric bootstrap LRT. These are not the chapter’s method (FLW uses the tabulated mixture above), but they avoid the asymptotic mixture approximation.

Testing Variance Components (Boundary)

         Model df      AIC      BIC    logLik   Test  L.Ratio p-value
lme_ri       1 10 2480.621 2520.334 -1230.311                        
lme_rifs     2 12 2471.239 2518.894 -1223.620 1 vs 2 13.38185  0.0012
# Dropping the random slope removes TWO parameters: use the 1/2 chi^2_1 + 1/2 chi^2_2 mixture
LR <- lrt[["L.Ratio"]][2]
p_mix <- 0.5 * pchisq(LR, 1, lower.tail = FALSE) + 0.5 * pchisq(LR, 2, lower.tail = FALSE)
c(naive_chisq2 = lrt[["p-value"]][2], mixture_corrected = p_mix)
     naive_chisq2 mixture_corrected 
      0.001242135       0.000748103 

Centering Time for Interpretation

With RS, centering time at a meaningful point (e.g., baseline or mid-study) stabilizes covariance and aids interpretation:

  • Intercept = mean response at the centering time
  • Random-intercept/slope correlation becomes more interpretable
tlc <- tlc %>% mutate(time_c = time_num - mean(time_num))
lme_rifs_c <- lme(y ~ group * time_c, random = ~ time_c | id, data = tlc, method = "REML", control = ctrl)
VarCorr(lme_rifs_c)

Centering Time for Interpretation

id = pdLogChol(time_c) 
            Variance   StdDev    Corr  
(Intercept) 22.3728335 4.7299930 (Intr)
time_c       0.1017683 0.3190114 1     
Residual    33.1869097 5.7608081       

Check Your Understanding: Model Selection and Testing

  1. You want to test whether the group * time_f interaction is significant. Should you use ML or REML for this comparison?

  2. When testing whether to drop the random slope (i.e., testing \(\sigma_{b1}^2 = 0\)), why is the standard \(\chi^2\) test inappropriate?

  3. A colleague compares two models: one with random = ~ 1 | id and one with random = ~ time | id, but the fixed effects differ (one uses time_f, the other uses time_num). Can they use REML AIC to compare these models?

Answers:

  1. ML. Use ML when comparing models with different fixed effects. REML likelihoods are not comparable across models with different fixed-effects structures.

  2. Testing \(\sigma_{b1}^2 = 0\) is a boundary test because the null places the parameter on the edge of its space (variance cannot be negative), so the null distribution is a 50:50 mixture, not a plain \(\chi^2\). Dropping the random slope from a model that already has a random intercept removes two parameters (\(\sigma_{b1}^2\) and the intercept-slope covariance \(\sigma_{b01}\)), so the correct mixture is \(\frac{1}{2}\chi^2_1 + \frac{1}{2}\chi^2_2\), not the standard \(\chi^2_2\). (Testing a single variance, e.g. whether to include a random intercept at all, gives \(\frac{1}{2}\chi^2_0 + \frac{1}{2}\chi^2_1\), where halving the p-value is exact.) The standard \(\chi^2\) test is conservative.

  3. No. REML AIC is only valid for comparing models with identical fixed effects. Since the fixed effects differ, they cannot use REML AIC. They would need to either (a) make the fixed effects the same, or (b) use ML for both models if comparing fixed effects is the goal.

Beyond Linear Time: Splines

library(splines)
lme_spl <- lme(y ~ group * ns(time_num, df = 3),
               random = ~ time_num | id, data = tlc, method = "REML", control = ctrl)
AIC(lme_rifs, lme_spl)

Flexible mean over time; keep random slopes (or random spline coefficients if enough data).

Beyond Linear Time: Splines

         df      AIC
lme_rifs 12 2471.239
lme_spl  12 2465.472

Time-Varying Covariates (TVCs)

Mixed models handle TVCs naturally.

Watch out for endogeneity (covariate depends on past \(Y_{ij}\)).

Interactions with time (fixed and/or random slopes) often required.

Missing Data (MAR) and LMMs

Under MAR, ML/REML estimates remain valid with an appropriate mean + random-effects structure.

LMMs use all available data without ad hoc imputation.

Simulation: Shrinkage and EBLUPs

set.seed(8)
N <- 200; m <- 3
ids <- factor(seq_len(N))
times <- 0:2
G <- matrix(c(1.0, 0.3, 0.3, 0.4), 2, 2)  # random-effects covariance (FLW's G)
sigma2 <- 1.2^2                            # residual variance

sim <- do.call(rbind, lapply(ids, function(i){
  b <- MASS::mvrnorm(1, mu = c(0,0), Sigma = G)
  tibble::tibble(id = i, time = times,
                 y = 5 + 0.6*time + b[1] + b[2]*time + rnorm(length(times), sd = sqrt(sigma2)))
}))

fit <- lme(y ~ time, random = ~ time | id, data = sim, method = "REML", control = ctrl)
blup <- ranef(fit)
head(blup)

Note BLUPs near 0 when subject has little information (shrinkage).

Simulation: Shrinkage and EBLUPs

  (Intercept)        time
1  -0.1305327 -0.04763182
2  -0.9065980 -0.33083520
3   0.3090307  0.11277080
4   0.5303020  0.19351415
5  -0.6292974 -0.22963803
6   0.9350150  0.34119200

Visual: Shrinkage with Few vs Many Points

set.seed(81)
ids_few <- sample(levels(sim$id), 6)
pred <- expand.grid(id = ids_few, time = seq(0,2,by=.05))
pred$pop <- predict(fit, newdata = pred, level = 0)
pred$ss  <- predict(fit, newdata = pred, level = 1)

ggplot(sim %>% dplyr::filter(id %in% ids_few),
       aes(time, y, group = id)) +
  geom_point() +
  geom_line(aes(y = pop, group = id), data = pred, linetype = 2) +
  geom_line(aes(y = ss,  group = id), data = pred) +
  facet_wrap(~ id, ncol = 3) +
  labs(title = "Shrinkage stronger with sparse/noisy subjects", x = "Time", y = "Y") +
  theme_minimal()

Visual: Shrinkage with Few vs Many Points

Check Your Understanding: Shrinkage and BLUPs

  1. What are the three conditions that make shrinkage stronger (i.e., pull subject-specific estimates more toward the population mean)?

  2. What is the difference between a BLUP and an EBLUP?

  3. In predict(fit, level = 0) vs predict(fit, level = 1), which one incorporates the subject-specific random effects?

Answers:

  1. Shrinkage is stronger when: (a) the subject has few measurements, (b) the residual error \(\sigma^2\) is large, and (c) the random-effect variance is small. In each case, subject-specific data is less informative, so we borrow more from the population.

  2. BLUP (Best Linear Unbiased Predictor) assumes variance components are known. EBLUP (Empirical BLUP) uses estimated variance components from the data. In practice, we always compute EBLUPs since true variances are unknown.

  3. level = 1 gives subject-specific predictions that include the BLUPs. level = 0 gives population-average predictions using only the fixed effects. The solid lines (level=1) in the shrinkage plot follow individual patterns; dashed lines (level=0) show the overall population trend.

Prediction and Intervals

Level What It Uses
Population-level (level = 0) Fixed effects only
Subject-specific (level = 1) Includes BLUPs

Prediction intervals require accounting for both residual and random-effect uncertainty, so there is no one-line predict() interval for a subject-specific prediction. The standard route (stated as reference, not run here):

Note

Parametric-bootstrap prediction interval (recipe). (1) Simulate new random effects \(\mathbf{b}^\ast \sim N(0,\hat G)\) and residuals \(\varepsilon^\ast \sim N(0,\hat\sigma^2)\) from the fitted model; (2) form \(y^\ast = \mathbf{X}\hat{\boldsymbol\beta} + \mathbf{Z}\mathbf{b}^\ast + \varepsilon^\ast\) for the target subject and time; (3) repeat many times and take empirical quantiles. lme4::bootMer automates this for lmer fits; for nlme, simulate from VarCorr/ranef by hand.

Influence and Outliers

  • Check standardized residuals, random-effect outliers, leverage
  • Consider robust mixed models or transformations if heavy tails/outliers

Random-Effects Covariance Interpretation

Component Interpretation
\(\sigma_{b0}^2\) Between-subject variance at the centered time
\(\sigma_{b1}^2\) Heterogeneity in slopes
\(\sigma_{b01}\) Intercept-slope association (positive means high baseline with steeper increase)

When to Add Residual Correlation

If RI/RS + heterogeneous residual variances still leave serial correlation in residuals (ACF/variogram), add corAR1 or similar.

Do not over-layer: random slopes often absorb a lot of serial correlation.

Practical Modeling Sequence

Step Action
1 Choose rich mean (time as factor or spline; group x time)
2 Start with RI; consider RS if trajectories differ in slope
3 Add heterogeneous residual SDs if needed; check residual ACF/variogram
4 Only then consider additional residual correlation (e.g., AR(1))
5 Compare REML AIC for random/residual structures; ML for fixed effects
6 Validate with residual and random-effect diagnostics

Class Exercise Prompt (TLC)

  1. Fit: RI, RI+RS, RI+RS+varIdent
  2. Compare by REML AIC; interpret variance components
  3. Report group differences at each visit (EMMs) with 95% CIs

Code Template for Exercise

# 1) RI
fit_ri <- lme(y ~ group * time_f, random = ~ 1 | id, data = tlc, method = "REML")

# 2) RI+RS (time_num in random part)
fit_rs <- lme(y ~ group * time_f, random = ~ time_num | id, data = tlc, method = "REML", control = ctrl)

# 3) Add hetero residual SDs by visit
fit_rs_het <- update(fit_rs, weights = varIdent(~ 1 | time_f))

# Compare REML AIC
data.frame(model=c("RI","RI+RS","RI+RS+het"),
           AIC=c(AIC(fit_ri), AIC(fit_rs), AIC(fit_rs_het)))

Code Template for Exercise

      model      AIC
1        RI 2480.621
2     RI+RS 2471.239
3 RI+RS+het 2471.239
# EMMs: group contrasts by visit
emmeans(fit_rs_het, ~ group | time_f) |> pairs()
time_f = 0:
 contrast           estimate   SE df t.ratio p.value
 Placebo - Succimer   -0.268 1.17 98  -0.229  0.8196

time_f = 1:
 contrast           estimate   SE df t.ratio p.value
 Placebo - Succimer   11.138 1.22 98   9.111 <0.0001

time_f = 4:
 contrast           estimate   SE df t.ratio p.value
 Placebo - Succimer    8.556 1.38 98   6.185 <0.0001

time_f = 6:
 contrast           estimate   SE df t.ratio p.value
 Placebo - Succimer    2.884 1.50 98   1.926  0.0570

Degrees-of-freedom method: containment 

Common Pitfalls

Pitfall Correct Approach
Comparing random structures under ML Use REML
Boundary tests with plain \(\chi^2\) Use mixture or bootstrap
Ignoring centering for RS Leads to awkward intercept/slope correlation
High-degree polynomials for time Prefer splines

Common Mistakes to Avoid

Mistakes That Lead to Wrong Results

Mistake Why It Is Wrong Correct Approach
Using REML to compare different fixed effects REML likelihoods not comparable Refit both models with ML, then compare
Using ML to select random-effects structure ML underestimates variances Use REML when fixed effects are the same
Using standard \(\chi^2\) for variance component tests Boundary problem: variance \(\geq 0\) Use the right 50:50 mixture (\(\frac{1}{2}\chi^2_0+\frac{1}{2}\chi^2_1\) for one variance; \(\frac{1}{2}\chi^2_1+\frac{1}{2}\chi^2_2\) to drop a random slope), comparing to FLW Appendix Table C.1 critical values (a parametric bootstrap is a beyond-FLW option)
Not centering time in RS models Hard-to-interpret intercept-slope correlation Center at baseline or mid-study
Adding AR(1) without checking if RS is sufficient Random slopes often absorb serial correlation Check residual ACF after fitting RS first
Interpreting VarCorr “StdDev” as variance Column shows standard deviation, not variance Square the StdDev to get variance
Forgetting that BLUPs are shrunk BLUPs are biased toward population mean Use for prediction, not unbiased estimation
Extrapolating subject-specific predictions BLUPs unreliable beyond observed data Limit predictions to data range

RI/RS vs Pure Covariance Models (Ch. 7)

  • Many covariance patterns can be mimicked by random effects
  • Mixed models interpret \(\Sigma_i\) through heterogeneity in baseline/slopes
  • For prediction and patient-specific effects, mixed models are natural

Reporting Checklist (LMM)

Element Details
Mean structure Time coding, interactions, rationale
Random structure RI/RS; covariance of random effects \(G\)
Residual structure Heterogeneous SDs, AR(1) if any
Estimation method ML/REML and model comparisons
Estimates Fixed effects with CIs; variance components
Diagnostics Residual ACF/variogram; BLUP plot for unusual subjects (not RE normality); influence
Sensitivity Additional analyses

Quick Reference: nlme::lme Syntax

# Random-intercept
lme(y ~ fixed_terms, random = ~ 1 | id, data = dat)

# Random-slope (linear time)
lme(y ~ fixed_terms, random = ~ time | id, data = dat)

# Add hetero residual SDs by visit
lme(y ~ fixed_terms, random = ~ time | id,
    weights = varIdent(~ 1 | time_factor), data = dat)

# Add residual AR(1)
lme(y ~ fixed_terms, random = ~ time | id,
    correlation = corAR1(~ time | id), data = dat)

The Same Model in SAS (PROC MIXED)

R is our runnable language (used in the slides above and in the homework). For students who work in SAS, here is the equivalent, shown for reference only and not executed. This mirrors FLW’s Computing section (8.9).

/* Not run here. SAS counterpart of the random-slope fit above, on the TLC data. */
PROC MIXED DATA=tlc METHOD=REML;
  CLASS id trt;
  MODEL y = time trt time*trt / SOLUTION;
  RANDOM intercept time / SUBJECT=id TYPE=UN G;
RUN;
  • RANDOM ... / SUBJECT=id is the SAS counterpart of R’s random = ~ time | id (or (time | id) in lmer).
  • METHOD=REML matches the nlme::lme/lmer default; use METHOD=ML for LRTs of fixed effects.
  • TYPE=UN is an unstructured \(G\) (correlated intercept and slope); TYPE=VC makes them independent.

Quick Reference: ML vs REML

Goal Method
Compare fixed effects Refit both models with ML; LRT/AIC on ML
Choose random/residual structure Fit with REML; compare REML AIC/LRT

Side Note: lme vs lmer

Package Strengths
nlme::lme Residual correlation structures and heteroscedastic weights; no built-in Satterthwaite
lme4::lmer Fast, robust; pair with lmerTest for df; no built-in residual AR(1)/varIdent

Choose based on needs; results for fixed effects are often similar.

Worked Example: ML vs REML

fit_full_ML <- lme(y ~ group * time_f, random = ~ time_num | id, data = tlc, method = "ML", control = ctrl)
fit_noint_ML <- lme(y ~ group + time_f, random = ~ time_num | id, data = tlc, method = "ML", control = ctrl)
anova(fit_noint_ML, fit_full_ML)

Worked Example: ML vs REML

             Model df      AIC      BIC    logLik   Test  L.Ratio p-value
fit_noint_ML     1  9 2578.232 2614.155 -1280.116                        
fit_full_ML      2 12 2481.167 2529.064 -1228.583 1 vs 2 103.0653  <.0001

Subject-Specific Predictions Table

sub_ids <- head(levels(tlc$id), 6)
new6 <- tlc %>% distinct(id, group) %>% filter(id %in% sub_ids) %>%
  mutate(time_num = 6, time_f = factor("6", levels = c("0","1","4","6")))
data.frame(
  id = sub_ids,
  pop = predict(lme_rifs, newdata = new6, level = 0),
  ss  = predict(lme_rifs, newdata = new6, level = 1)
)

Subject-Specific Predictions Table

  id    pop       ss
1  1 23.646 25.63256
2  2 20.762 22.22903
3  3 20.762 24.51969
4  4 23.646 22.34439
5  5 20.762 10.05191
6  6 20.762 11.78730

Model Criticism: Overdispersion/Heavy Tails

If residuals or random effects are heavy-tailed, consider:

  • Transformations, variance functions, robust LMMs (e.g., t-distributed random effects/errors)
  • Check sensitivity to outliers

Time as Factor vs Continuous: Guidance

Approach Best For
Factor Flexible at planned visits; good for EMMs at specific times
Continuous Interpretable global slope; good with many time points or mistiming
Combined Factor in fixed, continuous in random (RS) is often optimal

Growth Curves vs Spline LMMs

  • Polynomial growth curves are easy but can oscillate
  • Natural cubic splines are more stable
  • Choose df by AIC/CV and subject-matter considerations

TVCs and Lagged Effects

  • Incorporate lagged covariates if plausible causal ordering
  • Beware of post-treatment covariates when estimating treatment effects

Beyond Gaussian LMMs

  • GLMMs for non-Gaussian outcomes (binary, counts)
  • Similar principles; estimation via Laplace/AGQ
  • Not covered in this chapter

Cheat-Sheet: Model Choice

Question Solution
Slopes differ across subjects? Add RS
Residual SD varies by visit? Add varIdent
Leftover serial correlation? Add AR(1)
Need predictions per subject? Use Mixed models
Only marginal contrasts? Consider GEE as sensitivity

Cheat-Sheet: Testing

Test Type Method
Fixed effects ML for model comparison; large-sample Wald z / LRT (FLW). KR/Satterthwaite df is a modern add-on
Random effects REML LRT vs the 50:50 mixture (FLW Appendix Table C.1; use \(\alpha = 0.10\) for \(q\)-vs-\(q+k\), \(k>1\)). Bootstrap is beyond FLW

Do not mix ML and REML across compared models.

What to Include in Your Report

  • Data structure and missingness; time coding
  • Final mean, random, residual structures; estimation method
  • Key estimates (fixed) with CIs; variance components
  • Diagnostics; sensitivity; clear interpretations

Recipe Card: Fitting Mixed-Effects Models

Step-by-Step Workflow for LMM Analysis

Step 1: Data Preparation

# Ensure data is in long format with:
# - Subject ID as factor
# - Time as both factor (for rich mean) and numeric (for slopes)
dat <- dat %>%
  mutate(id = factor(id),
         time_f = factor(time),
         time_num = as.numeric(time),
         time_c = time_num - mean(time_num))  # centered

Step 2: Fit Candidate Models (REML)

# Start with random intercept
fit_ri <- lme(y ~ group * time_f, random = ~ 1 | id,
              data = dat, method = "REML")

# Add random slope
fit_rs <- lme(y ~ group * time_f, random = ~ time_num | id,
              data = dat, method = "REML")

# Add heterogeneous residual variances
fit_rs_het <- update(fit_rs, weights = varIdent(~ 1 | time_f))

Step 3: Compare Random Structures (REML AIC)

AIC(fit_ri, fit_rs, fit_rs_het)  # Lower is better

Step 4: Check Diagnostics

plot(fit_rs)                    # Residuals vs fitted
qqnorm(resid(fit_rs, type = "normalized"))
ACF(fit_rs)                     # Check for leftover correlation
qqnorm(ranef(fit_rs)[,1])       # BLUPs: spot unusual subjects, NOT a normality test (FLW p.273)

Step 5: Test Fixed Effects (refit with ML)

fit_full_ML <- update(fit_rs, method = "ML")
fit_reduced_ML <- lme(y ~ group + time_f, random = ~ time_num | id,
                      data = dat, method = "ML")
anova(fit_reduced_ML, fit_full_ML)  # LRT for interaction

Step 6: Report Final Model

summary(fit_rs)$tTable           # Fixed effects
VarCorr(fit_rs)                  # Variance components
emmeans(fit_rs, ~ group | time_f) %>% pairs()  # Contrasts

Summary: Big Ideas

  1. Mixed-effects = flexible, interpretable way to model individual trajectories and covariance
  2. Shrinkage improves estimation for sparse/noisy subjects
  3. Careful model comparison (ML vs REML) and boundary-aware tests are crucial

Further Reading and For Next Time

  • FLW Ch. 8: the Case Studies (8.8), the Computing section on PROC MIXED (8.9), and Further Reading (8.10).
  • Laird & Ware (1982), the original two-stage random-effects formulation behind 8.4.
  • Next: FLW Ch. 9, Fixed Effects versus Random Effects Models and the Hausman test (Lecture 9).

Note

A note on the data. TLC is our course running example (carried over from Ch. 5-7 for continuity), not a Ch. 8 dataset. FLW Ch. 8’s own case studies (Section 8.8) are the Six Cities log(FEV1) data, fit with hybrid random-effects + serial-correlation models, and the ACTG 193A log(CD4) counts, fit with a linear-spline LME whose empirical-BLUP trajectories appear in Fig. 8.10.