Fitzmaurice, Laird & Ware (2011) - Applied Longitudinal Data Analysis
Naim Rashid
Lecture Objectives
Understand why standard GLMs fail for longitudinal data
Define marginal models and when to use them
Distinguish marginal vs mixed model interpretations of \(\beta\)
Describe the three-part marginal model: mean, variance, association
Preview GEE estimation (Ch. 13) and apply it to a real Case Study
Prerequisites: What You Should Recall
Recall before we start
L11 (Ch. 11): the GLM building blocks: a link\(g(\mu)\), a variance function\(v(\mu)\), and a dispersion\(\phi\). Logistic, Poisson, and linear models are all special cases.
L07/L08 (Ch. 7-8): modeling within-subject correlation with a working/covariance structure (we used gls() with corAR1/corCompSymm). Same idea returns here as the working correlation.
L05 (Ch. 5): population-average mean trajectories by group.
Today we glue these together: a GLM mean model plus a within-subject association structure, fit so that inference stays valid.
We write the working structure as \(V_i(\alpha)\), and we do not reuse the symbol \(R_i\) for it: \(R_i\) is reserved for the residual covariance of the Gaussian LME. The ordinal cumulative-logit thresholds are written \(\kappa_k\) to avoid colliding with the working-correlation \(\alpha\).
Motivation
Repeated measures within a subject are correlated.
A naive GLM assumes independence, so its inference is invalid: the standard errors are incorrect (FLW pp. 43-44).
Note
For positively correlated within-subject data, ignoring correlation typically makes SEs too small for within-subject (time-varying) effects, inflating Type I error. It can make SEs for between-subject effects too large. Either way, the inference is not trustworthy.
For discrete data, how we model correlation also changes the meaning of \(\beta\).
Two Model Families (FLW Ch. 12)
FLW Ch. 12 contrasts two families: marginal (population-average) vs mixed (subject-specific).
Family
Target
\(\beta\) Interpretation
Marginal
Population mean
Population-average effect
Mixed
Individual trajectory
Subject-specific (given \(b_i\))
Transition (preview)
Conditional on past \(Y\)
Effect given previous outcomes
Note
Transition models are a third family but are not part of FLW Ch. 12. Shown greyed as a forward pointer (transition-models lecture / FLW transition chapter).
Within-subject dependence via the working covariance \(V_i(\alpha)\) (exchangeable, AR(1), etc.).
Key Insight: Working Correlation
Changing \(V_i(\alpha)\) affects precision, not the interpretation of \(\beta\).
This is the defining feature of marginal models: \(\beta\) targets the same population-average quantity regardless of the working correlation, and robust (sandwich) SEs stay valid even if \(V_i(\alpha)\) is misspecified.
Check Your Understanding: Marginal Model Foundations
What is the key difference between marginal and mixed model interpretations of \(\beta\)?
Why does ignoring within-subject correlation give invalid inference for longitudinal data?
If you change the working correlation structure in a marginal model, what changes and what stays the same?
Answers:
Marginal models give population-average effects: \(\beta\) describes how the average response in the population changes with covariates. Mixed models give subject-specific effects: \(\beta\) describes how an individual’s response changes, conditional on their random effects.
A naive GLM assumes observations are independent. When within-subject observations are positively correlated, the effective sample size is smaller than the nominal one. Ignoring this distorts the standard errors (typically too small for within-subject effects, sometimes too large for between-subject effects), so the inference is invalid.
The coefficient estimates (\(\hat{\beta}\)) remain stable because they target the same population-average quantity. The standard errors change because different working correlations have different efficiency. Robust (sandwich) SEs protect against misspecification of \(V_i(\alpha)\).
12.2 Notation and Data Structure
\(N\) subjects; subject \(i\) has \(n_i\) measurements at times \(t_{ij}\)
\(X_{ij}\) is the \(1\times p\) covariate row; stack the rows to \(X_i\) (\(n_i\times p\))
Type
Examples
Time-invariant
Treatment, sex (between-subject)
Time-varying
Age, exposure (within-subject)
Discrete Outcomes: Correlation Is Constrained
For binary outcomes the feasible correlation depends on the marginal means. With \(p_1=E(Y_1)=0.2\) and \(p_2=E(Y_2)=0.8\):
The largest possible \(\Pr(Y_1=1, Y_2=1)\) is \(\min(p_1,p_2)=0.2\) (the Frechet bound):
\(Y_2=0\)
\(Y_2=1\)
margin
\(Y_1=0\)
\(0.2\)
\(0.6\)
\(0.8\)
\(Y_1=1\)
\(0.0\)
\(0.2\)
\(0.2\)
margin
\(0.2\)
\(0.8\)
At that maximum, the correlation is \[\rho_{12}^{\max}=\sqrt{\frac{p_1(1-p_2)}{p_2(1-p_1)}}=\sqrt{\frac{0.2\cdot 0.2}{0.8\cdot 0.8}}=\sqrt{\tfrac{0.04}{0.64}}=0.25.\]
So \(\rho_{12}\le 0.25\): the means alone cap the correlation.
Discrete Outcomes: Odds Ratios Are More Natural
For a pair of binary outcomes, define the four joint cell probabilities:
\(Y_k=0\)
\(Y_k=1\)
\(Y_j=0\)
\(P_{00}\)
\(P_{01}\)
\(Y_j=1\)
\(P_{10}\)
\(P_{11}\)
The pairwise log odds ratio uses these cells directly:
Unlike the correlation, the odds ratio is not mechanically bounded by the marginal means, so it is the more natural association measure for binary data.
Master Simulated Dataset: continuous outcome
A small simulated dataset to illustrate the long format and the continuous fit. The continuous outcome carries subject random effects (\(b_0, b_1\)), so it is genuinely correlated within subject.
suppressPackageStartupMessages({library(dplyr); library(tidyr); library(tibble)library(ggplot2); library(nlme); library(broom); library(geepack)})set.seed(667)N <-120n_i <-sample(3:6, N, replace =TRUE)id <-rep(1:N, times = n_i)time <-unlist(lapply(n_i, function(n) 0:(n -1)))trt <-rep(rbinom(N, 1, 0.5), times = n_i)# subject random effects induce within-subject correlationb0 <-rep(rnorm(N, 0, 1.2), times = n_i)b1 <-rep(rnorm(N, 0, 0.4), times = n_i)y <-10+0.6* time +1.0* trt +0.5* (trt * time) + b0 + b1 * time +rnorm(length(id), 0, 1)
y_bin and y_count are generated independent within subject (no random effect), so they understate the correlation the lecture motivates. They illustrate the specification only. The real correlated discrete analysis is the epilepsy Case Study below.
Long Format: One Subject’s Rows
head(dat, 8)
Long Format: One Subject’s Rows
# A tibble: 8 × 7
id time trt y y_bin y_count ordinal_y
<fct> <int> <fct> <dbl> <int> <int> <ord>
1 1 0 Ctl 11.5 1 3 M
2 1 1 Ctl 12.4 1 5 H
3 1 2 Ctl 13.0 1 2 H
4 2 0 Tx 10.8 0 2 M
5 2 1 Tx 12.7 0 3 H
6 2 2 Tx 13.2 0 8 H
7 2 3 Tx 16.0 0 6 VH
8 3 0 Tx 8.77 0 7 L
Visualizing Imbalance
ggplot(dat, aes(time, y, group = id, color = trt)) +geom_line(alpha =0.15) +stat_summary(aes(group = trt), fun = mean, geom ="line", linewidth =1.2) +labs(title ="Unbalanced Trajectories by Treatment (continuous y)",y ="Continuous outcome y") +theme_minimal()
Visualizing Imbalance
Implicit Mean Assumption
The marginal mean depends only on the current covariate:
\(\kappa_k\): category-specific thresholds (cut-points), \(\kappa_1<\dots<\kappa_{K-1}\) (FLW writes these \(\alpha_k\); we use \(\kappa_k\) to keep \(\alpha\) for the GEE working correlation)
\(\beta\): population-average log cumulative-odds ratios, shared across cut-points (the proportional-odds assumption); the minus-sign form matches L11 and R’s polr, so a positive \(\beta\) pushes toward higher categories
Warning
A proper marginal ordinal fit needs GEE for cumulative logits (e.g. multgee::ordLORgee, repolr) to account for within-subject clustering. An ordinary MASS::polr fit assumes independence and gives the wrong SEs, so we defer the worked ordinal fit to L13. Here we present only the specification.
GEE bridge: how to read geeglm (estimation is L13)
Reading a geeglm call before we derive GEE
The estimation machinery (estimating equations, the sandwich variance) is next lecture (L13). For now, read a geepack::geeglm fit as a black box:
id = is the clustering variable (which rows belong to the same subject)
corstr = is the working correlation ("exchangeable", "ar1", "independence", …)
the printed Std.err IS the robust / sandwich SE (valid even if corstr is misspecified)
Continuous Outcome: Fit (gls)
dat_cont <- dat |> dplyr::select(id, time, trt, y)fit_cont <-gls( y ~ time * trt,data = dat_cont,correlation =corAR1(form =~ time | id),method ="REML")coef(summary(fit_cont))
In a sentence: both groups increase over time, but the treatment group starts higher and increases faster, so the separation grows.
Case Study: Epilepsy Seizure Counts
The data (FLW Ch. 12-14 canonical discrete example)
A placebo-controlled trial of the anti-epileptic drug progabide in 59 patients with partial seizures (Thall & Vail 1990; FLW Ch. 12-14). Seizure counts were recorded over four successive 2-week post-randomization intervals, plus an 8-week baseline count.
The naive GLM SE for treatment is roughly one-quarter of the GEE robust SE: ignoring the within-subject correlation gives invalid (overconfident) inference. The point estimate barely moves; only the SEs change.
Case Study: Model-Based vs Robust SEs
term naive_SE robust_SE ratio
trtProgabide trtProgabide 0.04527 0.1940 4.29
lbase lbase 0.03084 0.1487 4.82
The Same Model in SAS
Static reference (not run here): the marginal Poisson GEE in PROC GENMOD with a REPEATED statement.
proc genmod data=epi;
class id trt;
model seizures = trt lbase / dist=poisson link=log;
repeated subject=id / type=exch corrw;
run;
Note
repeated subject=id is SAS’s id=; type=exch is corstr="exchangeable". corrw prints the working correlation. SAS reports empirical (robust) standard errors by default, the same sandwich SE as geeglm’s Std.err.
Why Full Likelihood Is Hard
needed_terms <-function(m) 2^m - m -1dfc <-data.frame(m =4:12, extra_assoc =sapply(4:12, needed_terms))ggplot(dfc, aes(m, extra_assoc)) +geom_point() +geom_line() +scale_y_continuous(labels = scales::comma) +labs(title ="Explosion of Association Parameters",x ="Number of repeats (m)",y ="Parameters beyond means (2^m - m - 1)") +theme_minimal()
Why Full Likelihood Is Hard
Full Likelihood vs Marginal + GEE
Approach
Challenge
Full likelihood
Needs 2-, 3-, … way associations (infeasible)
Marginal + GEE
Specify mean, variance, working \(V_i(\alpha)\); robust SEs
GEE wins for practicality and interpretability.
Check Your Understanding: Specifications and the Case Study
For a count outcome with a log link, what does \(\beta\) represent in a marginal model?
In the epilepsy Case Study, the naive and robust SEs differed by ~4x but \(\hat\beta\) barely moved. Why?
Why did we present the ordinal model but not fit it with polr?
Answers:
A population-average log rate ratio: \(\exp(\beta)\) is the multiplicative change in the average event rate per unit covariate, across the population.
The point estimate is consistent regardless of the working correlation, so it is stable. The naive GLM assumes independence and badly underestimates the SE because the four visits per patient are strongly correlated; the GEE robust SE corrects this. Association affects precision, not the estimate.
polr assumes independence and ignores the within-subject clustering, giving the wrong (overconfident) SEs - the exact error this lecture warns against. A proper marginal ordinal fit needs GEE for cumulative logits (L13).
Practical Decisions: Model Selection
Question Type
Best Model
Population-average effects
Marginal (GEE)
Subject-specific trajectories
Mixed effects
Dependence on past \(Y\) (preview)
Transition model
Note
The transition row is a forward pointer (not FLW Ch. 12); transition models are covered in their own lecture / FLW transition chapter.
Looking Ahead (beyond Ch. 12): Missing Data
Missing-data caveat (covered in L17, FLW Ch. 17-18)
Standard GEE requires data MCAR (missing completely at random) for valid inference.
IPW-GEE (inverse-probability-weighted GEE; Robins, Rotnitzky & Zhao 1995) extends validity to MAR by weighting observations by the inverse probability of being observed and modeling the dropout process.
The MCAR / MAR definitions and the IPW weighting mechanics are developed in L17 (FLW Ch. 17-18). FLW Ch. 12 (p. 352) explicitly defers missing-data issues to those chapters.
Common Mistakes in Marginal Models
Mistake
Why It Is Wrong
Correct Approach
Interpreting \(\beta\) as subject-specific
Marginal \(\beta\) is population-average
Use mixed models for subject-specific effects
Ignoring within-subject correlation
Gives invalid inference / incorrect SEs
Account for it via GEE or mixed models
Using standard GEE with MAR dropout
Standard GEE assumes MCAR (L17)
IPW-GEE or likelihood-based methods
Treating all outcome types the same
Binary/ordinal have constrained associations
Use the appropriate link and variance functions
Confusing efficiency with consistency
Wrong \(V_i(\alpha)\) loses efficiency, not consistency
Choose sensible \(V_i(\alpha)\); robust SEs protect
Assuming marginal \(=\) conditional for non-linear links
They differ for logistic/Poisson (marginal attenuated vs conditional; quantified in L14/L16, FLW Ch. 16)
Be explicit about which target you mean
Key Takeaways
Marginal model = mean + variance + association (\(g\), \(v(\mu)\), \(V_i(\alpha)\))
\(\beta\) = population-average effect (log rate/odds ratio for count/binary), robust to the working correlation
The working correlation \(V_i(\alpha)\) affects precision, not interpretation; robust SEs stay valid
Case Study (epilepsy): progabide \(\widehat{RR}\approx 0.90\) (not significant); robust SEs ~4x the naive ones
For discrete data, full joint likelihoods are intractable; use GEE (L13)
Standard GEE assumes MCAR; missing-data extensions in L17
Further Reading
FLW (2011) Ch. 12, “Marginal Models: Introduction and Overview” (pp. 341-352), and its Further Reading section.
Liang, K.-Y. & Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13-22. (The original GEE paper; estimation in L13.)
Robins, J.M., Rotnitzky, A. & Zhao, L.P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. JASA 90, 106-121. (IPW-GEE; missing data in L17.)
Thall, P.F. & Vail, S.C. (1990). Some covariance models for longitudinal count data with overdispersion. Biometrics 46, 657-671. (The epilepsy data.)
Next lecture (L13): GEE estimation - the estimating equations, the sandwich variance, and choosing the working correlation.