BIOS 667 - Lecture 12: Marginal Models (Ch. 12)

Fitzmaurice, Laird & Ware (2011) - Applied Longitudinal Data Analysis

Naim Rashid

Lecture Objectives

  • Understand why standard GLMs fail for longitudinal data
  • Define marginal models and when to use them
  • Distinguish marginal vs mixed model interpretations of \(\beta\)
  • Describe the three-part marginal model: mean, variance, association
  • Preview GEE estimation (Ch. 13) and apply it to a real Case Study

Prerequisites: What You Should Recall

Recall before we start

  • L11 (Ch. 11): the GLM building blocks: a link \(g(\mu)\), a variance function \(v(\mu)\), and a dispersion \(\phi\). Logistic, Poisson, and linear models are all special cases.
  • L07/L08 (Ch. 7-8): modeling within-subject correlation with a working/covariance structure (we used gls() with corAR1/corCompSymm). Same idea returns here as the working correlation.
  • L05 (Ch. 5): population-average mean trajectories by group.

Today we glue these together: a GLM mean model plus a within-subject association structure, fit so that inference stays valid.

Roadmap

  1. Why GLMs fail for correlated data
  2. Marginal model: mean | variance | association
  3. Outcome-type specifications (continuous, count, binary, ordinal)
  4. Case Study: epilepsy seizure counts (Poisson GEE)
  5. Why full likelihood is hard for discrete data
  6. GEE preview, missing-data caveat, practical decisions

Notation reference

Symbol Meaning
\(Y_{ij}\) response for subject \(i\) at occasion \(j\) (scalar)
\(\mathbf{Y}_i\) response vector for subject \(i\) \((n_i \times 1)\)
\(\mathbf{X}_i,\ \boldsymbol\beta\) fixed-effect design \(n_i \times p\) and coefficients
\(\mathbf{Z}_i,\ \mathbf{b}_i\) random-effect design and random effects, \(\mathbf{b}_i \sim N(0, G)\)
\(G\) between-subject random-effects covariance (FLW’s letter, Ch. 8)
\(R_i\) within-subject residual covariance, \(\boldsymbol\varepsilon_i \sim N(0, R_i)\)
\(\Sigma_i = \mathbf{Z}_i G \mathbf{Z}_i^\top + R_i\) marginal (total) covariance \(\operatorname{Cov}(\mathbf{Y}_i)\) for subject \(i\)
\(\rho\) correlation parameter (AR(1), exchangeable)

Indices: \(i\) subject \(1,\dots,N\); \(j\) occasion/time \(1,\dots,n_i\) (\(n\) when balanced); \(k\) state (transition models); \(g\) group. Bold = vector or matrix.

Notation (this lecture): GEE working covariance

This is the deck that introduces the GEE working-covariance notation. Reserved symbols (\(R_i\) = within-subject residual covariance) are unchanged.

Symbol Meaning
\(\pi_{ij}\) binary response probability for subject \(i\) at occasion \(j\)
\(\phi\) dispersion (scale) parameter
\(\alpha\) working-correlation parameter (e.g. exchangeable \(\rho\), AR(1) decay)
\(\operatorname{Corr}_i(\alpha)\) working correlation matrix \((n_i \times n_i)\)
\(A_i\) diagonal matrix of marginal variances \(\phi\, v(\mu_{ij})\)
\(V_i(\alpha) = A_i^{1/2}\,\operatorname{Corr}_i(\alpha)\,A_i^{1/2}\) GEE working covariance of \(\mathbf{Y}_i\)

Warning

We write the working structure as \(V_i(\alpha)\), and we do not reuse the symbol \(R_i\) for it: \(R_i\) is reserved for the residual covariance of the Gaussian LME. The ordinal cumulative-logit thresholds are written \(\kappa_k\) to avoid colliding with the working-correlation \(\alpha\).

Motivation

Repeated measures within a subject are correlated.

A naive GLM assumes independence, so its inference is invalid: the standard errors are incorrect (FLW pp. 43-44).

Note

For positively correlated within-subject data, ignoring correlation typically makes SEs too small for within-subject (time-varying) effects, inflating Type I error. It can make SEs for between-subject effects too large. Either way, the inference is not trustworthy.

For discrete data, how we model correlation also changes the meaning of \(\beta\).

Two Model Families (FLW Ch. 12)

FLW Ch. 12 contrasts two families: marginal (population-average) vs mixed (subject-specific).

Family Target \(\beta\) Interpretation
Marginal Population mean Population-average effect
Mixed Individual trajectory Subject-specific (given \(b_i\))
Transition (preview) Conditional on past \(Y\) Effect given previous outcomes

Note

Transition models are a third family but are not part of FLW Ch. 12. Shown greyed as a forward pointer (transition-models lecture / FLW transition chapter).

Marginal Model = Mean + Variance + Association

Mean:

\[E(Y_{ij}\mid X_{ij})=\mu_{ij}, \quad g(\mu_{ij})=X_{ij}\beta\]

Variance:

\[\operatorname{Var}(Y_{ij}\mid X_{ij})=\phi\,v(\mu_{ij})\]

Association:

Within-subject dependence via the working covariance \(V_i(\alpha)\) (exchangeable, AR(1), etc.).

Key Insight: Working Correlation

Changing \(V_i(\alpha)\) affects precision, not the interpretation of \(\beta\).

This is the defining feature of marginal models: \(\beta\) targets the same population-average quantity regardless of the working correlation, and robust (sandwich) SEs stay valid even if \(V_i(\alpha)\) is misspecified.

Check Your Understanding: Marginal Model Foundations

  1. What is the key difference between marginal and mixed model interpretations of \(\beta\)?

  2. Why does ignoring within-subject correlation give invalid inference for longitudinal data?

  3. If you change the working correlation structure in a marginal model, what changes and what stays the same?

Answers:

  1. Marginal models give population-average effects: \(\beta\) describes how the average response in the population changes with covariates. Mixed models give subject-specific effects: \(\beta\) describes how an individual’s response changes, conditional on their random effects.

  2. A naive GLM assumes observations are independent. When within-subject observations are positively correlated, the effective sample size is smaller than the nominal one. Ignoring this distorts the standard errors (typically too small for within-subject effects, sometimes too large for between-subject effects), so the inference is invalid.

  3. The coefficient estimates (\(\hat{\beta}\)) remain stable because they target the same population-average quantity. The standard errors change because different working correlations have different efficiency. Robust (sandwich) SEs protect against misspecification of \(V_i(\alpha)\).

12.2 Notation and Data Structure

  • \(N\) subjects; subject \(i\) has \(n_i\) measurements at times \(t_{ij}\)
  • Stack responses: \(\mathbf{Y}_i=(Y_{i1},\dots,Y_{in_i})^\top\)
  • Across-subject independence
  • Within-subject dependence

Covariates

\(X_{ij}\) is the \(1\times p\) covariate row; stack the rows to \(X_i\) (\(n_i\times p\))

Type Examples
Time-invariant Treatment, sex (between-subject)
Time-varying Age, exposure (within-subject)

Discrete Outcomes: Correlation Is Constrained

For binary outcomes the feasible correlation depends on the marginal means. With \(p_1=E(Y_1)=0.2\) and \(p_2=E(Y_2)=0.8\):

The largest possible \(\Pr(Y_1=1, Y_2=1)\) is \(\min(p_1,p_2)=0.2\) (the Frechet bound):

\(Y_2=0\) \(Y_2=1\) margin
\(Y_1=0\) \(0.2\) \(0.6\) \(0.8\)
\(Y_1=1\) \(0.0\) \(0.2\) \(0.2\)
margin \(0.2\) \(0.8\)

At that maximum, the correlation is \[\rho_{12}^{\max}=\sqrt{\frac{p_1(1-p_2)}{p_2(1-p_1)}}=\sqrt{\frac{0.2\cdot 0.2}{0.8\cdot 0.8}}=\sqrt{\tfrac{0.04}{0.64}}=0.25.\]

So \(\rho_{12}\le 0.25\): the means alone cap the correlation.

Discrete Outcomes: Odds Ratios Are More Natural

For a pair of binary outcomes, define the four joint cell probabilities:

\(Y_k=0\) \(Y_k=1\)
\(Y_j=0\) \(P_{00}\) \(P_{01}\)
\(Y_j=1\) \(P_{10}\) \(P_{11}\)

The pairwise log odds ratio uses these cells directly:

\[ \theta_{jk}=\log\frac{P_{11}\,P_{00}}{P_{10}\,P_{01}} \]

Unlike the correlation, the odds ratio is not mechanically bounded by the marginal means, so it is the more natural association measure for binary data.

Master Simulated Dataset: continuous outcome

A small simulated dataset to illustrate the long format and the continuous fit. The continuous outcome carries subject random effects (\(b_0, b_1\)), so it is genuinely correlated within subject.

suppressPackageStartupMessages({
  library(dplyr); library(tidyr); library(tibble)
  library(ggplot2); library(nlme); library(broom); library(geepack)
})

set.seed(667)
N <- 120
n_i  <- sample(3:6, N, replace = TRUE)
id   <- rep(1:N, times = n_i)
time <- unlist(lapply(n_i, function(n) 0:(n - 1)))
trt  <- rep(rbinom(N, 1, 0.5), times = n_i)

# subject random effects induce within-subject correlation
b0 <- rep(rnorm(N, 0, 1.2), times = n_i)
b1 <- rep(rnorm(N, 0, 0.4), times = n_i)
y  <- 10 + 0.6 * time + 1.0 * trt + 0.5 * (trt * time) +
      b0 + b1 * time + rnorm(length(id), 0, 1)

Master Simulated Dataset: discrete outcomes

# Discrete outcomes generated INDEPENDENT within subject (illustration only)
lp_bin  <- -1 + 0.25 * time + 0.7 * trt
y_bin   <- rbinom(length(id), 1, plogis(lp_bin))

mu_cnt  <- exp(1.2 + 0.15 * time + 0.3 * trt)
y_count <- rpois(length(id), mu_cnt)

cuts <- quantile(y, probs = c(.25, .5, .75))
ordinal_y <- cut(y, breaks = c(-Inf, cuts, Inf),
                 labels = c("L", "M", "H", "VH"), ordered_result = TRUE)

dat <- tibble(
  id   = factor(id),
  time = as.integer(time),
  trt  = factor(trt, levels = c(0, 1), labels = c("Ctl", "Tx")),
  y = y, y_bin = as.integer(y_bin),
  y_count = as.integer(y_count), ordinal_y = ordered(ordinal_y)
)

Warning

y_bin and y_count are generated independent within subject (no random effect), so they understate the correlation the lecture motivates. They illustrate the specification only. The real correlated discrete analysis is the epilepsy Case Study below.

Long Format: One Subject’s Rows

head(dat, 8)

Long Format: One Subject’s Rows

# A tibble: 8 × 7
  id     time trt       y y_bin y_count ordinal_y
  <fct> <int> <fct> <dbl> <int>   <int> <ord>    
1 1         0 Ctl   11.5      1       3 M        
2 1         1 Ctl   12.4      1       5 H        
3 1         2 Ctl   13.0      1       2 H        
4 2         0 Tx    10.8      0       2 M        
5 2         1 Tx    12.7      0       3 H        
6 2         2 Tx    13.2      0       8 H        
7 2         3 Tx    16.0      0       6 VH       
8 3         0 Tx     8.77     0       7 L        

Visualizing Imbalance

ggplot(dat, aes(time, y, group = id, color = trt)) +
  geom_line(alpha = 0.15) +
  stat_summary(aes(group = trt), fun = mean, geom = "line", linewidth = 1.2) +
  labs(title = "Unbalanced Trajectories by Treatment (continuous y)",
       y = "Continuous outcome y") +
  theme_minimal()

Visualizing Imbalance

Implicit Mean Assumption

The marginal mean depends only on the current covariate:

\[ E(Y_{ij}\mid X_{i1},\dots,X_{in_i})=E(Y_{ij}\mid X_{ij}) \]

This holds for design-fixed covariates (fixed by the study design, e.g. randomized treatment, planned visit time).

   time ->        X_1  ->  Y_1     X_2  ->  Y_2     X_3  ->  Y_3
                            |                |
   FORBIDDEN feedback:      +----> X_2        +----> X_3   (Y_(j-1) -> X_j)

It fails when a future covariate responds to a past outcome (feedback), e.g. a clinician raising the dose after a poor prior response.

Outcome-Type Specifications (one table)

All four use the same three-part recipe; only the link and variance function change.

Outcome Link \(g(\mu)\) Variance \(v(\mu)\) \(\beta\) interpretation
Continuous identity \(\mu\) \(\phi\) (constant) population-avg mean difference / slope
Count log \(\log\mu\) \(\phi\,\mu\) population-avg log rate ratio
Binary logit \(\log\frac{\mu}{1-\mu}\) \(\mu(1-\mu)\) population-avg log odds ratio
Ordinal cumulative logit (multinomial) population-avg log cumulative-odds ratio

Association in every row: a working covariance \(V_i(\alpha)\) (exchangeable, AR(1), unstructured, …).

Ordinal Specification (Proportional Odds)

Genuinely new structure: cumulative dichotomizations with shared slopes.

\[\Pr(Y_{ij}\le k\mid X_{ij})=F_{ijk}, \qquad \log\frac{F_{ijk}}{1-F_{ijk}}=\kappa_k - X_{ij}\beta\]

  • \(\kappa_k\): category-specific thresholds (cut-points), \(\kappa_1<\dots<\kappa_{K-1}\) (FLW writes these \(\alpha_k\); we use \(\kappa_k\) to keep \(\alpha\) for the GEE working correlation)
  • \(\beta\): population-average log cumulative-odds ratios, shared across cut-points (the proportional-odds assumption); the minus-sign form matches L11 and R’s polr, so a positive \(\beta\) pushes toward higher categories

Warning

A proper marginal ordinal fit needs GEE for cumulative logits (e.g. multgee::ordLORgee, repolr) to account for within-subject clustering. An ordinary MASS::polr fit assumes independence and gives the wrong SEs, so we defer the worked ordinal fit to L13. Here we present only the specification.

GEE bridge: how to read geeglm (estimation is L13)

Reading a geeglm call before we derive GEE

The estimation machinery (estimating equations, the sandwich variance) is next lecture (L13). For now, read a geepack::geeglm fit as a black box:

  • id = is the clustering variable (which rows belong to the same subject)
  • corstr = is the working correlation ("exchangeable", "ar1", "independence", …)
  • the printed Std.err IS the robust / sandwich SE (valid even if corstr is misspecified)

Continuous Outcome: Fit (gls)

dat_cont <- dat |> dplyr::select(id, time, trt, y)

fit_cont <- gls(
  y ~ time * trt,
  data = dat_cont,
  correlation = corAR1(form = ~ time | id),
  method = "REML"
)

coef(summary(fit_cont))

Continuous Outcome: Fit (gls)

                Value  Std.Error   t-value       p-value
(Intercept) 9.8480676 0.25177579 39.114434 6.794428e-159
time        0.6358133 0.08095063  7.854335  2.221246e-14
trtTx       1.0093787 0.35894546  2.812067  5.103470e-03
time:trtTx  0.6150262 0.11373305  5.407630  9.648125e-08

Continuous: Plain-Language Interpretation

Term Estimate Interpretation
(Intercept) ~10 Mean outcome at time 0 for control group
time ~0.6 Control group increases ~0.6 units per period
trtTx ~1.0 At time 0, treatment group ~1.0 units higher
time:trtTx ~0.5 Treatment group increases ~0.5 units/time faster

In a sentence: both groups increase over time, but the treatment group starts higher and increases faster, so the separation grows.

Case Study: Epilepsy Seizure Counts

The data (FLW Ch. 12-14 canonical discrete example)

A placebo-controlled trial of the anti-epileptic drug progabide in 59 patients with partial seizures (Thall & Vail 1990; FLW Ch. 12-14). Seizure counts were recorded over four successive 2-week post-randomization intervals, plus an 8-week baseline count.

  • Outcome: seizures (count) at visits 1-4
  • Covariates: trt (placebo / progabide), log baseline rate, visit
  • Question: does progabide reduce the seizure rate vs placebo?

Case Study: Load and Reshape (wide -> long)

epi_url  <- "https://content.sph.harvard.edu/fitzmaur/ala2e/epilepsy.txt"
epi_file <- "../../data/epilepsy.txt"
if (!file.exists(epi_file)) {
  dir.create(dirname(epi_file), showWarnings = FALSE, recursive = TRUE)
  download.file(epi_url, epi_file)
}

# FLW .txt has a comment header; data rows start at the first "<int> <int> ..."
raw <- readLines(epi_file)
data_start <- grep("^\\s*[0-9]+\\s+[0-9]", raw)[1]
epi_wide <- read.table(text = raw[data_start:length(raw)])
names(epi_wide) <- c("id", "trt", "age", "baseline", "c1", "c2", "c3", "c4")

epi <- epi_wide |>
  tidyr::pivot_longer(c(c1, c2, c3, c4),
                      names_to = "visit", values_to = "seizures") |>
  dplyr::mutate(
    visit = as.integer(gsub("c", "", visit)),
    trt   = factor(trt, levels = c(0, 1),
                   labels = c("Placebo", "Progabide")),
    lbase = log(baseline / 4)   # 8-wk baseline -> per-2-wk rate scale
  )

head(epi, 6)

Case Study: Load and Reshape (wide -> long)

# A tibble: 6 × 7
     id trt       age baseline visit seizures lbase
  <int> <fct>   <int>    <int> <int>    <int> <dbl>
1     1 Placebo    31       11     1        5  1.01
2     1 Placebo    31       11     2        3  1.01
3     1 Placebo    31       11     3        3  1.01
4     1 Placebo    31       11     4        3  1.01
5     2 Placebo    30       11     1        3  1.01
6     2 Placebo    30       11     2        5  1.01

Case Study: Marginal Poisson GEE

fit_epi <- geeglm(
  seizures ~ trt + lbase,
  id      = id,
  data    = epi,
  family  = poisson(link = "log"),
  corstr  = "exchangeable"
)

summary(fit_epi)

Case Study: Marginal Poisson GEE


Call:
geeglm(formula = seizures ~ trt + lbase, family = poisson(link = "log"), 
    data = epi, id = id, corstr = "exchangeable")

 Coefficients:
             Estimate Std.err   Wald Pr(>|W|)    
(Intercept)   -0.3042  0.3651  0.694    0.405    
trtProgabide  -0.1065  0.1940  0.301    0.583    
lbase          1.1744  0.1487 62.390 2.78e-15 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = exchangeable 
Estimated Scale Parameters:

            Estimate Std.err
(Intercept)     4.82   1.155
  Link = identity 

Estimated Correlation Parameters:
      Estimate Std.err
alpha   0.4051 0.07368
Number of clusters:   59  Maximum cluster size: 4 

Case Study: Rate Ratios and Interpretation

est <- coef(summary(fit_epi))
rr  <- exp(cbind(RR = est[, "Estimate"],
                 Lower = est[, "Estimate"] - 1.96 * est[, "Std.err"],
                 Upper = est[, "Estimate"] + 1.96 * est[, "Std.err"]))
round(rr, 3)

Interpretation (rate ratios, population-average):

  • Progabide vs placebo: \(\widehat{RR}\approx 0.90\) (95% CI crosses 1) - no significant reduction in seizure rate once baseline is accounted for.
  • Log baseline: \(\widehat{RR}\approx 3.2\) per unit - the dominant predictor; higher baseline rate strongly predicts more post-randomization seizures.

Case Study: Rate Ratios and Interpretation

        RR Lower Upper
[1,] 0.738 0.361 1.509
[2,] 0.899 0.615 1.315
[3,] 3.236 2.418 4.331

Case Study: Model-Based vs Robust SEs

# Naive Poisson GLM ignores clustering: model-based SEs
fit_naive <- glm(seizures ~ trt + lbase, data = epi,
                 family = poisson(link = "log"))

data.frame(
  term       = c("trtProgabide", "lbase"),
  naive_SE   = summary(fit_naive)$coef[c("trtProgabide", "lbase"), "Std. Error"],
  robust_SE  = coef(summary(fit_epi))[c("trtProgabide", "lbase"), "Std.err"]
) |> dplyr::mutate(ratio = round(robust_SE / naive_SE, 2))

Warning

The naive GLM SE for treatment is roughly one-quarter of the GEE robust SE: ignoring the within-subject correlation gives invalid (overconfident) inference. The point estimate barely moves; only the SEs change.

Case Study: Model-Based vs Robust SEs

                     term naive_SE robust_SE ratio
trtProgabide trtProgabide  0.04527    0.1940  4.29
lbase               lbase  0.03084    0.1487  4.82

The Same Model in SAS

Static reference (not run here): the marginal Poisson GEE in PROC GENMOD with a REPEATED statement.

proc genmod data=epi;
   class id trt;
   model seizures = trt lbase / dist=poisson link=log;
   repeated subject=id / type=exch corrw;
run;

Note

repeated subject=id is SAS’s id=; type=exch is corstr="exchangeable". corrw prints the working correlation. SAS reports empirical (robust) standard errors by default, the same sandwich SE as geeglm’s Std.err.

Why Full Likelihood Is Hard

needed_terms <- function(m) 2^m - m - 1
dfc <- data.frame(m = 4:12, extra_assoc = sapply(4:12, needed_terms))

ggplot(dfc, aes(m, extra_assoc)) +
  geom_point() + geom_line() +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Explosion of Association Parameters",
       x = "Number of repeats (m)",
       y = "Parameters beyond means (2^m - m - 1)") +
  theme_minimal()

Why Full Likelihood Is Hard

Full Likelihood vs Marginal + GEE

Approach Challenge
Full likelihood Needs 2-, 3-, … way associations (infeasible)
Marginal + GEE Specify mean, variance, working \(V_i(\alpha)\); robust SEs

GEE wins for practicality and interpretability.

Check Your Understanding: Specifications and the Case Study

  1. For a count outcome with a log link, what does \(\beta\) represent in a marginal model?

  2. In the epilepsy Case Study, the naive and robust SEs differed by ~4x but \(\hat\beta\) barely moved. Why?

  3. Why did we present the ordinal model but not fit it with polr?

Answers:

  1. A population-average log rate ratio: \(\exp(\beta)\) is the multiplicative change in the average event rate per unit covariate, across the population.

  2. The point estimate is consistent regardless of the working correlation, so it is stable. The naive GLM assumes independence and badly underestimates the SE because the four visits per patient are strongly correlated; the GEE robust SE corrects this. Association affects precision, not the estimate.

  3. polr assumes independence and ignores the within-subject clustering, giving the wrong (overconfident) SEs - the exact error this lecture warns against. A proper marginal ordinal fit needs GEE for cumulative logits (L13).

Practical Decisions: Model Selection

Question Type Best Model
Population-average effects Marginal (GEE)
Subject-specific trajectories Mixed effects
Dependence on past \(Y\) (preview) Transition model

Note

The transition row is a forward pointer (not FLW Ch. 12); transition models are covered in their own lecture / FLW transition chapter.

Looking Ahead (beyond Ch. 12): Missing Data

Missing-data caveat (covered in L17, FLW Ch. 17-18)

  • Standard GEE requires data MCAR (missing completely at random) for valid inference.
  • IPW-GEE (inverse-probability-weighted GEE; Robins, Rotnitzky & Zhao 1995) extends validity to MAR by weighting observations by the inverse probability of being observed and modeling the dropout process.
  • The MCAR / MAR definitions and the IPW weighting mechanics are developed in L17 (FLW Ch. 17-18). FLW Ch. 12 (p. 352) explicitly defers missing-data issues to those chapters.

Common Mistakes in Marginal Models

Mistake Why It Is Wrong Correct Approach
Interpreting \(\beta\) as subject-specific Marginal \(\beta\) is population-average Use mixed models for subject-specific effects
Ignoring within-subject correlation Gives invalid inference / incorrect SEs Account for it via GEE or mixed models
Using standard GEE with MAR dropout Standard GEE assumes MCAR (L17) IPW-GEE or likelihood-based methods
Treating all outcome types the same Binary/ordinal have constrained associations Use the appropriate link and variance functions
Confusing efficiency with consistency Wrong \(V_i(\alpha)\) loses efficiency, not consistency Choose sensible \(V_i(\alpha)\); robust SEs protect
Assuming marginal \(=\) conditional for non-linear links They differ for logistic/Poisson (marginal attenuated vs conditional; quantified in L14/L16, FLW Ch. 16) Be explicit about which target you mean

Key Takeaways

  • Marginal model = mean + variance + association (\(g\), \(v(\mu)\), \(V_i(\alpha)\))
  • \(\beta\) = population-average effect (log rate/odds ratio for count/binary), robust to the working correlation
  • The working correlation \(V_i(\alpha)\) affects precision, not interpretation; robust SEs stay valid
  • Case Study (epilepsy): progabide \(\widehat{RR}\approx 0.90\) (not significant); robust SEs ~4x the naive ones
  • For discrete data, full joint likelihoods are intractable; use GEE (L13)
  • Standard GEE assumes MCAR; missing-data extensions in L17

Further Reading

  • FLW (2011) Ch. 12, “Marginal Models: Introduction and Overview” (pp. 341-352), and its Further Reading section.
  • Liang, K.-Y. & Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13-22. (The original GEE paper; estimation in L13.)
  • Robins, J.M., Rotnitzky, A. & Zhao, L.P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. JASA 90, 106-121. (IPW-GEE; missing data in L17.)
  • Thall, P.F. & Vail, S.C. (1990). Some covariance models for longitudinal count data with overdispersion. Biometrics 46, 657-671. (The epilepsy data.)

Next lecture (L13): GEE estimation - the estimating equations, the sandwich variance, and choosing the working correlation.