random-intercept variance (\(=g_{11}\), the single element of \(G\) with one random intercept)
\(h(\cdot)\)
inverse link \(h=g^{-1}\); for logistic \(h=\operatorname{expit}=\operatorname{logit}^{-1}=\texttt{plogis}\)
\(c\)
attenuation factor relating \(\boldsymbol\beta\) to \(\boldsymbol\beta^*\) (logistic random intercept)
\(\pi_{ij}\)
\(P(Y_{ij}=1)\), binary success probability (from L11)
\(\alpha\)
GEE working-correlation / log-OR association parameter (from L13)
Key Terminology
Term
Definition
Estimand
The population quantity we seek to estimate (e.g., population odds ratio)
Estimator
The statistical method/procedure that produces a number (e.g., GEE coefficient)
Estimate
The actual number obtained from applying the estimator to data
Non-collapsibility\(^\dagger\)
Marginal OR \(\neq\) conditional OR even without confounding (binary outcomes)
Attenuation
Reduction in marginal effect magnitude vs conditional effect due to averaging over random effects
\(^\dagger\) Beyond FLW Ch. 16 (the chapter frames the same gap purely as different targets of inference plus attenuation of \(\boldsymbol\beta^*\) relative to \(\boldsymbol\beta\)). “Non-collapsibility” is the standard term from Greenland, Robins & Pearl (1999) and Agresti (2002, Section 12.2, which FLW cites); we use it as a labeled aside.
Visual Decision Framework
Research Question
|
What is the target of inference?
/ \
Population Effect? Individual Effect?
| |
Marginal Model (GEE) Mixed Model (GLMM)
| |
Population OR/RR Subject-specific OR/RR
Key insight: Choose your estimand BEFORE choosing your software.
The Fundamental Distinction
Imagine 1000 patients with hypertension starting a new drug.
Question Type
Question
Model
Population-average
If we give this drug to all 1000 patients, what is the average reduction in systolic BP?
Marginal (GEE)
Subject-specific
For Patient #247 in my clinic, how much will her BP change?
Mixed (GLMM)
When Differences Are Largest
These are different questions requiring different models.
Divergence is greatest when:
Outcomes are non-linear (binary, count data)
There is high between-subject heterogeneity (\(\sigma_b^2\) is large)
Concrete Example: Smoking Cessation
A new smoking cessation program is being evaluated:
Model
Odds Ratio
Interpretation
GLMM
3.0
Individual’s odds of quitting triple
GEE
2.0
Population odds of quitting double
Why the difference? High variability in individual responses.
Concrete Example: Who Uses Which?
Stakeholder
Uses OR
Reason
FDA
2.0 (GEE)
“What’s the population benefit?”
Clinician
3.0 (GLMM)
“Will this help my patient?”
Both estimates are correct for their intended purpose.
\(\beta\) has marginal interpretation; GLS and LMM give same population inference
L12-13
GEE
Population-average approach; focus on \(\mu_i = \operatorname{E}(Y_i)\)
L14-15
GLMMs
Subject-specific approach; focus on \(\operatorname{E}(Y_i \mid b_i)\)
Today
Contrast
Why these yield DIFFERENT answers for non-linear models
The Key Realization
The “equivalence” we enjoyed with linear models (GLS \(\approx\) LME for \(\beta\)) BREAKS DOWN completely once we introduce non-linear link functions.
Central question: When do population-average and subject-specific parameters diverge, by how much, and which one should you report?
Part I: Concepts and the Linear Special Case
16.1 The Main Idea
Marginal and mixed models differ beyond correlation handling; they answer different questions.
Linear models mask this difference because \(\operatorname{E}(Y_i)=X_i\beta\) either way.
Once the link is non-linear, \(\beta\) (population) and \(\beta^*\) (subject) diverge.
Always start by writing down the estimand before picking software.
16.2 Linear Special Case: Why Equivalence Holds
With identity link, both GLS and LME imply \(\operatorname{E}(Y_i)=X_i\beta\).
Why? Averaging a linear function of \(b_i\) is straightforward: \[\operatorname{E}[Z_i b_i] = Z_i \operatorname{E}(b_i) = 0\]
Random effects only restructure the covariance: \[\boldsymbol\Sigma_i = \mathbf{Z}_i G \mathbf{Z}_i^\top + \sigma^2 \mathbf{I}\]
where \(G\) is the between-subject random-effects covariance and \(\sigma^2 \mathbf{I}\) is the within-subject measurement (error) covariance.
Therefore \(\boldsymbol\beta\) retains a marginal interpretation in linear mixed models: here \(\boldsymbol\beta = \boldsymbol\beta^*\).
16.2 Quick Derivation
Start from the LME conditional mean: \[\operatorname{E}(\mathbf{Y}_i \mid \mathbf{b}_i) = \mathbf{X}_i\boldsymbol\beta + \mathbf{Z}_i \mathbf{b}_i\]
Apply law of total expectation: \[\operatorname{E}(\mathbf{Y}_i) = \operatorname{E}\big[\operatorname{E}(\mathbf{Y}_i \mid \mathbf{b}_i)\big]\]
Marginal mean requires integrating the inverse link \(h\) over \(\mathbf{b}_i\): \[\mu_i = \int h(\mathbf{X}_i\boldsymbol\beta^* + \mathbf{Z}_i \mathbf{b}_i)\, f(\mathbf{b}_i)\,d\mathbf{b}_i\]
Result: \(g(\mu_i) \neq \mathbf{X}_i \boldsymbol\beta^*\) in general; the link form is not preserved.
16.3 Attenuation Insight
For a logistic GLMM with a random intercept: \[\text{logit}\{\operatorname{E}(Y_{ij}\mid b_i)\} = X_{ij}\boldsymbol\beta^* + b_i, \qquad b_i \sim N(0,\sigma_b^2)\]
Approximate marginal relationship (Zeger, Liang & Albert, 1988; FLW p. 477): \[\text{logit}\{\operatorname{E}(Y_{ij})\} \approx c \, X_{ij}\boldsymbol\beta^*, \qquad c = \left(1 + c_2^{\,2}\,\sigma_b^2\right)^{-1/2}\]
where the named constant is \(c_2 = \dfrac{16\sqrt{3}}{15\pi} \approx 0.588\), so the multiplier of \(\sigma_b^2\) is \(c_2^{\,2} \approx 0.346\).
Two numbers, one constant. \(c_2 = 16\sqrt{3}/(15\pi) \approx 0.588\) is the named Zeger-Liang-Albert constant; the quantity that multiplies \(\sigma_b^2\) inside the attenuation factor is \(c_2^{\,2} \approx 0.346\). Some texts (and FLW p. 477) report the constant as \(0.588\) and write the multiplier as \(c_2^2 = 0.346\); the deck uses \(0.346\) as the multiplier throughout.
Important Note on the Attenuation Constant
The multiplier \(c_2^{\,2} \approx 0.346\) is an approximation specific to:
Worked example (\(\boldsymbol\beta^* = 1.5\), \(\sigma_b^2 = 4\)): \(\boldsymbol\beta_{\text{marg}} = 1.5/\sqrt{1 + 0.346(4)} = 1.5/1.544 \approx 0.97\) (factor \(c=0.65\)). Population OR \(\approx 2.6\) vs individual OR \(\approx 4.5\): same direction, weaker population effect.
Check Your Understanding: Attenuation
Question: A researcher reports that a GLMM for diabetes progression yields a treatment OR of 0.5 (protective effect) with random intercept variance \(\sigma_b^2 = 3.0\). They conclude: “Treatment cuts the population odds in half.”
What is wrong with this conclusion, and what is the correct population OR?
Answer: The researcher incorrectly interpreted the conditional OR as a marginal (population) effect.
Correct conclusion: “Within individuals, treatment cuts odds in half (OR = 0.5). At the population level, treatment reduces odds by about 38% (OR = 0.62).”
The population effect is attenuated toward the null due to averaging over heterogeneity.
Marginal mean factorizes: \[\mu_{ij} = \exp(X_{ij}\boldsymbol\beta^*)\, \operatorname{E}\{\exp(Z_{ij} b_i)\}\]
For a single random intercept, \(\log(\mu_{ij}) = X_{ij}\boldsymbol\beta^* + g_{11}/2\): fixed effects not in \(Z_{ij}\) carry over unchanged (only the intercept shifts by \(g_{11}/2\)).
Warning
Qualifier (same slide, do not skip): this clean carry-over holds only when the covariate has no random slope (it does not overlap \(Z_{ij}\)) and is independent of \(b_i\). With a random slope present, fixed-effect rate ratios do NOT carry over unchanged.
Important Clarification: Marginal Predictions from GLMMs
Common misconception: predict(glmer_fit, re.form = NA) gives marginal predictions.
Reality (FLW textbook, p. 477):
re.form = NA gives conditional predictions at \(b_i = 0\) (the “typical” subject)
For non-linear links: \(\operatorname{E}(Y \mid b = 0) \neq \operatorname{E}(Y)\)
True marginal predictions require integrating over the random effects distribution
Helper Function for True Marginal Predictions
# CAVEAT: this helper assumes a SINGLE random intercept and a BINOMIAL family.# It does NOT handle random slopes or non-binomial GLMMs.glmm_marginal_prob <-function(fit, newdata, draws =2000, seed =667) {stopifnot(family(fit)$family =="binomial")set.seed(seed) # reproducible Monte Carlo (course seed 667)# Fixed-part linear predictor at b = 0 (re.form = NA): X*beta-star only eta_fix <-predict(fit, newdata = newdata, re.form =NA, type ="link")# VarCorr(fit)[[1]] is the covariance of the FIRST grouping factor; its# "stddev" attribute is the random-intercept SD (a 1x1 here -> sigma_b). sd_b <-as.numeric(attr(VarCorr(fit)[[1]], "stddev"))# Monte Carlo integration over b ~ N(0, sigma_b^2): for each of `draws`# random intercepts, push eta_fix + b through plogis (= expit = h), then# average the probabilities across draws (rowMeans over the draw columns).rowMeans(sapply(1:draws, function(...) plogis(eta_fix +rnorm(1, 0, sd_b))))}
Helper Function for True Marginal Predictions
Logistic Attenuation: Simulation Setup
set.seed(667)# Subject-specific parameters (conditional on random effects)beta0 <--1.5# Log-odds at time=0 for average individualbeta1 <-0.75# Subject-specific slope (change per unit time)sigma_b <-2# SD of random intercepts (heterogeneity); variance = 4# Time grid for plottingtime_grid <-seq(-4, 8, by =0.25)# Simulate 25 individuals with varying interceptssubject_b <-rnorm(25, mean =0, sd = sigma_b)# Generate subject-specific curvessubject_curves <- tidyr::expand_grid(time = time_grid, b = subject_b) |>mutate(linear_predictor = beta0 + beta1 * time + b,prob =plogis(linear_predictor),curve_id =factor(round(b, 2)) )# Generate marginal curve by Monte Carlo integrationmarginal_curve <-tibble(time = time_grid,prob =vapply( time_grid,function(t) mean(plogis(beta0 + beta1 * t +rnorm(2000, 0, sigma_b))),numeric(1) ))
Logistic Attenuation: Simulation Setup
Logistic Attenuation: Visualization
logit_plot <-ggplot(subject_curves, aes(x = time, y = prob, group = curve_id)) +geom_line(alpha =0.35, color ="#1f77b4") +geom_line(data = marginal_curve, aes(x = time, y = prob),inherit.aes =FALSE, color ="#d62728", linewidth =1.2) +coord_cartesian(ylim =c(0, 1)) +labs(x ="Time",y ="Probability of success",title ="Subject-specific vs marginal logistic curves",subtitle ="Random intercept variance = 4; slopes attenuate after averaging" ) +theme_minimal(base_size =16)print(logit_plot)
Logistic Attenuation: Visualization
Conditional curves (blue) rise faster than the marginal mean (red). Subject-specific slope = 0.75 yields flatter population slope.
Effect of Heterogeneity on Marginal Curves
# Low heterogeneity (variance = 0.5)sigma_b_low <-sqrt(0.5)marginal_low <-tibble(time =seq(-4, 8, 0.5),prob =vapply(seq(-4, 8, 0.5),function(t) mean(plogis(beta0 + beta1 * t +rnorm(1000, 0, sigma_b_low))),numeric(1)),heterogeneity ="Low (var=0.5)")# High heterogeneity (variance = 9)sigma_b_high <-3marginal_high <-tibble(time =seq(-4, 8, 0.5),prob =vapply(seq(-4, 8, 0.5),function(t) mean(plogis(beta0 + beta1 * t +rnorm(1000, 0, sigma_b_high))),numeric(1)),heterogeneity ="High (var=9)")# Plot comparisoncomparison_plot <-ggplot() +geom_line(data = marginal_low, aes(x = time, y = prob),color ="#2ca02c", linewidth =1.5) +geom_line(data = marginal_high, aes(x = time, y = prob),color ="#ff7f0e", linewidth =1.5) +labs(title ="Impact of heterogeneity on marginal curves",subtitle ="Green: Low heterogeneity (steep) | Orange: High heterogeneity (flat)",x ="Time", y ="Probability") +theme_minimal(base_size =14)print(comparison_plot)
Effect of Heterogeneity on Marginal Curves
Logistic Attenuation: Key Observations
Observation
Explanation
Subject curves (blue) are steeper
They condition on individual \(b_i\)
Marginal curve (red) is flatter
Averaging shrinks the slope because logit is non-linear
Higher \(\sigma_b^2\) = more attenuation
More heterogeneity to average over
Direction preserved
Magnitude shrinks toward null
16.4 Simple Numerical Illustration
Consider a simple setting: binary responses depend only on time with a random intercept.
ggplot(comparison_df, aes(x = time)) +geom_line(aes(y = prob_b0), color ="#1f77b4", linetype ="dashed", linewidth =1.2) +geom_line(aes(y = prob_marginal), color ="#d62728", linewidth =1.2) +annotate("text", x =2, y =0.9, label ="P(Y=1 | b=0)", color ="#1f77b4", size =5) +annotate("text", x =2, y =0.65, label ="True E[Y]", color ="#d62728", size =5) +labs(x ="Time",y ="Probability",title ="Conditional at b=0 vs True Marginal Probability",subtitle =expression(paste("GLMM with ", sigma[b], " = 2; b=0 overestimates probabilities near 0.5")) ) +coord_cartesian(ylim =c(0, 1)) +theme_minimal(base_size =14)
16.4 Visual Comparison
The b=0 curve (blue dashed) is NOT the marginal curve (red solid). The difference is largest in the middle range.
Why Is the Difference Largest Near p = 0.5?
Note
Direction of bias (Jensen): expit is concave above \(p=0.5\) and convex below, so averaging over \(b_i\) pulls the marginal probability toward 0.5. The marginal slope is therefore flatter than the subject-specific slope: \(|\boldsymbol\beta| < |\boldsymbol\beta^*|\).
Difference Near p = 0.5: When It Matters
Region
Curve shape
Effect of averaging over \(b_i\)
Near \(p = 0.5\) (\(\eta \approx 0\))
Steep
Large: marginal pulled toward 0.5, slope flattens
Near \(p = 0\) or \(1\)
Flat (saturated)
Small: marginal \(\approx\) prediction at \(b = 0\)
Takeaway: the marginal-vs-conditional gap is largest for steep regions of the link and large \(\sigma_b^2\).
16.4 Key Takeaway
Prediction Type
What It Represents
How to Compute
re.form = NA
\(E[Y \mid b = 0]\) (typical subject)
predict(fit, re.form = NA)
True marginal
\(E[Y]\) (population average)
Integrate over \(f(b)\)
The difference matters most when:
\(\sigma_b^2\) is large (high heterogeneity)
Predictions are near probability 0.5 (steepest part of logistic curve)
This bridges to the ECG crossover case study where we compare GEE vs GLMM estimates.
Part II: ECG Case Study and Choosing a Model
16.5 ECG Crossover Overview
Study design (FLW Table 16.1; Jones & Kenward 1989):
67 patients randomized to sequences: Placebo\(\to\)Active (34) or Active\(\to\)Placebo (33)
Binary response each period: abnormal ECG (1) vs normal (0)
Sampling zero in Sequence 1, pattern (1,0) - sparse cell issue
Goal: Compare treatment effect estimates from marginal vs mixed models.
ECG Crossover: Key Questions
Stakeholder
Question
FDA
“What’s the population risk increase if we approve this drug?”
Clinician
“Should I prescribe this to my patient with cardiac history?”
16.5 Crossover Design: Sequence x Period -> Treatment
Period 1 Period 2
+-------------+ +-------------+
Seq P->A | Placebo | | Active | treatment = 0, then 1
(34 pts) | (trt = 0) | | (trt = 1) |
+-------------+ +-------------+
Seq A->P | Active | | Placebo | treatment = 1, then 0
(33 pts) | (trt = 1) | | (trt = 0) |
+-------------+ +-------------+
Each subject contributes two binary ECG responses; treatment is the active/placebo indicator for that period, decoded from (sequence, period).
16.5 Data Reconstruction
ecg_counts <-tribble(~sequence, ~resp_p1, ~resp_p2, ~count,"P-A", 1, 1, 6,"P-A", 1, 0, 0, # Sampling zero"P-A", 0, 1, 6,"P-A", 0, 0, 22,"A-P", 1, 1, 9,"A-P", 1, 0, 4,"A-P", 0, 1, 2,"A-P", 0, 0, 18)ecg_long <- ecg_counts |> tidyr::uncount(count) |># one UNIQUE subject id per expanded row (67 patients, 2 rows each).# Do NOT reuse uncount's within-cell .id: that repeats ids across cells.mutate(id =factor(row_number())) |> tidyr::pivot_longer(cols =c(resp_p1, resp_p2),names_to ="period",values_to ="response" ) |>mutate(period =if_else(period =="resp_p1", 1L, 2L),# decode active-drug indicator from (sequence, period); see schematictreatment =case_when( sequence =="P-A"& period ==1L ~0L, sequence =="P-A"& period ==2L ~1L, sequence =="A-P"& period ==1L ~1L,TRUE~0L ) ) |>arrange(id, period)cat("Subjects:", length(unique(ecg_long$id)),"| rows:", nrow(ecg_long), "\n\n")
# Marginal model (GEE), exchangeable working correlationgee_fit <- geepack::geeglm( response ~ treatment + period,id = id,family = binomial,corstr ="exchangeable",data = ecg_long)# Note: only 2 obs/subject, so robust SEs can be biased downward# (small-sample corrections shown later).summary(gee_fit)
Note
Fidelity note. FLW Table 16.2 fits the marginal model with the within-subject association modeled as a common log odds ratio\(\alpha\) (an ALR-type GEE), reporting \(\hat\alpha = 3.56\). We use an exchangeable working correlation as a close software substitute; the treatment estimate (log-odds \(\approx 0.57\), OR \(\approx 1.77\)) reproduces FLW Table 16.2, but the association is parameterized differently.
# Mixed model (GLMM): 100-point adaptive Gaussian quadrature (FLW Table 16.3).# Laplace (nAGQ = 1) is inadequate here because Var(b) is very large (~24).glmm_fit <- lme4::glmer( response ~ treatment + period + (1| id),family = binomial,data = ecg_long,nAGQ =100,control = lme4::glmerControl(optimizer ="bobyqa", optCtrl =list(maxfun =2e5)))summary(glmm_fit)
Note
Fidelity note. FLW Table 16.3 uses 100-point adaptive Gaussian quadrature (nAGQ = 100). We match it. The default Laplace approximation (nAGQ = 1) is unreliable when \(\widehat{\operatorname{Var}}(b_i)\) is as large as here (about 24), so it is not used.
16.5 Model Fits: GLMM
Generalized linear mixed model fit by maximum likelihood (Adaptive
Gauss-Hermite Quadrature, nAGQ = 100) [glmerMod]
Family: binomial ( logit )
Formula: response ~ treatment + period + (1 | id)
Data: ecg_long
Control:
lme4::glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
AIC BIC logLik -2*log(L) df.resid
144.3 155.9 -68.1 136.3 130
Scaled residuals:
Min 1Q Median 3Q Max
-1.108 -0.236 -0.122 0.262 1.364
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 24.4 4.94
Number of obs: 134, groups: id, 67
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.119 2.276 -2.25 0.024 *
treatment 1.863 0.927 2.01 0.044 *
period 1.038 0.819 1.27 0.205
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) trtmnt
treatment -0.691
period -0.820 0.423
16.5 The Same Two Fits in SAS
Marginal model (GEE) – PROC GENMOD with a REPEATED statement:
proc genmod data=ecg descending;
class id period;
model response = treatment period / dist=bin link=logit;
repeated subject=id / type=exch; /* exchangeable working correlation */
run;
Mixed model (GLMM) – PROC GLIMMIX with a RANDOM intercept:
proc glimmix data=ecg method=quad(qpoints=100); /* adaptive Gaussian quadrature */
class id;
model response = treatment period / dist=bin link=logit solution;
random intercept / subject=id;
run;
This slide is reference only (not executed). qpoints=100 mirrors FLW Table 16.3; type=exch mirrors the working-correlation GEE.
cat("\nNote: prob_marginal < prob_conditional due to averaging over heterogeneity\n")
Note: prob_marginal < prob_conditional due to averaging over heterogeneity
16.5 Treatment Effect Comparison
Results interpretation (from computed values):
Model
OR (approx)
Interpretation
GEE
1.77
In the study population, active drug raises the odds of an abnormal ECG by roughly 77%
GLMM
6.44
For a given patient, active drug raises that patient’s odds by a larger factor (conditional on their propensity \(b_i\))
Why the difference? Marginal model averages over heterogeneity; mixed model conditions on \(b_i\). With \(\widehat{\operatorname{Var}}(b_i)\) very large and only 2 obs/subject, the GLMM OR is a point estimate from a weakly identified\(\sigma_b\), so read it as illustrative of direction and order of magnitude.
Interpreting the ECG Results: A Template
For the FDA (population-average perspective):
“Active drug increased the population odds of abnormal ECG by approximately 77% (OR = 1.77, 95% CI: 1.12 to 2.79). This population-level estimate accounts for the average effect across all patients regardless of their baseline cardiac status.”
For the clinician (subject-specific perspective):
“For an individual patient, the active drug raises that patient’s personal odds of an abnormal ECG by a substantial factor (OR \(\approx\) 6.44, 95% CI: 1.05 to 39.64), holding their baseline cardiac propensity constant. The wide interval reflects very large, imprecisely estimated between-patient variability.”
Key point: Neither interpretation is “wrong” – they answer different questions! Report the conditional OR with its uncertainty, not as a precise percentage.
Check Your Understanding: ECG Case Study
Question: The ECG study shows a GEE OR of 1.77 and a GLMM OR of 6.44. A colleague asks: “Which one is the true effect?”
How should you respond?
Answer: There is no single “true effect” – both estimates are correct for their respective estimands.
Key points to explain:
Different questions: GEE answers “What happens on average in the population?” while GLMM answers “What happens within a given individual?”
Non-collapsibility: For binary outcomes, marginal and conditional ORs are mathematically different even without confounding. This is a property of odds ratios, not a flaw in either method.
Practical guidance: Report the GEE estimate for regulatory submissions (population benefit). Report the GLMM estimate for clinical decision-making (individual risk). If unsure, report both with clear interpretation.
The difference is expected: The GLMM OR being larger is consistent with attenuation theory. The marginal effect is always closer to 1 (null) due to averaging over heterogeneity.
Non-collapsibility in Numbers (no confounding)
Two subgroups of equal size, conditional (within-subgroup) OR = \(9\) in both (so nothing is confounded):
Subgroup
Treated \(P(Y=1)\)
Control \(P(Y=1)\)
Conditional OR
Low-risk
0.50
0.10
\(\frac{.5/.5}{.1/.9}=9\)
High-risk
0.90
0.50
\(\frac{.9/.1}{.5/.5}=9\)
Pooled (marginal)
0.70
0.30
\(\frac{.7/.3}{.3/.7}=5.44\)
Marginal OR \(= 5.44 \neq 9 =\) conditional OR, with no confounder. The odds ratio is non-collapsible\(^\dagger\).
Which to Report?
Context
Report
Reason
Clinical Study Report (CSR) for FDA
GEE estimate
Population-level inference
Patient counseling
GLMM estimate
Individual-level inference
Both are correct – they answer different questions!
16.5 Small-Sample GEE Variance Adjustments
With only 2 observations per subject, standard GEE robust SEs can be biased downward.
Key point: The 0.25 correction is a diagnostic probe, not a solution.
16.5 Robustness Check with Conditional Logistic
# Conditional logistic (fixed-effects / within-subject), stratified by subject.# In a 2-period crossover, treatment and period are confounded WITHIN subject,# so we condition on subject and estimate the treatment effect alone; adding# period to the conditional model is not separately identified here.clogit_fit <- survival::clogit( response ~ treatment +strata(id),data = ecg_long)clogit_or <-exp(coef(clogit_fit)["treatment"])cat("Conditional logistic OR for treatment:", round(clogit_or, 2), "\n")
Note
In the Section 16.5 case study, FLW instead uses conditional ML (the method of Section 14.6) after adding \(1/4\) to the sampling-zero cell, obtaining \(\widehat\beta_2^* = 1.94\) (close to the GLMM’s \(1.86\)), with a Hausman-type test (Tchetgen & Coull 2006, Section 14.6) giving \(Z = 0.40\) (\(p > 0.65\)): no evidence against the normal random-effects assumption.
16.5 Robustness Check with Conditional Logistic
Conditional logistic OR for treatment: 5
cat("Compare to GLMM OR:", round(contrast_tbl$odds_ratio[2], 2), "\n")
Compare to GLMM OR: 6.44
cat("\nSame direction and order of magnitude as the GLMM:","the subject-specific effect is robust to the normal-b_i assumption.\n")
Same direction and order of magnitude as the GLMM: the subject-specific effect is robust to the normal-b_i assumption.
16.5 Additional Diagnostics
For sparse data with small \(n_i\):
Check
Purpose
Separation issues
Logistic models can fail to converge
Exact logistic regression
For small samples
Random effects distribution
Sensitivity to normality assumption
Profile likelihood CIs
When Wald approximation is questionable
Key Messages for Sparse Data
Distributional assumptions on \(b_i\) matter most when \(n_i\) is small
With only 2 observations/subject, both approaches have limitations
Report sensitivity analyses alongside main results
16.6 Conclusion Highlights
Approach
Focus
Use Case
Marginal
\(\mu_i\); model mean and association separately
Policy, regulatory
Mixed
Individual trajectories; random effects explain correlation
Personalized medicine
Key insight: Mixed models do NOT automatically provide marginal summaries without extra integration.
16.6 No Grand Unified Model
FLW’s verdict: do not chase one model that answers every question. Different scientific questions demand different models.
“The megalomaniacal strategy of fitting a grand unified model… is dangerous, unnecessary and counterproductive.” – Drum & McCullagh (1993), quoted approvingly by FLW (p. 486)
One size does not fit all. Choose marginal or mixed by the question, and report both when the audience needs both.
Why GLMMs Do Not Solve Everything
Issue
Explanation
Marginal means derivable in principle
But link forms change
Misspecified random effects
Lead to biased population averages
Marginal models
Avoid that risk when target is population inference
Bottom line: One size does not fit all.
When to Choose Marginal vs Mixed Models
Scenario
Use Marginal (GEE)
Use Mixed (GLMM)
FDA submission
X
Public health policy
X
Population attributable risk
X
Personalized medicine
X
Risk calculators
X
Individual prediction
X
Mechanistic understanding
X
Can you fit both? YES! Report both when stakeholders need both estimands.
Common Misconceptions
Misconception
Reality
“Mixed models are always better”
They answer different questions; GEE optimal for marginal inference
“The GLMM \(\beta^*\) is biased because it’s larger”
Neither biased – they estimate different parameters
“re.form = NA gives marginal predictions”
NO! It gives \(E[Y \mid b=0]\), not \(E[Y]\). Must integrate over \(f(b)\).
More Misconceptions
Misconception
Reality
“Link function preserved after integration”
Only for identity (linear) and partially for log link
“Large SEs in GLMM indicate problems”
Expected with small \(n_i\) or large \(\sigma_b^2\)
Gives \(E[Y \mid b=0]\), not \(E[Y]\); use Monte Carlo integration
Convergence warnings with small clusters
Try different optimizers
Sparse cells
Require sensitivity analyses
High \(\sigma_b^2\)
Check quadrature points in GLMMs
Best Practices
State estimand before opening software
Check for sparse cells in contingency tables
Use proper integration for marginal predictions from GLMMs
Apply small-sample corrections for GEE with few clusters
Report both estimates when relevant
Common Mistakes When Contrasting Models
Mistake
Why It Is Wrong
Correction
Claiming GLMM is “biased” because OR differs from GEE
They estimate different parameters (conditional vs marginal)
Both are unbiased for their respective estimands
Expecting GEE and GLMM to match for binary outcomes
Non-collapsibility makes this impossible
Accept the difference as mathematically expected
Using re.form=NA for “marginal” predictions
This gives \(E[Y \mid b=0]\), not \(E[Y]\)
Use Monte Carlo integration over \(f(b)\)
Choosing model based on which gives “better” p-value
P-value shopping invalidates inference
Choose model based on research question before seeing results
Ignoring small-sample issues in GEE
Robust SE can be biased downward with few clusters
Use jackknife, CR2, or Fay corrections
Fitting only one model type
Different audiences need different estimands
Fit both GEE and GLMM as sensitivity analysis
Check Your Understanding: Model Selection
Scenario: You are analyzing a clinical trial of a new antidepressant. The outcome is binary (remission yes/no) measured at 4 visits. You must present results to:
The FDA for drug approval
Psychiatrists who will prescribe the drug
Question: What model(s) should you fit, and what should you report to each audience?
Answer:
Fit both models:
GEE with exchangeable working correlation and robust SE
GLMM with random intercept (and possibly random slope for time)
Report to FDA:
GEE estimate (population-average OR)
“Treatment increased the population odds of remission by X% (OR = Y, 95% CI: …)”
Focus on: Did it work for the population?
Report to psychiatrists:
GLMM estimate (subject-specific OR)
“For any given patient, treatment increases their odds of remission by X% (OR = Y, 95% CI: …)”
Also mention random intercept SD to convey patient-to-patient variability
Focus on: Will it work for my patient?
In the paper: Report both, clearly labeled, with explanation of why they differ.
Advanced (beyond FLW): Marginalized Mixed Models
The challenge: want BOTH random effects (heterogeneity) AND a direct marginal interpretation. FLW notes this is possible only with a non-standard conditional-mean specification (footnote, p. 478); the constructions themselves are beyond Ch. 16.
Approach
Reference
Marginalized multilevel / mixed models
Heagerty (1999); Heagerty & Zeger (2000)
h-likelihood methods
Lee & Nelder (2004)
Implementations
Swihart et al. (2014)
Trade-off: best of both worlds in theory, but complex, specialized software, computationally intensive.
Recipe Card: Decision Table
Step 1 - Define the estimand, then read across:
Factor
Favors Marginal (GEE)
Favors Mixed (GLMM)
Target of inference
Population-average
Subject-specific
Typical audience
FDA, public health
Clinicians, precision medicine
Key output
Population OR/RR, attributable risk
Conditional OR/RR, RE variance
Cluster size \(n_i\)
Small (2-4)
Larger (5+)
Missing data
MCAR (or IPW-GEE for MAR)
MAR (likelihood)
Random effects of interest
No
Yes
Concern about RE distribution
Yes
Less so
Warning
Missing data (FLW p. 528).Standard GEE requires MCAR. IPW-GEE (correct dropout model) extends GEE to MAR; likelihood-based GLMMs are valid under MAR without weighting. “MCAR favors GEE” means standard GEE.
Recipe Card: Decision Flowchart
START
|
What is your estimand?
/ \
Population effect Individual effect
| |
Use GEE Use GLMM
| |
Report population OR/RR Report conditional OR/RR
| |
Consider: robust SE, Consider: nAGQ, random-
working correlation, MCAR effects structure, MAR
| |
+----------+----------------+
|
Report BOTH if the audience includes
clinicians AND regulators
Methods-section template:“We used [GEE/GLMM] because our primary interest was [population-average/subject-specific] effects.”
Synthesis and Key Takeaways
Estimand clarity prevents misinterpretation – Know your audience
Linear models are special – Only place where \(\beta = \beta^*\)
Attenuation is predictable – Use formula (for logistic random-intercept models)
Both approaches are valid – Different questions need different models
Marginal predictions require integration – re.form = NA gives \(E[Y \mid b=0]\), NOT \(E[Y]\)
Remember: The model serves the question, not vice versa.
16.7 Further Reading
Most accessible:
Reference
Best for
Hubbard et al. (2010)
Clinical researchers
Agresti (2002, Section 12.2)
Clearest binary data explanation
Comprehensive coverage:
Reference
Topic
Fitzmaurice et al. (2009, Ch. 7)
Detailed targets of inference
Gardiner et al. (2009)
Assumption comparison checklist
Further Reading: Technical Foundations
Reference
Topic
Neuhaus et al. (1991)
Original comparison paper
Zeger et al. (1988)
Population-average GEE theory
Heagerty & Zeger (2000)
Marginalized mixed models
Summary Checklist
Before analyzing longitudinal/clustered data, ask:
Final Thought
Understanding the distinction between marginal and conditional models is essential for correct inference in longitudinal data analysis.