BIOS 667 - Homework 5

Fitzmaurice, Laird & Ware (2011) - Applied Longitudinal Data Analysis, Chapters 14-16

Author

Your Name Here

Published

June 23, 2026

1 Instructions

Collaboration Policy: You may discuss problems with classmates, but all submitted work must be your own.

AI Use Policy: AI tools may be used for code assistance. Disclose any AI use in your submission.

Submission: Submit a rendered HTML file to Gradescope by the due date.

Due Date: See Canvas for due date

2 Grading Rubric

Category Weight Description
Interpretation & Reasoning 40% Conceptual understanding, correct interpretations
Analysis & Setup 40% Code correctness, model specification
Clarity & Presentation 20% Writing quality, formatting, organization

3 Setup


4 Question 1: Foundations of GLMMs (20 points)

The epilepsy dataset from Thall and Vail (1990) contains seizure counts for 59 epileptics in a placebo-controlled clinical trial of progabide. Patients were randomized to receive either progabide or placebo as an adjuvant to standard anti-epileptic chemotherapy. Baseline seizure counts (8-week interval) and counts during four 2-week post-randomization intervals were recorded.

Code
# Load and prepare epilepsy data with download fallback
data_url <- "https://content.sph.harvard.edu/fitzmaur/ala2e/epilepsy.txt"
data_file <- if (file.exists("../../data/epilepsy.txt")) "../../data/epilepsy.txt" else "data/epilepsy.txt"
if (!file.exists(data_file)) {
  dir.create("data", showWarnings = FALSE)
  download.file(data_url, data_file)
}

epilepsy_raw <- read.table(data_file,
                           skip = 35,  # Skip header text
                           col.names = c("id", "trt", "age", "base",
                                        "y1", "y2", "y3", "y4"))

# Convert to long format
epilepsy <- epilepsy_raw %>%
  pivot_longer(cols = y1:y4,
               names_to = "visit",
               values_to = "seizures") %>%
  mutate(
    visit_num = as.numeric(gsub("y", "", visit)),
    time = (visit_num - 1) * 2,  # weeks since randomization
    trt = factor(trt, levels = c(0, 1), labels = c("Placebo", "Progabide")),
    log_base = log(base / 4),  # log baseline rate per 2 weeks
    age_c = age - mean(age)    # centered age
  )

glimpse(epilepsy)
Rows: 236
Columns: 10
$ id        <int> 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, …
$ trt       <fct> Placebo, Placebo, Placebo, Placebo, Placebo, Placebo, Placeb…
$ age       <int> 31, 31, 31, 31, 30, 30, 30, 30, 25, 25, 25, 25, 36, 36, 36, …
$ base      <int> 11, 11, 11, 11, 11, 11, 11, 11, 6, 6, 6, 6, 8, 8, 8, 8, 66, …
$ visit     <chr> "y1", "y2", "y3", "y4", "y1", "y2", "y3", "y4", "y1", "y2", …
$ seizures  <int> 5, 3, 3, 3, 3, 5, 3, 3, 2, 4, 0, 5, 4, 4, 1, 4, 7, 18, 9, 21…
$ visit_num <dbl> 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, …
$ time      <dbl> 0, 2, 4, 6, 0, 2, 4, 6, 0, 2, 4, 6, 0, 2, 4, 6, 0, 2, 4, 6, …
$ log_base  <dbl> 1.0116009, 1.0116009, 1.0116009, 1.0116009, 1.0116009, 1.011…
$ age_c     <dbl> 2.1694915, 2.1694915, 2.1694915, 2.1694915, 1.1694915, 1.169…

4.1 Part (a) - WRITE: Define the GLMM and explain random effects (5 pts)

Write the mathematical formulation for a Poisson GLMM with a random intercept for the epilepsy seizure count data. Define each component of the model clearly.

Your answer here

4.2 Part (b) - CODE: Fit random-intercept Poisson GLMM (5 pts)

Fit a Poisson GLMM with a random intercept using lme4::glmer(). Include treatment, time (in weeks), log baseline rate, and centered age as predictors.

Code
# Your code here

4.3 Part (c) - WRITE: Interpret the treatment effect (5 pts)

Interpret the treatment coefficient from the GLMM. Be explicit about whether this is a marginal or conditional (subject-specific) effect.

Code
# Your code here

Your answer here

4.4 Part (d) - CODE: Extract and interpret variance components (5 pts)

Extract the random intercept variance and calculate the implied ICC on the log scale. What does this tell us about between-subject heterogeneity?

Code
# Your code here

Your answer here


5 Question 2: GEE vs GLMM Comparison (20 points)

5.1 Part (a) - CODE: Fit GEE with exchangeable correlation (5 pts)

Using the same predictor set, fit a GEE model with exchangeable working correlation and robust standard errors.

Code
# Your code here

5.2 Part (b) - CODE: Compare coefficient estimates (5 pts)

Create a table comparing the treatment coefficient (on the log scale) and standard errors from GLMM and GEE.

Code
# Your code here

5.3 Part (c) - WRITE: Explain the coefficient differences (5 pts)

Why do the GLMM coefficients differ from the GEE coefficients? Specifically address the concept of non-collapsibility and the marginal vs conditional interpretation.

Your answer here

5.4 Part (d) - WRITE: When to use each approach (5 pts)

Describe clinical or policy scenarios where you would prefer GEE over GLMM, and vice versa.

Your answer here


6 Question 3: Marginal Effects from GLMMs and the Attenuation Formula (20 points)

6.1 Part (a) - WRITE: Derive the Zeger attenuation formula for logistic models (5 pts)

For a random-intercept logistic GLMM, the Zeger et al. (1988) approximation relates conditional to marginal coefficients: \[\beta^{\text{marg}} \approx \frac{\beta^{\text{cond}}}{\sqrt{1 + c_2^2 \cdot \sigma_b^2}}\]

where, for the logistic link, \(c_2 = 16\sqrt{3}/(15\pi) \approx 0.588\) is the single attenuation constant, so the multiplier on \(\sigma_b^2\) under the root is \(c_2^2 \approx 0.346\) (equivalently \((1/1.7)^2\), the probit-matching form). Explain the intuition behind this attenuation and why \(\beta^{\text{marg}} < \beta^{\text{cond}}\) in magnitude.

Your answer here

6.2 Part (b) - CODE: Apply attenuation to a binary outcome GLMM (7 pts)

Create a binary outcome from the epilepsy data (seizure-free = 1 if seizures = 0, else 0) and fit a random-intercept logistic GLMM. Apply the Zeger approximation to obtain marginal coefficients.

Code
# Your code here

6.3 Part (c) - CODE: Compare to GEE estimate (5 pts)

Fit a GEE with the same specification and compare the marginal treatment OR to the Zeger-approximated marginal OR from the GLMM.

Code
# Your code here

Your answer here

6.4 Part (d) - WRITE: Limitations of the Zeger approximation (3 pts)

What are the limitations of the Zeger attenuation formula?

Your answer here


7 Question 4: Random-Slope Logistic GLMM (20 points)

For Questions 4 and 5 we use the toenail dataset (De Backer et al., 1998), the canonical binary marginal-vs-mixed example from FLW Chapter 14 (Problem 14.1). In a randomized, double-blind trial, 294 patients with toenail onychomycosis received one of two oral antifungals (Itraconazole or Terbinafine) and were evaluated for the degree of onycholysis (separation of the nail plate from the nail bed) at seven scheduled visits over 48 weeks. The binary outcome is moderate-or-severe onycholysis (1) versus none-or-mild (0).

Code
# Load toenail data with download fallback (clean numeric .dat, no header)
toe_url  <- "https://content.sph.harvard.edu/fitzmaur/ala2e/toenail.dat"
toe_file <- if (file.exists("../../data/toenail.dat")) "../../data/toenail.dat" else "data/toenail.dat"
if (!file.exists(toe_file)) {
  dir.create("data", showWarnings = FALSE)
  download.file(toe_url, toe_file)
}

# Variable list (FLW Ch.14): id, response, treatment, month, visit
# response: 0 = none/mild, 1 = moderate/severe
# treatment: 0 = Itraconazole, 1 = Terbinafine
toenail <- read.table(
  toe_file,
  col.names = c("id", "y", "treatment", "month", "visit")
)

cat("Patients:", length(unique(toenail$id)),
    "| Measurements:", nrow(toenail), "\n")
Patients: 294 | Measurements: 1908 

7.1 Part (a) - WRITE: Specify the random-slope GLMM (5 pts)

Write the random-intercept-and-slope logistic GLMM for the toenail data, with a linear trend in month and a treatment-by-month interaction. Define the random-effects covariance matrix \(G\) and each of its elements.

Your answer here

7.2 Part (b) - CODE: Fit the random-slope GLMM (5 pts)

Fit the random-slope logistic GLMM with lme4::glmer(). (For a random slope, glmer uses the Laplace approximation, nAGQ = 1; adaptive quadrature is restricted to a single scalar random effect.) Report the fixed effects.

Code
# Your code here

7.3 Part (c) - CODE + WRITE: Interpret the random-effects covariance \(G\) (5 pts)

Extract the estimated \(G\) and report (i) the variance of the random intercepts, (ii) the variance of the random slopes, and (iii) the intercept-slope correlation. Interpret each.

Code
# Your code here

Your answer here

7.4 Part (d) - CODE + WRITE: Test whether the random slope is needed (5 pts)

Test \(H_0: \sigma_1^2 = 0,\ \sigma_{01} = 0\) (drop the random slope). Because this is a GLMM boundary test, do not use exactRLRT (which is valid only for the Gaussian LMM). Use a parametric bootstrap of the likelihood-ratio statistic, and compare with the conservative Stram-Lee \(\tfrac12\chi^2_1 + \tfrac12\chi^2_2\) mixture as an approximation.

Why not exactRLRT? exactRLRT/RLRsim exploit the exact restricted-likelihood null distribution of a Gaussian LMM. For a logistic GLMM they are not valid; bootstrap the LRT instead.

Pseudocode:

1. Fit the null (random-intercept) and alternative (random-slope) models.
2. Compute the observed LRT = 2 * (logLik(alt) - logLik(null)).
3. Repeat n_boot times:
   a. Simulate a new response under the fitted NULL model (simulate()).
   b. Refit BOTH the null and alternative models to the simulated data.
   c. Record the bootstrap LRT.
4. Bootstrap p-value = (1 + #{boot LRT >= observed}) / (1 + n_boot).
5. Compare with the Stram-Lee mixture:
   0.5 * pchisq(LRT, 1, lower=FALSE) + 0.5 * pchisq(LRT, 2, lower=FALSE).

Keep n_boot modest (e.g. 200) so it runs in reasonable time; parallel::mclapply and glmerControl(calc.derivs = FALSE) speed it up. Remember that dropping a random slope removes TWO parameters (\(\sigma_1^2\) and \(\sigma_{01}\)), and \(\sigma_1^2 = 0\) is on the boundary.

Code
# Your code here

Your answer here


8 Question 5: Quadrature Sensitivity and Marginal vs Mixed Contrast (20 points)

This challenge question has two parts on the same toenail data: the numerical accuracy of GLMM estimation (adaptive Gaussian quadrature, FLW Ch.14 / Problem 14.1.10) and the contrast between the marginal (GEE) and conditional (GLMM) estimands (FLW Ch.16).

8.1 Part (a) - CODE + WRITE: Adaptive Gaussian quadrature sensitivity (7 pts)

Fit the random-intercept logistic GLMM (FLW Problem 14.1.5) at an increasing number of adaptive Gauss-Hermite quadrature points (nAGQ = 1, 2, 5, 10, 20, 30, 50) and tabulate how the variance component and fixed effects move. At how many points do the estimates stabilize?

Code
# Your code here

Your answer here

8.2 Part (b) - CODE: Marginal (GEE) vs conditional (GLMM) contrast (7 pts)

Fit a GEE (exchangeable working correlation, robust SEs) and the random-intercept GLMM with the same mean model, and contrast the treatment-by-month coefficient. Then apply the Zeger logistic attenuation factor to the GLMM coefficient and compare with the GEE estimate.

The Zeger logistic attenuation relates conditional to marginal coefficients as \(\beta^{\text{marg}} \approx \beta^{\text{cond}}/\sqrt{1 + c_2^2\,\sigma_b^2}\), with the single constant \(c_2 = 16\sqrt{3}/(15\pi) \approx 0.588\) (so the multiplier on \(\sigma_b^2\) under the root is \(c_2^2 \approx 0.346\)).

Code
# Your code here

8.3 Part (c) - WRITE: Explain the contrast and non-collapsibility (3 pts)

Interpret the gap between the marginal and conditional treatment-by-month effects. Why is the conditional effect larger in magnitude, and does the Zeger approximation recover the GEE estimate here?

Your answer here

8.4 Part (d) - WRITE: Which estimand for which question? (3 pts)

A clinician and a health-policy analyst each ask you to summarize the toenail trial. Which model (GEE or GLMM) answers each, and why?

9 Your answer here

10 Peer Review Section

After submission, you will be assigned peer reviews. Use this rubric:

Criterion Excellent (5) Good (3-4) Needs Work (1-2)
Code correctness All code runs, correct output Minor errors Major errors
Interpretations Clear, accurate, in context Mostly correct Missing key points
Presentation Well-organized, readable Adequate Difficult to follow
Marginal vs Conditional Clearly distinguishes Sometimes confuses Conflates the two

11 Reminders


12 Point Summary

Question Topic Points
Q1 Foundations of GLMMs 20
Q2 GEE vs GLMM Comparison 20
Q3 Attenuation Formula & Marginal Effects 20
Q4 Random-Slope Logistic GLMM 20
Q5 Quadrature Sensitivity & Marginal vs Mixed Contrast 20
Total 100