BIOS 667 - Lecture 13: GEE Extensions (Ch. 13)

Fitzmaurice, Laird & Ware (2011) - Applied Longitudinal Data Analysis

Naim Rashid

Lecture Objectives

Understand GEE estimation for binary and count outcomes
Compare working correlation structures and their practical implications
Distinguish model-based vs robust (sandwich) standard errors
Apply small-sample corrections for limited cluster sizes
Perform GEE diagnostics and model assessment
Distinguish standard GEE (MCAR) from IPW-GEE (MAR) for missing data

Roadmap

Day 1 (Parts I-IV): estimation, working correlation, standard errors, small samples. Day 2 (Parts V-VIII): diagnostics, GEE vs mixed, IPW-GEE, applied case study.

Part	Topic	Day
I	GEE Review and Estimation	1
II	Working Correlation Structures	1
III	Model-Based vs Robust Standard Errors	1
IV	Small-Sample Corrections	1
V	GEE Diagnostics and Model Assessment	2
VI	Comparison with Likelihood Methods	2
VII	IPW-GEE for MAR Missing Data	2
VIII	Applied Case Study: Epilepsy Data	2

Notation reference

Symbol	Meaning
\(Y_{ij}\)	response for subject \(i\) at occasion \(j\) (scalar)
\(\mathbf{Y}_i\)	response vector for subject \(i\) \((n_i \times 1)\)
\(\mathbf{X}_i,\ \boldsymbol\beta\)	fixed-effect design \(n_i \times p\) and coefficients
\(\mathbf{Z}_i,\ \mathbf{b}_i\)	random-effect design and random effects, \(\mathbf{b}_i \sim N(0, G)\)
\(G\)	between-subject random-effects covariance (FLW’s letter, Ch. 8)
\(R_i\)	within-subject residual covariance, \(\boldsymbol\varepsilon_i \sim N(0, R_i)\)
\(\Sigma_i = \mathbf{Z}_i G \mathbf{Z}_i^\top + R_i\)	marginal (total) covariance \(\operatorname{Cov}(\mathbf{Y}_i)\) for subject \(i\)
\(\rho\)	correlation parameter (AR(1), exchangeable)

Indices: \(i\) subject \(1,\dots,N\); \(j\) occasion/time \(1,\dots,n_i\) (\(n\) when balanced); \(k\) state (transition models); \(g\) group. Bold = vector or matrix.

Notation (this lecture): GEE symbols

Symbol	Meaning
\(V_i = \mathbf{A}_i^{1/2}\operatorname{Corr}_i(\alpha)\,\mathbf{A}_i^{1/2}\)	GEE working covariance for subject \(i\) (Ch. 12-13 only)
\(\operatorname{Corr}_i(\alpha)\)	working correlation matrix, a function of \(\alpha\) (FLW eq for \(V_i\))
\(\alpha\)	working-correlation parameter(s); scalar for exchangeable/AR(1), a vector for unstructured
\(\mathbf{A}_i = \operatorname{diag}\{\phi\,v(\mu_{ij})\}\)	diagonal of marginal variances \(\operatorname{Var}(Y_{ij})=\phi\,v(\mu_{ij})\) (\(v\) = GLM variance function, \(\phi\) = dispersion; Ch. 11)
\(\mathbf{D}_i = \partial\boldsymbol\mu_i/\partial\boldsymbol\beta\)	derivative of the mean wrt \(\boldsymbol\beta\) (\(n_i \times p\)), the mean’s sensitivity to \(\boldsymbol\beta\)
\(\phi\)	dispersion (scale) parameter, the same \(\phi\) from the Ch. 11 GLM lecture
\(R_{ij}\) (IPW only)	response/observation indicator: \(R_{ij}=1\) if \(Y_{ij}\) is observed. Distinct from \(\operatorname{Corr}_i(\alpha)\)

Part I (Day 1): GEE Review and Estimation

Marginal Models Recap

From Lecture 12, marginal models have three parts:

Mean: \[E(Y_{ij}\mid X_{ij}) = \mu_{ij}, \quad g(\mu_{ij}) = X_{ij} \boldsymbol\beta\]

Variance: \[\operatorname{Var}(Y_{ij}\mid X_{ij}) = \phi \, v(\mu_{ij})\]

Association: Working correlation \(\operatorname{Corr}_i(\alpha)\) for within-subject dependence

Why GEE?

Challenge	GEE Solution
Full likelihood intractable	Quasi-likelihood
Unknown true correlation	Working correlation + robust SEs
Binary/count data	No multivariate distribution needed
Population-average inference	\(\boldsymbol\beta\) interpretation unchanged

GEE Estimating Equations

The GEE for \(\boldsymbol\beta\) is:

\[ U(\boldsymbol\beta) = \sum_{i=1}^{N} \mathbf{D}_i^\top \mathbf{V}_i^{-1}(\mathbf{Y}_i - \boldsymbol\mu_i) = 0 \]

where:

\(\mathbf{D}_i = \partial \boldsymbol\mu_i / \partial \boldsymbol\beta\) is the \(n_i \times p\) derivative of the mean (its sensitivity to \(\boldsymbol\beta\))
\(\mathbf{V}_i = \mathbf{A}_i^{1/2} \operatorname{Corr}_i(\alpha)\, \mathbf{A}_i^{1/2}\) is the working covariance (precision)
\(\mathbf{A}_i = \operatorname{diag}(\phi\,v(\mu_{ij}))\) contains the marginal variances \(\operatorname{Var}(Y_{ij})=\phi\,v(\mu_{ij})\) (\(\phi\) = dispersion, the same \(\phi\) from Ch. 11)

In words: each subject contributes a residual \((\mathbf{Y}_i - \boldsymbol\mu_i)\), weighted by how the mean responds to \(\boldsymbol\beta\) (\(\mathbf{D}_i\)) and down-weighted by its covariance (\(\mathbf{V}_i^{-1}\)). The sum is set to zero:

\[ \underbrace{\mathbf{D}_i^\top}_{\text{sensitivity}}\ \underbrace{\mathbf{V}_i^{-1}}_{\text{precision}}\ \underbrace{(\mathbf{Y}_i - \boldsymbol\mu_i)}_{\text{residual}} \]

GEE Algorithm (FLW two-stage)

FLW (p. 357) fits GEE by alternating two stages until convergence:

Initialize \(\hat{\boldsymbol\beta}^{(0)}\) (e.g., from the independence fit)
Given \(\hat\alpha, \hat\phi\): update \(\hat{\boldsymbol\beta}\) by solving the estimating equation (iteratively reweighted least squares)
Given \(\hat{\boldsymbol\beta}\): update \(\hat\alpha, \hat\phi\) from the standardized residuals \(e_{ij}\)
Iterate steps 2-3 until convergence

Key insight: \(\hat{\boldsymbol\beta}\) converges to a consistent estimate even if \(\operatorname{Corr}_i(\alpha)\) is wrong (so long as the mean model is correct).

Setup: Load Packages and Data

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
  library(ggplot2)
  library(geepack)
  library(lme4)
  library(broom)
})

# Read epilepsy data
epi_raw <- read.table(
  "../../data/epilepsy.txt",
  skip = 36,
  col.names = c("id", "trt", "age", "base", "y1", "y2", "y3", "y4")
)

# Reshape to long format
epi <- epi_raw |>
  pivot_longer(
    cols = y1:y4,
    names_to = "visit",
    values_to = "seizures"
  ) |>
  mutate(
    visit = as.integer(gsub("y", "", visit)),
    trt = factor(trt, levels = c(0, 1), labels = c("Placebo", "Progabide")),
    log_base = log(base / 4),  # log baseline rate per 2 weeks
    log_age = log(age)
  )

head(epi, 8)

Setup: Load Packages and Data

# A tibble: 8 × 8
     id trt       age  base visit seizures log_base log_age
  <int> <fct>   <int> <int> <int>    <int>    <dbl>   <dbl>
1     1 Placebo    31    11     1        5     1.01    3.43
2     1 Placebo    31    11     2        3     1.01    3.43
3     1 Placebo    31    11     3        3     1.01    3.43
4     1 Placebo    31    11     4        3     1.01    3.43
5     2 Placebo    30    11     1        3     1.01    3.40
6     2 Placebo    30    11     2        5     1.01    3.40
7     2 Placebo    30    11     3        3     1.01    3.40
8     2 Placebo    30    11     4        3     1.01    3.40

Epilepsy Trial Data (case study)

Clinical trial of progabide for epilepsy:

59 patients randomized to treatment or placebo
Baseline: seizure count in 8-week period before treatment
Outcome: seizure counts at 4 post-randomization visits (2-week intervals)

Goal: Estimate treatment effect on seizure rate (population-average).

Dataset provenance

The epilepsy/progabide data are a Ch. 13 problem dataset (FLW Problem 13.2, p. 393), including the “exclude patient 49” sensitivity check (Problem 13.2.5). They are a legitimate Ch. 13 example but are not one of the chapter’s three worked Section 13.4 case studies (Muscatine obesity logistic, leprosy bacilli log-linear, arthritis ordinal). We use them as our worked count-data case study.

Visualize Seizure Trajectories

ggplot(epi, aes(visit, seizures, group = id, color = trt)) +
  geom_line(alpha = 0.3) +
  stat_summary(aes(group = trt), fun = mean, geom = "line",
               linewidth = 1.5) +
  scale_y_log10() +
  labs(
    title = "Seizure Counts Over Time by Treatment",
    x = "Visit (2-week period)",
    y = "Seizure Count (log scale)",
    color = "Treatment"
  ) +
  theme_minimal()

Visualize Seizure Trajectories

Part II (Day 1): Working Correlation Structures

Common Working Correlations

Structure	\(\operatorname{Corr}_{jk}(\alpha)\)	Parameters
Independence	\(I(j=k)\)	0
Exchangeable	\(\alpha\) if \(j \neq k\)	1
AR(1)	\(\alpha^{\lvert j-k\rvert}\)	1
Unstructured	\(\alpha_{jk}\)	\(n(n-1)/2\)

Picture First: Three Correlation Shapes

Exchangeable Correlation

\[ \operatorname{Corr}_i(\alpha) = \begin{pmatrix} 1 & \alpha & \alpha & \alpha \\ \alpha & 1 & \alpha & \alpha \\ \alpha & \alpha & 1 & \alpha \\ \alpha & \alpha & \alpha & 1 \end{pmatrix} \]

Use when: No natural time ordering; cluster members similar.

AR(1) Correlation

\[ \operatorname{Corr}_i(\alpha) = \begin{pmatrix} 1 & \alpha & \alpha^2 & \alpha^3 \\ \alpha & 1 & \alpha & \alpha^2 \\ \alpha^2 & \alpha & 1 & \alpha \\ \alpha^3 & \alpha^2 & \alpha & 1 \end{pmatrix} \]

Use when: Correlation decays with time lag.

Unstructured Correlation

\[ \operatorname{Corr}_i(\alpha) = \begin{pmatrix} 1 & \alpha_{12} & \alpha_{13} & \alpha_{14} \\ \alpha_{12} & 1 & \alpha_{23} & \alpha_{24} \\ \alpha_{13} & \alpha_{23} & 1 & \alpha_{34} \\ \alpha_{14} & \alpha_{24} & \alpha_{34} & 1 \end{pmatrix} \]

Use when: Small \(n_i\), enough subjects, no assumed pattern.

GEE with Different Correlations

# Independence
gee_ind <- geeglm(
  seizures ~ trt + log_base + log_age + visit,
  id = id,
  data = epi,
  family = poisson(link = "log"),
  corstr = "independence"
)

# Exchangeable
gee_exch <- geeglm(
  seizures ~ trt + log_base + log_age + visit,
  id = id,
  data = epi,
  family = poisson(link = "log"),
  corstr = "exchangeable"
)

# AR(1)
gee_ar1 <- geeglm(
  seizures ~ trt + log_base + log_age + visit,
  id = id,
  data = epi,
  family = poisson(link = "log"),
  corstr = "ar1"
)

# Unstructured
gee_unstr <- geeglm(
  seizures ~ trt + log_base + log_age + visit,
  id = id,
  data = epi,
  family = poisson(link = "log"),
  corstr = "unstructured"
)

GEE with Different Correlations

Compare Coefficient Estimates

extract_coefs <- function(fit, name) {
  coef(summary(fit)) |>
    as.data.frame() |>
    tibble::rownames_to_column("term") |>
    mutate(model = name) |>
    select(model, term, Estimate, `Std.err`)
}

bind_rows(
  extract_coefs(gee_ind, "Independence"),
  extract_coefs(gee_exch, "Exchangeable"),
  extract_coefs(gee_ar1, "AR(1)"),
  extract_coefs(gee_unstr, "Unstructured")
) |>
  filter(term == "trtProgabide") |>
  knitr::kable(digits = 4)

Compare Coefficient Estimates

model	term	Estimate	Std.err
Independence	trtProgabide	-0.0458	0.1868
Exchangeable	trtProgabide	-0.0420	0.1869
AR(1)	trtProgabide	-0.0559	0.1870
Unstructured	trtProgabide	-0.0538	0.1821

Key Observation

Treatment effect estimates are similar across working correlations.

Standard errors differ – efficiency depends on how close working correlation is to truth.

Robust SEs (default in geepack) protect against misspecification.

Check Your Understanding: Working Correlations

If you specify exchangeable correlation but the true correlation is AR(1), what happens to your coefficient estimates? What about standard errors?
When would you prefer AR(1) over exchangeable correlation?
Why is unstructured correlation not always the best choice?

Answers:

Coefficient estimates remain consistent (they converge to the correct population-average values). Standard errors may be inefficient (larger than optimal) if you use the model-based SEs. With robust (sandwich) SEs, inference remains valid even if the working correlation is misspecified.
Choose AR(1) when observations closer in time are more strongly correlated than distant observations (e.g., disease progression, daily measurements). AR(1) assumes correlation decays exponentially with time lag.
Unstructured correlation estimates \(n(n-1)/2\) parameters, which can be problematic with: (a) limited subjects (parameters may be poorly estimated), (b) unbalanced data (not all pairs observed), (c) overfitting concerns. Simpler structures (exchangeable, AR(1)) are more efficient when they approximately hold.

Estimated Working Correlations

cat("Exchangeable alpha:", gee_exch$geese$alpha, "\n")

Estimated Working Correlations

Exchangeable alpha: 0.4025429

cat("AR(1) alpha:", gee_ar1$geese$alpha, "\n")

AR(1) alpha: 0.5504609

cat("Unstructured alpha:\n")

Unstructured alpha:

print(gee_unstr$geese$alpha)

alpha.1:2 alpha.1:3 alpha.1:4 alpha.2:3 alpha.2:4 alpha.3:4 
0.4582196 0.4058512 0.2512328 0.5813033 0.3022116 0.4205118

Modeling the Association (binary data)

For binary outcomes, FLW Ch. 13 treats the within-subject association as something to MODEL, not just a nuisance:

Correlations are awkward for binary data (bounded by the means)
Better target: pairwise odds ratios \(\operatorname{OR}_{jk}\)
Alternating Logistic Regression (ALR): a second estimating equation for the association parameters \(\alpha\), solved alongside the mean equation for \(\boldsymbol\beta\)

So \(\alpha\) can be a target of inference (how does dependence change over time?), not only a working-correlation nuisance.

Beyond our worked example

We fit a Poisson (count) case study, so we use a working correlation. The OR/ALR association model is the FLW Ch. 13 extension for binary data (FLW Section 13.4, Muscatine logistic example); R support is in the geepack geese() / alr machinery and the multgee package. We flag it here as the defining “extension” of the chapter.

Part III (Day 1): Model-Based vs Robust Standard Errors

Two Types of Standard Errors

Type	Formula	Assumption
Model-based (naive)	\(\mathbf{B}^{-1}\)	Correct \(\mathbf{V}_i\)
Robust (sandwich)	\(\mathbf{B}^{-1} \mathbf{M} \mathbf{B}^{-1}\)	Mean model correct

Bread (sensitivity): \(\ \mathbf{B} = \sum_i \mathbf{D}_i^\top \mathbf{V}_i^{-1} \mathbf{D}_i\)

Meat (observed variability): \(\ \mathbf{M} = \sum_i \mathbf{D}_i^\top \mathbf{V}_i^{-1} (\mathbf{Y}_i - \boldsymbol\mu_i)(\mathbf{Y}_i - \boldsymbol\mu_i)^\top \mathbf{V}_i^{-1} \mathbf{D}_i\)

Why is the sandwich robust?

\(\mathbf{M}\) uses the observed residual products \((\mathbf{Y}_i - \boldsymbol\mu_i)(\mathbf{Y}_i - \boldsymbol\mu_i)^\top\), so it does not trust the working \(\mathbf{V}_i\). If the working correlation is wrong, \(\mathbf{M}\) still estimates the true variability. The model-based SE \(\mathbf{B}^{-1}\) is correct only if \(\mathbf{V}_i\) is correct.

When to Use Which?

Scenario	Recommendation
Large \(N\), balanced, replicated covariate patterns	Robust (sandwich)
Correlation uncertain, \(N\) large	Robust (sandwich)
Correlation plausibly correct	Model-based (more efficient)
Small/modest \(N\), or severely unbalanced	Model-based \(\mathbf{B}^{-1}\) (FLW pp. 360-361)

Robust is not a blanket default

The sandwich estimator’s robustness is an asymptotic (large-\(N\)) property. With few independent subjects, or a severely unbalanced design with no replicated covariate patterns, the sandwich is biased downward and highly variable (FLW pp. 360-361). FLW’s remedy is to prefer the model-based estimator \(\mathbf{B}^{-1}\) with a plausible working covariance. Robust SEs are the default only when \(N\) is large and the design is well-replicated.

Comparing SE Types in geepack

# Model-based (naive) SEs require fitting with std.err = "jack"
# then extracting the naive variance from the geese object

# Fit model - geepack stores both variance estimates internally
gee_exch_fit <- geeglm(

  seizures ~ trt + log_base + log_age + visit,
  id = id,
  data = epi,
  family = poisson(link = "log"),
  corstr = "exchangeable"
)

# Extract robust (sandwich) SEs - this is the default in summary()
se_robust <- sqrt(diag(gee_exch_fit$geese$vbeta))

# Extract model-based (naive) SEs from vbeta.naiv
# This is (D'V^{-1}D)^{-1}, assuming working covariance is correct
se_naive <- sqrt(diag(gee_exch_fit$geese$vbeta.naiv))

data.frame(
  term = names(coef(gee_exch_fit)),
  SE_robust = round(se_robust, 4),
  SE_model_based = round(se_naive, 4),
  ratio = round(se_robust / se_naive, 2)
)

Comparing SE Types in geepack

          term SE_robust SE_model_based ratio
1  (Intercept)    0.9275         1.2339  0.75
2 trtProgabide    0.1869         0.1510  1.24
3     log_base    0.1559         0.1052  1.48
4      log_age    0.2540         0.3335  0.76
5        visit    0.0350         0.0341  1.03

Interpretation of SE Comparison

Read the ratio = robust / naive column you just saw:

Ratio > 1 (robust larger): the working correlation overstates the true correlation, so the naive SE is too small. Naive inference is anti-conservative (Type I error inflated). Trust the robust SE.

Ratio < 1 (robust smaller): the working correlation understates the true correlation, so the naive SE is too large. Naive inference is conservative; the robust SE is sharper.

Mechanism: the naive SE believes the working \(\mathbf{V}_i\); when \(\mathbf{V}_i\) overstates correlation it credits each subject with too little independent information, flipping which SE is bigger.

Part IV (Day 1): Small-Sample Corrections

The Small-Sample Problem

Robust SEs rely on an asymptotic (large-\(N\)) approximation:

\[ \hat{\boldsymbol\beta} \ \dot\sim\ N\!\left(\boldsymbol\beta,\ \mathbf{B}^{-1}\mathbf{M}\mathbf{B}^{-1}\right) \]

With few clusters (\(N < 40\)), the sandwich estimator is biased downward (FLW pp. 360-361).

This leads to:

Underestimated standard errors
Inflated Type I error rates

FLW’s own remedy (in-chapter)

When \(N\) is small and a working covariance is plausible, FLW (p. 361) recommends using the model-based estimator \(\mathbf{B}^{-1}\) rather than the sandwich. That is the chapter’s prescription; the named bias corrections on the next slide are beyond FLW Ch. 13.

Bias-Corrected Sandwich Estimators

Beyond FLW Ch. 13 (instructor extension)

These named corrections are not in FLW Ch. 13 (Kauermann & Carroll 2001; Fay & Graubard 2001; Mancl & DeRouen 2001, Biometrics). FLW’s in-chapter small-\(N\) remedy is the model-based estimator \(\mathbf{B}^{-1}\) (p. 361). The corrections below inflate the under-estimated sandwich, differing mainly in aggressiveness.

Method	Correction
Kauermann-Carroll (KC)	\((\mathbf{I} - \mathbf{H}_i)^{-1/2}\) adjustment
Fay-Graubard (FG)	\((\mathbf{I} - \mathbf{H}_i)^{-1}\) adjustment
Mancl-DeRouen (MD)	Delete-one jackknife

\(\mathbf{H}_i\) is the cluster leverage (“hat”) matrix, the cluster analogue of the scalar OLS leverage \(h_{ii}\).

Small-Sample Correction in Practice

# geepack does not implement KC/FG/MD; those live in the geesmv package.
# We illustrate the small-sample idea with a model-agnostic cluster bootstrap.
set.seed(667)
n_boot <- 200
boot_coefs <- matrix(NA, n_boot, 5)

for (b in 1:n_boot) {
  # Resample whole CLUSTERS (subjects) with replacement
  ids <- unique(epi$id)
  boot_ids <- sample(ids, length(ids), replace = TRUE)
  boot_data <- do.call(rbind, lapply(seq_along(boot_ids), function(i) {
    d <- epi[epi$id == boot_ids[i], ]
    d$id <- i   # LOAD-BEARING: GEE needs UNIQUE cluster ids after resampling,
                # or a duplicated subject is merged and the SEs are wrong
    d
  }))

  fit_b <- tryCatch(
    geeglm(
      seizures ~ trt + log_base + log_age + visit,
      id = id,
      data = boot_data,
      family = poisson(link = "log"),
      corstr = "exchangeable"
    ),
    error = function(e) NULL
  )

  if (!is.null(fit_b)) {
    boot_coefs[b, ] <- coef(fit_b)
  }
}

boot_se <- apply(boot_coefs, 2, sd, na.rm = TRUE)
names(boot_se) <- names(coef(gee_exch))

data.frame(
  term = names(boot_se),
  SE_robust = round(se_robust, 4),
  SE_bootstrap = round(boot_se, 4)
)

Small-Sample Correction in Practice

                     term SE_robust SE_bootstrap
(Intercept)   (Intercept)    0.9275       1.0868
trtProgabide trtProgabide    0.1869       0.2090
log_base         log_base    0.1559       0.1714
log_age           log_age    0.2540       0.3120
visit               visit    0.0350       0.0341

Degrees of Freedom Adjustments

For small samples, also adjust test statistics:

Approach	df
Standard Wald	\(\infty\) (normal)
Cluster-level	\(N - p\)
Kenward-Roger-type	Data-dependent

Use \(t_{N-p}\) or \(F\) reference distributions instead of \(z\) or \(\chi^2\).

Rule of Thumb

N (clusters)	Recommendation
\(N \geq 40\)	Standard robust SEs adequate
\(20 \leq N < 40\)	Use bias-corrected SEs
\(N < 20\)	Consider mixed models or exact methods

With 59 patients here, standard robust SEs are reasonable.

Check Your Understanding: Standard Errors and Small Samples

What is the difference between model-based (naive) and robust (sandwich) standard errors?
Why do robust SEs tend to be biased downward with few clusters?
You have a study with 25 subjects. What precautions should you take for GEE inference?

Answers:

Model-based SEs assume the working covariance structure is correctly specified; they use \((D^\top V^{-1} D)^{-1}\). Robust SEs use the sandwich estimator \(B^{-1}MB^{-1}\), which uses the observed residuals to estimate the true covariance. Robust SEs are valid even if the working correlation is wrong.
The sandwich estimator relies on asymptotic theory (large N). With few clusters, there is not enough information to estimate the “meat” of the sandwich (\(M\)) well, leading to underestimation of variance. This causes inflated Type I error rates.
With 25 subjects (between 20 and 40), use bias-corrected sandwich estimators (Kauermann-Carroll, Fay-Graubard, or Mancl-DeRouen). Also consider using \(t\) or \(F\) distributions with \(N-p\) degrees of freedom instead of normal/chi-squared for hypothesis tests.

Day 1 Recap; Part V (Day 2): Diagnostics

Recap of Day 1 (Parts I-IV)

GEE solves \(\sum_i \mathbf{D}_i^\top \mathbf{V}_i^{-1}(\mathbf{Y}_i - \boldsymbol\mu_i) = 0\); \(\hat{\boldsymbol\beta}\) is consistent even if \(\operatorname{Corr}_i(\alpha)\) is wrong.
Working correlation affects efficiency, not consistency (exchangeable / AR(1) / unstructured).
Robust (sandwich) SE = \(\mathbf{B}^{-1}\mathbf{M}\mathbf{B}^{-1}\); model-based = \(\mathbf{B}^{-1}\). Prefer model-based when \(N\) is small or the design is unbalanced.

Diagnostic Goals for GEE

Goal	What to Check
Mean model	Residual patterns
Link function	Linearity of predictors
Variance function	Scale of residuals
Influential clusters	Leverage, deletion
Working correlation	QIC comparison

Pearson Residuals vs Fitted

Reading it

Good: residuals scattered evenly around 0 (dashed line), the red loess roughly flat. Problematic: a curved loess (mean-model / link misspecification) or a fan that widens with the fitted value (variance-function misspecification). Here, watch for a few large positive residuals from high-count subjects.

Residuals by Visit and Treatment

Reading it

Good: boxes centered on 0 (dashed line) with similar spread across visits and both arms. Problematic: a drift in the medians across visits (time trend the mean model missed) or boxes much wider in one arm (heteroscedasticity not captured).

Cluster-Level Residuals

Reading it

Each bar sums a subject’s four Pearson residuals. The red dashed lines at \(\pm 4\) are an informal flag: bars outside them are subjects the model fits poorly and are candidates for a sensitivity check. Good: most bars well inside \(\pm 4\). Problematic: one or two bars far outside (an influential cluster, e.g. subject 49).

Identifying Influential Clusters

Subject 49 has extremely high seizure counts:

# Check extreme subjects
epi |>
  group_by(id, trt) |>
  summarize(
    total_seizures = sum(seizures),
    mean_seizures = mean(seizures),
    .groups = "drop"
  ) |>
  arrange(desc(total_seizures)) |>
  head(5)

Identifying Influential Clusters

# A tibble: 5 × 4
     id trt       total_seizures mean_seizures
  <int> <fct>              <int>         <dbl>
1    49 Progabide            302          75.5
2    25 Placebo              143          35.8
3    18 Placebo              123          30.8
4     8 Placebo               95          23.8
5    35 Progabide             74          18.5

Sensitivity Analysis: Exclude Outlier

epi_no49 <- epi |> filter(id != 49)

gee_no49 <- geeglm(
  seizures ~ trt + log_base + log_age + visit,
  id = id,
  data = epi_no49,
  family = poisson(link = "log"),
  corstr = "exchangeable"
)

# Compare treatment effects
data.frame(
  model = c("Full data", "Exclude ID 49"),
  trt_coef = c(coef(gee_exch)["trtProgabide"],
               coef(gee_no49)["trtProgabide"]),
  SE = c(summary(gee_exch)$coef["trtProgabide", "Std.err"],
         summary(gee_no49)$coef["trtProgabide", "Std.err"])
) |>
  mutate(
    RR = exp(trt_coef),
    across(c(trt_coef, SE, RR), ~round(.x, 4))
  )

Sensitivity Analysis: Exclude Outlier

          model trt_coef     SE     RR
1     Full data   -0.042 0.1869 0.9589
2 Exclude ID 49   -0.262 0.1549 0.7695

Choosing a Working Correlation: FLW’s method

In-chapter approach (FLW p. 360)

FLW does NOT pick a working correlation by an information criterion. Its guidance: \(\hat{\boldsymbol\beta}\) is consistent under any working assumption, there is often negligible efficiency loss from working independence, and you can compare the empirical (sandwich) SEs across structures. If the robust SEs are stable across choices, the choice barely matters.

The fitted \(\hat\alpha\) values and the robust-vs-naive SE comparison (Part III) are the in-chapter diagnostics. QIC (next slide) is a useful add-on, but it is an instructor extension.

QIC: Model Selection for GEE

Beyond FLW Ch. 13 (instructor extension)

QIC / QICu are due to Pan (2001), Biometrics. They are not in FLW Ch. 13, which selects a working correlation by comparing empirical SEs (p. 360). QIC automates that comparison.

QIC -> compare correlation structures (same mean model)
QICu -> compare mean models (same correlation); QICu is NOT valid for selecting the correlation structure

One actionable rule: smaller is better; use QIC for correlation, QICu for covariates.

QIC: Code

# QIC() returns a named vector; index by NAME (positions are
# geepack-version-dependent). "QIC" ranks correlation structures;
# "QICu" ranks mean models (same correlation).
qic_row <- function(fit) {
  q <- geepack::QIC(fit)
  c(QIC = unname(q["QIC"]), QICu = unname(q["QICu"]))
}

data.frame(
  correlation = c("Independence", "Exchangeable", "AR(1)", "Unstructured"),
  rbind(qic_row(gee_ind), qic_row(gee_exch),
        qic_row(gee_ar1), qic_row(gee_unstr))
) |>
  arrange(QIC)

QIC: Code

   correlation       QIC      QICu
1 Independence -5879.308 -5898.909
2 Exchangeable -5879.027 -5898.895
3 Unstructured -5878.762 -5898.708
4        AR(1) -5876.994 -5897.955

Part VI (Day 2): Comparison with Likelihood Methods

GEE vs Mixed Models

Aspect	GEE	Mixed Models
Target	Population-average \(\boldsymbol\beta\)	Subject-specific \(\boldsymbol\beta\)
Correlation	Working (misspecification OK)	Model-based
Missing data	Standard GEE: MCAR; IPW-GEE: MAR	MAR with ML
Efficiency	Depends on correlation	Higher if model correct

Important: Standard GEE requires MCAR for valid inference. IPW-GEE extends validity to MAR when the dropout model is correctly specified. (IPW-GEE itself is developed in FLW Ch. 17-18, not Ch. 13; we preview it in Part VII.)

Coefficient Interpretation: link matters

For an identity link (continuous data): GEE and mixed-model \(\boldsymbol\beta\) have the same interpretation.

For a nonlinear link:

GEE \(\boldsymbol\beta\): population-average (marginal) effect
Mixed \(\boldsymbol\beta\mid \mathbf{b}\): subject-specific (conditional) effect

Attenuation is link-specific (not universal)

Logit link (binary): the marginal slope IS attenuated toward zero relative to the conditional slope. Approximately \(\beta^{\text{marg}} \approx (1 + c_2^2\,\sigma_b^2)^{-1/2}\,\beta^{\text{cond}}\) with \(c_2 = 16\sqrt{3}/(15\pi) \approx 0.588\), so the multiplier inside is \(c_2^2 \approx 0.346\) (FLW p. 477).
Log link (Poisson, our example): random-intercept SLOPES are NOT attenuated. Only the intercept shifts by \(\sigma_b^2/2\). Marginal and conditional slope coefficients are equal.

Attenuation: the logistic picture

For a binary outcome with a subject random intercept \(b_i \sim N(0, \sigma_b^2)\), averaging the individual logistic curves over \(b_i\) flattens the population-average curve:

Mixed Model for Comparison

# GLMM with random intercept
glmm_fit <- glmer(
  seizures ~ trt + log_base + log_age + visit + (1 | id),
  data = epi,
  family = poisson(link = "log")
)

# Compare coefficients
data.frame(
  term = c("(Intercept)", "trtProgabide", "log_base", "log_age", "visit"),
  GEE = round(coef(gee_exch), 4),
  GLMM = round(fixef(glmm_fit), 4)
)

Mixed Model for Comparison

                     term     GEE    GLMM
(Intercept)   (Intercept) -1.9860 -0.7316
trtProgabide trtProgabide -0.0420 -0.3232
log_base         log_base  1.2233  1.0211
log_age           log_age  0.5072  0.2608
visit               visit -0.0574 -0.0574

Marginal vs Conditional Effects

# Treatment effect comparison
trt_gee <- coef(gee_exch)["trtProgabide"]
trt_glmm <- fixef(glmm_fit)["trtProgabide"]

data.frame(
  Model = c("GEE (marginal)", "GLMM (conditional)"),
  log_RR = c(trt_gee, trt_glmm),
  RR = exp(c(trt_gee, trt_glmm)),
  Interpretation = c("Population-average", "Subject-specific")
) |>
  mutate(across(c(log_RR, RR), ~round(.x, 4)))

Expected: slopes nearly agree here

This is a log-link Poisson random-intercept model, so the marginal and conditional slope coefficients are equal in theory; the small numeric gap is sampling noise plus the random-intercept variance affecting the intercept. Slope attenuation would appear with a logit link (previous two slides), not here.

Marginal vs Conditional Effects

               Model  log_RR     RR     Interpretation
1     GEE (marginal) -0.0420 0.9589 Population-average
2 GLMM (conditional) -0.3232 0.7238   Subject-specific

When to Choose GEE vs GLMM

Choose GEE when	Choose GLMM when
Population-average inference	Subject-specific predictions
Robust to correlation misspecification	Modeling individual trajectories
MCAR missing data (or MAR with IPW-GEE)	MAR missing data
Scientific question is about average	Random effects are of interest

Note on missing data: Standard GEE is only valid under MCAR. If dropout depends on observed data (MAR), use IPW-GEE or likelihood-based methods (GLMM).

Part VII (Day 2): IPW-GEE for MAR Missing Data

Beyond FLW Ch. 13 (forward look)

Ch. 13 notes only that standard GEE needs MCAR and that IPW “is discussed in detail in Chapters 17 and 18” (FLW p. 359). IPW-GEE is developed there (and in Robins, Rotnitzky & Zhao 1995, JASA), not in Ch. 13. This Part is a preview of that later material.

The Missing Data Problem in GEE

Standard GEE assumes MCAR (Missing Completely at Random):

Probability of dropout does not depend on outcomes or covariates
Example: Random equipment failure, administrative censoring

Reality: Dropout often depends on observed data (MAR):

Patients with more seizures may drop out
Treatment side effects may cause attrition
This violates MCAR and biases standard GEE

IPW-GEE: The Solution for MAR

Inverse Probability Weighted GEE (Robins, Rotnitzky & Zhao, 1995):

Model the probability of being observed: \(\pi_{ij} = \Pr(R_{ij} = 1 \mid \text{history})\)
Weight each observation by \(w_{ij} = 1/\pi_{ij}\)
Solve the weighted estimating equations

\[ U_w(\boldsymbol\beta) = \sum_{i=1}^{N} \mathbf{D}_i^\top \mathbf{W}_i \mathbf{V}_i^{-1}(\mathbf{Y}_i - \boldsymbol\mu_i) = 0 \]

where \(\mathbf{W}_i = \operatorname{diag}(R_{ij}/\hat\pi_{ij})\).

Notation: \(R_{ij}\) here is the observation indicator

\(R_{ij} = 1\) if \(Y_{ij}\) is observed (FLW missing-data notation). This is distinct from the working correlation \(\operatorname{Corr}_i(\alpha)\) used throughout the deck. Same letter, different object.

IPW-GEE Validity Conditions

IPW-GEE is consistent if:

MAR holds: dropout depends only on observed data
Monotone dropout: once a subject leaves, they do not return (FLW’s weight model)
Dropout model correctly specified: \(\pi_{ij}\) estimated consistently
Positivity: \(\pi_{ij} > 0\) for all observations

FLW’s weight is a cumulative product

Under monotone MAR dropout FLW uses the cumulative stay-in weight \[w_{ij} = 1 \big/ \textstyle\prod_{k \le j} \Pr(R_{ik}=1 \mid R_{i,k-1}=1, \text{history}),\] and the proper IPW sandwich variance (which accounts for estimating the weights). Our example uses a single-visit \(1/\hat\pi_{ij}\) weight, a teaching simplification.

Trade-off: IPW-GEE is less efficient than standard GEE when MCAR actually holds.

IPW-GEE Example: Simulated Dropout

# Create data with MAR dropout
set.seed(667)
epi_mar <- epi |>
  group_by(id) |>
  mutate(
    # Dropout probability depends on previous seizures
    prev_seizures = lag(seizures, default = 0),
    # Higher seizures -> more likely to drop out
    dropout_prob = plogis(-2 + 0.1 * prev_seizures),
    dropout = rbinom(n(), 1, dropout_prob),
    # Cumulative dropout
    ever_dropped = cummax(dropout),
    observed = 1 - ever_dropped
  ) |>
  ungroup()

# Keep only observed data
epi_obs <- epi_mar |> filter(observed == 1)

cat("Original observations:", nrow(epi), "\n")

IPW-GEE Example: Simulated Dropout

Original observations: 236

cat("Observed after dropout:", nrow(epi_obs), "\n")

Observed after dropout: 139

cat("Dropout rate:", round(1 - nrow(epi_obs)/nrow(epi), 2) * 100, "%\n")

Dropout rate: 41 %

Fit Dropout Model

# Model probability of being observed given history
dropout_fit <- glm(
  observed ~ trt + log_base + log_age + visit + prev_seizures,
  data = epi_mar,
  family = binomial()
)

# Get predicted probabilities for observed subjects
epi_obs$pi_hat <- predict(dropout_fit, newdata = epi_obs, type = "response")
epi_obs$ipw <- 1 / epi_obs$pi_hat

# Check weight distribution
summary(epi_obs$ipw)

Fit Dropout Model

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.086   1.144   1.394   1.587   1.815   4.172

Compare Standard GEE vs IPW-GEE

# Standard GEE (ignoring dropout mechanism)
gee_standard <- geeglm(
  seizures ~ trt + log_base + log_age + visit,
  id = id,
  data = epi_obs,
  family = poisson(link = "log"),
  corstr = "exchangeable"
)

# IPW-GEE (accounting for MAR dropout)
gee_ipw <- geeglm(
  seizures ~ trt + log_base + log_age + visit,
  id = id,
  data = epi_obs,
  family = poisson(link = "log"),
  corstr = "exchangeable",
  weights = ipw
)

# Compare treatment effects (save for inline interpretation next slide)
ipw_compare <- data.frame(
  Method = c("Full data (no dropout)", "Standard GEE (MAR ignored)", "IPW-GEE (MAR corrected)"),
  trt_coef = round(c(coef(gee_exch)["trtProgabide"],
                     coef(gee_standard)["trtProgabide"],
                     coef(gee_ipw)["trtProgabide"]), 4),
  SE = round(c(summary(gee_exch)$coef["trtProgabide", "Std.err"],
               summary(gee_standard)$coef["trtProgabide", "Std.err"],
               summary(gee_ipw)$coef["trtProgabide", "Std.err"]), 4)
)
ipw_compare

Compare Standard GEE vs IPW-GEE

                      Method trt_coef     SE
1     Full data (no dropout)  -0.0420 0.1869
2 Standard GEE (MAR ignored)  -0.0799 0.2109
3    IPW-GEE (MAR corrected)  -0.1895 0.1906

Interpreting the IPW-GEE Comparison

With the full data, the treatment log-rate-ratio is -0.042. After MAR dropout:

Standard GEE (ignoring dropout) gives -0.08, a distance of 0.038 from the full-data value: dropout has biased it.
IPW-GEE gives -0.19, a distance of 0.148 from full-data, not closer to the truth.

Takeaway: ignoring informative dropout pulls the estimate away from the full-data target; weighting by \(1/\hat\pi_{ij}\) moves it back (here, not all the way to) the full-data value.

One simulated realization

This is a single set.seed(667) dropout draw, so the exact distances are illustrative. The direction (standard GEE biased, IPW-GEE corrects toward the truth) is the reproducible teaching point.

IPW-GEE: Practical Considerations

Consideration	Recommendation
Weight extremes	Stabilize or truncate weights if \(\pi_{ij}\) near 0
Dropout model	Include predictors of both dropout AND outcome
Model misspecification	Use doubly robust methods (AIPW)
Efficiency	IPW less efficient; use when MAR is plausible

Rule of thumb: If > 10% dropout and mechanism may be MAR, consider IPW-GEE.

Summary: GEE and Missing Data

Scenario	Appropriate Method
Complete data	Standard GEE
MCAR dropout	Standard GEE (valid)
MAR dropout	IPW-GEE or likelihood-based (GLMM)
MNAR dropout	Sensitivity analysis, pattern-mixture, selection models

Key point: Standard GEE is only valid under MCAR. For MAR, use IPW-GEE with a correctly specified dropout model.

Part VIII (Day 2): Applied Case Study Summary

Full GEE Analysis: Epilepsy Data

# Final model with exchangeable correlation
summary(gee_exch)

Full GEE Analysis: Epilepsy Data


Call:
geeglm(formula = seizures ~ trt + log_base + log_age + visit, 
    family = poisson(link = "log"), data = epi, id = id, corstr = "exchangeable")

 Coefficients:
             Estimate  Std.err   Wald Pr(>|W|)    
(Intercept)  -1.98597  0.92753  4.585   0.0323 *  
trtProgabide -0.04197  0.18688  0.050   0.8223    
log_base      1.22330  0.15593 61.551 4.33e-15 ***
log_age       0.50719  0.25398  3.988   0.0458 *  
visit        -0.05743  0.03498  2.696   0.1006    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = exchangeable 
Estimated Scale Parameters:

            Estimate Std.err
(Intercept)    4.727   1.163
  Link = identity 

Estimated Correlation Parameters:
      Estimate Std.err
alpha   0.4025 0.07227
Number of clusters:   59  Maximum cluster size: 4

Exponentiated Coefficients (Rate Ratios)

coefs <- coef(summary(gee_exch))

# Compute Wald-based 95% CIs manually
estimates <- coefs[, "Estimate"]
se <- coefs[, "Std.err"]
ci_lower <- estimates - 1.96 * se
ci_upper <- estimates + 1.96 * se

results <- data.frame(
  term = rownames(coefs),
  Estimate = estimates,
  SE = se,
  RR = exp(estimates),
  RR_lower = exp(ci_lower),
  RR_upper = exp(ci_upper),
  p_value = coefs[, "Pr(>|W|)"]
)

results |>
  mutate(across(c(Estimate, SE, RR, RR_lower, RR_upper), ~round(.x, 3))) |>
  mutate(p_value = format.pval(p_value, digits = 3))

Exponentiated Coefficients (Rate Ratios)

          term Estimate    SE    RR RR_lower RR_upper  p_value
1  (Intercept)   -1.986 0.928 0.137    0.022    0.845   0.0323
2 trtProgabide   -0.042 0.187 0.959    0.665    1.383   0.8223
3     log_base    1.223 0.156 3.398    2.503    4.613 4.33e-15
4      log_age    0.507 0.254 1.661    1.009    2.732   0.0458
5        visit   -0.057 0.035 0.944    0.882    1.011   0.1006

Interpreting Treatment Effect

Treatment coefficient: \(\hat\beta =\) -0.042

Rate Ratio: \(\exp(\hat\beta) =\) 0.959

95% CI: (0.66, 1.38)

Interpretation: On average, progabide patients have about 4.1% lower seizure rates than placebo, but this effect is not statistically significant (p = 0.82).

Plain-Language Interpretation: GEE Output

Term	Coefficient	Rate Ratio
trtProgabide	-0.042	0.959
log_base	1.223	3.398
log_age	0.507	1.661
visit	-0.057	0.944

trtProgabide: rate ratio 0.96, about 4.1% lower (not significant)
log_base: strongest predictor; higher baseline rate -> higher rate throughout
visit: small decline per visit

Effect of Baseline Seizure Rate

log_base coefficient: \(\hat\beta =\) 1.223

For each doubling of baseline rate: \[\exp(\hat\beta \times \log 2) = \exp(0.848) = 2.33\]

Patients with higher baseline rates continue to have higher rates on treatment.

Predicted Rates

# Predictions for average patient
newdata <- expand.grid(
  trt = levels(epi$trt),
  log_base = log(median(epi_raw$base) / 4),
  log_age = log(median(epi_raw$age)),
  visit = 1:4
)

# Use the model matrix to show the linear-predictor mechanics
# (predict(gee_exch, newdata, type = "response") would also work)
X <- model.matrix(~ trt + log_base + log_age + visit, data = newdata)
newdata$pred_rate <- exp(X %*% coef(gee_exch))

ggplot(newdata, aes(visit, pred_rate, color = trt)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 3) +
  labs(
    title = "Predicted Seizure Rates (Median Patient)",
    x = "Visit",
    y = "Predicted Seizures per 2 Weeks",
    color = "Treatment"
  ) +
  theme_minimal()

Predicted Rates

GEE Extensions: Overdispersion

For count data, we may have overdispersion (\(\phi > 1\)):

# Estimate dispersion
pearson_stat <- sum(residuals(gee_exch, type = "pearson")^2)
df_resid <- nrow(epi) - length(coef(gee_exch))
phi_hat <- pearson_stat / df_resid

cat("Estimated dispersion:", round(phi_hat, 2), "\n")

GEE Extensions: Overdispersion

Estimated dispersion: 4.83

cat("Evidence of overdispersion:", ifelse(phi_hat > 1.5, "Yes", "Moderate/No"))

Evidence of overdispersion: Yes

Handling Overdispersion

Options for overdispersed count data:

Scale parameter: Multiply variance by \(\hat\phi\)
Negative binomial: Different variance function
Robust SEs: Already account for variance misspecification

GEE with robust SEs handles overdispersion automatically.

Summary

Key Takeaways

Topic	Main Point
GEE estimation	Consistent \(\hat\beta\) even with wrong correlation
Working correlation	Affects efficiency, not consistency
Robust SEs	Default; protects against misspecification
Small samples	Use bias corrections for \(N < 40\)
Diagnostics	Residuals, QIC/QICu, sensitivity analysis
Missing data	Standard GEE requires MCAR; IPW-GEE valid under MAR
vs. Mixed models	Different interpretations for non-linear links; marginal effects attenuated

GEE Checklist for Practice

Choose appropriate link and variance function
Select reasonable working correlation (start exchangeable/AR1)
Always report robust SEs (default in geepack)
Check QIC to compare correlation structures
Examine residuals and influential clusters
Conduct sensitivity analyses for outliers
Consider small-sample corrections if \(N < 40\)
If dropout > 10% and MAR plausible, use IPW-GEE

Common Mistakes in GEE

Mistake	Why It Is Wrong	Correct Approach
Using standard GEE with MAR dropout	Standard GEE only valid for MCAR	Use IPW-GEE or likelihood-based methods
Ignoring small-sample issues	Sandwich SEs biased downward for N < 40	Use bias-corrected SEs (KC, FG, MD)
Using model-based SEs as default	Only valid if correlation is correct	Report robust (sandwich) SEs
Comparing QIC across different mean models	QIC compares correlation structures	Use QICu for mean model comparison
Forgetting to check residuals	May miss mean model misspecification	Always examine residual plots
Treating GEE \(\beta\) as subject-specific	GEE gives population-average effects	Use GLMM for conditional effects
Using unstructured correlation with few subjects	Too many parameters to estimate	Use simpler structures (exchangeable, AR1)

GEE Recipe Card: A Step-by-Step Workflow

GEE Analysis in 8 Steps

Step 1: Specify the outcome distribution

Binary: family = binomial(link = "logit")
Count: family = poisson(link = "log")
Continuous: family = gaussian(link = "identity")

Step 2: Start with a simple working correlation

corstr = "exchangeable" for general clustering
corstr = "ar1" for time-ordered data

Step 3: Fit the model

fit <- geeglm(y ~ x1 + x2 + time, id = subject_id,
              data = mydata, family = ..., corstr = "exchangeable")

Step 4: Check diagnostics

Residuals vs fitted
Residuals by time
Cluster-level residuals for outliers

Step 5: Compare correlation structures via QIC

QIC(fit_exch); QIC(fit_ar1); QIC(fit_unstr)

Step 6: Assess standard errors

Always report robust SEs (default in geepack)
If N < 40, consider bias-corrected SEs or bootstrap

Step 7: Handle missing data

If MCAR: standard GEE is valid
If MAR: use IPW-GEE with weights

Step 8: Report results

Exponentiate coefficients for OR/RR interpretation
Include 95% CI from robust SEs
Note working correlation and QIC

The Same GEE Fit in SAS

For SAS users, the exchangeable Poisson GEE is PROC GENMOD with a REPEATED statement (shown for reference, not run):

proc genmod data=epi;
   class id trt;
   model seizures = trt log_base log_age visit / dist=poisson link=log;
   repeated subject=id / type=exch corrw modelse;
run;

class id trt; declares the cluster id and the categorical treatment
dist=poisson link=log; is the marginal Poisson-log mean model
repeated subject=id / type=exch requests the exchangeable working correlation (use type=ar(1) or type=un for the others)
corrw prints the estimated working correlation; modelse adds the model-based SEs alongside the default empirical (sandwich) SEs

For Next Time

Chapter 14: Transition models
Modeling dependence on previous outcomes
Comparison: marginal vs transition interpretations

Reading: FLW Chapter 14

References

Fitzmaurice, Laird, Ware (2011). Applied Longitudinal Analysis. Ch. 13 (GEE, pp. 354-393); IPW-GEE developed in Ch. 17-18.
Liang, Zeger (1986). Longitudinal data analysis using generalized linear models. Biometrika.
Pan (2001). Akaike’s information criterion in generalized estimating equations. Biometrics. (QIC; beyond Ch. 13)
Kauermann, Carroll (2001); Fay, Graubard (2001); Mancl, DeRouen (2001). Small-sample sandwich corrections. JASA / Biometrics. (beyond Ch. 13)
Robins, Rotnitzky, Zhao (1995). Semiparametric regression with missing data. JASA. (IPW)

Appendix A: GEE Derivation and Fisher Scoring

The generalized estimating equation:

\[ U(\boldsymbol\beta) = \sum_{i=1}^{N} \mathbf{D}_i^\top \mathbf{V}_i^{-1}(\mathbf{Y}_i - \boldsymbol\mu_i) = 0 \]

is solved by an iterative update (Fisher scoring, using the expected information):

\[ \hat{\boldsymbol\beta}^{(t+1)} = \hat{\boldsymbol\beta}^{(t)} + \left(\sum_i \mathbf{D}_i^\top \mathbf{V}_i^{-1} \mathbf{D}_i\right)^{-1} \sum_i \mathbf{D}_i^\top \mathbf{V}_i^{-1}(\mathbf{Y}_i - \boldsymbol\mu_i^{(t)}) \]

Fisher scoring vs Newton-Raphson: Fisher scoring uses the expected information (model-based, more stable); Newton-Raphson uses the observed information (data-based). They coincide for GLMs with a canonical link in the IRLS sense, but that exact equivalence does not carry over to GEE with a non-identity working correlation. The update above is iteratively reweighted least squares with the current weights.

Appendix B: Sandwich Variance

The robust variance estimator:

\[ \widehat{\operatorname{Var}}(\hat{\boldsymbol\beta}) = \mathbf{B}^{-1} \mathbf{M} \mathbf{B}^{-1} \]

where (bread):

\[ \mathbf{B} = \sum_i \mathbf{D}_i^\top \mathbf{V}_i^{-1} \mathbf{D}_i \]

and (meat):

\[ \mathbf{M} = \sum_i \mathbf{D}_i^\top \mathbf{V}_i^{-1} (\mathbf{Y}_i - \hat{\boldsymbol\mu}_i)(\mathbf{Y}_i - \hat{\boldsymbol\mu}_i)^\top \mathbf{V}_i^{-1} \mathbf{D}_i \]

Appendix C: QIC Formula (Pan 2001; beyond Ch. 13)

\[ \operatorname{QIC}(R) = -2Q(\hat{\boldsymbol\beta}_R; I) + 2\operatorname{trace}(\hat{\boldsymbol\Omega}_I^{-1} \hat{\mathbf{V}}_R) \]

For comparing mean models with the same correlation:

\[ \operatorname{QICu} = -2Q(\hat{\boldsymbol\beta}; I) + 2p \]

\(Q(\hat{\boldsymbol\beta}; I)\) is the quasi-likelihood evaluated under the independence working model
\(\hat{\boldsymbol\Omega}_I\) is the independence-model (naive) variance
Smaller QIC indicates a better model