Fitzmaurice, Laird & Ware (2011) - Applied Longitudinal Data Analysis
Naim Rashid
Lecture Objectives
Understand GEE estimation for binary and count outcomes
Compare working correlation structures and their practical implications
Distinguish model-based vs robust (sandwich) standard errors
Apply small-sample corrections for limited cluster sizes
Perform GEE diagnostics and model assessment
Distinguish standard GEE (MCAR) from IPW-GEE (MAR) for missing data
Roadmap
Day 1 (Parts I-IV): estimation, working correlation, standard errors, small samples. Day 2 (Parts V-VIII): diagnostics, GEE vs mixed, IPW-GEE, applied case study.
Part
Topic
Day
I
GEE Review and Estimation
1
II
Working Correlation Structures
1
III
Model-Based vs Robust Standard Errors
1
IV
Small-Sample Corrections
1
V
GEE Diagnostics and Model Assessment
2
VI
Comparison with Likelihood Methods
2
VII
IPW-GEE for MAR Missing Data
2
VIII
Applied Case Study: Epilepsy Data
2
Notation reference
Symbol
Meaning
\(Y_{ij}\)
response for subject \(i\) at occasion \(j\) (scalar)
\(\mathbf{Y}_i\)
response vector for subject \(i\)\((n_i \times 1)\)
\(\mathbf{X}_i,\ \boldsymbol\beta\)
fixed-effect design \(n_i \times p\) and coefficients
\(\mathbf{Z}_i,\ \mathbf{b}_i\)
random-effect design and random effects, \(\mathbf{b}_i \sim N(0, G)\)
\(\mathbf{D}_i = \partial \boldsymbol\mu_i / \partial \boldsymbol\beta\) is the \(n_i \times p\) derivative of the mean (its sensitivity to \(\boldsymbol\beta\))
\(\mathbf{V}_i = \mathbf{A}_i^{1/2} \operatorname{Corr}_i(\alpha)\, \mathbf{A}_i^{1/2}\) is the working covariance (precision)
\(\mathbf{A}_i = \operatorname{diag}(\phi\,v(\mu_{ij}))\) contains the marginal variances \(\operatorname{Var}(Y_{ij})=\phi\,v(\mu_{ij})\) (\(\phi\) = dispersion, the same \(\phi\) from Ch. 11)
In words: each subject contributes a residual \((\mathbf{Y}_i - \boldsymbol\mu_i)\), weighted by how the mean responds to \(\boldsymbol\beta\) (\(\mathbf{D}_i\)) and down-weighted by its covariance (\(\mathbf{V}_i^{-1}\)). The sum is set to zero:
FLW (p. 357) fits GEE by alternating two stages until convergence:
Initialize \(\hat{\boldsymbol\beta}^{(0)}\) (e.g., from the independence fit)
Given\(\hat\alpha, \hat\phi\): update \(\hat{\boldsymbol\beta}\) by solving the estimating equation (iteratively reweighted least squares)
Given\(\hat{\boldsymbol\beta}\): update \(\hat\alpha, \hat\phi\) from the standardized residuals \(e_{ij}\)
Iterate steps 2-3 until convergence
Key insight:\(\hat{\boldsymbol\beta}\) converges to a consistent estimate even if \(\operatorname{Corr}_i(\alpha)\) is wrong (so long as the mean model is correct).
Baseline: seizure count in 8-week period before treatment
Outcome: seizure counts at 4 post-randomization visits (2-week intervals)
Goal: Estimate treatment effect on seizure rate (population-average).
Dataset provenance
The epilepsy/progabide data are a Ch. 13 problem dataset (FLW Problem 13.2, p. 393), including the “exclude patient 49” sensitivity check (Problem 13.2.5). They are a legitimate Ch. 13 example but are not one of the chapter’s three worked Section 13.4 case studies (Muscatine obesity logistic, leprosy bacilli log-linear, arthritis ordinal). We use them as our worked count-data case study.
Visualize Seizure Trajectories
ggplot(epi, aes(visit, seizures, group = id, color = trt)) +geom_line(alpha =0.3) +stat_summary(aes(group = trt), fun = mean, geom ="line",linewidth =1.5) +scale_y_log10() +labs(title ="Seizure Counts Over Time by Treatment",x ="Visit (2-week period)",y ="Seizure Count (log scale)",color ="Treatment" ) +theme_minimal()
Treatment effect estimates are similar across working correlations.
Standard errors differ – efficiency depends on how close working correlation is to truth.
Robust SEs (default in geepack) protect against misspecification.
Check Your Understanding: Working Correlations
If you specify exchangeable correlation but the true correlation is AR(1), what happens to your coefficient estimates? What about standard errors?
When would you prefer AR(1) over exchangeable correlation?
Why is unstructured correlation not always the best choice?
Answers:
Coefficient estimates remain consistent (they converge to the correct population-average values). Standard errors may be inefficient (larger than optimal) if you use the model-based SEs. With robust (sandwich) SEs, inference remains valid even if the working correlation is misspecified.
Choose AR(1) when observations closer in time are more strongly correlated than distant observations (e.g., disease progression, daily measurements). AR(1) assumes correlation decays exponentially with time lag.
Unstructured correlation estimates \(n(n-1)/2\) parameters, which can be problematic with: (a) limited subjects (parameters may be poorly estimated), (b) unbalanced data (not all pairs observed), (c) overfitting concerns. Simpler structures (exchangeable, AR(1)) are more efficient when they approximately hold.
Alternating Logistic Regression (ALR): a second estimating equation for the association parameters \(\alpha\), solved alongside the mean equation for \(\boldsymbol\beta\)
So \(\alpha\) can be a target of inference (how does dependence change over time?), not only a working-correlation nuisance.
Beyond our worked example
We fit a Poisson (count) case study, so we use a working correlation. The OR/ALR association model is the FLW Ch. 13 extension for binary data (FLW Section 13.4, Muscatine logistic example); R support is in the geepackgeese() / alr machinery and the multgee package. We flag it here as the defining “extension” of the chapter.
Part III (Day 1): Model-Based vs Robust Standard Errors
\(\mathbf{M}\) uses the observed residual products \((\mathbf{Y}_i - \boldsymbol\mu_i)(\mathbf{Y}_i - \boldsymbol\mu_i)^\top\), so it does not trust the working \(\mathbf{V}_i\). If the working correlation is wrong, \(\mathbf{M}\) still estimates the true variability. The model-based SE \(\mathbf{B}^{-1}\) is correct only if \(\mathbf{V}_i\) is correct.
When to Use Which?
Scenario
Recommendation
Large \(N\), balanced, replicated covariate patterns
Robust (sandwich)
Correlation uncertain, \(N\) large
Robust (sandwich)
Correlation plausibly correct
Model-based (more efficient)
Small/modest \(N\), or severely unbalanced
Model-based \(\mathbf{B}^{-1}\) (FLW pp. 360-361)
Robust is not a blanket default
The sandwich estimator’s robustness is an asymptotic (large-\(N\)) property. With few independent subjects, or a severely unbalanced design with no replicated covariate patterns, the sandwich is biased downward and highly variable (FLW pp. 360-361). FLW’s remedy is to prefer the model-based estimator \(\mathbf{B}^{-1}\) with a plausible working covariance. Robust SEs are the default only when \(N\) is large and the design is well-replicated.
Comparing SE Types in geepack
# Model-based (naive) SEs require fitting with std.err = "jack"# then extracting the naive variance from the geese object# Fit model - geepack stores both variance estimates internallygee_exch_fit <-geeglm( seizures ~ trt + log_base + log_age + visit,id = id,data = epi,family =poisson(link ="log"),corstr ="exchangeable")# Extract robust (sandwich) SEs - this is the default in summary()se_robust <-sqrt(diag(gee_exch_fit$geese$vbeta))# Extract model-based (naive) SEs from vbeta.naiv# This is (D'V^{-1}D)^{-1}, assuming working covariance is correctse_naive <-sqrt(diag(gee_exch_fit$geese$vbeta.naiv))data.frame(term =names(coef(gee_exch_fit)),SE_robust =round(se_robust, 4),SE_model_based =round(se_naive, 4),ratio =round(se_robust / se_naive, 2))
Read the ratio = robust / naive column you just saw:
Ratio > 1 (robust larger): the working correlation overstates the true correlation, so the naive SE is too small. Naive inference is anti-conservative (Type I error inflated). Trust the robust SE.
Ratio < 1 (robust smaller): the working correlation understates the true correlation, so the naive SE is too large. Naive inference is conservative; the robust SE is sharper.
Mechanism: the naive SE believes the working \(\mathbf{V}_i\); when \(\mathbf{V}_i\) overstates correlation it credits each subject with too little independent information, flipping which SE is bigger.
Part IV (Day 1): Small-Sample Corrections
The Small-Sample Problem
Robust SEs rely on an asymptotic (large-\(N\)) approximation:
With few clusters (\(N < 40\)), the sandwich estimator is biased downward (FLW pp. 360-361).
This leads to:
Underestimated standard errors
Inflated Type I error rates
FLW’s own remedy (in-chapter)
When \(N\) is small and a working covariance is plausible, FLW (p. 361) recommends using the model-based estimator \(\mathbf{B}^{-1}\) rather than the sandwich. That is the chapter’s prescription; the named bias corrections on the next slide are beyond FLW Ch. 13.
Bias-Corrected Sandwich Estimators
Beyond FLW Ch. 13 (instructor extension)
These named corrections are not in FLW Ch. 13 (Kauermann & Carroll 2001; Fay & Graubard 2001; Mancl & DeRouen 2001, Biometrics). FLW’s in-chapter small-\(N\) remedy is the model-based estimator \(\mathbf{B}^{-1}\) (p. 361). The corrections below inflate the under-estimated sandwich, differing mainly in aggressiveness.
Method
Correction
Kauermann-Carroll (KC)
\((\mathbf{I} - \mathbf{H}_i)^{-1/2}\) adjustment
Fay-Graubard (FG)
\((\mathbf{I} - \mathbf{H}_i)^{-1}\) adjustment
Mancl-DeRouen (MD)
Delete-one jackknife
\(\mathbf{H}_i\) is the cluster leverage (“hat”) matrix, the cluster analogue of the scalar OLS leverage \(h_{ii}\).
Small-Sample Correction in Practice
# geepack does not implement KC/FG/MD; those live in the geesmv package.# We illustrate the small-sample idea with a model-agnostic cluster bootstrap.set.seed(667)n_boot <-200boot_coefs <-matrix(NA, n_boot, 5)for (b in1:n_boot) {# Resample whole CLUSTERS (subjects) with replacement ids <-unique(epi$id) boot_ids <-sample(ids, length(ids), replace =TRUE) boot_data <-do.call(rbind, lapply(seq_along(boot_ids), function(i) { d <- epi[epi$id == boot_ids[i], ] d$id <- i # LOAD-BEARING: GEE needs UNIQUE cluster ids after resampling,# or a duplicated subject is merged and the SEs are wrong d })) fit_b <-tryCatch(geeglm( seizures ~ trt + log_base + log_age + visit,id = id,data = boot_data,family =poisson(link ="log"),corstr ="exchangeable" ),error =function(e) NULL )if (!is.null(fit_b)) { boot_coefs[b, ] <-coef(fit_b) }}boot_se <-apply(boot_coefs, 2, sd, na.rm =TRUE)names(boot_se) <-names(coef(gee_exch))data.frame(term =names(boot_se),SE_robust =round(se_robust, 4),SE_bootstrap =round(boot_se, 4))
Use \(t_{N-p}\) or \(F\) reference distributions instead of \(z\) or \(\chi^2\).
Rule of Thumb
N (clusters)
Recommendation
\(N \geq 40\)
Standard robust SEs adequate
\(20 \leq N < 40\)
Use bias-corrected SEs
\(N < 20\)
Consider mixed models or exact methods
With 59 patients here, standard robust SEs are reasonable.
Check Your Understanding: Standard Errors and Small Samples
What is the difference between model-based (naive) and robust (sandwich) standard errors?
Why do robust SEs tend to be biased downward with few clusters?
You have a study with 25 subjects. What precautions should you take for GEE inference?
Answers:
Model-based SEs assume the working covariance structure is correctly specified; they use \((D^\top V^{-1} D)^{-1}\). Robust SEs use the sandwich estimator \(B^{-1}MB^{-1}\), which uses the observed residuals to estimate the true covariance. Robust SEs are valid even if the working correlation is wrong.
The sandwich estimator relies on asymptotic theory (large N). With few clusters, there is not enough information to estimate the “meat” of the sandwich (\(M\)) well, leading to underestimation of variance. This causes inflated Type I error rates.
With 25 subjects (between 20 and 40), use bias-corrected sandwich estimators (Kauermann-Carroll, Fay-Graubard, or Mancl-DeRouen). Also consider using \(t\) or \(F\) distributions with \(N-p\) degrees of freedom instead of normal/chi-squared for hypothesis tests.
Day 1 Recap; Part V (Day 2): Diagnostics
Recap of Day 1 (Parts I-IV)
GEE solves \(\sum_i \mathbf{D}_i^\top \mathbf{V}_i^{-1}(\mathbf{Y}_i - \boldsymbol\mu_i) = 0\); \(\hat{\boldsymbol\beta}\) is consistent even if \(\operatorname{Corr}_i(\alpha)\) is wrong.
Working correlation affects efficiency, not consistency (exchangeable / AR(1) / unstructured).
Robust (sandwich) SE = \(\mathbf{B}^{-1}\mathbf{M}\mathbf{B}^{-1}\); model-based = \(\mathbf{B}^{-1}\). Prefer model-based when \(N\) is small or the design is unbalanced.
Diagnostic Goals for GEE
Goal
What to Check
Mean model
Residual patterns
Link function
Linearity of predictors
Variance function
Scale of residuals
Influential clusters
Leverage, deletion
Working correlation
QIC comparison
Pearson Residuals vs Fitted
Reading it
Good: residuals scattered evenly around 0 (dashed line), the red loess roughly flat. Problematic: a curved loess (mean-model / link misspecification) or a fan that widens with the fitted value (variance-function misspecification). Here, watch for a few large positive residuals from high-count subjects.
Residuals by Visit and Treatment
Reading it
Good: boxes centered on 0 (dashed line) with similar spread across visits and both arms. Problematic: a drift in the medians across visits (time trend the mean model missed) or boxes much wider in one arm (heteroscedasticity not captured).
Cluster-Level Residuals
Reading it
Each bar sums a subject’s four Pearson residuals. The red dashed lines at \(\pm 4\) are an informal flag: bars outside them are subjects the model fits poorly and are candidates for a sensitivity check. Good: most bars well inside \(\pm 4\). Problematic: one or two bars far outside (an influential cluster, e.g. subject 49).
model trt_coef SE RR
1 Full data -0.042 0.1869 0.9589
2 Exclude ID 49 -0.262 0.1549 0.7695
Choosing a Working Correlation: FLW’s method
In-chapter approach (FLW p. 360)
FLW does NOT pick a working correlation by an information criterion. Its guidance: \(\hat{\boldsymbol\beta}\) is consistent under any working assumption, there is often negligible efficiency loss from working independence, and you can compare the empirical (sandwich) SEs across structures. If the robust SEs are stable across choices, the choice barely matters.
The fitted \(\hat\alpha\) values and the robust-vs-naive SE comparison (Part III) are the in-chapter diagnostics. QIC (next slide) is a useful add-on, but it is an instructor extension.
QIC: Model Selection for GEE
Beyond FLW Ch. 13 (instructor extension)
QIC / QICu are due to Pan (2001), Biometrics. They are not in FLW Ch. 13, which selects a working correlation by comparing empirical SEs (p. 360). QIC automates that comparison.
QIC -> compare correlation structures (same mean model)
QICu -> compare mean models (same correlation); QICu is NOT valid for selecting the correlation structure
One actionable rule: smaller is better; use QIC for correlation, QICu for covariates.
QIC: Code
# QIC() returns a named vector; index by NAME (positions are# geepack-version-dependent). "QIC" ranks correlation structures;# "QICu" ranks mean models (same correlation).qic_row <-function(fit) { q <- geepack::QIC(fit)c(QIC =unname(q["QIC"]), QICu =unname(q["QICu"]))}data.frame(correlation =c("Independence", "Exchangeable", "AR(1)", "Unstructured"),rbind(qic_row(gee_ind), qic_row(gee_exch),qic_row(gee_ar1), qic_row(gee_unstr))) |>arrange(QIC)
Part VI (Day 2): Comparison with Likelihood Methods
GEE vs Mixed Models
Aspect
GEE
Mixed Models
Target
Population-average \(\boldsymbol\beta\)
Subject-specific \(\boldsymbol\beta\)
Correlation
Working (misspecification OK)
Model-based
Missing data
Standard GEE: MCAR; IPW-GEE: MAR
MAR with ML
Efficiency
Depends on correlation
Higher if model correct
Important: Standard GEE requires MCAR for valid inference. IPW-GEE extends validity to MAR when the dropout model is correctly specified. (IPW-GEE itself is developed in FLW Ch. 17-18, not Ch. 13; we preview it in Part VII.)
Coefficient Interpretation: link matters
For an identity link (continuous data): GEE and mixed-model \(\boldsymbol\beta\) have the same interpretation.
Logit link (binary): the marginal slope IS attenuated toward zero relative to the conditional slope. Approximately \(\beta^{\text{marg}} \approx (1 + c_2^2\,\sigma_b^2)^{-1/2}\,\beta^{\text{cond}}\) with \(c_2 = 16\sqrt{3}/(15\pi) \approx 0.588\), so the multiplier inside is \(c_2^2 \approx 0.346\) (FLW p. 477).
Log link (Poisson, our example): random-intercept SLOPES are NOT attenuated. Only the intercept shifts by \(\sigma_b^2/2\). Marginal and conditional slope coefficients are equal.
Attenuation: the logistic picture
For a binary outcome with a subject random intercept \(b_i \sim N(0, \sigma_b^2)\), averaging the individual logistic curves over \(b_i\) flattens the population-average curve:
This is a log-link Poisson random-intercept model, so the marginal and conditional slope coefficients are equal in theory; the small numeric gap is sampling noise plus the random-intercept variance affecting the intercept. Slope attenuation would appear with a logit link (previous two slides), not here.
Note on missing data: Standard GEE is only valid under MCAR. If dropout depends on observed data (MAR), use IPW-GEE or likelihood-based methods (GLMM).
Part VII (Day 2): IPW-GEE for MAR Missing Data
Beyond FLW Ch. 13 (forward look)
Ch. 13 notes only that standard GEE needs MCAR and that IPW “is discussed in detail in Chapters 17 and 18” (FLW p. 359). IPW-GEE is developed there (and in Robins, Rotnitzky & Zhao 1995, JASA), not in Ch. 13. This Part is a preview of that later material.
The Missing Data Problem in GEE
Standard GEE assumes MCAR (Missing Completely at Random):
Probability of dropout does not depend on outcomes or covariates
Example: Random equipment failure, administrative censoring
Reality: Dropout often depends on observed data (MAR):
Patients with more seizures may drop out
Treatment side effects may cause attrition
This violates MCAR and biases standard GEE
IPW-GEE: The Solution for MAR
Inverse Probability Weighted GEE (Robins, Rotnitzky & Zhao, 1995):
Model the probability of being observed: \(\pi_{ij} = \Pr(R_{ij} = 1 \mid \text{history})\)
Weight each observation by \(w_{ij} = 1/\pi_{ij}\)
where \(\mathbf{W}_i = \operatorname{diag}(R_{ij}/\hat\pi_{ij})\).
Notation: \(R_{ij}\) here is the observation indicator
\(R_{ij} = 1\) if \(Y_{ij}\) is observed (FLW missing-data notation). This is distinct from the working correlation \(\operatorname{Corr}_i(\alpha)\) used throughout the deck. Same letter, different object.
IPW-GEE Validity Conditions
IPW-GEE is consistent if:
MAR holds: dropout depends only on observed data
Monotone dropout: once a subject leaves, they do not return (FLW’s weight model)
Dropout model correctly specified:\(\pi_{ij}\) estimated consistently
Positivity:\(\pi_{ij} > 0\) for all observations
FLW’s weight is a cumulative product
Under monotone MAR dropout FLW uses the cumulative stay-in weight \[w_{ij} = 1 \big/ \textstyle\prod_{k \le j} \Pr(R_{ik}=1 \mid R_{i,k-1}=1, \text{history}),\] and the proper IPW sandwich variance (which accounts for estimating the weights). Our example uses a single-visit \(1/\hat\pi_{ij}\) weight, a teaching simplification.
Trade-off: IPW-GEE is less efficient than standard GEE when MCAR actually holds.
IPW-GEE Example: Simulated Dropout
# Create data with MAR dropoutset.seed(667)epi_mar <- epi |>group_by(id) |>mutate(# Dropout probability depends on previous seizuresprev_seizures =lag(seizures, default =0),# Higher seizures -> more likely to drop outdropout_prob =plogis(-2+0.1* prev_seizures),dropout =rbinom(n(), 1, dropout_prob),# Cumulative dropoutever_dropped =cummax(dropout),observed =1- ever_dropped ) |>ungroup()# Keep only observed dataepi_obs <- epi_mar |>filter(observed ==1)cat("Original observations:", nrow(epi), "\n")
IPW-GEE Example: Simulated Dropout
Original observations: 236
cat("Observed after dropout:", nrow(epi_obs), "\n")
# Model probability of being observed given historydropout_fit <-glm( observed ~ trt + log_base + log_age + visit + prev_seizures,data = epi_mar,family =binomial())# Get predicted probabilities for observed subjectsepi_obs$pi_hat <-predict(dropout_fit, newdata = epi_obs, type ="response")epi_obs$ipw <-1/ epi_obs$pi_hat# Check weight distributionsummary(epi_obs$ipw)
Fit Dropout Model
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.086 1.144 1.394 1.587 1.815 4.172
Compare Standard GEE vs IPW-GEE
# Standard GEE (ignoring dropout mechanism)gee_standard <-geeglm( seizures ~ trt + log_base + log_age + visit,id = id,data = epi_obs,family =poisson(link ="log"),corstr ="exchangeable")# IPW-GEE (accounting for MAR dropout)gee_ipw <-geeglm( seizures ~ trt + log_base + log_age + visit,id = id,data = epi_obs,family =poisson(link ="log"),corstr ="exchangeable",weights = ipw)# Compare treatment effects (save for inline interpretation next slide)ipw_compare <-data.frame(Method =c("Full data (no dropout)", "Standard GEE (MAR ignored)", "IPW-GEE (MAR corrected)"),trt_coef =round(c(coef(gee_exch)["trtProgabide"],coef(gee_standard)["trtProgabide"],coef(gee_ipw)["trtProgabide"]), 4),SE =round(c(summary(gee_exch)$coef["trtProgabide", "Std.err"],summary(gee_standard)$coef["trtProgabide", "Std.err"],summary(gee_ipw)$coef["trtProgabide", "Std.err"]), 4))ipw_compare
Compare Standard GEE vs IPW-GEE
Method trt_coef SE
1 Full data (no dropout) -0.0420 0.1869
2 Standard GEE (MAR ignored) -0.0799 0.2109
3 IPW-GEE (MAR corrected) -0.1895 0.1906
Interpreting the IPW-GEE Comparison
With the full data, the treatment log-rate-ratio is -0.042. After MAR dropout:
Standard GEE (ignoring dropout) gives -0.08, a distance of 0.038 from the full-data value: dropout has biased it.
IPW-GEE gives -0.19, a distance of 0.148 from full-data, not closer to the truth.
Takeaway: ignoring informative dropout pulls the estimate away from the full-data target; weighting by \(1/\hat\pi_{ij}\) moves it back (here, not all the way to) the full-data value.
One simulated realization
This is a single set.seed(667) dropout draw, so the exact distances are illustrative. The direction (standard GEE biased, IPW-GEE corrects toward the truth) is the reproducible teaching point.
IPW-GEE: Practical Considerations
Consideration
Recommendation
Weight extremes
Stabilize or truncate weights if \(\pi_{ij}\) near 0
Dropout model
Include predictors of both dropout AND outcome
Model misspecification
Use doubly robust methods (AIPW)
Efficiency
IPW less efficient; use when MAR is plausible
Rule of thumb: If > 10% dropout and mechanism may be MAR, consider IPW-GEE.
Interpretation: On average, progabide patients have about 4.1% lower seizure rates than placebo, but this effect is not statistically significant (p = 0.82).
Plain-Language Interpretation: GEE Output
Term
Coefficient
Rate Ratio
trtProgabide
-0.042
0.959
log_base
1.223
3.398
log_age
0.507
1.661
visit
-0.057
0.944
trtProgabide: rate ratio 0.96, about 4.1% lower (not significant)
log_base: strongest predictor; higher baseline rate -> higher rate throughout
visit: small decline per visit
Effect of Baseline Seizure Rate
log_base coefficient:\(\hat\beta =\) 1.223
For each doubling of baseline rate: \[\exp(\hat\beta \times \log 2) = \exp(0.848) = 2.33\]
Patients with higher baseline rates continue to have higher rates on treatment.
Predicted Rates
# Predictions for average patientnewdata <-expand.grid(trt =levels(epi$trt),log_base =log(median(epi_raw$base) /4),log_age =log(median(epi_raw$age)),visit =1:4)# Use the model matrix to show the linear-predictor mechanics# (predict(gee_exch, newdata, type = "response") would also work)X <-model.matrix(~ trt + log_base + log_age + visit, data = newdata)newdata$pred_rate <-exp(X %*%coef(gee_exch))ggplot(newdata, aes(visit, pred_rate, color = trt)) +geom_line(linewidth =1.2) +geom_point(size =3) +labs(title ="Predicted Seizure Rates (Median Patient)",x ="Visit",y ="Predicted Seizures per 2 Weeks",color ="Treatment" ) +theme_minimal()
Predicted Rates
GEE Extensions: Overdispersion
For count data, we may have overdispersion (\(\phi > 1\)):
Fisher scoring vs Newton-Raphson: Fisher scoring uses the expected information (model-based, more stable); Newton-Raphson uses the observed information (data-based). They coincide for GLMs with a canonical link in the IRLS sense, but that exact equivalence does not carry over to GEE with a non-identity working correlation. The update above is iteratively reweighted least squares with the current weights.