Fitzmaurice, Laird & Ware (2011) - Applied Longitudinal Data Analysis
Naim Rashid
Lecture Objectives
Understand transition models as an alternative to marginal and mixed models
Define Markov models and their application to longitudinal data
Fit first-order transition models for binary outcomes
Interpret regression coefficients conditional on previous responses
Compare transition, marginal, and mixed model interpretations
Provenance: instructor extension
Transition (Markov) models are an instructor extension, FLW 1st-ed (2004) Ch. 10. The 2nd-ed (2011) text has no dedicated transition-models chapter; the marginal/mixed/transition trichotomy is framed in FLW 2nd-ed Ch. 12 (Marginal Models). We include transition models here to complete that trichotomy.
Roadmap
Part
Topic
I
Introduction: Three Model Families
II
Markov Models and Transition Probabilities
III
First-Order Transition Models
IV
Estimation and Inference
V
Extensions and Higher-Order Models
VI
Comparison with Marginal and Mixed Models
Part I: Introduction
Three Approaches to Longitudinal Data
Model Family
Conditions On
\(\beta\) Interpretation
Marginal
Covariates only
Population-average
Mixed
Random effects \(b_i\)
Subject-specific
Transition
Previous responses \(Y_{i,j-1}\)
Conditional on past
All three can be valid for the same data; the choice depends on the scientific question.
When to Use Transition Models
Transition models are natural when:
The previous outcome directly influences the current outcome
Interest lies in state changes over time
Data represent a stochastic process (e.g., disease progression)
Consider a longitudinal study of depression where patients are assessed monthly.
Question: Does current depression status depend on:
Covariates (treatment, age)?
Depression status at the previous visit?
A transition model addresses both simultaneously.
Check Your Understanding: Part I
Quick Self-Check
Model Selection: A researcher wants to estimate the population-average effect of a new drug on blood pressure. Should they use a marginal, mixed, or transition model?
Research Question Match: You want to know: “Given that a patient was depressed last month, what is the probability they are depressed this month?” Which model type answers this question?
When NOT to Use Transition Models: A study measures height in children annually. Would a transition model be appropriate? Why or why not?
Answers
Marginal model (GEE) - The research question is about population-average effects, not conditional on individual history or subject-specific effects.
Transition model - This question explicitly conditions on the previous state (“given that a patient was depressed last month”), which is exactly what transition models address.
No - Height is a continuous, monotonically increasing outcome where the previous value does not meaningfully “cause” the current value in a stochastic sense. A mixed model capturing individual growth trajectories would be more appropriate. Transition models are best for discrete states or outcomes where state dependence is meaningful (e.g., relapse/remission).
Part II: Markov Models
Markov Processes
A Markov chain is a stochastic process where the future state depends only on the current state, not the full history.
suppressPackageStartupMessages({library(ggplot2)library(dplyr)library(tidyr)})trans_df <-as.data.frame(P) |> tibble::rownames_to_column("From") |>pivot_longer(-From, names_to ="To", values_to ="Probability")ggplot(trans_df, aes(To, From, fill = Probability)) +geom_tile(color ="white", linewidth =1) +geom_text(aes(label =round(Probability, 2)), size =6) +scale_fill_gradient(low ="white", high ="steelblue") +labs(title ="Transition Probability Matrix",x ="To State", y ="From State") +theme_minimal(base_size =14) +theme(legend.position ="none")
Visualizing Transitions
Stationary Distribution
If transitions are time-homogeneous, the stationary distribution\(\pi^*\) satisfies:
\[
\pi^* = \pi^* \mathbf{P}
\]
This gives the long-run proportion in each state.
# Solve for stationary distribution# pi = pi * P means (I - P')pi = 0# Add constraint: sum(pi) = 1A <-rbind(t(diag(2) - P), c(1, 1))b <-c(0, 0, 1)pi_star <-qr.solve(A, b)names(pi_star) <-c("Healthy", "Sick")round(pi_star, 3)
Stationary Distribution
Healthy Sick
0.667 0.333
Check Your Understanding: Part II
Quick Self-Check
Transition Matrix: Given a transition matrix where P(Sick|Healthy) = 0.2 and P(Sick|Sick) = 0.6, what is P(Healthy|Sick)?
Markov Property: Patient A has been healthy for 5 visits. Patient B has been healthy for 1 visit. According to the first-order Markov property, how do their probabilities of being sick next visit compare?
Stationary Distribution: If a disease has P(Disease|Healthy) = 0.1 and P(Disease|Disease) = 0.9, will the long-run prevalence be closer to 10% or 50%?
Answers
P(Healthy|Sick) = 0.4 - Each row of the transition matrix must sum to 1. If P(Sick|Sick) = 0.6, then P(Healthy|Sick) = 1 - 0.6 = 0.4.
They are the same! The first-order Markov property says the future depends ONLY on the current state, not the history. Both patients are currently Healthy, so both have the same transition probability to Sick.
Closer to 50% - Solve \(\pi = \pi P\): \(\pi_{disease} = \pi_{healthy} \times 0.1 + \pi_{disease} \times 0.9\). With \(\pi_{healthy} + \pi_{disease} = 1\), we get \(\pi_{disease} = 0.5\). The high persistence in the disease state (0.9) keeps people sick once they become sick.
Part III: First-Order Transition Models
The Basic Transition Model
For binary \(Y_{ij}\), model the conditional probability:
id time trt y
1 1 0 Control 0
2 1 1 Control 0
3 1 2 Control 0
4 1 3 Control 1
5 1 4 Control 0
6 1 5 Control 0
7 2 0 Control 0
8 2 1 Control 0
9 2 2 Control 1
10 2 3 Control 1
11 2 4 Control 1
12 2 5 Control 0
Create Lagged Response
# Add lagged response within each subjectdat <- dat |>group_by(id) |>mutate(y_lag =lag(y, 1)) |>ungroup()# Remove first observation (no lag available)dat_model <- dat |>filter(!is.na(y_lag))head(dat_model, 12)
Create Lagged Response
# A tibble: 12 × 5
id time trt y y_lag
<fct> <int> <fct> <dbl> <dbl>
1 1 1 Control 0 0
2 1 2 Control 0 0
3 1 3 Control 1 0
4 1 4 Control 0 1
5 1 5 Control 0 0
6 2 1 Control 0 0
7 2 2 Control 1 0
8 2 3 Control 1 1
9 2 4 Control 1 1
10 2 5 Control 0 1
11 3 1 Treatment 0 0
12 3 2 Treatment 0 0
# Create lagged response within each subjectdat <- dat |>group_by(id) |>mutate(y_lag =lag(y, 1)) |>ungroup()# Remove first observation (no lag available)dat_model <- dat |>filter(!is.na(y_lag))
Step 2: Fit the Model
# Standard GLM with lagged response as covariatefit <-glm(y ~ y_lag + treatment + time,data = dat_model,family =binomial(link ="logit"))
term GLM_SE Robust_SE GEE_SE
(Intercept) (Intercept) 0.1994 0.2001 0.1996
y_lag y_lag 0.1620 0.1572 0.1568
trtTreatment trtTreatment 0.1515 0.1569 0.1565
time time 0.0546 0.0522 0.0521
Check Your Understanding: Part IV
Quick Self-Check
Likelihood Factorization: Why can we use standard glm() to fit transition models instead of specialized software?
Robust vs Model-Based SEs: Your GLM gives SE = 0.15, but robust cluster SEs give SE = 0.25. What does this suggest?
GEE with Independence: Why do we use corstr = "independence" in GEE for transition models, even though the data are clearly correlated?
Answers
The Markov property allows the likelihood to factorize into a product of conditional probabilities: \(L = \prod_i \prod_{j \geq 2} P(Y_{ij} | Y_{i,j-1}, X_{ij})\). Each term is a standard logistic regression likelihood with \(Y_{i,j-1}\) as a covariate. This is just a pooled logistic regression!
This suggests residual correlation beyond what the first-order lag captures. The model-based SEs assume conditional independence given the lag, but the data show additional clustering. Use the robust SEs or consider higher-order lags.
The lag variable \(Y_{i,j-1}\)explicitly captures the serial dependence. Once we condition on the previous response, the residuals should be approximately independent. A non-independence working correlation would try to model correlation that we’ve already accounted for.
# Refit first-order on same datafit_order1 <-glm(y ~ y_lag1 + trt + time,data = dat_order2,family = binomial)anova(fit_order1, fit_order2, test ="LRT")
Testing for Higher-Order Dependence
Analysis of Deviance Table
Model 1: y ~ y_lag1 + trt + time
Model 2: y ~ y_lag1 + y_lag2 + trt + time
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 796 874
2 795 874 1 0.0794 0.78
times <-0:5coefs <-coef(fit_tv)# Effect of y_lag at each time pointeffect_at_t <- coefs["y_lag"] + coefs["y_lag:time"] * timeseff_df <-data.frame(time = times, log_or = effect_at_t)ggplot(eff_df, aes(time, log_or)) +geom_line(linewidth =1.2, color ="steelblue") +geom_point(size =3, color ="steelblue") +geom_hline(yintercept =0, linetype ="dashed") +labs(title ="Time-Varying Dependence on Previous Response",x ="Time", y ="Log Odds Ratio for Y_lag") +theme_minimal(base_size =14)
Visualization: Time-Varying Effect
Handling the Initial Observation
Three approaches for \(Y_{i1}\):
Approach
Description
Condition on \(Y_{i1}\)
Treat as fixed (most common)
Model \(Y_{i1}\) separately
Specify marginal distribution
Include random effects
Joint model with shared \(b_i\)
Conditioning is simple and usually valid if \(n_i\) is moderate.
Part VI: Model Comparison
Transition vs Marginal Models
Aspect
Marginal (GEE)
Transition
Conditions on
Covariates only
Covariates + past \(Y\)
Correlation
Working \(\operatorname{Corr}_i(\alpha)\)
Explicit via \(Y_{j-1}\)
\(\beta\) meaning
Population-average
Conditional on history
Use case
Population effects
State dependence
Transition vs Mixed Models
Aspect
Mixed (GLMM)
Transition
Conditions on
Random effects \(b_i\)
Past responses
Heterogeneity
Unobserved traits
Observable history
Correlation source
Latent variable
Serial dependence
Computation
Integration required
Standard GLM
Same Data, Different Questions
suppressPackageStartupMessages(library(lme4))# Transition model (already fit)# fit_trans# Marginal model (GEE)fit_marginal <-geeglm(y ~ trt + time,data = dat_model,id = id,family = binomial,corstr ="ar1")# Mixed model (GLMM with random intercept)fit_mixed <-glmer(y ~ trt + time + (1| id),data = dat_model,family = binomial)# Compare treatment effectsdata.frame(Model =c("Transition", "Marginal (GEE)", "Mixed (GLMM)"),Estimate =c(coef(fit_trans)["trtTreatment"],coef(fit_marginal)["trtTreatment"],fixef(fit_mixed)["trtTreatment"]),SE =c(summary(fit_trans)$coefficients["trtTreatment", "Std. Error"],summary(fit_marginal)$coefficients["trtTreatment", "Std.err"],summary(fit_mixed)$coefficients["trtTreatment", "Std. Error"])) |>mutate(across(where(is.numeric), ~round(., 3)))
Same Data, Different Questions
Model Estimate SE
1 Transition -0.541 0.152
2 Marginal (GEE) -0.701 0.198
3 Mixed (GLMM) -0.893 0.257
Key Insight: Different Interpretations
Transition: Treatment effect on current response, given previous response
Marginal: Treatment effect on population-average probability
Mixed: Treatment effect for a specific subject (given \(b_i\))
All are valid; choice depends on the research question.
When Transition Models Excel
Disease progression: Natural states (remission/relapse)
Panel surveys: Employment, insurance status
Clinical trials: Response at each visit conditional on prior
Process understanding: Mechanism involves serial dependence
Visualizing Trajectories
# Sample 20 subjectsset.seed(667)sample_ids <-sample(unique(dat$id), 20)dat_sample <- dat |>filter(id %in% sample_ids)ggplot(dat_sample, aes(time, y, group = id, color = trt)) +geom_line(alpha =0.6) +geom_point(size =1.5) +facet_wrap(~trt) +labs(title ="Individual Binary Trajectories",x ="Time", y ="Response (0/1)") +theme_minimal(base_size =12) +theme(legend.position ="none")
Visualizing Trajectories
Check Your Understanding: Part VI
Quick Self-Check
Model Comparison: A transition model gives treatment OR = 0.6; a marginal GEE gives OR = 0.7. Which is “correct”?
Initial Observation: You have 6 time points. When you fit a first-order transition model, how many observations per subject contribute to the likelihood?
Choosing a Model: You want to study how treatment affects the probability of relapse among cancer survivors. Which model type is most appropriate?
Answers
Both are correct for different questions! The transition model OR is conditional on previous state (“among those who were healthy last visit, treatment reduces odds of being sick by 40%”). The marginal OR is population-average (“overall, treatment reduces the odds of being sick by 30%”). They answer different questions.
5 observations - The first observation has no lag and is typically conditioned on (not modeled). The remaining 5 observations contribute to the conditional likelihood.
Transition model - The question explicitly involves a “state change” (relapse). Transition models naturally capture the probability of moving from remission to relapse, and can assess whether treatment modifies this transition probability.
Common Mistakes in Transition Models
Errors to Avoid
#
Mistake
Correct Approach
1
Interpreting transition model \(\beta\) as population-average
Transition \(\beta\) is conditional on previous state; use GEE for population-average
2
Using non-independence working correlation
Once you include \(Y_{j-1}\), use corstr = "independence" or robust SEs
3
Forgetting to remove first observation
First timepoint has no lag; filter to !is.na(y_lag)
4
Not testing higher-order dependence
Use LRT to compare first vs second-order models
5
Ignoring robust SEs
Always compute cluster-robust SEs; compare to model-based
6
Applying transition models to inappropriate outcomes
Best for discrete states with meaningful persistence (not continuous growth)
Mixed models (Ch. 8, 14): subject-specific effects, random effects \(b_i\)
Transition models (instructor extension, FLW 1st-ed Ch. 10): effects conditional on past responses
Missing-data methods (MCAR/MAR/MNAR, multiple imputation, sensitivity analysis) were covered in the prior missing-data lectures. With transition models, you now have a complete toolkit for the three ways correlation enters longitudinal data.
References
Transition models are not a FLW 2nd-ed chapter. Source: Fitzmaurice, Laird & Ware (2004, 1st ed.), Applied Longitudinal Analysis, Ch. 10; the marginal/mixed/transition trichotomy is framed in FLW (2011, 2nd ed.) Ch. 12 (Marginal Models), which has no dedicated transition chapter.