BIOS 667 - Lecture 3: Overview of Linear Models for Longitudinal Data (Ch. 3)

Fitzmaurice, Laird & Ware (2011) - Applied Longitudinal Data Analysis

Naim Rashid

Lecture Objectives

By the End of This Lecture, You Will Be Able To

  • Define and use vector and matrix notation for longitudinal data
  • Specify the general linear model with mean and covariance components
  • Apply descriptive visualization methods to explore longitudinal patterns
  • Compare three covariance modeling approaches: unstructured, pattern models, and random effects
  • Explain limitations of historical approaches (summary measures, ANOVA, MANOVA)

How These Objectives Map to Your Work

Objective Where you use it
Vector/matrix notation Every later lecture; HW1 setup of \(\mathbf{Y}_i\), \(\mathbf{X}_i\)
General linear model (mean + covariance) LME and GEE on HW3-HW4; final project model specification
Descriptive visualization HW1 exploratory plots; first step of the final-project analysis
Three covariance approaches Motivates covariance-pattern (Ch. 7) and random-effects (Ch. 8) choices on HW3
Limits of historical approaches Justifies why the final project uses modern methods, not ANOVA/MANOVA

Roadmap

Lecture Overview

Section Topics
Part 1 Notation and distributional assumptions
Part 2 Descriptive methods for visualization
Part 3 Modeling the covariance structure
Part 4 Historical approaches and their limitations
Part 5 Summary and next steps

Prerequisites / recall

You will get the most out of this lecture if you are comfortable with:

  • Column vectors and matrices: stacking values into a vector, reading matrix dimensions
  • The design matrix and the regression form \(\mathbf{X}\boldsymbol\beta\) in matrix notation (from Lecture 2)
  • Transpose notation (\(\mathbf{X}^\top\)) and block-diagonal matrices
  • The multivariate normal distribution as the multivariate analogue of the normal

We build directly on the \(\mathbf{X}\boldsymbol\beta\) matrix form from L02; today we add the within-subject covariance.

Notation reference (so far)

Symbol Meaning
\(i,\ j\) subject index \(i = 1,\dots,N\) and occasion/time index \(j = 1,\dots,n_i\)
\(N,\ n_i\) number of subjects; number of occasions for subject \(i\) (\(n\) when balanced)
\(Y_{ij},\ \mathbf{Y}_i\) response for subject \(i\) at occasion \(j\) (scalar); response vector \((n_i \times 1)\)
\(t_{ij}\) measurement time for subject \(i\) at occasion \(j\)
\(X_{ij},\ \mathbf{X}_i\) covariate row vector \(1 \times p\); design matrix \(n_i \times p\)
\(\boldsymbol\beta\) fixed-effect (population-average) coefficients
\(\varepsilon_{ij},\ \boldsymbol\varepsilon_i\) residual error (scalar; vector)
\(\Sigma_i = \operatorname{Cov}(\mathbf{Y}_i)\) marginal (total) covariance of \(\mathbf{Y}_i\) (decomposes into random-effect + residual parts once random effects appear, Ch. 8)
\(\rho\) a correlation parameter (e.g. exchangeable, or AR(1) where corr at lag \(k\) is \(\rho^k\))

Bold = vector or matrix. This is a running reference card: later lectures add random-effect and GEE notation.

Part 1: Notation and Distributional Assumptions

The Response Vector \(\mathbf{Y}_i\)

\(Y_{ij}\) denotes the response for subject \(i\) at occasion \(j\).

We group the \(n_i\) repeated measures for subject \(i\) into a column vector:

\[ \mathbf{Y}_{i} = \begin{pmatrix} Y_{i1} \\ Y_{i2} \\ \vdots \\ Y_{in_{i}} \end{pmatrix}_{n_i \times 1} \]

Independence Structure

Level Independence?
Between subjects Yes (conditional on covariates)
Within subject No (repeated measures are correlated)

This within-subject correlation is the defining feature of longitudinal data.

The Covariate Matrix \(\mathbf{X}_{i}\)

\(X_{ij}\) is the \(1 \times p\) row vector of covariates for subject \(i\) at occasion \(j\). Stacking these rows over the \(n_i\) occasions gives the design matrix for subject \(i\):

\[ \mathbf{X}_{i} = \begin{pmatrix} X_{i1} \\ X_{i2} \\ \vdots \\ X_{in_{i}} \end{pmatrix}_{n_i \times p} \]

So \(\mathbf{X}_i\) is \(n_i \times p\), and the model is written \(\mathbf{X}_i\boldsymbol\beta\).

Rows are occasions, columns are covariates. Intercept is a column of 1s; Sex is constant (time-constant); Time increases down the rows (time-varying).

Covariate Types

Type Description Example
Time-constant Does not change over time Sex, race
Time-varying Changes at each occasion Age, weight

The General Linear Model

The model for the observation vector of subject \(i\) is:

\[ \mathbf{Y}_{i} = \mathbf{X}_{i}\boldsymbol{\beta} + \boldsymbol{\varepsilon}_{i} \]

The mean response is linear in \(\boldsymbol{\beta}\).

Model Components

Component Notation Description
Response vector \(\mathbf{Y}_i\) Observed outcomes
Design matrix \(\mathbf{X}_i\) Covariates
Regression parameters \(\boldsymbol{\beta}\) Fixed effects
Random errors \(\boldsymbol{\varepsilon}_i\) Zero-mean errors

Example: Response Vector

A subject measured at three time points with responses 12, 10, and 9.

# Define the response vector for subject i
Y_i <- c(12, 10, 9)
print(Y_i)

Interpretation: This shows the response vector \(\mathbf{Y}_i\) for a single subject with 3 measurements. The values (12, 10, 9) represent the outcome at times 1, 2, and 3 respectively. Notice the downward trend: this subject’s response is decreasing over time. In R, this is stored as a numeric vector of length 3.

Example: Response Vector

[1] 12 10  9

Example: Design Matrix

Covariates include an intercept, sex (time-constant), and time (time-varying).

# Column 1: intercept, Column 2: Sex (0 = male), Column 3: Time
X_i <- matrix(c(1, 0, 1,
                1, 0, 2,
                1, 0, 3), nrow = 3, byrow = TRUE)
colnames(X_i) <- c("Intercept", "Sex", "Time")
print(X_i)

Interpretation: This \(3 \times 3\) design matrix corresponds to the same subject with 3 visits. Each row is one occasion, so this is \(X_{i1}, X_{i2}, X_{i3}\) stacked into \(\mathbf{X}_i\). The intercept column is all 1s (standard for regression). The Sex column is 0 for all rows because sex is time-constant (this subject is male). The Time column increases (1, 2, 3) because time is time-varying. When we compute \(\mathbf{X}_i\boldsymbol{\beta}\), we get the predicted mean response at each visit.

Example: Design Matrix

     Intercept Sex Time
[1,]         1   0    1
[2,]         1   0    2
[3,]         1   0    3

The Full Data Model

We can stack data for all \(N\) subjects into one model:

\[ \mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon} \]

Where \(\mathbf{Y} = (\mathbf{Y}_1^{\top}, \dots, \mathbf{Y}_N^{\top})^{\top}\) and \(\boldsymbol{\varepsilon} = (\boldsymbol{\varepsilon}_1^{\top}, \dots, \boldsymbol{\varepsilon}_N^{\top})^{\top}\).

Block-Diagonal Covariance

\(\mathbf{X}\) is formed by stacking the subject-specific design matrices \(\mathbf{X}_i\).

The covariance matrix \(\boldsymbol\Sigma\) for the full data set is block-diagonal:

\[ \boldsymbol\Sigma = \mathrm{diag}(\Sigma_1, \Sigma_2, \dots, \Sigma_N) \]

Here \(\Sigma_i = \operatorname{Cov}(\mathbf{Y}_i)\) is the within-subject covariance of subject \(i\)’s response (unstructured for now). The off-diagonal blocks are zero because different subjects are independent.

Block-Diagonal Covariance: the Picture

Dense blocks on the diagonal are the within-subject covariances \(\Sigma_1, \Sigma_2, \Sigma_3\). The white off-diagonal blocks are zero: different subjects are independent.

The Two Key Components

Component Expression Meaning
Mean structure \(E(\mathbf{Y}_{i}) = \mathbf{X}_{i}\boldsymbol{\beta}\) How the average response changes
Covariance structure \(\mathrm{Cov}(\mathbf{Y}_{i}) = \Sigma_i\) How measures are correlated

\(\Sigma_i = \operatorname{Cov}(\mathbf{Y}_i)\) is the within-subject covariance of the response (unstructured for now); we decompose it once random effects appear in Ch. 8.

Distributional Assumptions

The Multivariate Normal Assumption

We assume the vector of responses for subject \(i\) is multivariate normal:

\[ \mathbf{Y}_i \sim N_{n_i}(\mathbf{X}_i\boldsymbol{\beta}, \Sigma_i) \]

Why Multivariate Normal?

  • A working assumption for convenience, not a law of nature
  • Makes it easy to derive estimators for \(\boldsymbol{\beta}\) and \(\Sigma_i\) and to build tests
  • Not strictly required: inference is generally robust to non-normality
  • What matters more: the mean and covariance structure (covered next)

Verifying the Assumption

Unlike the univariate case, it is very difficult to formally verify the multivariate normal assumption.

Formal statistical tests exist but are often unhelpful, as they can detect departures that are not substantively important.

Practical Graphical Checks

Check Purpose
Histograms of residuals Assess marginal normality at each time
Boxplots of residuals Identify outliers and skewness
Scatterplots of residual pairs Check for linear trends between times

Caveat: These checks can only provide evidence of departures. They cannot prove the assumption holds.

Why Normality Is Not Critical

The assumption of multivariate normality is not as critical for valid inferences about \(\boldsymbol{\beta}\) as the assumptions about:

  • Independence between subjects
  • Covariance structure within an individual

Unless departures are very extreme (e.g., highly skewed data), correctly modeling dependence is more important.

Common Pitfalls

Pitfall Why It Matters
Confusing mean and covariance misspecification Misspecifying the mean model is a bigger problem
Overstating the role of normality It is a working assumption, not a strict requirement

A Note on Notation

For simplicity, we often use:

  • \(E(\mathbf{Y}_i)\) for \(E(\mathbf{Y}_i | \mathbf{X}_i)\)
  • \(\mathrm{Cov}(\mathbf{Y}_i)\) for \(\mathrm{Cov}(\mathbf{Y}_i | \mathbf{X}_i)\)

We are always referring to the conditional distribution given the covariates.

Check Your Understanding: Part 1a (Dimensions and structure)

  1. For a subject with 5 repeated measurements and 3 covariates (including intercept), what are the dimensions of \(\mathbf{Y}_i\) and \(\mathbf{X}_i\)?
  2. Why is the covariance matrix \(\boldsymbol\Sigma\) for the full dataset block-diagonal?

Answers:

  1. \(\mathbf{Y}_i\) is \(5 \times 1\); \(\mathbf{X}_i\) is \(5 \times 3\).
  2. Because subjects are independent of each other (no correlation between different subjects), only the within-subject blocks have non-zero covariances. The off-diagonal blocks (between subjects) are all zeros.

Check Your Understanding: Part 1b (Normality)

  1. A colleague says “we must have normally distributed data to use these methods.” How would you respond?

Answer:

  1. Normality is a working assumption for convenience, but the methods are robust to departures from normality. What matters more is correctly specifying the mean structure and covariance structure.

Part 2: Descriptive Methods

Why Visualize First?

Before modeling, we need to understand the patterns in our data.

Goal: To see individual trajectories and overall trends.

The Spaghetti Plot

A spaghetti plot connects successive repeated measures for each individual.

Purpose: Shows individual trajectories, the spread of starting points (between-subject variability), and any outlying subjects or unbalanced/irregular data.

Caution (FLW Section 3.3): The joined-segments plot is not very informative about trends in the mean response; between- and within-subject variability is often almost completely obscured. For reading the mean trend, use the mean-response (mean-trajectory) plot instead, not the spaghetti plot.

Spaghetti Plot: Code

# Load packages
library(dplyr)
library(ggplot2)

# Create mock longitudinal data
set.seed(123)
data_long <- tibble(
  subject = rep(1:20, each = 5),
  time = rep(1:5, times = 20),
  response = rnorm(100, mean = 50 + 2*time + 0.5*subject, sd = 5)
)

# Create the plot
ggplot(data_long, aes(x = time, y = response, group = subject)) +
  geom_line(alpha = 0.6) +
  geom_point(alpha = 0.6) +
  labs(title = "Spaghetti Plot", x = "Time", y = "Response") +
  theme_minimal()

Interpretation: Each line represents one subject’s trajectory over 5 time points. The plot is good for seeing the spread of starting points (substantial between-subject variability, with some subjects high and some low), how individuals maintain their relative positions over time (a subject who starts high tends to stay high), and any outlying subjects or unbalanced data. It is not the right tool for reading the mean trend: with many overplotted lines the average drift is easily obscured. For the mean trend, read the mean-response plot on the next slide. The persistence of relative position here is the visual cue for the within-subject correlation our models must accommodate.

Spaghetti Plot: Code

The Mean Trajectory Plot

A mean trajectory plot reveals the overall trend in the data.

Purpose: Shows the overall trend of the group and a confidence interval for that trend.

Mean Trajectory Plot: Code

# Summarize data
summary_data <- data_long %>%
  group_by(time) %>%
  summarize(
    mean_response = mean(response),
    sd_response = sd(response),
    n = n(),
    se_response = sd_response / sqrt(n)
  )

# Create the plot with a ribbon for uncertainty
ggplot(summary_data, aes(x = time, y = mean_response)) +
  geom_line(color = "blue", linewidth = 1.5) +
  geom_point(color = "blue", size = 3) +
  geom_ribbon(aes(ymin = mean_response - 1.96 * se_response,
                  ymax = mean_response + 1.96 * se_response),
              alpha = 0.2, fill = "blue") +
  labs(title = "Mean Trajectory with 95% CI", x = "Time", y = "Mean Response") +
  theme_minimal()

Interpretation: The blue line shows the average response at each time point across all 20 subjects. The shaded ribbon represents a 95% confidence interval for the mean. The upward slope indicates that, on average, the response increases by about 2 units per time point. The narrow confidence interval reflects relatively precise estimation of the mean trajectory. This plot complements the spaghetti plot by summarizing the overall trend while hiding individual variation.

Mean Trajectory Plot: Code

The Correlation Heatmap

A correlation heatmap shows pairwise correlations between repeated measures.

Purpose: Shows how the correlation decays over time. Do measurements far apart in time have a lower correlation?

Note

This heatmap is an instructor-added exploratory device, not one of the FLW Section 3.3 descriptive plots (which are time plots of the raw data, mean-response plots, and smoothing via moving averages or lowess). FLW introduces correlation and correlation decay later, under Section 3.5 “Modeling the Covariance”; we show the heatmap here to bridge into that section.

Correlation Heatmap: Code

# Load packages
library(tidyr)

# Simulated illustration: genuine AR(1) serial correlation within each subject.
# Each subject's 5 errors follow an AR(1) process, so correlation decays with
# time separation. The population lag-k correlation is 0.8^k, but the finite-
# sample estimate is weaker at longer lags. rho = 0.8 matches the running AR(1)
# example in L02.
set.seed(667)
n_sub <- 200
phi <- 0.8
ar1_data <- do.call(rbind, lapply(1:n_sub, function(s) {
  err <- as.numeric(arima.sim(model = list(ar = phi), n = 5))
  tibble(subject = s, time = 1:5, response = 50 + 2 * (1:5) + err)
}))

# Pivot data to a wide format
wide_data <- ar1_data %>%
  select(subject, time, response) %>%
  pivot_wider(names_from = time, values_from = response) %>%
  select(-subject)

# Calculate correlation matrix
cor_matrix <- cor(wide_data, use = "pairwise.complete.obs")

# Tidy matrix for plotting
cor_long <- cor_matrix %>%
  as.data.frame() %>%
  tibble::rownames_to_column("Time1") %>%
  pivot_longer(-Time1, names_to = "Time2", values_to = "Correlation")

# Create the heatmap
ggplot(cor_long, aes(x = Time1, y = Time2, fill = Correlation)) +
  geom_tile() +
  scale_fill_gradient2(low = "red", high = "blue", mid = "white",
                       midpoint = 0, limit = c(-1,1), name = "Correlation") +
  theme_minimal() +
  labs(title = "Correlation Heatmap (AR(1) data)")

Interpretation: This heatmap displays the correlation matrix for the 5 time points. The diagonal is darkest blue (correlation = 1, since each time point is perfectly correlated with itself). Because the errors were simulated from an AR(1) process, the off-diagonal correlations fade as the time separation grows: adjacent times (lag 1) are strongly correlated, while the most separated pair (times 1 and 5, lag 4) is much weaker. That visible fade away from the diagonal is the signature of serial correlation, exactly the pattern an AR(1) covariance structure is built to capture.

Correlation Heatmap: Code

Check Your Understanding: Part 2a (Reading the plots)

  1. What is the primary purpose of a spaghetti plot versus a mean trajectory plot?
  2. In the correlation heatmap, what pattern would you expect if correlation decays with time separation (like AR(1))?

Answers:

  1. Spaghetti plots show individual trajectories and between-subject variability; mean trajectory plots summarize the overall group trend and its uncertainty.
  2. You would see stronger correlations (darker colors) near the diagonal and weaker correlations (lighter colors) in the corners. The cells farthest from the diagonal would be lightest.

Check Your Understanding: Part 2b (Why visualize first)

  1. Why should you always examine these plots before fitting a model?

Answer:

  1. These plots reveal the data structure (trends, heterogeneity, potential outliers) and help you choose an appropriate model. You might discover nonlinear trends, unusual subjects, or correlation patterns that inform your covariance structure choice.

Case Study: TLC Lead-Exposure Trial

Case Study: From Mock to Real Data

The plots above used simulated data to isolate each idea. Now we run the same three descriptive views on a real FLW dataset.

Data: Treatment of Lead-Exposed Children (TLC) trial, the running example from Lecture 2 (FLW Ch. 2, Tables 2.2-2.3). We use the placebo group: \(N = 50\) children, blood-lead level measured at weeks 0, 1, 4, and 6 (\(n = 4\) occasions).

Why placebo: an untreated group lets us see the natural decay and the within-subject correlation structure without a treatment effect confounding the trend.

library(dplyr)
library(tidyr)
library(ggplot2)

# tlc.csv is WIDE: ID, Treatment Group, then lead level at weeks 0, 1, 4, 6.
# Local-first with a download fallback (FLW ships a header-free, space-delim file).
tlc_file <- "../../data/tlc.csv"
if (!file.exists(tlc_file)) tlc_file <- "data/tlc.csv"
if (file.exists(tlc_file)) {
  tlc <- read.csv(tlc_file)
} else {
  tlc <- read.table("https://content.sph.harvard.edu/fitzmaur/ala2e/tlc-data.txt",
                    header = FALSE,
                    col.names = c("ID", "Treatment.Group",
                                  "Lead.Level.Week.0", "Lead.Level.Week.1",
                                  "Lead.Level.Week.4", "Lead.Level.Week.6"))
}

# Placebo group, one column per occasion -> long format for plotting
weeks <- c(0, 1, 4, 6)
# Select the four weekly lead-level columns BY NAME (robust to column order);
# fall back to the last four columns if a local file uses non-standard names.
lead_cols <- grep("Lead.*Level|Week", names(tlc), ignore.case = TRUE, value = TRUE)
if (length(lead_cols) != 4) lead_cols <- tail(names(tlc), 4)
pbo_wide <- tlc[tlc$Treatment.Group == "P", lead_cols]
colnames(pbo_wide) <- paste0("w", weeks)
pbo_wide$id <- seq_len(nrow(pbo_wide))

pbo_long <- pbo_wide %>%
  pivot_longer(cols = starts_with("w"), names_to = "wk", values_to = "lead") %>%
  mutate(week = weeks[match(wk, paste0("w", weeks))])

nrow(pbo_wide)            # number of placebo children

Case Study: From Mock to Real Data

[1] 50
range(pbo_long$lead)     # observed blood-lead range
[1] 13.5 43.3

TLC Placebo: Spaghetti Plot

The 50 placebo trajectories spread widely at baseline (the between-subject heterogeneity), and children who start high tend to stay high: relative position persists across weeks. That persistence is the visual signature of within-subject correlation the spaghetti plot exposes but cannot quantify.

TLC Placebo: Mean Trajectory

Mean blood-lead drifts down only mildly under placebo, from about 26.3 mcg/dL at week 0 to about 23.6 mcg/dL at week 6. This is the mean-structure picture \(E(\mathbf{Y}_i) = \mathbf{X}_i\boldsymbol\beta\) made visual.

Warning

The 95% ribbon uses a naive standard error that ignores within-subject correlation, so treat it as descriptive only. Proper standard errors require modeling the marginal covariance \(\Sigma_i\) (Part 3).

TLC Placebo: Within-Subject Correlation

Off-diagonal correlations are high and positive throughout, ranging from about 0.76 to 0.87 (FLW Table 2.3). The dominant signal is strong positive correlation everywhere (between-subject heterogeneity), with a modest decay as the gap between weeks widens. These are exactly the two facts (non-constant structure, decaying correlation) the covariance models in Part 3 are built to capture.

Part 3: Modeling the Covariance

Why Model the Covariance?

A defining feature of longitudinal data is that repeated responses on the same individual are correlated.

This correlation cannot be ignored.

Consequences of Proper Covariance Modeling

Benefit Description
Valid inferences Correct standard errors and p-values
Efficiency More precise estimates
Missing data Valid estimates under certain conditions

Three Covariance Approaches

Approach Description
Unstructured Completely arbitrary pattern
Covariance pattern models Specific parsimonious structure
Random effects Individual-specific effects induce structure

Approach 1: Unstructured Covariance

This approach assumes no explicit structure for the covariance.

It estimates the unique variance for each time point and the covariance for every possible pair.

Key Feature: Provides a completely flexible and arbitrary pattern.

Drawbacks of Unstructured Covariance

Issue Description
Parameter-heavy \(\frac{n_i(n_i+1)}{2}\) parameters, where \(n_i\) = number of occasions (55 for 10 occasions!)
Unstable estimates Small samples lead to instability
Data limitations Cannot accommodate irregular timing or missing data

Approach 2: Covariance Pattern Models

This approach places a specific, parsimonious structure on the covariance matrix.

Inspired by time series analysis.

Key Idea: The correlation between repeated measures decays as the time separation increases.

Approach 3: Random Effects

An alternative, indirect way to model the covariance.

It accounts for correlation by introducing individual-specific random effects into the model.

Example: A random intercept represents all unobserved factors that make some subjects naturally higher or lower responders.

Benefits of Random Effects

Benefit Description
Flexibility and parsimony Few parameters, realistic patterns
Handles irregular data Natural handling of missing observations
Multiple effects Random intercepts and slopes for richer patterns

Summary of Covariance Approaches

Approach Pros Cons
Unstructured Most flexible Parameter-heavy, requires complete data
Covariance Pattern Parsimonious, handles irregular timing Assumes a specific pattern
Random Effects Flexible, parsimonious, handles missing data Structure can be too restrictive

Covariance Structure Visualized

Compound Symmetry (CS): Constant correlation (\(\rho\)) for any pair of times.

\[ \begin{pmatrix} 1 & \rho & \rho & \rho \\ \rho & 1 & \rho & \rho \\ \rho & \rho & 1 & \rho \\ \rho & \rho & \rho & 1 \end{pmatrix} \]

Covariance Structure Visualized

AR(1): Correlation decays exponentially as time separation increases (\(\rho^k\)).

\[ \begin{pmatrix} 1 & \rho & \rho^2 & \rho^3 \\ \rho & 1 & \rho & \rho^2 \\ \rho^2 & \rho & 1 & \rho \\ \rho^3 & \rho^2 & \rho & 1 \end{pmatrix} \]

Check Your Understanding: Part 3a (Unstructured and pattern models)

  1. Why is an unstructured covariance matrix problematic with 10 time points?
  2. What is the key difference between compound symmetry (CS) and AR(1) covariance structures?

Answers:

  1. An unstructured matrix with \(n_i = 10\) occasions requires \(\frac{n_i(n_i+1)}{2} = \frac{10 \times 11}{2} = 55\) parameters (here \(n_i\) is the number of occasions, distinct from \(n\), the number of subjects). This many parameters can lead to unstable estimates, especially with small samples, and the model may fail to converge.
  2. CS assumes the same correlation (\(\rho\)) between any two time points regardless of how far apart they are. AR(1) assumes correlation decays as time separation increases (\(\rho^k\) for lag \(k\)), which is often more realistic for longitudinal data.

Check Your Understanding: Part 3b (Random effects)

  1. How do random effects induce correlation among repeated measures?

Answer:

  1. Random effects (e.g., a random intercept \(b_i\)) represent subject-specific deviations that are shared across all measurements for that subject. Since the same \(b_i\) affects all of subject \(i\)’s responses, it creates positive correlation among them.

Part 4: Historical Approaches

Three Classical Approaches

Method Description
Summary Measures Reduce to a single value
Univariate Repeated Measures ANOVA Extension of ANOVA
MANOVA Multivariate analysis of variance

Why We Do Not Use Them Anymore

Method Advantage Disadvantage
Summary Measures Simple, easy to use Discards trajectory information
Univariate ANOVA Easy to perform Requires compound symmetry
MANOVA No covariance structure required Cannot handle missing data

Method 1: Summary Measures

The simplest approach: reduce repeated measures to a single summary measure.

Examples: Mean response or regression slope.

Limitation: Discards information about individual trajectories.

Method 2: Univariate Repeated Measures ANOVA

An extension of ANOVA.

Assumption: Compound symmetry (equal variances and constant correlation).

This is often inappropriate for longitudinal data where correlation typically decays with time separation.

Solution: Greenhouse and Geisser (1959) developed an adjustment to correct the F-test.

Method 3: MANOVA

Treats repeated measures as a single, multi-dimensional outcome.

Advantage: Does not assume a specific covariance structure; it estimates an unstructured matrix.

Limitation: Requires a complete set of measurements for every subject.

The Missing Data Problem in MANOVA

MANOVA requires complete measurements for every subject.

Subject Time 1 Time 2 Time 3 Time 4
1 12 10 9 11
2 15 14 - 13
3 11 12 10 12

Subject 2 would be dropped from the analysis.

Comparison of Covariance Assumptions

Method Assumption Parameters
Univariate ANOVA Compound symmetry 2
MANOVA Unstructured Many
Modern methods Realistic and parsimonious Few

Part 5: Summary

Common Mistakes to Avoid

  • Focusing only on the mean structure: The covariance structure is equally important. Misspecifying it leads to incorrect standard errors and invalid inference, even if your mean model is correct.

  • Using unstructured covariance by default: While flexible, unstructured covariance requires many parameters and cannot handle missing data or irregular timing. Consider parsimonious alternatives (CS, AR(1), random effects).

  • Applying MANOVA with incomplete data: MANOVA drops subjects with any missing observations. This wastes data and can introduce bias if missingness is related to the outcome.

  • Assuming compound symmetry without checking: Univariate repeated measures ANOVA assumes equal correlations across all pairs. This is often unrealistic, since correlations typically decay with time separation.

  • Confusing the mean and covariance misspecification consequences: Misspecifying the mean model biases \(\hat{\beta}\). Misspecifying the covariance structure affects standard errors but not point estimates (if the mean is correct).

The Core Components

The general linear model has two parts:

Component Expression
Mean structure \(E(\mathbf{Y}_i) = \mathbf{X}_i\boldsymbol{\beta}\)
Covariance structure \(\mathrm{Cov}(\mathbf{Y}_i) = \Sigma_i\)

The Main Challenge

The covariance structure makes longitudinal analysis unique.

We must account for the non-independence of repeated measures.

Choosing a model for \(\Sigma_i\) that is both flexible and parsimonious is the main challenge.

Further Reading

  • Fitzmaurice, Laird & Ware (2011), Chapter 3 and its “Further Reading” notes: the primary reference for this lecture.
  • Laird & Ware (1982), Random-effects models for longitudinal data, Biometrics 38:963-974: the foundational mixed-model paper we build toward in Ch. 8.
  • Greenhouse & Geisser (1959), On methods in the analysis of profile data, Psychometrika 24:95-112: the classic ANOVA sphericity correction mentioned under historical approaches.
  • Diggle, Heagerty, Liang & Zeger (2002), Analysis of Longitudinal Data (2nd ed.): complementary treatment of covariance structure and serial correlation.

What Comes Next?

Upcoming chapters will cover modern approaches that are more flexible and robust.

Topic Description
Serial correlation Accounting for time-ordered dependence
Random variation Modeling individual-specific effects
Incomplete data Handling missing observations