data.frame(rho = c(0, 0.3, 0.7)) |>
dplyr::mutate(Var_Delta = 2*(1-rho),
SE_Delta = sqrt(Var_Delta))Fitzmaurice, Laird & Ware (2011) - Applied Longitudinal Data Analysis
By the end of this lecture, you should be able to:
(Residual ACF and the semivariogram are introduced later as instructor extensions beyond FLW Ch. 7.)
| Section | Topic |
|---|---|
| 7.1-7.2 | Why model \(\Sigma\)? Implications for SEs and power |
| 7.3 | Unstructured covariance (UN) |
| 7.4 | Covariance-pattern models (CS, AR(1), Exponential) |
| 7.5 | Strategy for choosing \(\Sigma\) |
| 7.8 | Computing: SAS to R mapping |
| Symbol | Meaning |
|---|---|
| \(i,\ j\) | subject index \(i = 1,\dots,N\) and occasion/time index \(j = 1,\dots,n_i\) |
| \(N,\ n_i\) | number of subjects; number of occasions for subject \(i\) (\(n\) when balanced) |
| \(Y_{ij},\ \mathbf{Y}_i\) | response for subject \(i\) at occasion \(j\) (scalar); response vector \((n_i \times 1)\) |
| \(t_{ij}\) | measurement time for subject \(i\) at occasion \(j\) |
| \(X_{ij},\ \mathbf{X}_i\) | covariate row vector \(1 \times p\); design matrix \(n_i \times p\) |
| \(\boldsymbol\beta\) | fixed-effect (population-average) coefficients |
| \(\varepsilon_{ij},\ \boldsymbol\varepsilon_i\) | residual error (scalar; vector) |
| \(\Sigma_i = \operatorname{Cov}(\mathbf{Y}_i)\) | marginal (total) covariance of \(\mathbf{Y}_i\) (decomposes into random-effect + residual parts once random effects appear, Ch. 8) |
| \(\rho\) | a correlation parameter (e.g. exchangeable, or AR(1) where corr at lag \(k\) is \(\rho^k\)) |
Bold = vector or matrix. This is a running reference card: later lectures add random-effect and GEE notation.
We write \(\Sigma_i = \operatorname{Cov}(\mathbf{Y}_i)\) for the marginal (within-subject) covariance (FLW’s letter, Ch. 7), and often drop the subscript and write \(\Sigma\).
| Key Point | Description |
|---|---|
| Within-person correlation | Longitudinal data have correlation structure \(\Sigma\) that must be modeled |
| Correct \(\Sigma\) | Yields valid SEs and more power |
| Misspecified \(\Sigma\) | Leads to misleading inferences |
| Interdependence | Mean model and covariance model are linked (residuals depend on the mean) |
| MAR missingness | Valid likelihood inference requires sensible mean + \(\Sigma\) |
Paired “before-after” change (two time points):
\[ \operatorname{Var}(Y_{i2}-Y_{i1}) = \sigma_1^2+\sigma_2^2-2\rho_{12}\sigma_1\sigma_2 \]
If \(\sigma_1^2=\sigma_2^2=\sigma^2\):
\[ \operatorname{Var}(Y_{i2}-Y_{i1}) = 2\sigma^2(1-\rho) \]
| Comparison Type | Variance |
|---|---|
| Within-person change | \(2\sigma^2(1-\rho)\) |
| Cross-sectional (independent) | \(2\sigma^2\) |
| Ratio (within / between) | \(1-\rho\) |
Higher \(\rho\) yields smaller variance for within-person change and more power.
For within-person change \(\Delta=Y_{i2}-Y_{i1}\) with \(\sigma^2=1\):
\[\mathrm{Var}(\Delta)=2\sigma^2(1-\rho)=2(1-\rho)\]
Higher \(\rho\) yields smaller \(\mathrm{SE}(\Delta)\) and more power for detecting change.
rho Var_Delta SE_Delta
1 0.0 2.0 1.4142136
2 0.3 1.4 1.1832160
3 0.7 0.6 0.7745967
| Model | Captures |
|---|---|
| Independence | Neither serial correlation nor heteroscedasticity |
| AR(1) only | Serial correlation (off-diagonals), constant variance |
| varIdent only | Changing variance by time (diagonal), independent residuals |
| ARH(1) | Both serial correlation and time-varying variance |
set.seed(667)
library(dplyr)
library(tidyr)
library(ggplot2)
library(nlme)
# Simulate balanced longitudinal data with AR(1) correlation + heteroscedasticity
N <- 120 # subjects
m <- 5 # visits
times <- 0:4
rho_true <- 0.6 # true serial correlation
sigma_true <- 1.2 # marginal SD scale for the AR(1) part
omega <- c(1.0, 1.1, 1.5, 1.2, 0.9) # time-specific SD multipliers
# AR(1) generator with stationary Var = 1
sim_ar1_std <- function(m, rho){
z <- numeric(m)
z[1] <- rnorm(1, sd = 1)
if (m > 1) {
eps <- rnorm(m - 1, sd = sqrt(1 - rho^2))
for (t in 2:m) z[t] <- rho * z[t - 1] + eps[t - 1]
}
z
}
ids <- factor(seq_len(N))
design <- expand.grid(id = ids, time_idx = seq_len(m)) |>
arrange(id, time_idx) |>
mutate(time_f = factor(time_idx, levels = seq_len(m),
labels = paste0("t", times)),
time_num = times[time_idx])
# Mean structure: linear time trend + small group effect
beta0 <- 10
beta_time <- 0.8
group <- factor(rep(c("A","B"), each = N/2))
design <- design |>
group_by(id) |>
mutate(group = group[as.integer(id)]) |>
ungroup()
# Subject random intercept
sigma_b <- 1.0
b_i <- rnorm(N, 0, sigma_b)
b <- b_i[as.integer(design$id)]
# AR(1) errors with time-specific SD multipliers
z <- do.call(c, lapply(seq_len(N), function(i) sim_ar1_std(m, rho_true)))
err <- sigma_true * (omega[design$time_idx]) * z
sim <- design |>
mutate(mu = beta0 + beta_time * time_num + ifelse(group=="B", 0.5, 0),
y = mu + b + err)# Fit GLS under different covariance structures
fit_ind <- gls(y ~ time_num, data = sim, method = "REML")
fit_ar1 <- gls(y ~ time_num, data = sim,
correlation = corAR1(form = ~ time_idx | id),
method = "REML")
# Initialize varIdent with numeric starting values
w_time <- varIdent(form = ~ 1 | time_f)
nlme::coef(w_time) <- setNames(rep(1, nlevels(sim$time_f) - 1),
levels(sim$time_f)[-1])
fit_vhet <- gls(y ~ time_num, data = sim,
weights = w_time,
method = "REML")
fit_arh1 <- gls(y ~ time_num, data = sim,
correlation = corAR1(form = ~ time_idx | id),
weights = w_time,
method = "REML")# Compare fits.
# REML-LRT (nested) and AIC (non-nested) are the primary criteria;
# BIC is shown for reference only (FLW p. 179 advises against it for Sigma).
tab_ic <- data.frame(
model = c("Independence","AR(1) only","varIdent only","ARH(1) = AR(1)+varIdent"),
AIC = c(AIC(fit_ind), AIC(fit_ar1), AIC(fit_vhet), AIC(fit_arh1)),
BIC = c(BIC(fit_ind), BIC(fit_ar1), BIC(fit_vhet), BIC(fit_arh1)),
logLik = c(logLik(fit_ind), logLik(fit_ar1), logLik(fit_vhet), logLik(fit_arh1))
) |>
arrange(AIC)
tab_icBIC is not recommended for covariance selection (FLW p. 179)
FLW recommend AIC (not BIC) for choosing among covariance structures. BIC’s heavier penalty carries a “high risk of selecting a model that is too simple or parsimonious” for \(\Sigma\). We display the BIC column for reference only; base the decision on the REML-LRT (for nested models) and AIC (for non-nested models).
model AIC BIC logLik
1 ARH(1) = AR(1)+varIdent 1992.582 2027.730 -988.2909
2 AR(1) only 2014.874 2032.448 -1003.4369
3 varIdent only 2293.414 2324.169 -1139.7068
4 Independence 2302.095 2315.276 -1148.0477
param estimate
1 rho_hat (AR1 only) 1.544779
2 rho_hat (ARH1) 1.688604
# Estimated varIdent SD multipliers
vi_coef <- coef(fit_arh1$modelStruct$varStruct, unconstrained = FALSE)
levs <- levels(sim$time_f)
vi_all <- setNames(rep(1, length(levs)), levs)
vi_all[names(vi_coef)] <- vi_coef
sigma_eps <- sigma(fit_arh1)
sd_time <- sigma_eps * vi_all
sd_table <- data.frame(time = names(sd_time), sd_est = as.numeric(sd_time))
sd_table time sd_est
1 t0 1.488975
2 t1 1.670688
3 t2 2.003748
4 t3 1.586693
5 t4 1.396372
Dataset note
We use the TLC trial and a simulated AR(1) example as our course-standard data so the running example carries across lectures. FLW Ch. 7’s own case study (Section 7.6) is the exercise-therapy (muscle-strength) trial (Tables 7.1-7.5), where UN, AR, and EXP structures are compared by REML-logLik/AIC and the AR-vs-UN fit is tested with a 13-df REML-LRT.
library(dplyr)
library(tidyr)
library(nlme)
library(emmeans)
library(ggplot2)
# Read TLC data
path <- "../../data/tlc.csv"
tlc_wide <- read.csv(path, check.names = FALSE, stringsAsFactors = FALSE)
names(tlc_wide) <- trimws(names(tlc_wide))
tlc_wide <- tlc_wide %>%
rename(
id = ID,
trt = `Treatment Group`,
w0 = `Lead Level Week 0`,
w1 = `Lead Level Week 1`,
w4 = `Lead Level Week 4`,
w6 = `Lead Level Week 6`
) %>%
mutate(trt = factor(trt, levels = c("P","A"), labels = c("Placebo","Succimer")))
# Wide -> long format
long <- tlc_wide %>%
pivot_longer(c(w0,w1,w4,w6), names_to = "wk", values_to = "y") %>%
mutate(
time = recode(wk, w0="0", w1="1", w4="4", w6="6"),
time_f = factor(time, levels = c("0","1","4","6")),
time_num = as.numeric(as.character(time)),
time_idx = as.integer(time_f),
group = trt,
id = factor(id),
y = as.numeric(y)
) %>%
select(id, group, time_f, time_num, time_idx, y) %>%
arrange(id, time_idx)
head(long)# A tibble: 6 × 6
id group time_f time_num time_idx y
<fct> <fct> <fct> <dbl> <int> <dbl>
1 1 Placebo 0 0 1 30.8
2 1 Placebo 1 1 2 26.9
3 1 Placebo 4 4 3 25.8
4 1 Placebo 6 6 4 23.8
5 2 Succimer 0 0 1 26.5
6 2 Succimer 1 1 2 14.8
# Helpers for stable varIdent initialization
make_varIdent <- function(data, fac) {
f <- nlme::varIdent(form = as.formula(paste0("~ 1 | ", fac)))
levs <- levels(data[[fac]])
if (length(levs) > 1) nlme::coef(f) <- stats::setNames(rep(1, length(levs) - 1), levs[-1])
f
}
w_time <- make_varIdent(long, "time_f")
m_visits <- nlevels(long$time_f)
p_lower <- m_visits * (m_visits - 1) / 2
start_vec_un <- rep(0.2, p_lower)# Compute rho at baseline vs week 1
bl_w1 <- tlc_wide %>% select(id, trt, w0, w1) %>% filter(!is.na(w0) & !is.na(w1))
rho_hat <- with(bl_w1, cor(w0, w1))
var_within_hat <- var(bl_w1$w1 - bl_w1$w0)
var_between_hat <- var(bl_w1$w1) + var(bl_w1$w0)
ratio_hat <- var_within_hat / var_between_hat
data.frame(
rho_hat = round(rho_hat, 3),
ratio_within_to_between = round(ratio_hat, 3),
approx_1_minus_rho = round(1 - rho_hat, 3)
) rho_hat ratio_within_to_between approx_1_minus_rho
1 0.419 0.638 0.581
Beyond FLW (instructor extension)
The residual ACF and semivariogram on the next slides are not part of FLW Ch. 7. They are standard serial-correlation diagnostics from the Diggle-style literature (cf. Diggle, Heagerty, Liang and Zeger, Analysis of Longitudinal Data). The FLW-native diagnostic appears a few slides later: compare the estimated UN correlation matrix to a pattern-implied matrix and run a REML-LRT.
Purpose: Compare residual autocorrelation under different \(\Sigma\) assumptions.
| Design | Purpose |
|---|---|
| N = 400, m = 8 | Better estimation of \(\Sigma\) |
| \(\rho = 0.8\) (strong) | Independence leaves larger residual ACF |
Expected result: AR(1) fit shows flat residual ACF; Independence shows positive lag-1 ACF.
set.seed(667)
library(dplyr)
library(tidyr)
library(ggplot2)
library(nlme)
# Simulate longitudinal data with AR(1) errors
N <- 400 # more subjects
m <- 8 # more visits
rho <- 0.80 # stronger serial correlation
sigma <- 2 # marginal SD
# AR(1) generator with stationary Var = sigma^2
sim_ar1 <- function(m, rho, sigma){
e <- numeric(m)
e[1] <- rnorm(1, sd = sigma)
if (m > 1) {
w <- rnorm(m - 1, sd = sigma * sqrt(1 - rho^2))
for (t in 2:m) e[t] <- rho * e[t - 1] + w[t - 1]
}
e
}
ids <- factor(seq_len(N))
times <- seq_len(m)
design <- expand.grid(id = ids, time_idx = times) |>
arrange(id, time_idx) |>
mutate(
group = ifelse(as.integer(id) <= N/2, "A", "B"),
group = factor(group, levels = c("A","B")),
time_f = factor(time_idx, levels = times),
time_num = as.numeric(time_idx)
)
# Fixed mean structure
beta0 <- 10
beta_time <- 0.7
beta_groupB <- 1.0
beta_int_B <- 0.3
errs <- do.call(c, lapply(seq_len(N), function(i) sim_ar1(m, rho, sigma)))
sim_long <- design |>
mutate(
mu = beta0 +
beta_time * (time_num - 1) +
beta_groupB * as.numeric(group == "B") +
beta_int_B * as.numeric(group == "B") * (time_num - 1),
y = mu + errs
)# Fit GLS under Independence and AR(1)
gls_ind <- gls(y ~ group * time_f, data = sim_long, method = "REML")
gls_ar1 <- gls(y ~ group * time_f, data = sim_long,
correlation = corAR1(form = ~ time_idx | id),
method = "REML")
# Attach normalized residuals
res_ind <- sim_long |> mutate(res = residuals(gls_ind, type = "normalized"))
res_ar1 <- sim_long |> mutate(res = residuals(gls_ar1, type = "normalized"))# Pooled residual ACF by subject (lags 1-3)
lag_cor <- function(x, k){
n <- length(x); if (n <= k) return(NA_real_)
stats::cor(x[(k+1):n], x[1:(n-k)], use = "pairwise.complete.obs")
}
acf_by_id <- function(dat){
dat |>
arrange(id, time_idx) |>
group_by(id) |>
summarise(acf1 = lag_cor(res, 1),
acf2 = lag_cor(res, 2),
acf3 = lag_cor(res, 3),
.groups = "drop")
}
pool_acf <- function(dat, model_lab){
by_id <- acf_by_id(dat)
by_id |>
tidyr::pivot_longer(cols = dplyr::starts_with("acf"),
names_to = "lag", values_to = "rho") |>
mutate(lag = as.integer(sub("acf", "", lag))) |>
group_by(lag) |>
summarise(
mean_rho = mean(rho, na.rm = TRUE),
med_rho = median(rho, na.rm = TRUE),
se_rho = sd(rho, na.rm = TRUE)/sqrt(dplyr::n()),
.groups = "drop"
) |>
mutate(model = model_lab)
}
acf_ind <- pool_acf(res_ind, "Independence")
acf_ar1 <- pool_acf(res_ar1, "AR(1)")
acf_all <- bind_rows(acf_ind, acf_ar1)ggplot(acf_all, aes(x = factor(lag), y = mean_rho, fill = model)) +
geom_col(position = position_dodge(width = 0.7), width = 0.6) +
geom_errorbar(aes(ymin = mean_rho - 2*se_rho, ymax = mean_rho + 2*se_rho),
position = position_dodge(width = 0.7), width = 0.15) +
geom_hline(yintercept = 0, linewidth = 0.3) +
labs(title = "Pooled residual ACF by model (simulated AR(1) truth, rho = 0.80)",
subtitle = "Correct Sigma (AR(1)) shows flat residual ACF; Independence shows positive lag-1 correlation",
x = "Lag", y = "Residual correlation") +
theme_minimal()Beyond FLW (instructor extension)
The semivariogram is a Diggle-style tool (cf. Diggle, Heagerty, Liang and Zeger, Analysis of Longitudinal Data), not part of FLW Ch. 7. Shown here as a useful visual check on how residual correlation decays with time separation.
A variogram plots residual variance versus time separation.
| Pattern | Interpretation |
|---|---|
| Flat variogram | Compound Symmetry may be reasonable |
| Increasing variogram | Correlation decays with time (AR(1), Exponential) |
The diagnostic FLW Ch. 7 actually uses: fit the unstructured correlation matrix, then compare it to the correlation matrix implied by a pattern (e.g. AR(1) or CS). A REML likelihood-ratio test of the pattern against UN formalizes the comparison.
In FLW’s exercise-therapy example, the AR-vs-UN REML-LRT gives \(G^2 = 23.8\) on 13 df, \(p < 0.05\) (FLW p. 182), so the AR pattern does not provide an adequate fit relative to UN (AR is rejected). The exponential model (also 2 parameters) is the one FLW prefers: it has the lowest AIC and, unlike AR, is not rejected against UN at 0.05 (\(G^2 = 21.2\) on 13 df, \(p \approx 0.07\)).
# Fit the unstructured correlation model on the simulated AR(1) data
gls_un_sim <- gls(y ~ group * time_f, data = sim_long,
correlation = corSymm(form = ~ time_idx | id),
weights = varIdent(form = ~ 1 | time_f),
method = "REML")
# Estimated UN correlation matrix (the "data-driven" target)
cor_un <- corMatrix(gls_un_sim$modelStruct$corStruct)[[1]]
# AR(1)-implied correlation matrix: rho^|j-k|
rho_hat <- as.numeric(coef(gls_ar1$modelStruct$corStruct, unconstrained = FALSE))
lags <- abs(outer(seq_len(m), seq_len(m), "-"))
cor_ar1 <- rho_hat ^ lags
round(cor_un, 2) # unstructured estimate [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1.00 0.81 0.65 0.49 0.37 0.26 0.20 0.15
[2,] 0.81 1.00 0.80 0.63 0.49 0.37 0.28 0.19
[3,] 0.65 0.80 1.00 0.78 0.63 0.46 0.36 0.27
[4,] 0.49 0.63 0.78 1.00 0.80 0.61 0.48 0.36
[5,] 0.37 0.49 0.63 0.80 1.00 0.76 0.63 0.49
[6,] 0.26 0.37 0.46 0.61 0.76 1.00 0.83 0.65
[7,] 0.20 0.28 0.36 0.48 0.63 0.83 1.00 0.80
[8,] 0.15 0.19 0.27 0.36 0.49 0.65 0.80 1.00
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1.00 0.80 0.64 0.52 0.41 0.33 0.27 0.21
[2,] 0.80 1.00 0.80 0.64 0.52 0.41 0.33 0.27
[3,] 0.64 0.80 1.00 0.80 0.64 0.52 0.41 0.33
[4,] 0.52 0.64 0.80 1.00 0.80 0.64 0.52 0.41
[5,] 0.41 0.52 0.64 0.80 1.00 0.80 0.64 0.52
[6,] 0.33 0.41 0.52 0.64 0.80 1.00 0.80 0.64
[7,] 0.27 0.33 0.41 0.52 0.64 0.80 1.00 0.80
[8,] 0.21 0.27 0.33 0.41 0.52 0.64 0.80 1.00
# REML-LRT: AR(1) (reduced) nested in UN (full).
# Both share the same maximal mean model, so REML logLik is comparable.
G2 <- 2 * (as.numeric(logLik(gls_un_sim)) - as.numeric(logLik(gls_ar1)))
df <- attr(logLik(gls_un_sim), "df") - attr(logLik(gls_ar1), "df")
p <- pchisq(G2, df = df, lower.tail = FALSE)
data.frame(G2 = round(G2, 2), df = df, p_value = signif(p, 3))Same mean model (\(Y_{ij} \sim \text{group} \times \text{time}_f\)).
Compare \(\Sigma\) via gls: Independence, AR(1), UN (hetero).
Extract within-group change (week 1 - baseline) for each group with SEs using emmeans.
# Fit three covariance choices with the same mean
gls_ind <- gls(y ~ group * time_f, data = long, method = "REML")
gls_ar1 <- gls(y ~ group * time_f, data = long,
correlation = corAR1(form = ~ time_idx | id), method = "REML")
gls_un <- gls(y ~ group * time_f, data = long,
correlation = corSymm(value = start_vec_un, form = ~ time_idx | id),
weights = w_time,
method = "REML")# Helper: within-group change (week1 - baseline) via emmeans
get_change_tbl <- function(fit){
emm_t_by_g <- emmeans(fit, ~ time_f | group)
contr <- contrast(emm_t_by_g,
method = list("week1 - baseline" = c(-1, 1, 0, 0)))
as.data.frame(summary(contr))[, c("group","estimate","SE","df","t.ratio","p.value")]
}
tab_ind <- get_change_tbl(gls_ind); tab_ind$model <- "Independence"
tab_ar1 <- get_change_tbl(gls_ar1); tab_ar1$model <- "AR(1)"
tab_un <- get_change_tbl(gls_un ); tab_un$model <- "UN (hetero)"
tab_all <- bind_rows(tab_ind, tab_ar1, tab_un) %>%
select(model, group, estimate, SE, df, t.ratio, p.value)
tab_all model group estimate SE df t.ratio p.value
1 Independence Placebo -1.612 1.3251428 392.00000 -1.216473 2.245369e-01
2 Independence Succimer -13.018 1.3251428 392.00000 -9.823847 1.652083e-20
3 AR(1) Placebo -1.612 0.8005441 284.18647 -2.013631 4.499076e-02
4 AR(1) Succimer -13.018 0.8005441 284.18647 -16.261441 1.737753e-42
5 UN (hetero) Placebo -1.612 0.7919240 97.77276 -2.035549 4.450131e-02
6 UN (hetero) Succimer -13.018 0.7919240 97.77276 -16.438445 6.776871e-30
| Error | Cause |
|---|---|
Initialize.corSymm: wrong dimension |
Start vector length must equal \(m(m-1)/2\) |
| “unique values must be consecutive integers” | Ensure time_idx = as.integer(time_f) |
abs(value) non-numeric |
varIdent coefficients must be numeric |
| Observation | Interpretation |
|---|---|
| Estimates similar across \(\Sigma\) | Mean model is sensible |
| SEs shrink under AR(1) or UN | Realistic \(\Sigma\) recovers precision |
| Independence: larger SEs | Power left on the table |
Takeaway: Modeling \(\Sigma\) recovers precision for change contrasts.
| Principle | Reason |
|---|---|
| Model \(\Sigma\) | Alters SEs and power; necessary under MAR |
| Use contrast tables | Show efficiency gains clearly |
| Next: Specify \(\Sigma\) | Unstructured (7.3) and covariance-pattern (7.4) models |
Check Your Understanding: Why Model the Covariance?
If the within-person correlation \(\rho = 0.7\) and marginal variance \(\sigma^2 = 1\), what is \(\text{Var}(Y_{i2} - Y_{i1})\) for a within-person change?
Why does modeling \(\Sigma\) correctly matter for standard errors, even if we only care about the mean model?
What happens to power for detecting within-person change as \(\rho\) increases?
Answers:
\(\text{Var}(Y_{i2} - Y_{i1}) = 2\sigma^2(1-\rho) = 2(1)(1-0.7) = 0.6\). Compare this to \(2\sigma^2 = 2\) for independent observations.
Even though the point estimates \(\hat{\beta}\) may be similar, the standard errors depend on the estimated covariance structure. Misspecified \(\Sigma\) leads to incorrect SEs, invalid confidence intervals, and misleading hypothesis tests.
As \(\rho\) increases, the variance of within-person change decreases (since \(1-\rho\) gets smaller), so standard errors shrink and power increases. This is why longitudinal designs are often more powerful than cross-sectional designs.
Let \(\Sigma\) be arbitrary (symmetric, positive-definite).
| Visits (\(n\)) | Parameters |
|---|---|
| 3 | 6 |
| 5 | 15 |
| 10 | 55 |
Formula: \(n(n+1)/2\) parameters.
| Pros | Cons |
|---|---|
| Flexible | Many parameters |
| Captures unequal variances by time | Needs large \(N\) |
| Awkward for mistimed visits |
# UN with heterogeneous variances
gls_un <- gls(y ~ group * time_f, data = long,
correlation = corSymm(value = start_vec_un, form = ~ time_idx | id),
weights = w_time,
method = "REML")
# Compound Symmetry (homoscedastic)
gls_cs <- gls(y ~ group * time_f, data = long,
correlation = corCompSymm(form = ~ 1 | id),
method = "REML")
# AR(1) with equal spacing
gls_ar1 <- gls(y ~ group * time_f, data = long,
correlation = corAR1(form = ~ time_idx | id),
method = "REML")
# Exponential (continuous-time)
gls_exp <- gls(y ~ group * time_f, data = long,
correlation = corExp(form = ~ time_num | id),
method = "REML")
# Exponential with heterogeneous variances
gls_exp_het <- gls(y ~ group * time_f, data = long,
correlation = corExp(form = ~ time_num | id),
weights = w_time,
method = "REML")
# Exponential + nugget
gls_exp_nug <- gls(y ~ group * time_f, data = long,
correlation = corExp(form = ~ time_num | id, nugget = TRUE),
method = "REML")n <- 2:10
params_UN <- n*(n+1)/2
params_AR1 <- rep(2, length(n))
params_CS <- rep(2, length(n))
df <- data.frame(n, UN=params_UN, AR1=params_AR1, CS=params_CS) |>
tidyr::pivot_longer(-n, names_to="model", values_to="p")
ggplot(df, aes(n, p, color=model)) +
geom_line() + geom_point() +
labs(title="Number of covariance parameters vs visits",
x="Number of visits (n)", y="Number of parameters") +
theme_minimal()Rule-of-thumb: Be wary if \(N\) / #cov-params \(\lesssim 5\).
# A tibble: 9 × 8
n UN AR1 CS N N_per_param_UN N_per_param_AR1 N_per_param_CS
<int> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 2 3 2 2 100 33.3 50 50
2 3 6 2 2 100 16.7 50 50
3 4 10 2 2 100 10 50 50
4 5 15 2 2 100 6.67 50 50
5 6 21 2 2 100 4.76 50 50
6 7 28 2 2 100 3.57 50 50
7 8 36 2 2 100 2.78 50 50
8 9 45 2 2 100 2.22 50 50
9 10 55 2 2 100 1.82 50 50
library(reshape2)
# Correlation matrix from UN (first subject)
C_un_list <- corMatrix(gls_un$modelStruct$corStruct)
C_un <- if (is.list(C_un_list)) C_un_list[[1]] else C_un_list
dfC <- melt(C_un)
ggplot(dfC, aes(Var1, Var2, fill = value)) +
geom_tile() + coord_equal() +
scale_fill_gradient2(limits = c(-1,1)) +
labs(title = "Implied correlation (UN)", x = "Visit index (1..4)", y = "Visit index (1..4)", fill = "rho") +
theme_minimal()# varIdent returns relative SD multipliers by factor level (base level=1)
w_coef <- coef(gls_un$modelStruct$varStruct, unconstrained = FALSE)
levs <- levels(long$time_f)
w_all <- setNames(rep(1, length(levs)), levs)
w_all[names(w_coef)] <- w_coef
sigma_eps <- sigma(gls_un)
sd_time <- sigma_eps * w_all
data.frame(time = names(sd_time), sd = as.numeric(sd_time)) %>%
ggplot(aes(x = time, y = sd, group = 1)) +
geom_line() + geom_point(size = 2) +
labs(title = "Estimated residual SD by time (UN + varIdent)",
x = "Time (weeks)", y = "SD") +
theme_minimal()| Strength | Limitation |
|---|---|
| Flexible | Parameter count grows quickly |
| Captures non-constant variances | Requires ample \(N\) relative to \(n(n+1)/2\) |
| Captures irregular pairwise correlations | Not ideal for mistimed visits |
Check Your Understanding: Unstructured Covariance
How many covariance parameters does the UN model require for a study with 6 planned visits?
A colleague fits UN to a dataset with \(N = 50\) subjects and 8 visits. Should they be concerned? Why?
What does the “varIdent” component add on top of the correlation structure in gls?
Answers:
\(n(n+1)/2 = 6(7)/2 = 21\) parameters.
Yes, definitely. With 8 visits, UN requires \(8(9)/2 = 36\) covariance parameters. The rule of thumb is \(N\)/params \(\gtrsim 5\), so they would need at least \(36 \times 5 = 180\) subjects. With only 50 subjects, the ratio is about 1.4, which risks unstable estimates.
The corSymm structure models the correlation matrix (values between -1 and 1), while varIdent allows heterogeneous variances across time points. Together, they model the full covariance matrix with time-varying SDs.
| Failure Mode | Remedy |
|---|---|
| Non-positive-definite estimates | Use structured \(\Sigma\) (AR(1), Toeplitz, Exponential) |
| Non-convergence | Increase optimizer iterations/tolerance |
| Inflated SEs when \(N\) is small | Simplify mean model |
| Overfit | Consider mixed-effects (RI/RS; Chapter 8) |
| Issue | Solution |
|---|---|
time_idx not consecutive |
Ensure consecutive integers |
| UN start vector wrong length | corSymm(value = rep(0.2, p_lower)) where \(p = m(m-1)/2\) |
varIdent coefficients invalid |
Initialize with numeric values, not factors/chars |
| Non-convergence | Use glsControl(msMaxIter = 200, tolerance = 1e-8) |
| UN unstable | Fit ARH(1)/EXP first, then UN |
| Goal | Approach |
|---|---|
| Balance parsimony vs. flexibility | Time-series inspired patterns |
| Model correlation decay | CS, AR(1), Exponential |
| Handle heterogeneous variances | Add varIdent(~ 1 | time_f) |
\[\Sigma = \sigma^2\{(1-\rho)I + \rho J\}\]
Constant variance, constant correlation.
Often unrealistic (no decay with lag); useful baseline comparator.
Value Std.Error t-value p-value
(Intercept) 26.272 0.9370175 28.0378975 1.054330e-95
groupSuccimer 0.268 1.3251428 0.2022424 8.398322e-01
time_f1 -1.612 0.8428574 -1.9125418 5.653521e-02
time_f4 -2.202 0.8428574 -2.6125416 9.332721e-03
time_f6 -2.626 0.8428574 -3.1155923 1.970857e-03
groupSuccimer:time_f1 -11.406 1.1919804 -9.5689497 1.238048e-19
[1] 0.8310651
\[\text{Corr}(k) = \rho^k\]
where \(k\) = lag in visits.
| Pros | Cons |
|---|---|
| Parsimonious (2 params) | Can decay too fast |
| Requires equal spacing |
Value Std.Error t-value p-value
(Intercept) 26.272 0.9318003 28.1948833 2.436323e-96
groupSuccimer 0.268 1.3177646 0.2033747 8.389477e-01
time_f1 -1.612 0.8005441 -2.0136306 4.473123e-02
time_f4 -2.202 1.0223615 -2.1538370 3.186201e-02
time_f6 -2.626 1.1403270 -2.3028482 2.181034e-02
groupSuccimer:time_f1 -11.406 1.1321403 -10.0747234 2.208258e-21
[1] 1.485959
\[\text{Corr}(t_j, t_k) = \exp\{-\theta|t_j - t_k|\}\]
Respects actual time gaps.
| Caveat | Issue |
|---|---|
| Replicates at same time | \(\text{Corr} \to 1\) (unless nugget) |
| Long lags | \(\text{Corr} \to 0\) |
Value Std.Error t-value p-value
(Intercept) 26.272 0.9566816 27.4615919 2.338299e-93
groupSuccimer 0.268 1.3529521 0.1980854 8.430810e-01
time_f1 -1.612 0.6708782 -2.4028207 1.673318e-02
time_f4 -2.202 1.1128663 -1.9786744 4.855221e-02
time_f6 -2.626 1.2222135 -2.1485608 3.228123e-02
groupSuccimer:time_f1 -11.406 0.9487650 -12.0219438 1.482515e-28
range
1.265128
Problem: Models like corExp assume \(\text{Corr}(Y_{ij}, Y_{ik}) \to 1\) as \(|t_j - t_k| \to 0\).
This implies perfect measurement, which is rarely true.
Extra variance component \(\tau^2\) added to the diagonal:
\[\text{Var}(Y_{ij}) = \sigma^2 + \tau^2\]
Interpretation:
Correlation for measurements at the exact same time:
\[\text{Corr}(Y_{ij}, Y_{ik})|_{t_j=t_k} = \frac{\sigma^2}{\sigma^2 + \tau^2} < 1\]
In nlme: Use correlation = corExp(form = ~ time_num | id, nugget = TRUE).
Allow \(\operatorname{Var}(Y_{ij})\) to differ by time:
\[\operatorname{Var}(Y_{ij}) = \sigma^2 \times \omega_{t(j)}^2, \quad \omega_t = 1 \text{ at base level}\]
library(reshape2); library(dplyr); library(ggplot2)
get_cor_mat <- function(fit){
M <- nlme::corMatrix(fit$modelStruct$corStruct)
if (is.list(M)) M[[1]] else M
}
mats <- list(CS=gls_cs, AR1=gls_ar1, EXP=gls_exp, UN=gls_un) |>
purrr::imap(~melt(get_cor_mat(.x)) |> mutate(model=.y))
bind_rows(mats) |>
ggplot(aes(Var1, Var2, fill=value)) +
geom_tile() + coord_equal() +
scale_fill_gradient2(limits=c(-1,1)) +
facet_wrap(~model, nrow=1) +
labs(title="Implied correlation by Sigma", x="Visit index", y="Visit index", fill="rho") +
theme_minimal()mods <- list(
UN_het = gls_un,
CS = gls_cs,
AR1 = gls_ar1,
AR1het = gls_ar1_het,
EXP = gls_exp,
EXP_het = gls_exp_het,
EXP_nug = gls_exp_nug
)
# Rank by AIC (FLW's recommended non-nested criterion); BIC for reference only.
data.frame(
model = names(mods),
AIC = sapply(mods, AIC),
BIC = sapply(mods, BIC), # reference only; FLW p. 179 advises against for Sigma
logLik = sapply(mods, logLik)
) |>
dplyr::arrange(AIC) model AIC BIC logLik
UN_het UN_het 2452.076 2523.559 -1208.038
AR1het AR1het 2477.640 2529.266 -1225.820
CS CS 2480.621 2520.334 -1230.311
EXP_nug EXP_nug 2482.503 2526.187 -1230.251
AR1 AR1 2492.631 2532.343 -1236.315
EXP_het EXP_het 2513.537 2565.164 -1243.769
EXP EXP 2521.918 2561.631 -1250.959
# Independence (for boundary LRT demo)
gls_ind <- gls(y ~ group * time_f, data = long, method = "REML")
# CS, AR(1), Exponential, UN (hetero)
gls_cs <- gls(y ~ group * time_f, data = long,
correlation = corCompSymm(value = 0.2, form = ~ 1 | id), method = "REML")
gls_ar1 <- gls(y ~ group * time_f, data = long,
correlation = corAR1(value = 0.2, form = ~ time_idx | id), method = "REML")
gls_exp <- gls(y ~ group * time_f, data = long,
correlation = corExp(value = 0.2, form = ~ time_num | id), method = "REML")
gls_un <- gls(y ~ group * time_f, data = long,
correlation = corSymm(value = start_vec_un, form = ~ time_idx | id),
weights = w_time, method = "REML")mods <- list(
IND = gls_ind,
CS = gls_cs,
AR1 = gls_ar1,
AR1het = gls_ar1_het,
EXP = gls_exp,
UN_het = gls_un
)
# AIC ranks non-nested candidates; for nested pairs use the REML-LRT.
# BIC shown for reference only (FLW p. 179 advises against it for Sigma).
data.frame(
model = names(mods),
AIC = sapply(mods, AIC),
BIC = sapply(mods, BIC),
logLik = sapply(mods, logLik)
) |>
dplyr::arrange(AIC)Decide on REML-LRT + AIC, not BIC (FLW p. 179)
For nested structures (e.g. AR(1) within UN, CS within UN) use the REML-LRT; for non-nested structures use AIC. FLW recommend against BIC here: its heavier penalty risks “selecting a model that is too simple or parsimonious” for \(\Sigma\).
model AIC BIC logLik
UN_het UN_het 2452.076 2523.559 -1208.038
AR1het AR1het 2477.640 2529.266 -1225.820
CS CS 2480.621 2520.334 -1230.311
AR1 AR1 2492.631 2532.343 -1236.315
EXP EXP 2521.918 2561.631 -1250.959
IND IND 2644.255 2679.997 -1313.128
| Scenario | Recommendation |
|---|---|
| Few visits, balanced | UN-het if \(N\) is sufficient |
| Equal spacing, moderate decay | AR(1); add varIdent if variances differ |
| Unequal spacing / mistimed | Exponential (consider nugget) or hybrid (RI + Exp) |
Always check: implied correlation heatmaps, residual diagnostics, AIC.
Read CS as a random-intercept model: \(\rho = \sigma_b^2/(\sigma_b^2 + \sigma^2)\). Dropping the random intercept (CS to Independence) tests the variance component \(H_0: \sigma_b^2 = 0\) vs \(H_A: \sigma_b^2 > 0\), which is on the boundary of the parameter space, so this is the genuine boundary test.
(For CS proper, \(\rho = 0\) is interior, not on a boundary: CS allows \(\rho\) down to \(-1/(n-1)\). The boundary issue is the variance component being pinned at 0, equivalent to \(\rho = 0\) only under this CS-as-random-intercept reading.)
| Approach | Issue |
|---|---|
| Naive \(\chi^2\) | Over-penalizes, selects overly simple models |
| Mixture \(\frac{1}{2}\chi^2_0 + \frac{1}{2}\chi^2_1\) | Correct for testing one variance at 0 |
| Conservative \(\alpha = 0.10\) | Alternative approach |
L08 develops the variance-component boundary test in full (FLW Section 8.5 / Table C.1). Do not use REML LRTs to compare different mean models.
# REML LRT: IND (reduced) vs CS read as a random intercept (full).
# Testing the random-intercept variance H0: sigma_b^2 = 0 (on the boundary).
LR <- 2 * (as.numeric(logLik(gls_cs)) - as.numeric(logLik(gls_ind)))
df <- 1 # one extra variance component
p_standard <- pchisq(LR, df = df, lower.tail = FALSE) # naive
p_mixture <- 0.5 * pchisq(LR, df = df, lower.tail = FALSE) # correct boundary mixture
data.frame(LR = LR, df = df,
p_naive_chisq = p_standard,
p_mixture_50_50 = p_mixture) LR df p_naive_chisq p_mixture_50_50
1 165.6342 1 6.64936e-38 3.32468e-38
A random intercept model induces Compound Symmetry covariance.
We can fit this using lme() from nlme.
Results for fixed effects (\(\beta\) and SEs) will be nearly identical to gls with corCompSymm.
Value Std.Error t-value p-value
(Intercept) 26.272 0.9370175 28.0378982 1.054323e-95
groupSuccimer 0.268 1.3251428 0.2022424 8.398322e-01
time_f1 -1.612 0.8428574 -1.9125418 5.653521e-02
time_f4 -2.202 0.8428574 -2.6125416 9.332722e-03
time_f6 -2.626 0.8428574 -3.1155923 1.970858e-03
groupSuccimer:time_f1 -11.406 1.1919804 -9.5689495 1.238049e-19
groupSuccimer:time_f4 -8.824 1.1919804 -7.4028065 8.217104e-13
groupSuccimer:time_f6 -3.152 1.1919804 -2.6443389 8.513830e-03
Value Std.Error DF t-value p-value
(Intercept) 26.272 0.9370175 294 28.0378980 4.595088e-85
groupSuccimer 0.268 1.3251428 98 0.2022424 8.401465e-01
time_f1 -1.612 0.8428574 294 -1.9125418 5.677827e-02
time_f4 -2.202 0.8428574 294 -2.6125416 9.449112e-03
time_f6 -2.626 0.8428574 294 -3.1155923 2.017079e-03
groupSuccimer:time_f1 -11.406 1.1919804 294 -9.5689496 4.631638e-19
groupSuccimer:time_f4 -8.824 1.1919804 294 -7.4028065 1.411814e-12
groupSuccimer:time_f6 -3.152 1.1919804 294 -2.6443389 8.624648e-03
| Principle | Rationale |
|---|---|
| Mean and covariance are interdependent | Choose \(\Sigma\) under a maximal mean |
| Use REML for \(\Sigma\) selection | Same fixed effects across candidates |
| AIC comparable only when | Fixed effects identical + same estimation method (BIC not recommended for \(\Sigma\), FLW p. 179) |
| REML LRTs | For nested \(\Sigma\) (UN vs pattern: ordinary \(\chi^2\)). Use the boundary mixture only when a variance/correlation is tested at 0. |
The Problem: Misspecified mean leaves structured patterns in residuals.
The Risk: We might model leftover structure as part of covariance.
| Step | Action |
|---|---|
| 1 | Start with flexible mean (e.g., group * time_as_factor) |
| 2 | Use rich mean to select best \(\Sigma\) using REML |
| 3 | Once \(\Sigma\) chosen, simplify mean using ML for nested comparisons |
Goal: Separate modeling mean from modeling covariance.
| Strength | Description |
|---|---|
| Parsimony | Fewer parameters than UN when \(n_{\text{time}}\) grows |
| Interpretability | Patterns map to scientific intuition |
| Availability | Standard software; easy REML-AIC comparison |
| Handles missing | Works with balanced design + intermittent missingness |
| Weakness | Issue |
|---|---|
| Equal spacing assumption | AR(1), Toeplitz less suitable for irregular times |
| Constant variance | Often wrong; prefer heterogeneous variances |
| UN parameter explosion | \(n(n+1)/2\) parameters; unstable with modest \(n_{\text{time}}\) |
| EXP assumptions | \(\text{Corr}(t,t)=1\); \(\text{Corr} \to 0\) quickly unless nugget added |
| Recommendation | Rationale |
|---|---|
| Use maximal mean | Time as factor with key between-subject covariates |
| Prefer heterogeneous variances | varIdent(~1|time_f) |
| Compare small set | UN-hetero, AR(1), Exponential |
| Choose by REML-AIC | Check nested vs UN via REML-LRT |
Check Your Understanding: Covariance Patterns
What is the key difference between AR(1) and Exponential correlation structures?
In what situation would you add a “nugget” effect to an Exponential model?
Why do we use REML (not ML) when comparing different covariance structures?
Answers:
AR(1) uses visit index (requires equally-spaced measurements): \(\text{Corr}(k) = \rho^k\) where \(k\) is the lag in visits. Exponential uses actual time differences: \(\text{Corr}(t_j, t_k) = \exp(-\theta|t_j - t_k|)\). Use Exponential when visits are unequally spaced or mistimed.
The nugget handles measurement error or micro-scale variability. Without it, \(\text{Corr}(Y_{ij}, Y_{ik}) \to 1\) as \(|t_j - t_k| \to 0\), implying perfect measurement. If you have replicate measurements at the same time point, or believe there is inherent measurement error, add nugget = TRUE.
REML (restricted maximum likelihood) provides less biased estimates of variance components by accounting for the loss of degrees of freedom used to estimate fixed effects. When comparing covariance structures, we hold fixed effects constant and use REML to get the best variance estimates. ML would be used when comparing different fixed-effects specifications.
| Step | Action |
|---|---|
| 1 | Fix mean: saturated group * time_f |
| 2 | Try \(\Sigma\) candidates: UN+hetero, AR(1), EXP |
| 3 | Compare REML-AIC |
| 4 | For a pattern nested in UN: REML-LRT, ordinary \(\chi^2\) (df = difference in covariance parameters) |
| 5 | Refit final model; proceed to inference on \(\beta\) |
| Element | Include |
|---|---|
| Design & missingness | Planned times, spacing, missingness map |
| Mean model | Maximal fixed effects used for \(\Sigma\) selection |
| \(\Sigma\) candidates & fits | REML logLik and AIC (BIC not recommended for \(\Sigma\), FLW p. 179) |
| Nested tests | REML-LRTs (UN vs pattern: ordinary \(\chi^2\), df = parameter difference). The boundary mixture applies only when testing a variance/correlation = 0 (e.g. \(\rho = 0\)). |
| Estimated \(\Sigma\) | Key parameters (\(\rho\), \(\theta\)), varIdent multipliers, heatmap |
| Diagnostics | UN vs pattern-implied correlation (FLW); residual ACF/semivariogram (instructor extension); convergence notes |
| Sensitivity | Alternative \(\Sigma\); final choice justification |
| Mistake | Consequence |
|---|---|
| Comparing \(\Sigma\) with different fixed effects | AIC not comparable |
| Using ML for \(\Sigma\) selection | Use REML |
| Ignoring boundary mixtures in LRTs | Incorrect p-values |
| Assuming equal spacing for AR(1) | Model mismatch |
| Treating varIdent coefficients as variances | They are SD multipliers |
Mistakes That Lead to Wrong Results
| Mistake | Why It Is Wrong | Correct Approach |
|---|---|---|
| Using ML for \(\Sigma\) selection | ML underestimates variance components | Use REML when comparing covariance structures |
| Comparing AIC across different fixed effects | AIC not comparable | Keep fixed effects identical; only vary \(\Sigma\) |
| Ignoring boundary mixtures in LRTs | P-values are too conservative | Use \(\frac{1}{2}\chi^2_0 + \frac{1}{2}\chi^2_1\) when testing a variance component at 0 (e.g. \(\sigma_b^2 = 0\)) |
| Using AR(1) with unequal spacing | Model assumes lag = visit index | Use Exponential for continuous time |
| Interpreting varIdent as variances | They are SD multipliers relative to reference | Multiply by \(\sigma_\varepsilon\) to get actual SDs |
Forgetting time_idx must be consecutive integers |
corAR1/corSymm require integer indexing |
Create time_idx = as.integer(time_f) |
| Fitting UN with too few subjects | Unstable/non-PD estimates | Rule of thumb: \(N\)/params \(\gtrsim 5\) |
A complete standalone covariance-pattern fit in SAS (static reference, not run here; R is the course’s runnable language):
/* Covariance-pattern model on the TLC data; static reference, not run here. */
PROC MIXED DATA=tlc METHOD=REML;
CLASS id group time;
MODEL y = group time group*time / SOLUTION CHISQ;
REPEATED time / SUBJECT=id TYPE=UN R RCORR; /* TYPE=CS or TYPE=AR(1) for those patterns */
RUN;
Swap TYPE=UN to TYPE=CS, TYPE=AR(1), or TYPE=SP(EXP)(time) to fit the other covariance patterns; the nlme equivalents are below.
| SAS | R (nlme) |
|---|---|
TYPE=UN |
corSymm(~ time_idx | id) + varIdent(~ 1 | time_f) |
TYPE=CS |
corCompSymm(~ 1 | id) or random intercept |
TYPE=AR(1) |
corAR1(~ time_idx | id) |
TYPE=SP(EXP)(ctime) |
corExp(~ time_num | id) |
BIOS 667 · UNC Gillings - Lecture 7 (Ch.7)