Fitzmaurice, Laird & Ware (2011): Applied Longitudinal Data Analysis, Chapters 1-3
Author
Your Name Here
Published
June 23, 2026
Instructions
Collaboration Policy: You may discuss problems with classmates, but all submitted work must be your own.
AI Use Policy: AI tools may be used for code assistance. Disclose any AI use in your submission.
Submission: Submit a rendered HTML file to Gradescope by the due date.
Due Date: TBD - Check Canvas
Grading Rubric
Category
Weight
Description
Interpretation & Reasoning
40%
Conceptual understanding, correct interpretations
Analysis & Setup
40%
Code correctness, model specification
Clarity & Presentation
20%
Writing quality, formatting, organization
Setup
Question 1: Longitudinal vs. Cross-Sectional Data (20 points)
The FEV1 dataset (fev1.txt) contains repeated measurements of lung function (FEV1) in children from the Six Cities Study. Each child was measured annually, with measurements taken at different ages.
# Load the FEV1 data with download fallbackdata_url <-"https://content.sph.harvard.edu/fitzmaur/ala2e/fev1.txt"data_file <-if (file.exists("../../data/fev1.txt")) "../../data/fev1.txt"else"data/fev1.txt"if (!file.exists(data_file)) {dir.create("data", showWarnings =FALSE)download.file(data_url, data_file)}fev <-read.table(data_file, header =TRUE)head(fev)
Part (a) - WRITE: What type of study design is this? (4 pts)
Explain whether this is a longitudinal or cross-sectional study and justify your answer with specific features from the data.
Your answer here
Part (b) - CODE: Calculate summary statistics (4 pts)
Calculate the number of observations per child. What is the range, mean, and median number of observations?
# Your code here
Part (c) - WRITE: Notation (6 pts)
Using the notation from Chapter 1, define the following for this dataset:
What does \(Y_{ij}\) represent?
What does \(n_i\) represent, and why might it vary?
What time-varying and time-constant covariates are present?
Your answer here
Part (d) - CODE: Create a spaghetti plot (6 pts)
Create a spaghetti plot showing individual trajectories of logfev1 against age. Overlay the mean trajectory with a 95% confidence band.
# Your code here
Question 2: Sources of Correlation (20 points)
The dental dataset (dental.txt) contains measurements of dental growth (distance from pituitary to pterygomaxillary fissure) in children at ages 8, 10, 12, and 14.
Part (a) - WRITE: Identify sources of correlation (5 pts)
Based on Chapter 2, describe the three sources of correlation in repeated measures data and explain which are likely present in this dental growth study.
Your answer here
Part (b) - CODE: Compute and visualize the correlation matrix (5 pts)
Compute the sample correlation matrix of dental measurements across the four ages. Display it as a heatmap.
# Your code here
Part (c) - WRITE: Interpret the correlation structure (5 pts)
Based on the correlation matrix, what can you conclude about:
The overall strength of within-subject correlation
Whether correlation decays with increasing time separation
Which covariance structure might be appropriate (compound symmetry, AR(1), or unstructured)?
Your answer here
Part (d) - CODE: Compare correlations by gender (5 pts)
Compute separate correlation matrices for males and females. Comment on any differences.
# Your code here
Question 3: Data Structures and Notation (20 points)
Part (a) - CODE: Wide vs. Long Format (5 pts)
The dental data was originally in wide format. Demonstrate the conversion from wide to long format and explain why long format is preferred for most longitudinal analyses.
# Your code here
Your answer here
Part (b) - WRITE: Design matrix construction (5 pts)
For a simple linear model where dental distance depends on age and gender, write out the design matrix \(\mathbf{X}_i\) for a single subject with 4 measurements.
Your answer here
Part (c) - CODE: Create design matrix in R (5 pts)
Write R code to construct the design matrix for subject 1 in the dental data.
# Your code here
Part (d) - WRITE: Response vector and covariance (5 pts)
Write out the response vector \(\mathbf{Y}_i\) and explain what the covariance matrix \(\Sigma_i = \text{Cov}(\mathbf{Y}_i)\) represents. What is the dimension of \(\Sigma_i\) for a subject with 4 measurements?
Your answer here
Question 4: Linear Models and OLS Assumptions (20 points)
Part (a) - CODE: Fit naive OLS model (5 pts)
Fit an ordinary least squares (OLS) regression model to the dental data, regressing distance on age and gender. Ignore the repeated measures structure.
# Your code here
Part (b) - WRITE: OLS assumptions (5 pts)
List the four key assumptions of OLS and explain which assumption is violated when applying OLS to longitudinal data.
Your answer here
Part (c) - WRITE: Consequences of ignoring correlation (5 pts)
Explain the consequences of ignoring within-subject correlation when fitting OLS. What happens to:
Coefficient estimates?
Standard error estimates?
Hypothesis tests and confidence intervals?
Your answer here
Part (d) - CODE: Compare OLS with model accounting for correlation (5 pts)
Fit a model that accounts for the repeated measures structure using gls() from the nlme package with compound symmetry correlation. Compare the standard errors to the naive OLS model.
# Your code here
Question 5: Model Comparison and Visualization (20 points)
Part (a) - CODE: Create comprehensive EDA (6 pts)
Create a multi-panel figure that shows:
Individual trajectories by gender (spaghetti plot)
Mean trajectories with 95% CI by gender
Boxplots at each age by gender
# Your code here
Part (b) - WRITE: Interpret visualizations (4 pts)
Based on your visualizations, describe:
The overall pattern of dental growth
Differences between genders
Variability within and between subjects
Your answer here
Part (c) - CODE: Fit and compare models (6 pts)
Fit three models to the dental data:
Model with only age effect
Model with age and gender (additive)
Model with age × gender interaction
Compare using AIC and interpret which model is preferred.
# Your code here
Part (d) - WRITE: Synthesize findings (4 pts)
Write a brief summary (3-4 sentences) that a clinician could understand, describing what you learned about dental growth patterns from this analysis.
Your answer here
Peer Review Section
After submission, you will be assigned peer reviews. Use this rubric: