This is the older approach that uses the glm function to maximize with respect to \(\boldsymbol{\beta}\) in the M-step, as well use a closed form estimator for \(\sigma^2_{gamma}\) given the samples from the E-step. We repeat each observation \(\boldsymbol{y_{i}}\) and corresponding predictor in the design matrix \(M\) times, filling in \(\boldsymbol{X}_k\) for each repeated row for \(\gamma_i\) in the linear predictor above. Each repeated row is weighted with \(1/M\). We can also show that the solution for \(\sigma^2_{\gamma}\) is simply the sum of the squared \(X_k\)’s divided by \(nM\).
First, let’s set up the rejection sampler:
## function for Rejection Sampling in the MCEM E-step for subject i
# yi is a 5 x 1 vector containing the monthly observations for sample i
# xi: a 5 x 2 matrix pertaining to the intercept and longitudinal month value for each subject (it is the same for each subject, 1:5)
# M is the total number of iid samples desired from the posterior
# maxit is the max number of iterations (to prevent runaway looping)
# betat is the estimate for beta at EM iteration t
# s2gammat is the estimate for s2gamma at EM iteration t
## Returns a list object with accepted samples and the acceptance rate
rejection.sample.gamma.posterior.i = function(yi, xi = cbind(rep(1,5), 1:5), M, maxit, betat, s2gammat, trace = 0) {
## envelope function setup, done once
# obtain gamma_hat to create the envelope function, optimizing Likelihood wrt gammai
# can use any optimization routine, here using IRLS holding XB fixed as an offset
# log(lambda_i) = X%*%beta_t + gamma_i --> fix X and beta_t, treat gamma_i as coefficient
gamma_hat = glm(yi ~ 1, family = poisson(),
offset = xi %*% beta)$coef
# calculate lambda values for rejection ratio denominator
lambda_star = exp(xi %*% beta + gamma_hat)
# calculate rejection ratio denominator
Le = prod(dpois(yi, lambda = lambda_star))
## specify vector to hold samples
posterior_gammai = rep(0, M)
## start rejection sampler
m = index = 1
while (m <= M & index < maxit) {
# sample from normal prior (proposal density here)
gammai = rnorm(1, 0, sqrt(s2gammat))
# calculate lambda values for rejection ratio numerator
lambdai = exp(xi %*% beta + gammai)
# calculate ratio numerator
Lq = prod(dpois(yi, lambda = lambdai))
# calculate ratio
r = Lq / Le
# sample Ui from standard uniform
U = runif(1)
# if step 3 met, save sample
if (U < r) {
posterior_gammai[m] = gammai
# update num current acceptances
m = m + 1
}
# keep track of total number of samples drawn
index = index + 1
}
## acceptance rate
if (trace > 0)
print(M / index)
## return accepted samples and accepted rate in a list
return(list(gammai = posterior_gammai, ar = M / index))
}
## wrapper function for Rejection Sampling over all observations
# data is the main data.table object
# data_aug is the augmented data.table object
# other arguments same as above
## Returns data_aug with updated rejection samples column
rejection.sample.gamma.posterior.all = function(data,
data_aug,
M,
maxit,
betat,
s2gammat,
trace = 0) {
## get n, total number of subjects
n = length(unique(data[,subject]))
## looping over n subjects
for (i in 1:n) {
if(trace > 0) print(i)
# draw M samples from the posterior for gamma_i
rejection.samples.i =
rejection.sample.gamma.posterior.i(
yi = data[subject == i, words],
M = M,
maxit = maxit,
betat = betat,
s2gammat = s2gammat,
trace = trace
)$gammai
# since months in each subject share same gamma_i, we repeat
# the draw for each month in a subject
rejection.samples.i = rejection.samples.i[rep(1:M, each = 5)]
# fill in last col of augmented matrix with drawn samples for subj i
data_aug[subject == i, rejection.samples := rejection.samples.i]
if(trace > 0) print("saved data")
}
## return data_aug
return(data_aug)
}
To simplify the implementation of the EM algorithm using only off the shelf methods for maximization in the M-step, we augment the data matrices used in this example by repeating the data pertaining to each subject \(M\) times in the original data matrix. This will also facilitate vectorized evaluations of the Q-function in the E-step. We introduce a function below to help with this (we will circle back to how this works in a bit)
augment = function(data, M){
# convert data to data.table
if(!is.data.table(data)) data = data.table(data)
# repeat each row M times
data_aug = data[rep(1:nrow(data), M),]
# append a column indicating the index of M for each draw in each obs
data_aug$M = rep(1:M, each = nrow(data))
# resort by subject, M, then month. This repeats the 5 row
# block of data for each subject M times
data_aug = data_aug[order(subject, M, month),]
# add another column to hold the M poterior draws per i later
# initialize column to 0, will update later with actual draws
data_aug$rejection.samples = rep(0, nrow(data_aug))
# create repeated design matrix corresponding to data_aug
X_aug = model.matrix(~ 1 + month, data = data_aug)
## return values
return(list(data_aug, X_aug))
}
Now that we have this helper function, lets read in the data and set up everything for the EM algorithm
## read in the data, convert to data.table
library(data.table)
alz = read.table("alzheimers.dat", header = T)
alz = data.table(alz)
## set initial parameters
# number of subjects
n = length(unique(alz$subject))
# relative convergence threshold
tol = 10^-5
# max number of iterations
maxit = 100
# initialize iteration counter, epsilon, qfunction value to avoid triggering while loop
iter = 0
eps = Inf
qfunction = -10000 # using Qfunction for convergence
## starting values for beta/s2gamma
# fixed effects, same as before
beta = c(1.804, 0.165)
# random intercept variance, same as before
s2gamma = 0.000225 + .01
## set M = 10000 draws from posterior per observation in E-step
M = 10000
## Repeat the data pertaining to y_i, X_i M times
# This allows for easy vectorization of the sum shown in the Q-function approximation over n and M
# run the function to do the repeating from above
aug = augment(data = alz, M = M)
# extract alz matrix with data pertaining to i'th obs repeated M times
alz_M_aug = aug[[1]]
# extract predictor matrix with rows repeated in same manner as alz_M_aug
X_m_aug = aug[[2]]
# Check Dimensions of original vs augmented matrix for M step
dim(alz) # originally (22*5) rows
## [1] 110 3
dim(alz_M_aug) # now (22*5*M) rows
## [1] 1100000 5
# peak at each to see the difference
head(alz, 10)
## subject month words
## 1: 1 1 9
## 2: 1 2 12
## 3: 1 3 16
## 4: 1 4 17
## 5: 1 5 18
## 6: 2 1 6
## 7: 2 2 7
## 8: 2 3 10
## 9: 2 4 15
## 10: 2 5 16
head(alz_M_aug, 10)
## subject month words M rejection.samples
## 1: 1 1 9 1 0
## 2: 1 2 12 1 0
## 3: 1 3 16 1 0
## 4: 1 4 17 1 0
## 5: 1 5 18 1 0
## 6: 1 1 9 2 0
## 7: 1 2 12 2 0
## 8: 1 3 16 2 0
## 9: 1 4 17 2 0
## 10: 1 5 18 2 0
Let’s test out the Rejection Sampler on just the first sample, given the starting values:
# run RS
test_sample = rejection.sample.gamma.posterior.i(
yi = alz[subject == 1, words],
M = M,
maxit = maxit*M,
betat = beta,
s2gammat = s2gamma
)
# acceptance rate
print(test_sample$ar)
## [1] 0.07717598
# mean
print(mean(test_sample$gammai))
## [1] 0.1370397
# variance
print(var(test_sample$gammai))
## [1] 0.006292223
#histogram
hist(test_sample$gammai)
Now that thats out of the way, lets apply the MCEM algorithm.
start = Sys.time()
while(eps > tol & iter < maxit){
## save old qfunction
qfunction0 = qfunction
## Begin E-step
# update rejection.samples column with the new draws
alz_M_aug = rejection.sample.gamma.posterior.all(
data = alz,
data_aug = alz_M_aug,
M = M,
maxit = maxit * M,
betat = beta,
s2gammat = s2gamma
)
## evaluate qfunction given drawn samples, current param estimates
#lambda
lambda_M_aug = exp(X_m_aug %*% beta + alz_M_aug[, rejection.samples])
# qfunction calculation (vectorized)
qfunction =
# sum over first part of Q
sum(dpois(alz_M_aug[, words], lambda = lambda_M_aug, log = T)) +
# sum over second part of Q, no need to use repeated gamma over months here
sum(dnorm(alz_M_aug[month == 1, rejection.samples],
mean = 0,
sd = sqrt(s2gamma),
log = T))
# take average over M
qfunction = (qfunction) / M
## End E-step
## Calculate relative change in qfunction from prior iteration
eps = abs(qfunction - qfunction0) / abs(qfunction0)
## Start M-step
# s2gamma, derived similar as the MLE for sigma^2 from normal with mean 0, averaged over M
# closed form derived from Q function approximation given earlier
s2gamma = sum(alz_M_aug[month == 1, rejection.samples] ^ 2) / n
s2gamma = s2gamma / M
# beta, poisson glm with gammai's filled in with M posterior draws/subject
# averaging over M draws, so weight = 1/M for each repeated row with draw filled in
# posterior draws for gamma_i area are filled in as offsets in the glm
fit = glm(
alz_M_aug$words ~ X_m_aug - 1,
family = poisson(),
weights = rep(1 / M, nrow(X_m_aug)),
offset = alz_M_aug$rejection.samples,
# use starting value from previous step
start = beta
)
beta = as.vector(fit$coefficients)
## update iterator
iter = iter + 1
if(iter == maxit) warning("Iteration limit reached without convergence")
## print out info to keep track
cat(sprintf("Iter: %d Qf: %.3f s2gamma: %f beta0: %.3f beta0:%.3f eps:%f\n",iter, qfunction,s2gamma, beta[1],beta[2], eps))
}
## Iter: 1 Qf: -326.658 s2gamma: 0.013371 beta0: 1.873 beta0:0.166 eps:0.967334
## Iter: 2 Qf: -318.349 s2gamma: 0.019284 beta0: 1.862 beta0:0.166 eps:0.025438
## Iter: 3 Qf: -315.607 s2gamma: 0.029715 beta0: 1.849 beta0:0.166 eps:0.008611
## Iter: 4 Qf: -307.896 s2gamma: 0.049816 beta0: 1.831 beta0:0.166 eps:0.024433
## Iter: 5 Qf: -296.279 s2gamma: 0.082097 beta0: 1.802 beta0:0.166 eps:0.037732
## Iter: 6 Qf: -287.397 s2gamma: 0.120700 beta0: 1.781 beta0:0.166 eps:0.029978
## Iter: 7 Qf: -270.975 s2gamma: 0.181967 beta0: 1.785 beta0:0.166 eps:0.057142
## Iter: 8 Qf: -268.633 s2gamma: 0.210257 beta0: 1.789 beta0:0.166 eps:0.008642
## Iter: 9 Qf: -268.621 s2gamma: 0.219047 beta0: 1.791 beta0:0.166 eps:0.000043
## Iter: 10 Qf: -268.677 s2gamma: 0.219586 beta0: 1.793 beta0:0.166 eps:0.000208
## Iter: 11 Qf: -268.640 s2gamma: 0.219978 beta0: 1.796 beta0:0.166 eps:0.000138
## Iter: 12 Qf: -268.642 s2gamma: 0.220761 beta0: 1.798 beta0:0.166 eps:0.000007
end = Sys.time()
print(end - start)
## Time difference of 5.851663 mins
This is pretty close to what we get from glmer:
library(lme4)
## Loading required package: Matrix
fit = glmer(words ~ month + (1|subject), family = poisson(), data = alz)
print(summary(fit))
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: words ~ month + (1 | subject)
## Data: alz
##
## AIC BIC logLik deviance df.resid
## 569.9 578.0 -281.9 563.9 107
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.68504 -0.26032 -0.04494 0.24645 1.24383
##
## Random effects:
## Groups Name Variance Std.Dev.
## subject (Intercept) 0.2181 0.467
## Number of obs: 110, groups: subject, 22
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.82051 0.12418 14.660 < 2e-16 ***
## month 0.16570 0.02019 8.205 2.3e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## month -0.541
In addition, the posterior modes of the random effects are also similar:
# glmer
print(ranef(fit)$subject)
## (Intercept)
## 1 0.30285930
## 2 0.03193711
## 3 0.48564826
## 4 0.13002683
## 5 -0.19553936
## 6 0.19078864
## 7 0.03193711
## 8 0.47439755
## 9 -0.65788352
## 10 0.13002683
## 11 0.16083703
## 12 -0.98344289
## 13 -0.26059237
## 14 -0.94262334
## 15 0.61203271
## 16 0.42815552
## 17 0.35476607
## 18 -0.28315043
## 19 0.23420353
## 20 -0.56662270
## 21 0.20545664
## 22 0.30285930
# RS
print(alz_M_aug[,mean(rejection.samples), by = subject])
## subject V1
## 1: 1 0.32209191
## 2: 2 0.04845071
## 3: 3 0.50257276
## 4: 4 0.14725844
## 5: 5 -0.18371560
## 6: 6 0.20466239
## 7: 7 0.04569174
## 8: 8 0.49175159
## 9: 9 -0.65134453
## 10: 10 0.14488029
## 11: 11 0.17755126
## 12: 12 -0.98661251
## 13: 13 -0.24951172
## 14: 14 -0.94208592
## 15: 15 0.63229054
## 16: 16 0.44652189
## 17: 17 0.37352343
## 18: 18 -0.27201795
## 19: 19 0.25029945
## 20: 20 -0.55618211
## 21: 21 0.22006466
## 22: 22 0.31991799
## subject V1
We can also take a look at the final augmented matrix to see what it looks like
print(head(alz_M_aug, 10))
## subject month words M rejection.samples
## 1: 1 1 9 1 0.4079068
## 2: 1 2 12 1 0.4079068
## 3: 1 3 16 1 0.4079068
## 4: 1 4 17 1 0.4079068
## 5: 1 5 18 1 0.4079068
## 6: 1 1 9 2 0.3106106
## 7: 1 2 12 2 0.3106106
## 8: 1 3 16 2 0.3106106
## 9: 1 4 17 2 0.3106106
## 10: 1 5 18 2 0.3106106