\(\def \A \ensuremath{\mathbf{A}}\) \(\def \b \ensuremath{\mathbf{b}}\) \(\def \tA \ensuremath{\mathbf{A$^{T}$}}\) \(\def \d \ensuremath{\mathbf{d}}\) \(\def \e \ensuremath{\mathbf{e}}\) \(\def \g \ensuremath{\mathbf{g}}\) \(\def \I \ensuremath{\mathbf{I}}\) \(\def \l \ensuremath{\mathbf{l}}\) \(\def \M \ensuremath{\mathbf{M}}\) \(\def \W \ensuremath{\mathbf{W}}\) \(\def \y \ensuremath{\mathbf{y}}\) \(\def \Y \ensuremath{\mathbf{Y}}\) \(\def \X \ensuremath{\mathbf{X}}\) \(\def \x \ensuremath{\mathbf{x}}\) \(\def \tx \ensuremath{\mathbf{x$^{T}$}}\) \(\def \z \ensuremath{\mathbf{z}}\) \(\def \betab \ensuremath{\boldsymbol{\beta}}\) \(\def \Omegab \ensuremath{\boldsymbol{\Omega}}\) \(\def \pib \ensuremath{\boldsymbol{\pi}}\) \(\def \thetab \ensuremath{\boldsymbol{\theta}}\) \(\def \epsilonb \ensuremath{\boldsymbol{\epsilon}}\)

This is the older approach that uses the glm function to maximize with respect to \(\boldsymbol{\beta}\) in the M-step, as well use a closed form estimator for \(\sigma^2_{gamma}\) given the samples from the E-step. We repeat each observation \(\boldsymbol{y_{i}}\) and corresponding predictor in the design matrix \(M\) times, filling in \(\boldsymbol{X}_k\) for each repeated row for \(\gamma_i\) in the linear predictor above. Each repeated row is weighted with \(1/M\). We can also show that the solution for \(\sigma^2_{\gamma}\) is simply the sum of the squared \(X_k\)’s divided by \(nM\).

1 Implementing the E-step Rejection sampler

First, let’s set up the rejection sampler:

## function for Rejection Sampling in the MCEM E-step for subject i
  # yi is a 5 x 1 vector containing the monthly observations for sample i
  # xi: a 5 x 2 matrix pertaining to the intercept and longitudinal month value for each subject (it is the same for each subject, 1:5)
  # M is the total number of iid samples desired from the posterior
  # maxit is the max number of iterations (to prevent runaway looping)
  # betat is  the estimate for beta at EM iteration t
  # s2gammat is the estimate for s2gamma at EM iteration t
## Returns a list object with accepted samples and the acceptance rate

rejection.sample.gamma.posterior.i = function(yi, xi = cbind(rep(1,5), 1:5), M, maxit, betat, s2gammat, trace = 0) {
  
  ## envelope function setup, done once
    # obtain gamma_hat to create the envelope function, optimizing Likelihood wrt gammai
    # can use any optimization routine, here using IRLS holding XB fixed as an offset
    # log(lambda_i) = X%*%beta_t + gamma_i --> fix X and beta_t, treat gamma_i as coefficient
    gamma_hat = glm(yi ~ 1, family = poisson(),
                    offset = xi %*% beta)$coef
    
    # calculate lambda values for rejection ratio denominator
    lambda_star = exp(xi %*% beta + gamma_hat)
    
    # calculate rejection ratio denominator
    Le = prod(dpois(yi, lambda = lambda_star))
    
  ## specify vector to hold samples
    posterior_gammai = rep(0, M)
  
  ## start rejection sampler
    m = index = 1
    while (m <= M & index < maxit) {
      
      # sample from normal prior (proposal density here)
      gammai = rnorm(1, 0, sqrt(s2gammat))
      
      # calculate lambda values for rejection ratio numerator
      lambdai = exp(xi %*% beta + gammai)
      
      # calculate ratio numerator
      Lq = prod(dpois(yi, lambda = lambdai))
      
      # calculate ratio
      r = Lq / Le
      
      # sample Ui from standard uniform
      U = runif(1)
      
      # if step 3 met, save sample
      if (U < r) {
        posterior_gammai[m] = gammai
        
        # update num current acceptances
        m = m + 1
      }
      
      # keep track of total number of samples drawn
      index = index + 1
    }
    
  ## acceptance rate
    if (trace > 0)
      print(M / index)
  
  ## return accepted samples and accepted rate in a list
    return(list(gammai = posterior_gammai, ar = M / index))
}


## wrapper function for Rejection Sampling over all observations
  # data is the main data.table object
  # data_aug is the augmented data.table object
  # other arguments same as above
## Returns data_aug with updated rejection samples column

rejection.sample.gamma.posterior.all = function(data,
                                                data_aug,
                                                M,
                                                maxit,
                                                betat,
                                                s2gammat,
                                                trace = 0) {
  ## get n, total number of subjects
  n = length(unique(data[,subject]))
 
  ## looping over n subjects
  for (i in 1:n) {
    
    if(trace > 0) print(i) 
    
    # draw M samples from the posterior for gamma_i
    rejection.samples.i =
      rejection.sample.gamma.posterior.i(
        yi = data[subject == i, words],
        M = M,
        maxit = maxit,
        betat = betat,
        s2gammat = s2gammat,
        trace = trace
      )$gammai
    
    # since months in each subject share same gamma_i, we repeat
    # the draw for each month in a subject
    rejection.samples.i = rejection.samples.i[rep(1:M, each = 5)]
    
    # fill in last col of augmented matrix with drawn samples for subj i
    data_aug[subject == i, rejection.samples := rejection.samples.i]
    
        if(trace > 0) print("saved data") 
    
  }
  
  ## return data_aug
  return(data_aug)
}

2 Other Helper Functions

To simplify the implementation of the EM algorithm using only off the shelf methods for maximization in the M-step, we augment the data matrices used in this example by repeating the data pertaining to each subject \(M\) times in the original data matrix. This will also facilitate vectorized evaluations of the Q-function in the E-step. We introduce a function below to help with this (we will circle back to how this works in a bit)

augment = function(data, M){
  
  # convert data to data.table
  if(!is.data.table(data)) data = data.table(data)
  
  # repeat each row M times
  data_aug = data[rep(1:nrow(data),  M),]
  
  # append a column indicating the index of M for each draw in each obs
  data_aug$M = rep(1:M, each = nrow(data))
  
  # resort by subject, M, then month. This repeats the 5 row
  #  block of data for each subject M times
  data_aug = data_aug[order(subject, M, month),]
  
  # add another column to hold the M poterior draws per i later
  # initialize column to 0, will update later with actual draws
  data_aug$rejection.samples = rep(0, nrow(data_aug))
  
  # create repeated design matrix corresponding to data_aug
  X_aug = model.matrix(~ 1 + month, data = data_aug)
  
## return values
  return(list(data_aug, X_aug))
}

3 Setup for the MCEM algorithm

Now that we have this helper function, lets read in the data and set up everything for the EM algorithm

## read in the data, convert to data.table
  library(data.table)
  alz = read.table("alzheimers.dat", header = T)
  alz = data.table(alz)

    
## set initial parameters
  
  # number of subjects
  n = length(unique(alz$subject))
  
  # relative convergence threshold
  tol = 10^-5
  
  # max number of iterations
  maxit = 100
  
  # initialize iteration counter, epsilon, qfunction value to avoid triggering while loop
  iter = 0
  eps = Inf
  qfunction = -10000 # using Qfunction for convergence

    
## starting values for beta/s2gamma
  
  # fixed effects, same as before
  beta = c(1.804, 0.165)
  
  # random intercept variance, same as before
  s2gamma =  0.000225 + .01 

  
## set M = 10000 draws from posterior per observation in E-step
  M = 10000

  
## Repeat the data pertaining to y_i, X_i M times 
  # This allows for easy vectorization of the sum shown in the Q-function approximation over n and M
  
  # run the function to do the repeating from above
  aug = augment(data = alz, M = M)
  
  # extract alz matrix with data pertaining to i'th obs repeated M times 
  alz_M_aug = aug[[1]]
  
  # extract predictor matrix with rows repeated in same manner as alz_M_aug
  X_m_aug = aug[[2]]

  # Check Dimensions of original vs augmented matrix for M step
  dim(alz)     # originally (22*5) rows
## [1] 110   3
  dim(alz_M_aug) # now (22*5*M) rows
## [1] 1100000       5
  # peak at each to see the difference
  head(alz, 10)
##     subject month words
##  1:       1     1     9
##  2:       1     2    12
##  3:       1     3    16
##  4:       1     4    17
##  5:       1     5    18
##  6:       2     1     6
##  7:       2     2     7
##  8:       2     3    10
##  9:       2     4    15
## 10:       2     5    16
  head(alz_M_aug, 10)
##     subject month words M rejection.samples
##  1:       1     1     9 1                 0
##  2:       1     2    12 1                 0
##  3:       1     3    16 1                 0
##  4:       1     4    17 1                 0
##  5:       1     5    18 1                 0
##  6:       1     1     9 2                 0
##  7:       1     2    12 2                 0
##  8:       1     3    16 2                 0
##  9:       1     4    17 2                 0
## 10:       1     5    18 2                 0

4 Testing the Rejection Sampler

Let’s test out the Rejection Sampler on just the first sample, given the starting values:

# run RS
test_sample = rejection.sample.gamma.posterior.i(
                yi = alz[subject == 1, words], 
                M = M, 
                maxit = maxit*M, 
                betat = beta, 
                s2gammat = s2gamma 
        )

# acceptance rate
print(test_sample$ar)
## [1] 0.07717598
# mean
print(mean(test_sample$gammai))
## [1] 0.1370397
# variance
print(var(test_sample$gammai))
## [1] 0.006292223
#histogram
hist(test_sample$gammai)

5 Now run the MCEM algorithm

Now that thats out of the way, lets apply the MCEM algorithm.

  start = Sys.time()
  while(eps > tol & iter < maxit){
  
  ## save old qfunction
    qfunction0 = qfunction
  
  ## Begin E-step

    # update rejection.samples column with the new draws
    alz_M_aug = rejection.sample.gamma.posterior.all(
      data = alz,
      data_aug = alz_M_aug,
      M = M,
      maxit = maxit * M,
      betat = beta,
      s2gammat = s2gamma
    )
    
  ## evaluate  qfunction given drawn samples, current param estimates
    #lambda
      lambda_M_aug = exp(X_m_aug %*% beta + alz_M_aug[, rejection.samples])
      
    # qfunction calculation (vectorized)
      qfunction = 
        # sum over first part of Q
        sum(dpois(alz_M_aug[, words], lambda = lambda_M_aug, log = T)) + 
        # sum over second part of Q, no need to use repeated gamma over months here
        sum(dnorm(alz_M_aug[month == 1, rejection.samples], 
              mean = 0,
              sd = sqrt(s2gamma),
              log = T))
      
    # take average over M
      qfunction = (qfunction) / M 
    
  ## End E-step
    
  ## Calculate relative change in qfunction from prior iteration
    eps  = abs(qfunction - qfunction0) / abs(qfunction0)
    
  ## Start M-step
    
    # s2gamma, derived similar as the MLE for sigma^2 from normal with mean 0, averaged over M
    # closed form derived from Q function approximation given earlier
    s2gamma = sum(alz_M_aug[month == 1, rejection.samples] ^ 2) / n
    s2gamma = s2gamma / M
    
    # beta, poisson glm with gammai's filled in with M posterior draws/subject
    # averaging over M draws, so weight = 1/M for each repeated row with draw filled in
    # posterior draws for gamma_i area are filled in as offsets in the glm
    fit = glm(
      alz_M_aug$words ~ X_m_aug - 1,
      family = poisson(),
      weights = rep(1 / M, nrow(X_m_aug)),
      offset = alz_M_aug$rejection.samples,
      # use starting value from previous step
      start = beta
    )
    beta = as.vector(fit$coefficients)
  
  ## update iterator
    iter = iter + 1
    if(iter == maxit) warning("Iteration limit reached without convergence")
    
  ## print out info to keep track
    cat(sprintf("Iter: %d Qf: %.3f s2gamma: %f beta0: %.3f beta0:%.3f eps:%f\n",iter, qfunction,s2gamma, beta[1],beta[2], eps))
}
## Iter: 1 Qf: -326.658 s2gamma: 0.013371 beta0: 1.873 beta0:0.166 eps:0.967334
## Iter: 2 Qf: -318.349 s2gamma: 0.019284 beta0: 1.862 beta0:0.166 eps:0.025438
## Iter: 3 Qf: -315.607 s2gamma: 0.029715 beta0: 1.849 beta0:0.166 eps:0.008611
## Iter: 4 Qf: -307.896 s2gamma: 0.049816 beta0: 1.831 beta0:0.166 eps:0.024433
## Iter: 5 Qf: -296.279 s2gamma: 0.082097 beta0: 1.802 beta0:0.166 eps:0.037732
## Iter: 6 Qf: -287.397 s2gamma: 0.120700 beta0: 1.781 beta0:0.166 eps:0.029978
## Iter: 7 Qf: -270.975 s2gamma: 0.181967 beta0: 1.785 beta0:0.166 eps:0.057142
## Iter: 8 Qf: -268.633 s2gamma: 0.210257 beta0: 1.789 beta0:0.166 eps:0.008642
## Iter: 9 Qf: -268.621 s2gamma: 0.219047 beta0: 1.791 beta0:0.166 eps:0.000043
## Iter: 10 Qf: -268.677 s2gamma: 0.219586 beta0: 1.793 beta0:0.166 eps:0.000208
## Iter: 11 Qf: -268.640 s2gamma: 0.219978 beta0: 1.796 beta0:0.166 eps:0.000138
## Iter: 12 Qf: -268.642 s2gamma: 0.220761 beta0: 1.798 beta0:0.166 eps:0.000007
end = Sys.time()
print(end - start)
## Time difference of 5.851663 mins

This is pretty close to what we get from glmer:

library(lme4)
## Loading required package: Matrix
fit = glmer(words ~ month + (1|subject), family = poisson(), data = alz)
print(summary(fit))
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: words ~ month + (1 | subject)
##    Data: alz
## 
##      AIC      BIC   logLik deviance df.resid 
##    569.9    578.0   -281.9    563.9      107 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.68504 -0.26032 -0.04494  0.24645  1.24383 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  subject (Intercept) 0.2181   0.467   
## Number of obs: 110, groups:  subject, 22
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.82051    0.12418  14.660  < 2e-16 ***
## month        0.16570    0.02019   8.205  2.3e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##       (Intr)
## month -0.541

In addition, the posterior modes of the random effects are also similar:

# glmer 
print(ranef(fit)$subject)
##    (Intercept)
## 1   0.30285930
## 2   0.03193711
## 3   0.48564826
## 4   0.13002683
## 5  -0.19553936
## 6   0.19078864
## 7   0.03193711
## 8   0.47439755
## 9  -0.65788352
## 10  0.13002683
## 11  0.16083703
## 12 -0.98344289
## 13 -0.26059237
## 14 -0.94262334
## 15  0.61203271
## 16  0.42815552
## 17  0.35476607
## 18 -0.28315043
## 19  0.23420353
## 20 -0.56662270
## 21  0.20545664
## 22  0.30285930
# RS
print(alz_M_aug[,mean(rejection.samples), by = subject])
##     subject          V1
##  1:       1  0.32209191
##  2:       2  0.04845071
##  3:       3  0.50257276
##  4:       4  0.14725844
##  5:       5 -0.18371560
##  6:       6  0.20466239
##  7:       7  0.04569174
##  8:       8  0.49175159
##  9:       9 -0.65134453
## 10:      10  0.14488029
## 11:      11  0.17755126
## 12:      12 -0.98661251
## 13:      13 -0.24951172
## 14:      14 -0.94208592
## 15:      15  0.63229054
## 16:      16  0.44652189
## 17:      17  0.37352343
## 18:      18 -0.27201795
## 19:      19  0.25029945
## 20:      20 -0.55618211
## 21:      21  0.22006466
## 22:      22  0.31991799
##     subject          V1

We can also take a look at the final augmented matrix to see what it looks like

print(head(alz_M_aug, 10))
##     subject month words M rejection.samples
##  1:       1     1     9 1         0.4079068
##  2:       1     2    12 1         0.4079068
##  3:       1     3    16 1         0.4079068
##  4:       1     4    17 1         0.4079068
##  5:       1     5    18 1         0.4079068
##  6:       1     1     9 2         0.3106106
##  7:       1     2    12 2         0.3106106
##  8:       1     3    16 2         0.3106106
##  9:       1     4    17 2         0.3106106
## 10:       1     5    18 2         0.3106106