$\def \A \ensuremath{\mathbf{A}}$ $\def \b \ensuremath{\mathbf{b}}$ $\def \tA \ensuremath{\mathbf{A$^{T}$}}$ $\def \d \ensuremath{\mathbf{d}}$ $\def \e \ensuremath{\mathbf{e}}$ $\def \g \ensuremath{\mathbf{g}}$ $\def \I \ensuremath{\mathbf{I}}$ $\def \l \ensuremath{\mathbf{l}}$ $\def \M \ensuremath{\mathbf{M}}$ $\def \W \ensuremath{\mathbf{W}}$ $\def \y \ensuremath{\mathbf{y}}$ $\def \Y \ensuremath{\mathbf{Y}}$ $\def \X \ensuremath{\mathbf{X}}$ $\def \x \ensuremath{\mathbf{x}}$ $\def \tx \ensuremath{\mathbf{x$^{T}$}}$ $\def \z \ensuremath{\mathbf{z}}$ $\def \betab \ensuremath{\boldsymbol{\beta}}$ $\def \Omegab \ensuremath{\boldsymbol{\Omega}}$ $\def \pib \ensuremath{\boldsymbol{\pi}}$ $\def \thetab \ensuremath{\boldsymbol{\theta}}$ $\def \epsilonb \ensuremath{\boldsymbol{\epsilon}}$

This is the older approach that uses the glm function to maximize with respect to $\boldsymbol{\beta}$ in the M-step, as well use a closed form estimator for $\sigma^2_{gamma}$ given the samples from the E-step. We repeat each observation $\boldsymbol{y_{i}}$ and corresponding predictor in the design matrix $M$ times, filling in $\boldsymbol{X}_k$ for each repeated row for $\gamma_i$ in the linear predictor above. Each repeated row is weighted with $1/M$. We can also show that the solution for $\sigma^2_{\gamma}$ is simply the sum of the squared $X_k$’s divided by $nM$.

1 Implementing the E-step Rejection sampler

First, let’s set up the rejection sampler:

## function for Rejection Sampling in the MCEM E-step for subject i
  # yi is a 5 x 1 vector containing the monthly observations for sample i
  # xi: a 5 x 2 matrix pertaining to the intercept and longitudinal month value for each subject (it is the same for each subject, 1:5)
  # M is the total number of iid samples desired from the posterior
  # maxit is the max number of iterations (to prevent runaway looping)
  # betat is  the estimate for beta at EM iteration t
  # s2gammat is the estimate for s2gamma at EM iteration t
## Returns a list object with accepted samples and the acceptance rate

rejection.sample.gamma.posterior.i = function(yi, xi = cbind(rep(1,5), 1:5), M, maxit, betat, s2gammat, trace = 0) {
  
  ## envelope function setup, done once
    # obtain gamma_hat to create the envelope function, optimizing Likelihood wrt gammai
    # can use any optimization routine, here using IRLS holding XB fixed as an offset
    # log(lambda_i) = X%*%beta_t + gamma_i --> fix X and beta_t, treat gamma_i as coefficient
    gamma_hat = glm(yi ~ 1, family = poisson(),
                    offset = xi %*% beta)$coef
    
    # calculate lambda values for rejection ratio denominator
    lambda_star = exp(xi %*% beta + gamma_hat)
    
    # calculate rejection ratio denominator
    Le = prod(dpois(yi, lambda = lambda_star))
    
  ## specify vector to hold samples
    posterior_gammai = rep(0, M)
  
  ## start rejection sampler
    m = index = 1
    while (m <= M & index < maxit) {
      
      # sample from normal prior (proposal density here)
      gammai = rnorm(1, 0, sqrt(s2gammat))
      
      # calculate lambda values for rejection ratio numerator
      lambdai = exp(xi %*% beta + gammai)
      
      # calculate ratio numerator
      Lq = prod(dpois(yi, lambda = lambdai))
      
      # calculate ratio
      r = Lq / Le
      
      # sample Ui from standard uniform
      U = runif(1)
      
      # if step 3 met, save sample
      if (U < r) {
        posterior_gammai[m] = gammai
        
        # update num current acceptances
        m = m + 1
      }
      
      # keep track of total number of samples drawn
      index = index + 1
    }
    
  ## acceptance rate
    if (trace > 0)
      print(M / index)
  
  ## return accepted samples and accepted rate in a list
    return(list(gammai = posterior_gammai, ar = M / index))
}


## wrapper function for Rejection Sampling over all observations
  # data is the main data.table object
  # data_aug is the augmented data.table object
  # other arguments same as above
## Returns data_aug with updated rejection samples column

rejection.sample.gamma.posterior.all = function(data,
                                                data_aug,
                                                M,
                                                maxit,
                                                betat,
                                                s2gammat,
                                                trace = 0) {
  ## get n, total number of subjects
  n = length(unique(data[,subject]))
 
  ## looping over n subjects
  for (i in 1:n) {
    
    if(trace > 0) print(i) 
    
    # draw M samples from the posterior for gamma_i
    rejection.samples.i =
      rejection.sample.gamma.posterior.i(
        yi = data[subject == i, words],
        M = M,
        maxit = maxit,
        betat = betat,
        s2gammat = s2gammat,
        trace = trace
      )$gammai
    
    # since months in each subject share same gamma_i, we repeat
    # the draw for each month in a subject
    rejection.samples.i = rejection.samples.i[rep(1:M, each = 5)]
    
    # fill in last col of augmented matrix with drawn samples for subj i
    data_aug[subject == i, rejection.samples := rejection.samples.i]
    
        if(trace > 0) print("saved data") 
    
  }
  
  ## return data_aug
  return(data_aug)
}

2 Other Helper Functions

To simplify the implementation of the EM algorithm using only off the shelf methods for maximization in the M-step, we augment the data matrices used in this example by repeating the data pertaining to each subject $M$ times in the original data matrix. This will also facilitate vectorized evaluations of the Q-function in the E-step. We introduce a function below to help with this (we will circle back to how this works in a bit)

augment = function(data, M){
  
  # convert data to data.table
  if(!is.data.table(data)) data = data.table(data)
  
  # repeat each row M times
  data_aug = data[rep(1:nrow(data),  M),]
  
  # append a column indicating the index of M for each draw in each obs
  data_aug$M = rep(1:M, each = nrow(data))
  
  # resort by subject, M, then month. This repeats the 5 row
  #  block of data for each subject M times
  data_aug = data_aug[order(subject, M, month),]
  
  # add another column to hold the M poterior draws per i later
  # initialize column to 0, will update later with actual draws
  data_aug$rejection.samples = rep(0, nrow(data_aug))
  
  # create repeated design matrix corresponding to data_aug
  X_aug = model.matrix(~ 1 + month, data = data_aug)
  
## return values
  return(list(data_aug, X_aug))
}

3 Setup for the MCEM algorithm

Now that we have this helper function, lets read in the data and set up everything for the EM algorithm

## read in the data, convert to data.table
  library(data.table)
  alz = read.table("alzheimers.dat", header = T)
  alz = data.table(alz)

    
## set initial parameters
  
  # number of subjects
  n = length(unique(alz$subject))
  
  # relative convergence threshold
  tol = 10^-5
  
  # max number of iterations
  maxit = 100
  
  # initialize iteration counter, epsilon, qfunction value to avoid triggering while loop
  iter = 0
  eps = Inf
  qfunction = -10000 # using Qfunction for convergence

    
## starting values for beta/s2gamma
  
  # fixed effects, same as before
  beta = c(1.804, 0.165)
  
  # random intercept variance, same as before
  s2gamma =  0.000225 + .01 

  
## set M = 10000 draws from posterior per observation in E-step
  M = 10000

  
## Repeat the data pertaining to y_i, X_i M times 
  # This allows for easy vectorization of the sum shown in the Q-function approximation over n and M
  
  # run the function to do the repeating from above
  aug = augment(data = alz, M = M)
  
  # extract alz matrix with data pertaining to i'th obs repeated M times 
  alz_M_aug = aug[[1]]
  
  # extract predictor matrix with rows repeated in same manner as alz_M_aug
  X_m_aug = aug[[2]]

  # Check Dimensions of original vs augmented matrix for M step
  dim(alz)     # originally (22*5) rows

## [1] 110   3

  dim(alz_M_aug) # now (22*5*M) rows

## [1] 1100000       5

  # peak at each to see the difference
  head(alz, 10)

##     subject month words
##  1:       1     1     9
##  2:       1     2    12
##  3:       1     3    16
##  4:       1     4    17
##  5:       1     5    18
##  6:       2     1     6
##  7:       2     2     7
##  8:       2     3    10
##  9:       2     4    15
## 10:       2     5    16

  head(alz_M_aug, 10)

##     subject month words M rejection.samples
##  1:       1     1     9 1                 0
##  2:       1     2    12 1                 0
##  3:       1     3    16 1                 0
##  4:       1     4    17 1                 0
##  5:       1     5    18 1                 0
##  6:       1     1     9 2                 0
##  7:       1     2    12 2                 0
##  8:       1     3    16 2                 0
##  9:       1     4    17 2                 0
## 10:       1     5    18 2                 0

4 Testing the Rejection Sampler

Let’s test out the Rejection Sampler on just the first sample, given the starting values:

# run RS
test_sample = rejection.sample.gamma.posterior.i(
                yi = alz[subject == 1, words], 
                M = M, 
                maxit = maxit*M, 
                betat = beta, 
                s2gammat = s2gamma 
        )

# acceptance rate
print(test_sample$ar)

## [1] 0.07717598

# mean
print(mean(test_sample$gammai))

## [1] 0.1370397

# variance
print(var(test_sample$gammai))

## [1] 0.006292223

#histogram
hist(test_sample$gammai)

5 Now run the MCEM algorithm

Now that thats out of the way, lets apply the MCEM algorithm.

  start = Sys.time()
  while(eps > tol & iter < maxit){
  
  ## save old qfunction
    qfunction0 = qfunction
  
  ## Begin E-step

    # update rejection.samples column with the new draws
    alz_M_aug = rejection.sample.gamma.posterior.all(
      data = alz,
      data_aug = alz_M_aug,
      M = M,
      maxit = maxit * M,
      betat = beta,
      s2gammat = s2gamma
    )
    
  ## evaluate  qfunction given drawn samples, current param estimates
    #lambda
      lambda_M_aug = exp(X_m_aug %*% beta + alz_M_aug[, rejection.samples])
      
    # qfunction calculation (vectorized)
      qfunction = 
        # sum over first part of Q
        sum(dpois(alz_M_aug[, words], lambda = lambda_M_aug, log = T)) + 
        # sum over second part of Q, no need to use repeated gamma over months here
        sum(dnorm(alz_M_aug[month == 1, rejection.samples], 
              mean = 0,
              sd = sqrt(s2gamma),
              log = T))
      
    # take average over M
      qfunction = (qfunction) / M 
    
  ## End E-step
    
  ## Calculate relative change in qfunction from prior iteration
    eps  = abs(qfunction - qfunction0) / abs(qfunction0)
    
  ## Start M-step
    
    # s2gamma, derived similar as the MLE for sigma^2 from normal with mean 0, averaged over M
    # closed form derived from Q function approximation given earlier
    s2gamma = sum(alz_M_aug[month == 1, rejection.samples] ^ 2) / n
    s2gamma = s2gamma / M
    
    # beta, poisson glm with gammai's filled in with M posterior draws/subject
    # averaging over M draws, so weight = 1/M for each repeated row with draw filled in
    # posterior draws for gamma_i area are filled in as offsets in the glm
    fit = glm(
      alz_M_aug$words ~ X_m_aug - 1,
      family = poisson(),
      weights = rep(1 / M, nrow(X_m_aug)),
      offset = alz_M_aug$rejection.samples,
      # use starting value from previous step
      start = beta
    )
    beta = as.vector(fit$coefficients)
  
  ## update iterator
    iter = iter + 1
    if(iter == maxit) warning("Iteration limit reached without convergence")
    
  ## print out info to keep track
    cat(sprintf("Iter: %d Qf: %.3f s2gamma: %f beta0: %.3f beta0:%.3f eps:%f\n",iter, qfunction,s2gamma, beta[1],beta[2], eps))
}

## Iter: 1 Qf: -326.658 s2gamma: 0.013371 beta0: 1.873 beta0:0.166 eps:0.967334
## Iter: 2 Qf: -318.349 s2gamma: 0.019284 beta0: 1.862 beta0:0.166 eps:0.025438
## Iter: 3 Qf: -315.607 s2gamma: 0.029715 beta0: 1.849 beta0:0.166 eps:0.008611
## Iter: 4 Qf: -307.896 s2gamma: 0.049816 beta0: 1.831 beta0:0.166 eps:0.024433
## Iter: 5 Qf: -296.279 s2gamma: 0.082097 beta0: 1.802 beta0:0.166 eps:0.037732
## Iter: 6 Qf: -287.397 s2gamma: 0.120700 beta0: 1.781 beta0:0.166 eps:0.029978
## Iter: 7 Qf: -270.975 s2gamma: 0.181967 beta0: 1.785 beta0:0.166 eps:0.057142
## Iter: 8 Qf: -268.633 s2gamma: 0.210257 beta0: 1.789 beta0:0.166 eps:0.008642
## Iter: 9 Qf: -268.621 s2gamma: 0.219047 beta0: 1.791 beta0:0.166 eps:0.000043
## Iter: 10 Qf: -268.677 s2gamma: 0.219586 beta0: 1.793 beta0:0.166 eps:0.000208
## Iter: 11 Qf: -268.640 s2gamma: 0.219978 beta0: 1.796 beta0:0.166 eps:0.000138
## Iter: 12 Qf: -268.642 s2gamma: 0.220761 beta0: 1.798 beta0:0.166 eps:0.000007

end = Sys.time()
print(end - start)

## Time difference of 5.851663 mins

This is pretty close to what we get from glmer:

library(lme4)

## Loading required package: Matrix

fit = glmer(words ~ month + (1|subject), family = poisson(), data = alz)
print(summary(fit))

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: words ~ month + (1 | subject)
##    Data: alz
## 
##      AIC      BIC   logLik deviance df.resid 
##    569.9    578.0   -281.9    563.9      107 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.68504 -0.26032 -0.04494  0.24645  1.24383 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  subject (Intercept) 0.2181   0.467   
## Number of obs: 110, groups:  subject, 22
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.82051    0.12418  14.660  < 2e-16 ***
## month        0.16570    0.02019   8.205  2.3e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##       (Intr)
## month -0.541

In addition, the posterior modes of the random effects are also similar:

# glmer 
print(ranef(fit)$subject)

##    (Intercept)
## 1   0.30285930
## 2   0.03193711
## 3   0.48564826
## 4   0.13002683
## 5  -0.19553936
## 6   0.19078864
## 7   0.03193711
## 8   0.47439755
## 9  -0.65788352
## 10  0.13002683
## 11  0.16083703
## 12 -0.98344289
## 13 -0.26059237
## 14 -0.94262334
## 15  0.61203271
## 16  0.42815552
## 17  0.35476607
## 18 -0.28315043
## 19  0.23420353
## 20 -0.56662270
## 21  0.20545664
## 22  0.30285930

# RS
print(alz_M_aug[,mean(rejection.samples), by = subject])

##     subject          V1
##  1:       1  0.32209191
##  2:       2  0.04845071
##  3:       3  0.50257276
##  4:       4  0.14725844
##  5:       5 -0.18371560
##  6:       6  0.20466239
##  7:       7  0.04569174
##  8:       8  0.49175159
##  9:       9 -0.65134453
## 10:      10  0.14488029
## 11:      11  0.17755126
## 12:      12 -0.98661251
## 13:      13 -0.24951172
## 14:      14 -0.94208592
## 15:      15  0.63229054
## 16:      16  0.44652189
## 17:      17  0.37352343
## 18:      18 -0.27201795
## 19:      19  0.25029945
## 20:      20 -0.55618211
## 21:      21  0.22006466
## 22:      22  0.31991799
##     subject          V1

We can also take a look at the final augmented matrix to see what it looks like

print(head(alz_M_aug, 10))

##     subject month words M rejection.samples
##  1:       1     1     9 1         0.4079068
##  2:       1     2    12 1         0.4079068
##  3:       1     3    16 1         0.4079068
##  4:       1     4    17 1         0.4079068
##  5:       1     5    18 1         0.4079068
##  6:       1     1     9 2         0.3106106
##  7:       1     2    12 2         0.3106106
##  8:       1     3    16 2         0.3106106
##  9:       1     4    17 2         0.3106106
## 10:       1     5    18 2         0.3106106

Rejection sampler - glm-based maximization in M-step

Naim Rashid

3/16/2021

1 Implementing the E-step Rejection sampler

2 Other Helper Functions

3 Setup for the MCEM algorithm

4 Testing the Rejection Sampler

5 Now run the MCEM algorithm