GLMM estimation with MCMC, and a numerical example


Refs:

Probit-normal model for binary data

  • data: \(Y=(Y_1,\ldots,Y_K)\): K-dim 0-1 vector
  • \(\Phi^{-1}[\Pr(Y_i=1|u)] = X_i\beta+Z_iu\)
    • \(\Phi\) is CDF of \(N(0,1)\) dist.
  • \(u\): q-dim random effects following \(N(0,R)\)
  • latent variable model representation
    • \(W=X\beta+Zu+\epsilon\), \(\epsilon\sim N(0,I)\)
    • \(Y_i=I(W_i>0)\) since \(\Pr(Y_i=1)=\Pr(X_i\beta+Z_iu>-\epsilon_i)=\Phi(X_i\beta+Z_iu)\)
    • Marginally \(W\sim N(X\beta,ZRZ^T+I)\)
      • so \(\Pr(Y_1=1) = \Phi(X_i\beta/\sqrt{V_{ii}})\), where \(V=ZRZ^T+I\).
      • The random effects \(u\) does change the marginal mean of outcomes!
  • EM Estimation for fixed effects parameters \(\beta\)
    • complete data \(W\)
    • \(\max_{\beta} E[\log\Pr(W)|Y]\)
    • we can check that \(\hat{\beta} = (X^TV^{-1}X)^{-1}X^TV^{-1}E(W|Y)\)
  • EM estimation for \(R\)
    • complete data \((W,u)\)
    • \(\max_{R}E[\log\Pr(u)|Y]\)
    • we can check \(\hat{R} = E(uu^T|Y)\)
    • note \(E(uu^T|Y)=E[E(uu^T|W)|Y]\)
    • and \((u|W)\) follows a conditional normal dist (mean a linear func of \(W\) and fixed conditonal variance)
      • \(E(u|W) = RZ^TV^{-1}(W-X\beta)\)
      • \(Var(u|W) = R - RZ^TV^{-1}ZR\)
  • Both estimations reduces to compute \(E(W|Y)\) and \(Var(W|Y)\)
    • estimate mean and variance of constrained MVN dist
    • easy for one-dim normal dist
    • Monte Carlo estimation with Gibbs sampling
      • \((W_i|W_{-i},Y)\) equiv to \((W_i|W_{-i},Y_i)\)
      • note \((W_i|W_{-i})\) follows a normal dist
        • \(E(W_i|W_{-i})=X_i\beta - V_{-i,i}^TV_{-i,-i}^{-1}(W_{-i}-X_{-i}\beta)\)
        • \(Var(W_i|W_{-i})=V_{ii} - V_{-i,i}^TV_{-i,-i}^{-1}V_{-i,i}\)
      • We need to simulate from \([(W_i|W_{-i})|W_i>0]\)
        • this is simulating from an one-dim constrained normal dist.
      • sequentially simulate \(W_i|W_{-i},Y\) for \(i=1,\ldots,K\).

General GLMM

  • \(\Pr(Y|u)\) independent GLMs
  • \(u\sim N(0,R)\)
  • EM algorithm estimation
    • E-step: \(Q = E[\log\Pr(Y|u)|Y] + E[\log\Pr(u)|Y]\)
    • Monte Carlo approx using \(u\sim \Pr(u|Y)\)
    • Markov Chain simulation from \(\Pr(u|Y)\propto \Pr(Y|u)\Pr(u)\)
    • Metropolis sampling
      • given current value \(u\)
      • new value \(u'\) from proposal density \(h(u)\)
      • acceptance prob: \(\min(1, \Pr(u'|Y)h(u)/[\Pr(u|Y)h(u')])\)
        • \(=\min(1,\Pr(u')h(u)/[\Pr(u)h(u')]\Pr(Y|u')/\Pr(Y|u))\)
        • take \(h(u)=\Pr(u)\), \(=\min(1,\Pr(Y|u')/\Pr(Y|u))\)
        • to ensure prop acceptance prob, can take \(u'_k=u_k, k\neq i\) and update \(u'_i\) only using \(\Pr(u_i|u_{-i})\) (a conditional normal dist)
    • \(Q\approx \sum_b \log\Pr(Y|u_b) + \log\Pr(u_b)\), where \(u_b\sim \Pr(u|Y)\)
      • maximization straightforward: GLM and covariance matrix estimation

A numerical example

  • First we simulate a longitudinal data with 4 observations for each of 1000 separate individuals. We fit the data with glmer() from lme4 R package.
    • three continuous covariates (varying over time)
    • one ordinal covariate (constant over time)
    • consider a random intercept model (mean zero and variance 100)
n = 1e3; p = 3; K = 4; sig = 10
set.seed(123)
## time varying covariates
Xl = vector('list', K)
for(k in 1:K) Xl[[k]] = matrix(rnorm(n*p), n,p)
## constant covariate
Z = rbinom(n, 2,0.2)
## random effects
U = rnorm(n)*sig
## fixed effects
etaX = sapply(Xl, rowSums)
## random errors
eps = matrix(rnorm(n*K), n,K)
## logit model
eta = etaX + U + eps
prb = 1/(1+exp(-eta))
D = 1*(matrix(runif(n*K),n,K)<prb)
Xs = Xl[[1]]
for(k in 2:K) Xs = rbind(Xs, Xl[[k]])
## GLMM model
library(lme4)
sid = rep(1:n, K)
## model fit with GLMMM (default to Laplace approximation)
a1 = glmer(c(D) ~ Xs + Z[sid] + (1|sid), family=binomial)
  • We will develop three numerical routines
    • Simulate from conditional dist of random effects given data
      • Will use the Metropolis-Hastings (MH) Markov Chain sampling
    • Compute proposal dist in MH sampling
      • Will use normal dist matching the mode and hessian (variation) to the conditional dist of random effects
      • Feasible since the unknown normalizing constant does not affect the mode and hessian.
      • Note each observation has its own proposal dist.
        • Need to solve a ridged logistic regression model for each observation
    • Parameter estimation
      • Fixed effects parameters: one-step Newton-Raphson (Fisher-scoring) for fixed effects parameters
        • need to compute average of mean/variance across MC samples.
      • Variance of random effects dist: sample variance estimate.

Numerical algorithms

  • MH sampling algorithm
## MH sampling of random effects | data
## logit\Pr(D_i|eta_i,U) = eta_i+U; U \sim N(0,Vu)
## proposal dist: N(Uc,Vc)
U.mh <- function(Di,eta, Vu, Uc,Vc, B=100){
  ub = rep(0, B)
  ub[1] = rnorm(1)*sqrt(Vc)+Uc
  prb = 1/(1+exp(-eta-ub[1]))
  llk0 = dnorm(ub[1],sd=sqrt(Vu), log=TRUE) + sum(log(Di*prb+(1-Di)*(1-prb))) - dnorm(ub[1],Uc,sqrt(Vc), log=TRUE)
  for(k in 2:B){
    ub[k] = ub[k-1]
    uk = rnorm(1)*sqrt(Vc)+Uc
    prb = 1/(1+exp(-eta-uk))
    llk1 = dnorm(uk,sd=sqrt(Vu), log=TRUE) + sum(log(Di*prb+(1-Di)*(1-prb))) - dnorm(uk,Uc,sqrt(Vc), log=TRUE)
    alpha = exp( llk1 - llk0  )
    if(alpha>=1){
      ub[k] = uk
      llk0 = llk1
    } else{
      aa = runif(1)
      if(aa<alpha){
        ub[k] = uk
        llk0 = llk1
      }
    }
  }
  return(ub)
}
  • Compute proposal dist
    • we did numerical optimization and calculation of second derivatives (using numDeriv R package).
library(numDeriv)
UV.est <- function(Di,eta,Vu,Uc){
  llk0 = function(xpar){
    Uc = xpar
    prb = 1/(1+exp(-eta-Uc))
    res = dnorm(Uc,sd=sqrt(Vu), log=TRUE) + sum(log(Di*prb+(1-Di)*(1-prb)))
    -res
  }
  tmp = try(optim(Uc, llk0, method='Brent', lower=Uc-10,upper=Uc+10) )
  if(class(tmp)=='try-error') tmp = optim(Uc, llk0)
  Uc = tmp$par
  Vc = 1/hessian(llk0, Uc)
  c(Uc,Vc)
}
UV.mh <- function(Vu,beta,Uc, D,X,subj){
  ## Cov matrix
  sid = unique(subj);  n = length(sid)
  Uc = Vc = rep(0,n)
  for(i in 1:n){
    ij = which(subj==sid[i]);  ni = length(ij)
    Xi = X[ij,,drop=FALSE]
    eta = Xi%*%beta
    zi = UV.est(D[ij],eta,Vu,Uc[i])
    Uc[i] = zi[1]; Vc[i] = zi[2]
  }
  return(list(Uc=Uc,Vc=Vc) )
}
  • Newton Raphson update
    • Compute first/second derives of complete data log likelihood
## score and fisher information
SF.mh <- function(Vu,beta,Uc,Vc, D,X,subj){
  ## S/hessian matrix
  sid = unique(subj);  n = length(sid)
  p = dim(X)[2]
  S = rep(0, p)
  FI = matrix(0, p,p)
  sig2 = 0
  for(i in 1:n){
    ij = which(subj==sid[i]);  ni = length(ij)
    Xi = X[ij,,drop=FALSE]
    eta = Xi%*%beta
    zi = U.mh(D[ij],eta,Vu,Uc[i],Vc[i], B=5e3)[-(1:1e3)]
    theta = sapply(eta, function(b0)  mean(1/(1+exp(-b0-zi))) )
    theta2 = sapply(eta, function(b0) mean(exp(b0+zi)/(1+exp(b0+zi))^2) )
    FI = FI + t(Xi)%*%(theta2*Xi)
    S = S+colSums((D[ij]-theta)*Xi)
    sig2 = sig2 + mean(zi^2)
  }
  return(list(S=S, FI=FI, sig2=sig2/n) )
}
  • Iterative estimation
    • Start from the glmer() fitted values
    • Note MH sampling is more efficient if we update the proposal dist after each iteration (matching mode/curvature).
    • Numerically you can try keeping using the same proposal dist
      • just comment out the two lines of codes for updating Uc/Vc.
      • very slow convergence.
library(lme4)
sid = rep(1:n, K)
a1 = glmer(c(D) ~ Xs + Z[sid] + (1|sid), family=binomial)
## extract variance and fixed effects parameters; + mode/variance of (random effects|data)
Vu = (getME(a1,'theta'))^2; beta = fixef(a1); Um = ranef(a1,condVar=TRUE)
D = c(D); X = cbind(1,Xs,Z[sid]); subj = sid
Uc = unlist(Um[[1]]); Vc = c( attr(Um[[1]], 'postVar') )
for(b in 1:100){
  ## NR updates with MH sampling
  obj = SF.mh(Vu,beta,Uc,Vc, D,X,subj)
  Vu = obj$sig2
  tmp = solve(obj$FI,obj$S)
  beta = beta + tmp
  ## Proposal dist update
  tmp1 = UV.mh(Vu,beta,Uc, D,X,subj)
  Uc = tmp1$Uc; Vc = tmp1$Vc
  cat(b, ':', tmp, ';', obj$S/n, '\n\t', sqrt(Vu), beta, '\n')
}