GLMM estimation with MCMC, and a numerical example
Refs:
- Charles McCulloch (1997). Maximum Likelihood Algorithms for Generalized Linear Mixed Models. JASA, 92(437), 162-170. http://www.jstor.org/stable/2291460
- Charles McCulloch (1994). Maximum Likelihood Variance Components Estimation for Binary Data. JASA, 89(425), 330-335. http://www.jstor.org/stable/2291229
Probit-normal model for binary data
- data: \(Y=(Y_1,\ldots,Y_K)\): K-dim 0-1 vector
- \(\Phi^{-1}[\Pr(Y_i=1|u)] = X_i\beta+Z_iu\)
- \(\Phi\) is CDF of \(N(0,1)\) dist.
- \(u\): q-dim random effects following \(N(0,R)\)
- latent variable model representation
- \(W=X\beta+Zu+\epsilon\), \(\epsilon\sim N(0,I)\)
- \(Y_i=I(W_i>0)\) since \(\Pr(Y_i=1)=\Pr(X_i\beta+Z_iu>-\epsilon_i)=\Phi(X_i\beta+Z_iu)\)
- Marginally \(W\sim N(X\beta,ZRZ^T+I)\)
- so \(\Pr(Y_1=1) = \Phi(X_i\beta/\sqrt{V_{ii}})\), where \(V=ZRZ^T+I\).
- The random effects \(u\) does change the marginal mean of outcomes!
- EM Estimation for fixed effects parameters \(\beta\)
- complete data \(W\)
- \(\max_{\beta} E[\log\Pr(W)|Y]\)
- we can check that \(\hat{\beta} = (X^TV^{-1}X)^{-1}X^TV^{-1}E(W|Y)\)
- EM estimation for \(R\)
- complete data \((W,u)\)
- \(\max_{R}E[\log\Pr(u)|Y]\)
- we can check \(\hat{R} = E(uu^T|Y)\)
- note \(E(uu^T|Y)=E[E(uu^T|W)|Y]\)
- and \((u|W)\) follows a conditional normal dist (mean a linear func
of \(W\) and fixed conditonal variance)
- \(E(u|W) = RZ^TV^{-1}(W-X\beta)\)
- \(Var(u|W) = R - RZ^TV^{-1}ZR\)
- Both estimations reduces to compute \(E(W|Y)\) and \(Var(W|Y)\)
- estimate mean and variance of constrained MVN dist
- easy for one-dim normal dist
- Monte Carlo estimation with Gibbs sampling
- \((W_i|W_{-i},Y)\) equiv to \((W_i|W_{-i},Y_i)\)
- note \((W_i|W_{-i})\) follows a normal dist
- \(E(W_i|W_{-i})=X_i\beta - V_{-i,i}^TV_{-i,-i}^{-1}(W_{-i}-X_{-i}\beta)\)
- \(Var(W_i|W_{-i})=V_{ii} - V_{-i,i}^TV_{-i,-i}^{-1}V_{-i,i}\)
- We need to simulate from \([(W_i|W_{-i})|W_i>0]\)
- this is simulating from an one-dim constrained normal dist.
- sequentially simulate \(W_i|W_{-i},Y\) for \(i=1,\ldots,K\).
General GLMM
- \(\Pr(Y|u)\) independent GLMs
- \(u\sim N(0,R)\)
- EM algorithm estimation
- E-step: \(Q = E[\log\Pr(Y|u)|Y] + E[\log\Pr(u)|Y]\)
- Monte Carlo approx using \(u\sim \Pr(u|Y)\)
- Markov Chain simulation from \(\Pr(u|Y)\propto \Pr(Y|u)\Pr(u)\)
- Metropolis sampling
- given current value \(u\)
- new value \(u'\) from proposal density \(h(u)\)
- acceptance prob: \(\min(1, \Pr(u'|Y)h(u)/[\Pr(u|Y)h(u')])\)
- \(=\min(1,\Pr(u')h(u)/[\Pr(u)h(u')]\Pr(Y|u')/\Pr(Y|u))\)
- take \(h(u)=\Pr(u)\), \(=\min(1,\Pr(Y|u')/\Pr(Y|u))\)
- to ensure prop acceptance prob, can take \(u'_k=u_k, k\neq i\) and update \(u'_i\) only using \(\Pr(u_i|u_{-i})\) (a conditional normal dist)
- \(Q\approx \sum_b \log\Pr(Y|u_b) + \log\Pr(u_b)\), where \(u_b\sim \Pr(u|Y)\)
- maximization straightforward: GLM and covariance matrix estimation
A numerical example
- First we simulate a longitudinal data with 4 observations for each
of 1000 separate individuals. We fit the data with glmer() from
lme4 R package.
- three continuous covariates (varying over time)
- one ordinal covariate (constant over time)
- consider a random intercept model (mean zero and variance 100)
n = 1e3; p = 3; K = 4; sig = 10 set.seed(123) ## time varying covariates Xl = vector('list', K) for(k in 1:K) Xl[[k]] = matrix(rnorm(n*p), n,p) ## constant covariate Z = rbinom(n, 2,0.2) ## random effects U = rnorm(n)*sig ## fixed effects etaX = sapply(Xl, rowSums) ## random errors eps = matrix(rnorm(n*K), n,K) ## logit model eta = etaX + U + eps prb = 1/(1+exp(-eta)) D = 1*(matrix(runif(n*K),n,K)<prb) Xs = Xl[[1]] for(k in 2:K) Xs = rbind(Xs, Xl[[k]]) ## GLMM model library(lme4) sid = rep(1:n, K) ## model fit with GLMMM (default to Laplace approximation) a1 = glmer(c(D) ~ Xs + Z[sid] + (1|sid), family=binomial)
- We will develop three numerical routines
- Simulate from conditional dist of random effects given data
- Will use the Metropolis-Hastings (MH) Markov Chain sampling
- Compute proposal dist in MH sampling
- Will use normal dist matching the mode and hessian (variation) to the conditional dist of random effects
- Feasible since the unknown normalizing constant does not affect the mode and hessian.
- Note each observation has its own proposal dist.
- Need to solve a ridged logistic regression model for each observation
- Parameter estimation
- Fixed effects parameters: one-step Newton-Raphson (Fisher-scoring) for fixed effects
parameters
- need to compute average of mean/variance across MC samples.
- Variance of random effects dist: sample variance estimate.
- Fixed effects parameters: one-step Newton-Raphson (Fisher-scoring) for fixed effects
parameters
- Simulate from conditional dist of random effects given data
Numerical algorithms
- MH sampling algorithm
## MH sampling of random effects | data ## logit\Pr(D_i|eta_i,U) = eta_i+U; U \sim N(0,Vu) ## proposal dist: N(Uc,Vc) U.mh <- function(Di,eta, Vu, Uc,Vc, B=100){ ub = rep(0, B) ub[1] = rnorm(1)*sqrt(Vc)+Uc prb = 1/(1+exp(-eta-ub[1])) llk0 = dnorm(ub[1],sd=sqrt(Vu), log=TRUE) + sum(log(Di*prb+(1-Di)*(1-prb))) - dnorm(ub[1],Uc,sqrt(Vc), log=TRUE) for(k in 2:B){ ub[k] = ub[k-1] uk = rnorm(1)*sqrt(Vc)+Uc prb = 1/(1+exp(-eta-uk)) llk1 = dnorm(uk,sd=sqrt(Vu), log=TRUE) + sum(log(Di*prb+(1-Di)*(1-prb))) - dnorm(uk,Uc,sqrt(Vc), log=TRUE) alpha = exp( llk1 - llk0 ) if(alpha>=1){ ub[k] = uk llk0 = llk1 } else{ aa = runif(1) if(aa<alpha){ ub[k] = uk llk0 = llk1 } } } return(ub) }
- Compute proposal dist
- we did numerical optimization and calculation of second derivatives (using numDeriv R package).
library(numDeriv) UV.est <- function(Di,eta,Vu,Uc){ llk0 = function(xpar){ Uc = xpar prb = 1/(1+exp(-eta-Uc)) res = dnorm(Uc,sd=sqrt(Vu), log=TRUE) + sum(log(Di*prb+(1-Di)*(1-prb))) -res } tmp = try(optim(Uc, llk0, method='Brent', lower=Uc-10,upper=Uc+10) ) if(class(tmp)=='try-error') tmp = optim(Uc, llk0) Uc = tmp$par Vc = 1/hessian(llk0, Uc) c(Uc,Vc) } UV.mh <- function(Vu,beta,Uc, D,X,subj){ ## Cov matrix sid = unique(subj); n = length(sid) Uc = Vc = rep(0,n) for(i in 1:n){ ij = which(subj==sid[i]); ni = length(ij) Xi = X[ij,,drop=FALSE] eta = Xi%*%beta zi = UV.est(D[ij],eta,Vu,Uc[i]) Uc[i] = zi[1]; Vc[i] = zi[2] } return(list(Uc=Uc,Vc=Vc) ) }
- Newton Raphson update
- Compute first/second derives of complete data log likelihood
## score and fisher information SF.mh <- function(Vu,beta,Uc,Vc, D,X,subj){ ## S/hessian matrix sid = unique(subj); n = length(sid) p = dim(X)[2] S = rep(0, p) FI = matrix(0, p,p) sig2 = 0 for(i in 1:n){ ij = which(subj==sid[i]); ni = length(ij) Xi = X[ij,,drop=FALSE] eta = Xi%*%beta zi = U.mh(D[ij],eta,Vu,Uc[i],Vc[i], B=5e3)[-(1:1e3)] theta = sapply(eta, function(b0) mean(1/(1+exp(-b0-zi))) ) theta2 = sapply(eta, function(b0) mean(exp(b0+zi)/(1+exp(b0+zi))^2) ) FI = FI + t(Xi)%*%(theta2*Xi) S = S+colSums((D[ij]-theta)*Xi) sig2 = sig2 + mean(zi^2) } return(list(S=S, FI=FI, sig2=sig2/n) ) }
- Iterative estimation
- Start from the glmer() fitted values
- Note MH sampling is more efficient if we update the proposal dist after each iteration (matching mode/curvature).
- Numerically you can try keeping using the same proposal dist
- just comment out the two lines of codes for updating Uc/Vc.
- very slow convergence.
library(lme4) sid = rep(1:n, K) a1 = glmer(c(D) ~ Xs + Z[sid] + (1|sid), family=binomial) ## extract variance and fixed effects parameters; + mode/variance of (random effects|data) Vu = (getME(a1,'theta'))^2; beta = fixef(a1); Um = ranef(a1,condVar=TRUE) D = c(D); X = cbind(1,Xs,Z[sid]); subj = sid Uc = unlist(Um[[1]]); Vc = c( attr(Um[[1]], 'postVar') ) for(b in 1:100){ ## NR updates with MH sampling obj = SF.mh(Vu,beta,Uc,Vc, D,X,subj) Vu = obj$sig2 tmp = solve(obj$FI,obj$S) beta = beta + tmp ## Proposal dist update tmp1 = UV.mh(Vu,beta,Uc, D,X,subj) Uc = tmp1$Uc; Vc = tmp1$Vc cat(b, ':', tmp, ';', obj$S/n, '\n\t', sqrt(Vu), beta, '\n') }