GEE inference of repeated measurement data
- Some GEE references
- Kung-Yee Liang, Scott Zeger (1986). Longitudinal Data Analysis Using Generalized Linear Models.
Biometrika, 73(1), 13-22. http://www.jstor.org/stable/2336267
- Scott Zeger, Kung-Yee Liang (1986). Longitudinal Data Analysis for Discrete and Continuous Outcomes.
Biometrics, 42(1), 121-130. http://www.jstor.org/stable/2531248
- Scott Zeger; Kung-Yee Liang; Paul Albert (1988). Models for Longitudinal Data: A Generalized Estimating Equation Approach.
Biometrics, 44(4), 1049-1060. http://www.jstor.org/stable/2531734
- Paul Burton, Lyle Gurrin, Peter Sly (1998). Extending the simple linear regression model to account for correlated
responses: An introduction to generalized estimating equations and multi-level mixed modelling.
Statistics in Medicine, 17(11), 1261-1291. http://dx.doi.org/10.1002/(SICI)1097-0258(19980615)17:11<1261::AID-SIM846>3.0.CO;2-Z
- Kung-Yee Liang, Scott Zeger and Hahjat Qaqish (1992). Multivariate regression analyses for categorial data.
JRSSB, 54(1), 3-40. http://www.jstor.org/stable/2345947
- Vincent Carey, Scott Zeger , and Peter Diggle (1993). Modelling multivariate binary data with alternating logistic regressions.
Biometrika 80: 517-526. doi:10.1093/biomet/80.3.517
- Lue Ping Zhao; Ross Prentice (1990). Correlated Binary Regression Using a Quadratic Exponential Model.
Biometrika, 77(3), 642-648. http://www.jstor.org/stable/2337004
- Ross Prentice; Lue Ping Zhao (1991). Estimating Equations for Parameters in Means and Covariances of Multivariate Discrete and Continuous Responses.
Biometrics, 47(3), 825-839. http://www.jstor.org/stable/2532642
- Lue Ping Zhao; Ross Prentice; Steven Self (1992). Multivariate Mean Parameter Estimation by Using a Partly Exponential Model.
JRSSB, 54(3), 805-811. http://www.jstor.org/stable/2345860
- Heagerty, P.J. and Zeger, S.L. (1996) Marginal regression models for clustered ordinal measurements.
JASA, 91, 1024-1036. http://www.jstor.org/stable/2291722
GEE for matched pair analysis
## data, P. 202
R> Nc = matrix(c(501,146,157,132),2,2)
R> N = sum(Nc)
R> y1 = rep(c(1,0,1,0), Nc)
R> y0 = rep(c(1,1,0,0), Nc)
R> ( tb1 = table(y1,y0) )
y0
y1 0 1
0 132 146
1 157 501
R> y = c(y1,y0)
R> x = rep(1:0, each=N)
R> id = rep(1:N, 2)
R> ## QEF: dependence modeling with MLE
R> p = tb1[1:4]/N
R> eor = (p[2]+p[4])/(p[1]+p[3])/((p[3]+p[4])/(p[1]+p[2]))
R> cv1 = (diag(p[-4])-outer(p[-4], p[-4]))/N
R> r1 = c(-1/(p[2]+p[4])-1/(p[1]+p[3]) + 1/(p[3]+p[4]) + 1/(p[1]+p[2]),
+ 1/(p[3]+p[4]) + 1/(p[1]+p[2]), -1/(p[2]+p[4]) - 1/(p[1]+p[3]))
R> sqrt(r1%*%cv1%*%r1)
[,1]
[1,] 0.08807813
R> as.character( .Last.value )
[1] "0.0880781313014677"
R> log(eor)/sqrt(r1%*%cv1%*%r1)
[,1]
[1,] 0.6319867
R> 2*pnorm(-abs(.Last.value))
[,1]
[1,] 0.5273955
R> (p[4]-(p[2]+p[4])*(p[3]+p[4]))/sqrt((p[2]+p[4])*(p[1]+p[3]))/sqrt((p[3]+p[4])*(p[1]+p[2]))
[1] 0.2336404
R> cor(y1,y0)
[1] 0.2336404
R> p[1]*p[4]/p[2]/p[3]
[1] 2.885089
R> exp(glm(y1 ~ y0, family=binomial)$coef[2])
y0
2.885089
R> ## GEE : explicit fitting of marginal prob
R> library(geepack)
R> ij = order(id)
R> summary( geeglm(y[ij] ~ x[ij], family=binomial, id=id[ij]) )
Call:
geeglm(formula = y[ij] ~ x[ij], family = binomial, id = id[ij])
Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) 0.80592 0.07074 129.784 <2e-16
x[ij] 0.05566 0.08817 0.399 0.528
Estimated Scale Parameters:
Estimate Std.err
(Intercept) 1 0.02567
Correlation: Structure = independenceNumber of clusters: 936 Maximum cluster size: 2
R> as.character( .Last.value$coef[2,2] )
[1] "0.088172225982748"
R> library(gee)
R> summary(gee(y[ij] ~ x[ij], id=id[ij], family=binomial, corstr="unstructured"))
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
(Intercept) x[ij]
0.80592 0.05566
GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
gee S-function, version 4.13 modified 98/01/27 (1998)
Model:
Link: Logit
Variance to Mean Relation: Binomial
Correlation Structure: Unstructured
Call:
gee(formula = y[ij] ~ x[ij], id = id[ij], family = binomial,
corstr = "unstructured")
Summary of Residuals:
Min 1Q Median 3Q Max
-0.7030 -0.6912 0.2970 0.3088 0.3088
Coefficients:
Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept) 0.80592 0.07079 11.3847 0.07075 11.391
x[ij] 0.05566 0.08813 0.6316 0.08808 0.632
Estimated Scale Parameter: 1.001
Number of Iterations: 1
Working Correlation
[,1] [,2]
[1,] 1.0000 0.2336
[2,] 0.2336 1.0000
R> as.character( .Last.value$coef[2,4] )
[1] "0.0880781313014677"
R> summary(gee(y[ij] ~ x[ij], id=id[ij], family=binomial, corstr="independence"))
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
(Intercept) x[ij]
0.80592 0.05566
GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
gee S-function, version 4.13 modified 98/01/27 (1998)
Model:
Link: Logit
Variance to Mean Relation: Binomial
Correlation Structure: Independent
Call:
gee(formula = y[ij] ~ x[ij], id = id[ij], family = binomial,
corstr = "independence")
Summary of Residuals:
Min 1Q Median 3Q Max
-0.7030 -0.6912 0.2970 0.3088 0.3088
Coefficients:
Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept) 0.80592 0.07079 11.385 0.07075 11.391
x[ij] 0.05566 0.10067 0.553 0.08808 0.632
Estimated Scale Parameter: 1.001
Number of Iterations: 1
Working Correlation
[,1] [,2]
[1,] 1 0
[2,] 0 1
R> as.character( .Last.value$coef[2,4] )
[1] "0.0880781313014683"
R> obj = glm(y ~ x, family=binomial)
R> summary(obj)
Call:
glm(formula = y ~ x, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.56 -1.53 0.84 0.86 0.86
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.8059 0.0708 11.39 <2e-16
x 0.0557 0.1006 0.55 0.58
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2296.2 on 1871 degrees of freedom
Residual deviance: 2295.8 on 1870 degrees of freedom
AIC: 2300
Number of Fisher Scoring iterations: 4
R> ( tb = table(x,y) )
y
x 0 1
0 289 647
1 278 658
R> ( beta1 = log(tb[4]*tb[1]/tb[2]/tb[3]) )
[1] 0.05566
R> sqrt(sum(1/tb))
[1] 0.1006
R> ( alpha1 = log(tb[3]/tb[1]) )
[1] 0.806
R> sqrt(sum(1/tb[1]+1/tb[3]))
[1] 0.07075
R> ## GEE1 : independence working cov
R> prd = obj$fit
R> res = y-prd
R> cv0 = prd*(1-prd)
R> v1i = diag( 1/c(cv0[1], cv0[N+1]) )
R> Gi = matrix(c(cv0[1],cv0[1],cv0[1+N],0), 2,2)
R> U1 = N*Gi%*%v1i%*%t(Gi)
R> cv1 = matrix(c(sum(res[1:N]^2),sum(res[1:N]*res[N+1:N]), sum(res[1:N]*res[N+1:N]),
+ sum(res[N+1:N]^2)), 2,2)
R> Uv = Gi%*%v1i%*%cv1%*%v1i%*%t(Gi)
R> ## robust cov
R> ( fim = solve(U1)%*%Uv%*%solve(U1) )
[,1] [,2]
[1,] 0.005006 -0.003823
[2,] -0.003823 0.007758
R> as.character( sqrt(diag(fim)) )
[1] "0.0707516972697433" "0.0880781313014295"
R> ## model-based cov
R> solve(U1)
[,1] [,2]
[1,] 0.005006 -0.005006
[2,] -0.005006 0.010123
R> sqrt(diag(solve(U1)))
[1] 0.07075 0.10061
R> ## GEE1 : constant corr in working cov
R> prd = obj$fit
R> res = y-prd
R> cv0 = prd*(1-prd)
R> rho = cor(res[1:N], res[N+1:N])
R> res1 = res/sqrt(cv0)
R> mean(res1[1:N]*res1[N+1:N])
[1] 0.2336
R> cor(y1,y0)
[1] 0.2336
R> v1 = matrix(c(cv0[1], rep(sqrt(cv0[1]*cv0[1+N])*rho,2), cv0[N+1]), 2,2)
R> v1i = solve(v1)
R> Gi = matrix(c(cv0[1],cv0[1],cv0[1+N],0), 2,2)
R> U1 = N*Gi%*%v1i%*%t(Gi)
R> cv1 = matrix(c(sum(res[1:N]^2), rep(sum(res[1:N]*res[N+1:N]),2), sum(res[N+1:N]^2)), 2,2)
R> Uv = Gi%*%v1i%*%cv1%*%v1i%*%t(Gi)
R> ## robust cov
R> ( fim = solve(U1)%*%Uv%*%solve(U1) )
[,1] [,2]
[1,] 0.005006 -0.003823
[2,] -0.003823 0.007758
R> as.character( sqrt(diag(fim)) )
[1] "0.0707516972697433" "0.0880781313014295"
R> ## model-based cov
R> solve(U1)
[,1] [,2]
[1,] 0.005006 -0.003823
[2,] -0.003823 0.007758
R> sqrt(diag(solve(U1)))
[1] 0.07075 0.08808
R> as.character(.Last.value)
[1] "0.0707516972697545" "0.0880781313014486"
R> ## GEE1 : constant OR in working cov
R> prd = obj$fit
R> res = y-prd
R> cv0 = prd*(1-prd)
R> lor1 = glm(y1 ~ y0, family=binomial)$coef[2]
R> eor1 = exp(lor1)
R> c0 = 1 - (prd[1]+prd[1+N])*(1-eor1)
R> u12 = (c0-sqrt(c0^2-4*(eor1-1)*eor1*prd[1]*prd[1+N]))/2/(eor1-1)
R> v12 = u12 - prd[1]*prd[1+N]
R> ## rho :: v12/sqrt(cv0[1]*cv0[N+1])
R> v1 = matrix(c(cv0[1], rep(v12,2), cv0[N+1]), 2,2)
R> v1i = solve(v1)
R> Gi = matrix(c(cv0[1],cv0[1],cv0[1+N],0), 2,2)
R> U1 = N*Gi%*%v1i%*%t(Gi)
R> cv1 = matrix(c(sum(res[1:N]^2), rep(sum(res[1:N]*res[N+1:N]),2), sum(res[N+1:N]^2)), 2,2)
R> Uv = Gi%*%v1i%*%cv1%*%v1i%*%t(Gi)
R> ## robust cov
R> ( fim = solve(U1)%*%Uv%*%solve(U1) )
[,1] [,2]
[1,] 0.005006 -0.003823
[2,] -0.003823 0.007758
R> as.character( sqrt(diag(fim)) )
[1] "0.0707516972697433" "0.0880781313014295"
R> ## model-based cov
R> solve(U1)
[,1] [,2]
[1,] 0.005006 -0.003823
[2,] -0.003823 0.007758
R> sqrt(diag(solve(U1)))
[1] 0.07075 0.08808
R> as.character(.Last.value)
[1] "0.0707516972697545" "0.0880781313014661"
R> ## GEE1 : constant OR in working cov : FS/ALR
R> y1 = rep(c(1,0,1,0), Nc)
R> y0 = rep(c(1,1,0,0), Nc)
R> eor = 1.5
R> ## fisher scoring
R> rcf = 0:1
R> prd = 1/(1+exp(- rcf[1] - rcf[2]*x))
R> cv0 = prd*(1-prd)
R> res = y-prd
R> for(b in 1:5){
+ c0 = 1 - (prd[1]+prd[1+N])*(1-eor)
+ u12 = (c0-sqrt(c0^2-4*(eor-1)*eor*prd[1]*prd[1+N]))/2/(eor-1)
+ v12 = u12 - prd[1]*prd[1+N]
+ v1 = matrix(c(cv0[1], rep(v12,2), cv0[N+1]), 2,2)
+ v1i = solve(v1)
+ Gi = matrix(c(cv0[1],cv0[1],cv0[1+N],0), 2,2)
+ U1 = Gi%*%v1i%*%t(Gi)
+ U0 = Gi%*%v1i%*%c(mean(res[1:N]), mean(res[N+1:N]))
+ dif = solve(U1, U0)
+ rcf = rcf + dif
+ prd = 1/(1+exp(- rcf[1] - rcf[2]*x))
+ res = y-prd
+ cv0 = prd*(1-prd)
+ ## ALR : OR
+ ofs = (prd[1]-u12)/(1-prd[1]-prd[N+1]+u12)
+ lor = glm(y1 ~ y0, offset=rep(ofs,N), family=binomial)$coef[2]
+ eor = exp(lor)
+ cat(b, ":", dif, ",", eor, rcf, "\n")
+ }
1 : 0.765 -0.9077 , 2.885 0.765 0.09229
2 : 0.04065 -0.03632 , 2.885 0.8056 0.05597
3 : 0.0003095 -0.0003056 , 2.885 0.806 0.05566
4 : 1.831e-08 -1.831e-08 , 2.885 0.806 0.05566
5 : 5.976e-17 -3.299e-16 , 2.885 0.806 0.05566
R> ## rob.cov
R> U1 = N*Gi%*%v1i%*%t(Gi)
R> cv1 = matrix(c(sum(res[1:N]^2), rep(sum(res[1:N]*res[N+1:N]),2), sum(res[N+1:N]^2)), 2,2)
R> Uv = Gi%*%v1i%*%cv1%*%v1i%*%t(Gi)
R> ## robust cov
R> ( fim = solve(U1)%*%Uv%*%solve(U1) )
[,1] [,2]
[1,] 0.005864 -0.004810
[2,] -0.004810 0.008892
R> as.character( sqrt(diag(fim)) )
[1] "0.0765740564338148" "0.094298554346957"
R> ## model-based cov
R> solve(U1)
[,1] [,2]
[1,] 0.005006 -0.003823
[2,] -0.003823 0.007758
R> sqrt(diag(solve(U1)))
[1] 0.07075 0.08808
R> as.character(.Last.value)
[1] "0.0707516972697545" "0.0880781313014661"