GEE inference of repeated measurement data


Some GEE references
  1. Kung-Yee Liang, Scott Zeger (1986). Longitudinal Data Analysis Using Generalized Linear Models. Biometrika, 73(1), 13-22. http://www.jstor.org/stable/2336267
  2. Scott Zeger, Kung-Yee Liang (1986). Longitudinal Data Analysis for Discrete and Continuous Outcomes. Biometrics, 42(1), 121-130. http://www.jstor.org/stable/2531248
  3. Scott Zeger; Kung-Yee Liang; Paul Albert (1988). Models for Longitudinal Data: A Generalized Estimating Equation Approach. Biometrics, 44(4), 1049-1060. http://www.jstor.org/stable/2531734
  4. Paul Burton, Lyle Gurrin, Peter Sly (1998). Extending the simple linear regression model to account for correlated responses: An introduction to generalized estimating equations and multi-level mixed modelling. Statistics in Medicine, 17(11), 1261-1291. http://dx.doi.org/10.1002/(SICI)1097-0258(19980615)17:11<1261::AID-SIM846>3.0.CO;2-Z
  5. Kung-Yee Liang, Scott Zeger and Hahjat Qaqish (1992). Multivariate regression analyses for categorial data. JRSSB, 54(1), 3-40. http://www.jstor.org/stable/2345947
  6. Vincent Carey, Scott Zeger , and Peter Diggle (1993). Modelling multivariate binary data with alternating logistic regressions. Biometrika 80: 517-526. doi:10.1093/biomet/80.3.517
  7. Lue Ping Zhao; Ross Prentice (1990). Correlated Binary Regression Using a Quadratic Exponential Model. Biometrika, 77(3), 642-648. http://www.jstor.org/stable/2337004
  8. Ross Prentice; Lue Ping Zhao (1991). Estimating Equations for Parameters in Means and Covariances of Multivariate Discrete and Continuous Responses. Biometrics, 47(3), 825-839. http://www.jstor.org/stable/2532642
  9. Lue Ping Zhao; Ross Prentice; Steven Self (1992). Multivariate Mean Parameter Estimation by Using a Partly Exponential Model. JRSSB, 54(3), 805-811. http://www.jstor.org/stable/2345860
  10. Heagerty, P.J. and Zeger, S.L. (1996) Marginal regression models for clustered ordinal measurements. JASA, 91, 1024-1036. http://www.jstor.org/stable/2291722

GEE for matched pair analysis
##  data, P. 202
R> Nc = matrix(c(501,146,157,132),2,2)
R> N = sum(Nc)
R> y1 = rep(c(1,0,1,0), Nc)
R> y0 = rep(c(1,1,0,0), Nc)
R> ( tb1 = table(y1,y0) )
   y0
y1    0   1
  0 132 146
  1 157 501
R> y = c(y1,y0)
R> x = rep(1:0, each=N)
R> id = rep(1:N, 2)


R> ## QEF: dependence modeling with MLE
R> p = tb1[1:4]/N
R> eor = (p[2]+p[4])/(p[1]+p[3])/((p[3]+p[4])/(p[1]+p[2]))
R> cv1 = (diag(p[-4])-outer(p[-4], p[-4]))/N
R> r1 = c(-1/(p[2]+p[4])-1/(p[1]+p[3]) + 1/(p[3]+p[4]) + 1/(p[1]+p[2]),
+   1/(p[3]+p[4]) + 1/(p[1]+p[2]), -1/(p[2]+p[4]) - 1/(p[1]+p[3]))
R> sqrt(r1%*%cv1%*%r1)
           [,1]
[1,] 0.08807813
R> as.character( .Last.value )
[1] "0.0880781313014677"
R> log(eor)/sqrt(r1%*%cv1%*%r1)
          [,1]
[1,] 0.6319867
R> 2*pnorm(-abs(.Last.value))
          [,1]
[1,] 0.5273955
R> (p[4]-(p[2]+p[4])*(p[3]+p[4]))/sqrt((p[2]+p[4])*(p[1]+p[3]))/sqrt((p[3]+p[4])*(p[1]+p[2]))
[1] 0.2336404
R> cor(y1,y0)
[1] 0.2336404
R> p[1]*p[4]/p[2]/p[3]
[1] 2.885089
R> exp(glm(y1 ~ y0, family=binomial)$coef[2])
      y0
2.885089



R> ## GEE : explicit fitting of marginal prob
R> library(geepack)
R> ij = order(id)
R> summary( geeglm(y[ij] ~ x[ij], family=binomial, id=id[ij]) )
Call:
geeglm(formula = y[ij] ~ x[ij], family = binomial, id = id[ij])

 Coefficients:
            Estimate Std.err    Wald Pr(>|W|)
(Intercept)  0.80592 0.07074 129.784   <2e-16
x[ij]        0.05566 0.08817   0.399    0.528

Estimated Scale Parameters:
            Estimate Std.err
(Intercept)        1 0.02567

Correlation: Structure = independenceNumber of clusters:   936   Maximum cluster size: 2
R> as.character( .Last.value$coef[2,2] )
[1] "0.088172225982748"
R> library(gee)
R> summary(gee(y[ij] ~ x[ij], id=id[ij], family=binomial, corstr="unstructured"))
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
(Intercept)       x[ij]
    0.80592     0.05566

 GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
 gee S-function, version 4.13 modified 98/01/27 (1998)

Model:
 Link:                      Logit
 Variance to Mean Relation: Binomial
 Correlation Structure:     Unstructured

Call:
gee(formula = y[ij] ~ x[ij], id = id[ij], family = binomial,
    corstr = "unstructured")

Summary of Residuals:
    Min      1Q  Median      3Q     Max
-0.7030 -0.6912  0.2970  0.3088  0.3088


Coefficients:
            Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept)  0.80592    0.07079 11.3847     0.07075   11.391
x[ij]        0.05566    0.08813  0.6316     0.08808    0.632

Estimated Scale Parameter:  1.001
Number of Iterations:  1

Working Correlation
       [,1]   [,2]
[1,] 1.0000 0.2336
[2,] 0.2336 1.0000
R> as.character( .Last.value$coef[2,4] )
[1] "0.0880781313014677"
R> summary(gee(y[ij] ~ x[ij], id=id[ij], family=binomial, corstr="independence"))
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
(Intercept)       x[ij]
    0.80592     0.05566

 GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
 gee S-function, version 4.13 modified 98/01/27 (1998)

Model:
 Link:                      Logit
 Variance to Mean Relation: Binomial
 Correlation Structure:     Independent

Call:
gee(formula = y[ij] ~ x[ij], id = id[ij], family = binomial,
    corstr = "independence")

Summary of Residuals:
    Min      1Q  Median      3Q     Max
-0.7030 -0.6912  0.2970  0.3088  0.3088


Coefficients:
            Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept)  0.80592    0.07079  11.385     0.07075   11.391
x[ij]        0.05566    0.10067   0.553     0.08808    0.632

Estimated Scale Parameter:  1.001
Number of Iterations:  1

Working Correlation
     [,1] [,2]
[1,]    1    0
[2,]    0    1
R> as.character( .Last.value$coef[2,4] )
[1] "0.0880781313014683"


R> obj = glm(y ~ x, family=binomial)
R> summary(obj)

Call:
glm(formula = y ~ x, family = binomial)

Deviance Residuals:
   Min      1Q  Median      3Q     Max
 -1.56   -1.53    0.84    0.86    0.86

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.8059     0.0708   11.39   <2e-16
x             0.0557     0.1006    0.55     0.58

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2296.2  on 1871  degrees of freedom
Residual deviance: 2295.8  on 1870  degrees of freedom
AIC: 2300

Number of Fisher Scoring iterations: 4

R> ( tb = table(x,y) )
   y
x     0   1
  0 289 647
  1 278 658
R> ( beta1 = log(tb[4]*tb[1]/tb[2]/tb[3]) )
[1] 0.05566
R> sqrt(sum(1/tb))
[1] 0.1006
R> ( alpha1 = log(tb[3]/tb[1]) )
[1] 0.806
R> sqrt(sum(1/tb[1]+1/tb[3]))
[1] 0.07075



R> ## GEE1 : independence working cov
R> prd = obj$fit
R> res = y-prd
R> cv0 = prd*(1-prd)
R> v1i = diag( 1/c(cv0[1], cv0[N+1]) )
R> Gi = matrix(c(cv0[1],cv0[1],cv0[1+N],0), 2,2)
R> U1 = N*Gi%*%v1i%*%t(Gi)
R> cv1 = matrix(c(sum(res[1:N]^2),sum(res[1:N]*res[N+1:N]), sum(res[1:N]*res[N+1:N]),
+   sum(res[N+1:N]^2)), 2,2)
R> Uv = Gi%*%v1i%*%cv1%*%v1i%*%t(Gi)
R> ## robust cov
R> ( fim = solve(U1)%*%Uv%*%solve(U1) )
          [,1]      [,2]
[1,]  0.005006 -0.003823
[2,] -0.003823  0.007758
R> as.character( sqrt(diag(fim)) )
[1] "0.0707516972697433" "0.0880781313014295"
R> ## model-based cov
R> solve(U1)
          [,1]      [,2]
[1,]  0.005006 -0.005006
[2,] -0.005006  0.010123
R> sqrt(diag(solve(U1)))
[1] 0.07075 0.10061


R> ## GEE1 : constant corr in working cov
R> prd = obj$fit
R> res = y-prd
R> cv0 = prd*(1-prd)
R> rho = cor(res[1:N], res[N+1:N])
R> res1 = res/sqrt(cv0)
R> mean(res1[1:N]*res1[N+1:N])
[1] 0.2336
R> cor(y1,y0)
[1] 0.2336
R> v1 = matrix(c(cv0[1], rep(sqrt(cv0[1]*cv0[1+N])*rho,2), cv0[N+1]), 2,2)
R> v1i = solve(v1)
R> Gi = matrix(c(cv0[1],cv0[1],cv0[1+N],0), 2,2)
R> U1 = N*Gi%*%v1i%*%t(Gi)
R> cv1 = matrix(c(sum(res[1:N]^2), rep(sum(res[1:N]*res[N+1:N]),2), sum(res[N+1:N]^2)), 2,2)
R> Uv = Gi%*%v1i%*%cv1%*%v1i%*%t(Gi)
R> ## robust cov
R> ( fim = solve(U1)%*%Uv%*%solve(U1) )
          [,1]      [,2]
[1,]  0.005006 -0.003823
[2,] -0.003823  0.007758
R> as.character( sqrt(diag(fim)) )
[1] "0.0707516972697433" "0.0880781313014295"
R> ## model-based cov
R> solve(U1)
          [,1]      [,2]
[1,]  0.005006 -0.003823
[2,] -0.003823  0.007758
R> sqrt(diag(solve(U1)))
[1] 0.07075 0.08808
R> as.character(.Last.value)
[1] "0.0707516972697545" "0.0880781313014486"


R> ## GEE1 : constant OR in working cov
R> prd = obj$fit
R> res = y-prd
R> cv0 = prd*(1-prd)
R> lor1 = glm(y1 ~ y0, family=binomial)$coef[2]
R> eor1 = exp(lor1)
R> c0 = 1 - (prd[1]+prd[1+N])*(1-eor1)
R> u12 = (c0-sqrt(c0^2-4*(eor1-1)*eor1*prd[1]*prd[1+N]))/2/(eor1-1)
R> v12 = u12 - prd[1]*prd[1+N]
R> ## rho :: v12/sqrt(cv0[1]*cv0[N+1])
R> v1 = matrix(c(cv0[1], rep(v12,2), cv0[N+1]), 2,2)
R> v1i = solve(v1)
R> Gi = matrix(c(cv0[1],cv0[1],cv0[1+N],0), 2,2)
R> U1 = N*Gi%*%v1i%*%t(Gi)
R> cv1 = matrix(c(sum(res[1:N]^2), rep(sum(res[1:N]*res[N+1:N]),2), sum(res[N+1:N]^2)), 2,2)
R> Uv = Gi%*%v1i%*%cv1%*%v1i%*%t(Gi)
R> ## robust cov
R> ( fim = solve(U1)%*%Uv%*%solve(U1) )
          [,1]      [,2]
[1,]  0.005006 -0.003823
[2,] -0.003823  0.007758
R> as.character( sqrt(diag(fim)) )
[1] "0.0707516972697433" "0.0880781313014295"
R> ## model-based cov
R> solve(U1)
          [,1]      [,2]
[1,]  0.005006 -0.003823
[2,] -0.003823  0.007758
R> sqrt(diag(solve(U1)))
[1] 0.07075 0.08808
R> as.character(.Last.value)
[1] "0.0707516972697545" "0.0880781313014661"


R> ## GEE1 : constant OR in working cov : FS/ALR
R> y1 = rep(c(1,0,1,0), Nc)
R> y0 = rep(c(1,1,0,0), Nc)
R> eor = 1.5
R> ## fisher scoring
R> rcf = 0:1
R> prd = 1/(1+exp(- rcf[1] - rcf[2]*x))
R> cv0 = prd*(1-prd)
R> res = y-prd
R> for(b in 1:5){
+   c0 = 1 - (prd[1]+prd[1+N])*(1-eor)
+   u12 = (c0-sqrt(c0^2-4*(eor-1)*eor*prd[1]*prd[1+N]))/2/(eor-1)
+   v12 = u12 - prd[1]*prd[1+N]
+   v1 = matrix(c(cv0[1], rep(v12,2), cv0[N+1]), 2,2)
+   v1i = solve(v1)
+   Gi = matrix(c(cv0[1],cv0[1],cv0[1+N],0), 2,2)
+   U1 = Gi%*%v1i%*%t(Gi)
+   U0 = Gi%*%v1i%*%c(mean(res[1:N]), mean(res[N+1:N]))
+   dif = solve(U1, U0)
+   rcf = rcf + dif
+   prd = 1/(1+exp(- rcf[1] - rcf[2]*x))
+   res = y-prd
+   cv0 = prd*(1-prd)
+   ## ALR : OR
+   ofs = (prd[1]-u12)/(1-prd[1]-prd[N+1]+u12)
+   lor = glm(y1 ~ y0, offset=rep(ofs,N), family=binomial)$coef[2]
+   eor = exp(lor)
+   cat(b, ":", dif, ",", eor, rcf, "\n")
+ }
1 : 0.765 -0.9077 , 2.885 0.765 0.09229
2 : 0.04065 -0.03632 , 2.885 0.8056 0.05597
3 : 0.0003095 -0.0003056 , 2.885 0.806 0.05566
4 : 1.831e-08 -1.831e-08 , 2.885 0.806 0.05566
5 : 5.976e-17 -3.299e-16 , 2.885 0.806 0.05566
R> ## rob.cov
R> U1 = N*Gi%*%v1i%*%t(Gi)
R> cv1 = matrix(c(sum(res[1:N]^2), rep(sum(res[1:N]*res[N+1:N]),2), sum(res[N+1:N]^2)), 2,2)
R> Uv = Gi%*%v1i%*%cv1%*%v1i%*%t(Gi)
R> ## robust cov
R> ( fim = solve(U1)%*%Uv%*%solve(U1) )
          [,1]      [,2]
[1,]  0.005864 -0.004810
[2,] -0.004810  0.008892
R> as.character( sqrt(diag(fim)) )
[1] "0.0765740564338148" "0.094298554346957"
R> ## model-based cov
R> solve(U1)
          [,1]      [,2]
[1,]  0.005006 -0.003823
[2,] -0.003823  0.007758
R> sqrt(diag(solve(U1)))
[1] 0.07075 0.08808
R> as.character(.Last.value)
[1] "0.0707516972697545" "0.0880781313014661"