VARIANCE ESTIMATION                                            SPH 5421 notes.028

     Variance-covariance matrices for maximum-likelihood estimates of parameters are
useful for constructing confidence regions and performing tests.  SAS and Splus
linear and nonlinear regression routines all provide ways to estimate the asymptotic
covariance matrix for covariates.  For maximum likelihood problems in general, the
covariance matrix is the inverse of the 'Fisher Information' matrix.

     Fisher information is the expectation of the negative of the matrix of second
partial derivatives of the log likelihood.  Let l = log(L; B), where L is the
likelihood of an observed data set and B is a vector of parameters. Then

                Fisher Information = E{dl/dbi * dl/dbj}.

     The expectation is taken across all the possible realizations of the data.
There is a theorem which says

[1]             -E{d2l/dbidbj} = E{dl/dbi * dl/dbj}.

(See, e.g., P. Bickel & K. Doksum, Mathematical Statistics: Basic Ideas and
Selected Topics, Holden-Day, Inc., San Francisco, 1977, p 139.)

     Thus Fisher information can be expressed in terms of the expectation of a
matrix of second derivatives.

     The problem with Fisher information is that the expectation is very difficult
to calculate.  What is done in fact is to evaluate the matrix of second derivatives
at the MLE itself.  This works well if you have exact expressions for the second
derivative matrix.  Numerical approximation of the second derivative matrix is possible,
(as described below) but in general it is subject to numerical instability because
of small denominators.

     An approach using a numerical estimate of the second derivative is implemented
in the program below.  In this program, which was discussed originally in notes.019, maximum
likelihood was used to approximate coefficients b0, and b1 for a
logistic risk function.  The covariance matrix was estimated using

     -inv(d2l/dbidbj)

evaluated analytically at the maximum likelihood estimates themselves.

     Numerical approximation of the second derivatives was added to the latter part
of this program.  Here is the underlying idea.  Let l(b0, b1)
represent the log likelihood, where b0 and b1 are the MLEs.
Then the various second derivatives of the log likelihood may be approximated as follows:

     d2l(b0, b1)/db02  = {l(b0 + h, b1) - 2l(b0, b1) + l(b0 - h, b1)} / (h^2)

     d2l(b0, b1)/db12  = {l(b0, b1 + h) - 2l(b0, b1) + l(b0, b1 - h)} / (h^2)

     d2l(b0, b1)/db1db2  = {l(b0 + h, b1 + h) - l(b0 + h, b1 - h) - l(b0 - h, b1 + h) + l(b0 - h, b1 - h)} / (4*h^2)

  These estimates are computed in the following section of the program:

  h = 1e-6 ;
  beta1ph = beta ;
  beta1ph[1] = beta[1] + h ;
  beta1mh = beta ;
  beta1mh[1] = beta[1] - h ;
  beta2ph = beta ;
  beta2ph[2] = beta[2] + h ;
  beta2mh = beta ;
  beta2mh[2] = beta[2] - h ;

  beta12phph = beta ;
  beta12phph[1] = beta[1] + h ;
  beta12phph[2] = beta[2] + h ;

  beta12phmh = beta ;
  beta12phmh[1] = beta[1] + h ;
  beta12phmh[2] = beta[2] - h ;

  beta12mhph = beta ;
  beta12mhph[1] = beta[1] - h ;
  beta12mhph[2] = beta[2] + h ;

  beta12mhmh = beta ;
  beta12mhmh[1] = beta[1] - h ;
  beta12mhmh[2] = beta[2] - h ;

  run loglikeonly(beta, p, x, y, l20h0h) ;

  run loglikeonly(beta1ph, p, x, y, l2ph0h) ;
  run loglikeonly(beta1mh, p, x, y, l2mh0h) ;
  run loglikeonly(beta2ph, p, x, y, l20hph) ;
  run loglikeonly(beta2mh, p, x, y, l20hmh) ;

  run loglikeonly(beta12phph, p, x, y, l2phph) ;
  run loglikeonly(beta12phmh, p, x, y, l2phmh) ;
  run loglikeonly(beta12mhph, p, x, y, l2mhph) ;
  run loglikeonly(beta12mhmh, p, x, y, l2mhmh) ;

  d2l11 = (l2ph0h - 2 * l20h0h + l2mh0h) / h**2 ;
  d2l22 = (l20hph - 2 * l20h0h + l20hmh) / h**2 ;
  d2l12 = (l2phph - l2phmh - l2mhph + l2mhmh) / (4*h*h) ;

  d2lnumer = j(2, 2, 0) ;

  d2lnumer[1, 1] = -d2l11 ;
  d2lnumer[1, 2] = -d2l12 ;
  d2lnumer[2, 1] = -d2l12 ;
  d2lnumer[2, 2] = -d2l22 ;

  covarnumer = inv(d2lnumer) ;

  print "Current value of beta: " beta ;

  print "Numerical Estimate of -Fisher Information: " d2lnumer ;

  print "Numerical Estimate of Covariance Matrix: " covarnumer ;

  beta1 = beta[1] ;
  beta2 = beta[2] ;

  serr1 = sqrt(covarnumer[1, 1]) ;
  serr2 = sqrt(covarnumer[2, 2]) ;

  print "Numerical: beta1, serr1 = " beta1 serr1 ;
  print "Numerical: beta2, serr2 = " beta2 serr2 ;

----------------------------------------------------------------------------------

Note that in this case, h was set equal to 10^(-6).  This is about as small as it is
safe to use with the computer's 64-bit arithmetic, because in the expressions above,
h is squared, meaning that the denominators are of the order of 10^(-12).


/*    Program to compute maximum likelihood estimates ....           */;
/*    Based on Newton-Raphson methods - uses IML                     */;

footnote "program: /home/walleye/john-c/5421/imlml.fisher.sas &sysdate &systime" ;

options linesize = 80 ;

data sim ;

     one = 1 ;
     a = 2 ;
     b = 1 ;
     n = 300 ;
     seed = 20000214 ;

/*   Simulated data from logistic distribution                      */;

     do i = 1 to n ;

        u = 2 * i / n ;
        p = 1 / (1 + exp(-a - b * u)) ;
        r = ranuni(seed) ;
        yi = 0 ;
        if r lt p then yi = 1 ;
        output ;

     end ;

run ;


/*   Logistic analysis of simulated data ......                     */;

proc logistic descending ;
     model yi = u ;
title1 'Proc logistic results ... ' ;
run ;

/*   Begin proc iml ..............................                    */;

title1 'PROC IML Results ...' ;

proc iml ;

  use sim ;                        /* Input the simulated dataset     */;

  read all var {one u} into x ;    /* Read the design matrix into x   */;
  read all var {yi}    into y ;    /* Read the outcome data into y    */;

  p = 2 ;                          /* Set the number of parameters    */;

  beta = {0.00, 0.00} ;            /* Initialize the param. vector    */;


/* -------------------------------------------------------------------*/;
/*   Module which computes loglikelihood and derivatives ...          */;

  start loglike(beta, p, x, y, l, dl, d2l) ;

     n = 300 ;

     l = 0 ;                       /*  Initialize log likelihood ...  */;
     dl  = j(p, 1, 0) ;            /*  Initialize 1st derivatives...  */;
     d2l = j(p, p, 0) ;            /*  Initialize 2nd derivatives...  */;

     do i = 1 to n ;

        xi = x[i,] ;
        yi = y[i] ;
        xibeta = xi * beta ;

        w = exp(-xibeta) ;

/*      The log likelihood for the i-th observation ...               */;

        iloglike = log((1 - yi) * w + yi) - log(1 + w) ;

        l = l + iloglike ;

        do j = 1 to p ;

           xij = xi[j] ;

/*      The jth 1st derivative of the log likelihood for the ith obs      */;

           jdlogl = -(1 - yi) * w * xij / ((1 - yi) * w + yi)
                    + w * xij / (1 + w) ;

           dtemp = dl[j] + jdlogl ;
           dl[j] = dtemp ;

           do k = 1 to p ;

              xik = xi[k] ;

/*      The jkth 2nd derivative of the log likelihood for the ith obs    */;

              jkd2logl = ((1 - yi) * w * xij * xik * ((1 - yi) * w + yi)
                         -(1 - yi) * w * xij * ((1 - yi) * w * xik))/
                          ((1 - yi) * w + yi)**2
                       + (-w * xij * xik * (1 + w) + w * w * xij * xik)/
                          (1 + w)**2 ;

              d2temp = d2l[j, k] ;
              d2l[j, k] = d2temp + jkd2logl ;

           end ;

        end ;

     end ;

 finish loglike ;

/* -------------------------------------------------------------------*/;

  start loglikeonly(beta, p, x, y, l) ;

     n = 300 ;

     l = 0 ;                       /*  Initialize log likelihood ...  */;

     do i = 1 to n ;

        xi = x[i,] ;
        yi = y[i] ;
        xibeta = xi * beta ;

        w = exp(-xibeta) ;

/*      The log likelihood for the i-th observation ...               */;

        iloglike = log((1 - yi) * w + yi) - log(1 + w) ;

        l = l + iloglike ;

     end ;

  finish loglikeonly ;

/* -------------------------------------------------------------------*/;

 eps = 1e-8 ;
 diff = 1 ;

/*  The following do loop stops when the increments in beta are       */;
/*  sufficiently small, or when number of iterations reaches 20       */;

 do iter = 1 to 20 while(diff > eps) ;

    run loglike(beta, p, x, y, l, dl, d2l) ;

    invd2l = inv(d2l) ;
    beta = beta - invd2l * dl ;       /*  The key Newton step ...  */;

    diff = max(abs(invd2l * dl)) ;

    print iter  l  dl  d2l  diff  beta ;

 end ;

 b1 = beta[1] ;
 b2 = beta[2] ;

 serr1 = sqrt(-invd2l[1, 1]) ;
 serr2 = sqrt(-invd2l[2, 2]) ;
 covar = -invd2l ;
 llratio = - 2 * l ;

 print ' -2 * loglikelihood = ' llratio ;

 print 'b1 coeff, std err : ' b1 serr1 ;
 print 'b2 coeff, std err : ' b2 serr2 ;

 print 'covariance matrix : ' covar ;

  h = 1e-6 ;
  beta1ph = beta ;
  beta1ph[1] = beta[1] + h ;
  beta1mh = beta ;
  beta1mh[1] = beta[1] - h ;
  beta2ph = beta ;
  beta2ph[2] = beta[2] + h ;
  beta2mh = beta ;
  beta2mh[2] = beta[2] - h ;

  beta12phph = beta ;
  beta12phph[1] = beta[1] + h ;
  beta12phph[2] = beta[2] + h ;

  beta12phmh = beta ;
  beta12phmh[1] = beta[1] + h ;
  beta12phmh[2] = beta[2] - h ;

  beta12mhph = beta ;
  beta12mhph[1] = beta[1] - h ;
  beta12mhph[2] = beta[2] + h ;

  beta12mhmh = beta ;
  beta12mhmh[1] = beta[1] - h ;
  beta12mhmh[2] = beta[2] - h ;

  run loglikeonly(beta, p, x, y, l20h0h) ;

  run loglikeonly(beta1ph, p, x, y, l2ph0h) ;
  run loglikeonly(beta1mh, p, x, y, l2mh0h) ;
  run loglikeonly(beta2ph, p, x, y, l20hph) ;
  run loglikeonly(beta2mh, p, x, y, l20hmh) ;

  run loglikeonly(beta12phph, p, x, y, l2phph) ;
  run loglikeonly(beta12phmh, p, x, y, l2phmh) ;
  run loglikeonly(beta12mhph, p, x, y, l2mhph) ;
  run loglikeonly(beta12mhmh, p, x, y, l2mhmh) ;

  d2l11 = (l2ph0h - 2 * l20h0h + l2mh0h) / h**2 ;
  d2l22 = (l20hph - 2 * l20h0h + l20hmh) / h**2 ;
  d2l12 = (l2phph - l2phmh - l2mhph + l2mhmh) / (4*h*h) ;

  d2lnumer = j(2, 2, 0) ;

  d2lnumer[1, 1] = -d2l11 ;
  d2lnumer[1, 2] = -d2l12 ;
  d2lnumer[2, 1] = -d2l12 ;
  d2lnumer[2, 2] = -d2l22 ;

  covarnumer = inv(d2lnumer) ;

  print "Current value of beta: " beta ;

  print "Numerical Estimate of -Fisher Information: " d2lnumer ;

  print "Numerical Estimate of Covariance Matrix: " covarnumer ;

  beta1 = beta[1] ;
  beta2 = beta[2] ;

  serr1 = sqrt(covarnumer[1, 1]) ;
  serr2 = sqrt(covarnumer[2, 2]) ;

  print "Numerical: beta1, serr1 = " beta1 serr1 ;
  print "Numerical: beta2, serr2 = " beta2 serr2 ;

quit ;

--------------------------------------------------------------------------------------------------
                           Proc logistic results ...                           1
                                              17:34 Wednesday, November 30, 2011

                             The LOGISTIC Procedure

                               Model Information

                 Data Set                      WORK.SIM        
                 Response Variable             yi              
                 Number of Response Levels     2               
                 Number of Observations        300             
                 Model                         binary logit    
                 Optimization Technique        Fisher's scoring


                                Response Profile
 
                       Ordered                      Total
                         Value           yi     Frequency

                             1            1           278
                             2            0            22

                          Probability modeled is yi=1.


                            Model Convergence Status

                 Convergence criterion (GCONV=1E-8) satisfied.          


                             Model Fit Statistics
 
                                                  Intercept
                                   Intercept         and   
                    Criterion        Only        Covariates

                    AIC              159.306        153.781
                    SC               163.010        161.188
                    -2 Log L         157.306        149.781


                    Testing Global Null Hypothesis: BETA=0
 
            Test                 Chi-Square       DF     Pr > ChiSq

            Likelihood Ratio         7.5255        1         0.0061
            Score                    7.2657        1         0.0070
            Wald                     6.7583        1         0.0093


 
 
 
 
 
 
 
 
           program: /home/walleye/john-c/5421/imlml.fisher.sas 30NOV11 17:34
                           Proc logistic results ...                           2
                                              17:34 Wednesday, November 30, 2011

                             The LOGISTIC Procedure

                   Analysis of Maximum Likelihood Estimates
 
                                     Standard          Wald
      Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

      Intercept     1      1.5917      0.3766       17.8686        <.0001
      u             1      1.1107      0.4272        6.7583        0.0093


                              Odds Ratio Estimates
                                        
                                Point          95% Wald
                   Effect    Estimate      Confidence Limits

                   u            3.036       1.314       7.015


         Association of Predicted Probabilities and Observed Responses

               Percent Concordant     66.7    Somers' D    0.344
               Percent Discordant     32.3    Gamma        0.348
               Percent Tied            1.0    Tau-a        0.047
               Pairs                  6116    c            0.672


                          Estimated Covariance Matrix
 
                     Variable       Intercept             u

                     Intercept        0.14179       -0.1292
                     u                -0.1292       0.18254
 
 
           program: /home/walleye/john-c/5421/imlml.fisher.sas 30NOV11 17:34
                              PROC IML Results ...                             3
                                              17:34 Wednesday, November 30, 2011

          ITER         L        DL       D2L                DIFF      BETA

             1 -207.9442       128       -75    -75.25   1.42466   1.42466
                         135.45333    -75.25 -100.5006           0.2810698


          ITER         L        DL       D2L                DIFF      BETA

             2 -86.11339 24.433947  -39.1126  -36.7227  0.382878 1.6898849
                         27.881783  -36.7227 -47.38327           0.6639477


          ITER         L        DL       D2L                DIFF      BETA

             3 -76.10077 5.4183855  -24.6185 -20.35354 0.3405272 1.6284455
                         7.0927236 -20.35354 -24.50093           1.0044749


          ITER         L        DL       D2L                DIFF      BETA

             4 -74.93779 0.8082239 -20.61113 -15.14088 0.1008562 1.5935698
                         1.1867474 -15.14088 -17.00239           1.1053311


          ITER         L        DL       D2L                DIFF      BETA

             5 -74.89053   0.04008 -19.90169 -14.11319 0.0054776 1.5916993
                         0.0585386 -14.11319 -15.50634           1.1108087


          ITER         L        DL       D2L                DIFF      BETA

             6  -74.8904 0.0001116  -19.8646 -14.05981 0.0000151 1.5916942
                         0.0001618 -14.05981  -15.4291           1.1108238


          ITER         L        DL       D2L                DIFF      BETA

             7  -74.8904  8.55E-10 -19.86449 -14.05967 1.157E-10 1.5916942
                         1.2388E-9 -14.05967 -15.42889           1.1108238


                                                 LLRATIO

                         -2 * loglikelihood =  149.78081


                                                B1     SERR1

                    b1 coeff, std err :  1.5916942  0.376554


 
 
 
 
           program: /home/walleye/john-c/5421/imlml.fisher.sas 30NOV11 17:34
                              PROC IML Results ...                             4
                                              17:34 Wednesday, November 30, 2011

                                                B2     SERR2

                    b2 coeff, std err :  1.1108238 0.4272664


                                             COVAR

                    covariance matrix :  0.1417929  -0.12921
                                          -0.12921 0.1825565


                                                    BETA

                       Current value of beta:  1.5916942
                                               1.1108238


                                                     D2LNUMER

        Numerical Estimate of -Fisher Information:  19.753088 14.061641
                                                    14.061641 15.319301


                                                   COVARNUMER

         Numerical Estimate of Covariance Matrix:   0.1460737 -0.134082
                                                    -0.134082 0.1883511


                                                BETA1     SERR1

                 Numerical: beta1, serr1 =  1.5916942 0.3821959


                                                BETA2     SERR2

                 Numerical: beta2, serr2 =  1.1108238 0.4339943
 
--------------------------------------------------------------------------------------------------

Here is a comparison of the resulting covariance matrices and resulting standard errors of
parameter estimates:

   Estimates  of the Covariance Matrix

   From PROC LOGISTIC                          From -d2l / db2:            From Numerical Approx
                                                                                of -d2l/db2
   --------------------------------------     ----------------------       ---------------------
   Variable       Intercept             u     Intercept       u            Intercept       u

   Intercept        0.14179       -0.1292     0.14179        -0.1291       0.1461        -0.1341
   u                -0.1292       0.18254     -0.1292         0.1856      -0.1341         0.1856
                  -----------------------     ----------------------     -----------------------

   Std Err Intcpt:  0.3766                    0.3766                       0.3822
   Std Err b1 (u):  0.4272                    0.4273                       0.4340


   Conclusion: In this case numerical approximation of 2nd derivatives worked reasonably
well, but it does not agree exactly with the MLE computation based on the analytic
second derivative matrix.

   This method has the advantage that it can be used even in cases where the second
derivative is not used in the estimating program: e.g., when the EM algorithm is used.
All that is needed is the ability to compute the log likelihood of the observed data.


           program: /home/walleye/john-c/5421/imlml.fisher.sas 30NOV11 17:34

==================================================================================
/home/walleye/john-c/5421/notes.028   Last update: November 30, 2011.