~john-c/5421/zeroinflatedpoisson.notes

Below are two programs and their output which analyze parameters for
zero-inflated Poisson data.  Here the assumption is that the data are
from a mixture of two populations: one which yields only 0's as measurements,
and the other where the measurements are Poisson-distributed counts with
mean lambda.

It is assumed that the mixing parameter is alpha, that is, that a
proportion alpha of the measurements come from Population 1 and a
proportion (1 - alpha) come from Population 2.

Note that people who have measurement 0 can come from either of
the two populations: 0 is a possible value for a Poisson distribution.

Thus the likelihood for an observation Y is either:

     alpha + (1 - alpha) * exp(-lambda)          if Y = 0, or

     (1 - alpha) * lambda^Y * exp(-lambda) / Y!  if Y > 0.

The first program is just a Gauss-Newton maximum likelihood program.

The second program is based on the EM algorithm.  This program does
not operate on the original data, but on the summary counts.  The
EM algorithm is usually applied when some data are missing.  A crude
version of the EM idea is to (1) impute the expected values of the missing 
data from an initial set of parameters, to yield 'pseudo-complete' data, 
and (2) compute the next iteration of the parameters by maximizing the
function resulting from Step (1).  These are the 'E' and 'M' steps
respectively.

In this EM example, the missing data are the numbers of people from Population 1
and Population 2 who have measured values of 0.  These numbers are denoted
by xa and xb respectively in the program.  Note that the data give the total
number of people with the measured value of 0, but they do not tell you
how many came from each population.  That is, xa + xb = totalzeroes is
known, but xa and xb themselves are not known.  In the E-step, each of
xa and xb is estimated; in the M-step, parameters alpha and lambda are
chosen to maximize the pseudo-complete likelihood.

Both of these program give the same estimates for the parameters
alpha and lambda.  They also give the same estimates as can be obtained
from PROC GENMOD using an option on the MODEL statement called: dist = zip.

Note that the Gauss-Newton program converged in 11 iterations, while 27
iterations were required for the EM algorithm.  Typically mixture problems
present computational difficulties which often prevent convergence of
Gauss-Newton procedures.  The EM algorithm typically converges but often
very slowly.

----------------------------------------------------------------------------
options linesize = 100 ;
footnote "~john-c/5421/zeroinflatedpoisson.sas &sysdate &systime" ;

*                                                                    ;
*  This program estimates parameters for zero-inflated Poisson using ;
*  a Gauss-Newton algorithm ...                                      ;
*                                                                    ;

%let numobs = 300 ;
%let truealpha = .4 ;
%let truelambda = 2.0 ;
%let seed = 20111118 ;

data zeroinfpoisson ;

     alpha = &truealpha ;
     n = &numobs;
     lambda = &truelambda ;

     do i = 1 to n ;

        r = ranuni(&seed) ;
        if r < alpha then do ;

           count = 0 ;
           output ;

        end ;

        if r >= alpha then do ;

           count = ranpoi(&seed, lambda) ;

           output ;

         end ;

    end ;

run ;

proc freq data = zeroinfpoisson ;
     tables count ;
run ;

proc iml ;

     title1 'PROC IML MLE for zero-inflated Poisson ...' ;
     use zeroinfpoisson ;
     read all var {count} into t ;

     n = nrow(t) ;

     p = 2 ;
     eps = 1e-8 ;
     diff = 99999 ;
     oldl = -1e20 ;
     evals = 0 ;
     beta = {.5, 2.0} ;
     e = j(n, 1, 1) ;


start loglike(n, beta, p, t, l, d1l) ;

   l = 0 ;
   d1l = j(n, p, 0) ;

   do i = 1 to n ;

      ti = t[i] ;
      alpha = beta[1] ;
      lambda = beta[2] ;
      emlambda = exp(-lambda) ;

      if ti = 0 then like = alpha + (1 - alpha) * exp(-lambda) ;
      if ti > 0 then like = (1 - alpha) * (lambda**ti) * exp(-lambda) ;
      if ti = 0 then loglike = log(alpha + (1 - alpha) * exp(-lambda)) ;
      if ti > 0 then loglike = log(1 - alpha) + ti * log(lambda) - lambda ;

      l = l + loglike ;

      if ti = 0 then d1ldalpha = (1 - exp(-lambda)) / (alpha + (1 - alpha) * exp(-lambda)) ;
      if ti > 0 then d1ldalpha = -1 / (1 - alpha) ;

      if ti = 0 then d1ldlambda = -(1 - alpha) * exp(-lambda) / like ;

      if ti > 0 then d1ldlambda = ti / lambda - 1 ;

      d1l[i, 1] = d1ldalpha ;
      d1l[i, 2] = d1ldlambda ;

   end ;

 finish loglike ;

 run loglike(n, beta, p, t, l, d1l) ;
 print "Start values:" beta l ;

 maxderiv = 99999 ;

 do iter = 1 to 50 while (diff > eps & maxderiv > eps) ;

    run loglike(n, beta, p, t, l, d1l) ;
    delta = inv(d1l` * d1l) * (d1l` * e) ;

    oldbeta = beta ;
    beta = beta + delta ;

    diff = max(abs(delta)) ;
    maxderiv = max(abs(d1l` * e)) ;

    print iter oldbeta beta l diff maxderiv ;

 end ;

 if (diff > eps) then do ;

    print "No convergence after " iter " iterations ..." ;

 end ;


 if (diff < eps)  then do ;

    covar = inv(d1l` * d1l) ;
    serrbeta1 = sqrt(covar[1, 1]) ;
    serrbeta2 = sqrt(covar[2, 2]) ;

    print "---------------------------------------------" ;
    print "Number of observations: " &numobs ;
    print "Convergence achieved ..." ;
    print "Final Parameter Estimates: " ;
    beta1hat = beta[1] ; quotient1 = beta1hat / serrbeta1 ;
    beta2hat = beta[2] ; quotient2 = beta2hat / serrbeta2 ;

    print "beta1hat = " beta1hat "  serr(beta1hat) = " serrbeta1 "  beta1hat/serrbeta1 = " quotient1 ;
    print "beta2hat = " beta2hat "  serr(beta2hat) = " serrbeta2 "  beta2hat/serrbeta2 = " quotient2 ;
    print "True values of probzero and lambda = " &truealpha &truelambda ;
    m2logl = -2 * l ;
    print "-2 * log(likelihood) = " m2logl ;
    print "---------------------------------------------" ;

  end ;

quit ;

proc freq data = zeroinfpoisson ;
     tables count ;
run ;

proc genmod data = zeroinfpoisson ;
     model count = / dist = zip ;
     zeromodel / link = logit ;
title 'PROC GENMOD for zero-inflated Poisson ...' ;
run ;


                             PROC IML MLE for zero-inflated Poisson ...                            2
                                                                     18:44 Monday, November 21, 2011

                                                    beta         l

                                 Start values:       0.5 -237.5108
                                                       2          


                         iter   oldbeta      beta         l      diff  maxderiv

                            1       0.5 0.4677629 -237.5108 0.0773609  36.28987
                                      2 2.0773609                              


                         iter   oldbeta      beta         l      diff  maxderiv

                            2 0.4677629 0.4661239 -236.6476 0.0050742 1.1578301
                              2.0773609 2.0722867                              


                         iter   oldbeta      beta         l      diff  maxderiv

                            3 0.4661239 0.4661958 -236.6464  0.000997 0.0455482
                              2.0722867 2.0732837                              


                         iter   oldbeta      beta         l      diff  maxderiv

                            4 0.4661958 0.4661806 -236.6463 0.0001956 0.0088474
                              2.0732837 2.0730881                              


                         iter   oldbeta      beta         l      diff  maxderiv

                            5 0.4661806 0.4661836 -236.6463 0.0000385 0.0017428
                              2.0730881 2.0731266                              


                         iter   oldbeta      beta         l      diff  maxderiv

                            6 0.4661836  0.466183 -236.6463 7.5617E-6 0.0003425
                              2.0731266  2.073119                              


                         iter   oldbeta      beta         l      diff  maxderiv

                            7  0.466183 0.4661831 -236.6463 1.4867E-6 0.0000673
                               2.073119 2.0731205                              


                         iter   oldbeta      beta         l      diff  maxderiv

                            8 0.4661831 0.4661831 -236.6463 2.9229E-7 0.0000132
                              2.0731205 2.0731202                              
 
 
                         ~john-c/5421/zeroinflatedpoisson.sas 21NOV11 18:44
                             PROC IML MLE for zero-inflated Poisson ...                            3
                                                                     18:44 Monday, November 21, 2011

                         iter   oldbeta      beta         l      diff  maxderiv

                            9 0.4661831 0.4661831 -236.6463 5.7465E-8 2.6031E-6
                              2.0731202 2.0731203                              


                         iter   oldbeta      beta         l      diff  maxderiv

                           10 0.4661831 0.4661831 -236.6463 1.1298E-8 5.1177E-7
                              2.0731203 2.0731203                              


                         iter   oldbeta      beta         l      diff  maxderiv

                           11 0.4661831 0.4661831 -236.6463 2.2212E-9 1.0062E-7
                              2.0731203 2.0731203                              


                           ---------------------------------------------


                                 Number of observations:        300


                                      Convergence achieved ...


                                    Final Parameter Estimates: 


                    beta1hat                     serrbeta1                         quotient1

       beta1hat =  0.4661831   serr(beta1hat) =  0.0348685   beta1hat/serrbeta1 =  13.369734


                    beta2hat                     serrbeta2                         quotient2

       beta2hat =  2.0731203   serr(beta2hat) =   0.148579   beta2hat/serrbeta2 =  13.952982


                     True values of probzero and lambda =        0.4         2


                                                            m2logl

                                 -2 * log(likelihood) =  473.29268


                           ---------------------------------------------
 
 
                         ~john-c/5421/zeroinflatedpoisson.sas 21NOV11 18:44
                             PROC IML MLE for zero-inflated Poisson ...                            4
                                                                     18:44 Monday, November 21, 2011

                                         The FREQ Procedure

                                                       Cumulative    Cumulative
                     count    Frequency     Percent     Frequency      Percent
                     ----------------------------------------------------------
                         0         160       53.33           160        53.33  
                         1          37       12.33           197        65.67  
                         2          46       15.33           243        81.00  
                         3          36       12.00           279        93.00  
                         4          11        3.67           290        96.67  
                         5           9        3.00           299        99.67  
                         6           1        0.33           300       100.00  
 
 
                         ~john-c/5421/zeroinflatedpoisson.sas 21NOV11 18:44
                             PROC GENMOD for zero-inflated Poisson ...                             5
                                                                     18:44 Monday, November 21, 2011

                                        The GENMOD Procedure

                                         Model Information

                            Data Set                WORK.ZEROINFPOISSON
                            Distribution          Zero Inflated Poisson
                            Link Function                           Log
                            Dependent Variable                    count


                              Number of Observations Read         300
                              Number of Observations Used         300


                               Criteria For Assessing Goodness Of Fit
 
                  Criterion                     DF           Value        Value/DF

                  Deviance                                835.3194                
                  Scaled Deviance                         835.3194                
                  Pearson Chi-Square           298        282.4116          0.9477
                  Scaled Pearson X2            298        282.4116          0.9477
                  Log Likelihood                         -236.6463                
                  Full Log Likelihood                    -417.6597                
                  AIC (smaller is better)                 839.3194                
                  AICC (smaller is better)                839.3598                
                  BIC (smaller is better)                 846.7270                


            Algorithm converged.                                                       


                        Analysis Of Maximum Likelihood Parameter Estimates
 
                                   Standard     Wald 95% Confidence          Wald
    Parameter    DF    Estimate       Error           Limits           Chi-Square    Pr > ChiSq

    Intercept     1      0.7291      0.0655      0.6006      0.8575        123.82        <.0001
    Scale         0      1.0000      0.0000      1.0000      1.0000                            

NOTE: The scale parameter was held fixed.


                 Analysis Of Maximum Likelihood Zero Inflation Parameter Estimates
 
                                   Standard     Wald 95% Confidence          Wald
    Parameter    DF    Estimate       Error           Limits           Chi-Square    Pr > ChiSq

    Intercept     1     -0.1355      0.1389     -0.4077      0.1367          0.95        0.3293
 
 
                         ~john-c/5421/zeroinflatedpoisson.sas 21NOV11 18:44

-------------------------------------------------------------------------------------------------


options linesize = 150;
footnote "~john-c/5421/sasprog.sas &sysdate &systime" ;

* This is an EM-algorithm approach to estimating parameters   ;
* for zero-inflated Poisson data.                             ;


proc iml ;

     counts = {160, 37, 46, 36, 11, 9, 1} ;
     values = {0, 1, 2, 3, 4, 5, 6} ;

     p = .5 ;
     lambda = 0.8  ;

     print "Initial estimates of p, lambda: " p lambda ;

     y0 = counts[1] ;

     do i = 1 to 50 ;
        xa = counts[1] * p / (p + (1 - p) * exp(-lambda)) ;
        xb = counts[1] - xa ;

        print "Step " i " E-step: XA, XB: " xa xb ;

        numvalues = max(values) ;
        sumky = 0 ;
        sumy = xb ;

        loglikeobs = counts[1] * log(p + (1 - p)*exp(-lambda)) ;
        loglikecomplete = xa * log(p) + xb * (log(1 - p) - lambda) ;

        do j = 2 to numvalues + 1 ;
           sumky = sumky + values[j] * counts[j] ;
           sumy = sumy + counts[j] ;
           loglikeobs = loglikeobs + counts[j] * (log(1 - p) + (j - 1) * log(lambda) - lambda) ;
           loglikecomplete = loglikecomplete + counts[j] * (log(1 - p) + (j - 1) * log(lambda) - lambda) ;

        end ;

        p = xa / (xa + sumy) ;

        lambda = sumky / sumy ;

        print "Step " i " M-step: p, lambda, loglikeobs loglikecomplete: " p lambda loglikeobs loglikecomplete ;

     end ;

quit ;
                                                                    The SAS System                                 17:47 Monday, November 21, 2011   1


                                                                                          P    LAMBDA

                                                 Initial estimates of p, lambda:        0.5       0.8


                                                              I                          XA        XB

                                                Step          1  E-step: XA, XB:  110.39592 49.604083


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step          1  M-step: p, lambda, loglikeobs loglikecomplete:  0.3679864 1.7510171  -334.6517       -433.7111


                                                              I                          XA        XB

                                                Step          2  E-step: XA, XB:  123.25214  36.74786


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step          2  M-step: p, lambda, loglikeobs loglikecomplete:  0.4108405  1.878382  -241.5981       -327.8193


                                                              I                          XA        XB

                                                Step          3  E-step: XA, XB:  131.23641 28.763591


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step          3  M-step: p, lambda, loglikeobs loglikecomplete:  0.4374547  1.967249  -238.3656       -313.7333


                                                              I                          XA        XB

                                                Step          4  E-step: XA, XB:  135.61291 24.387095


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step          4  M-step: p, lambda, loglikeobs loglikecomplete:   0.452043 2.0196233  -237.1374       -305.4387


                                                              I                          XA        XB

                                                Step          5  E-step: XA, XB:  137.82853 22.171474


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step          5  M-step: p, lambda, loglikeobs loglikecomplete:  0.4594284 2.0472158  -236.7693       -301.1473
 
 
                                                        ~john-c/5421/sasprog.sas 21NOV11 17:47
                                                                    The SAS System                                 17:47 Monday, November 21, 2011   2

                                                              I                          XA        XB

                                                Step          6  E-step: XA, XB:  138.90166  21.09834


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step          6  M-step: p, lambda, loglikeobs loglikecomplete:  0.4630055  2.060853  -236.6749       -299.0614


                                                              I                          XA        XB

                                                Step          7  E-step: XA, XB:  139.40983  20.59017


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step          7  M-step: p, lambda, loglikeobs loglikecomplete:  0.4646994 2.0673744  -236.6527       -298.0745


                                                              I                          XA        XB

                                                Step          8  E-step: XA, XB:  139.64784  20.35216


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step          8  M-step: p, lambda, loglikeobs loglikecomplete:  0.4654928  2.070443  -236.6477       -297.6127


                                                              I                          XA        XB

                                                Step          9  E-step: XA, XB:  139.75874 20.241263


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step          9  M-step: p, lambda, loglikeobs loglikecomplete:  0.4658625 2.0718758  -236.6466       -297.3977


                                                              I                          XA        XB

                                                Step         10  E-step: XA, XB:  139.81028 20.189718


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step         10  M-step: p, lambda, loglikeobs loglikecomplete:  0.4660343 2.0725425  -236.6464       -297.2978


                                                              I                          XA        XB

                                                Step         11  E-step: XA, XB:  139.83421 20.165788
 
 
                                                        ~john-c/5421/sasprog.sas 21NOV11 17:47
                                                                    The SAS System                                 17:47 Monday, November 21, 2011   3

                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step         11  M-step: p, lambda, loglikeobs loglikecomplete:   0.466114 2.0728522  -236.6464       -297.2514


                                                              I                          XA        XB

                                                Step         12  E-step: XA, XB:  139.84532 20.154684


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step         12  M-step: p, lambda, loglikeobs loglikecomplete:  0.4661511 2.0729959  -236.6463       -297.2299


                                                              I                          XA        XB

                                                Step         13  E-step: XA, XB:  139.85047 20.149532


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step         13  M-step: p, lambda, loglikeobs loglikecomplete:  0.4661682 2.0730626  -236.6463       -297.2199


                                                              I                          XA        XB

                                                Step         14  E-step: XA, XB:  139.85286 20.147143


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step         14  M-step: p, lambda, loglikeobs loglikecomplete:  0.4661762 2.0730935  -236.6463       -297.2153


                                                              I                          XA        XB

                                                Step         15  E-step: XA, XB:  139.85397 20.146034


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step         15  M-step: p, lambda, loglikeobs loglikecomplete:  0.4661799 2.0731078  -236.6463       -297.2131


                                                              I                          XA        XB

                                                Step         16  E-step: XA, XB:  139.85448  20.14552


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step         16  M-step: p, lambda, loglikeobs loglikecomplete:  0.4661816 2.0731145  -236.6463       -297.2121
 
 
                                                        ~john-c/5421/sasprog.sas 21NOV11 17:47
                                                                    The SAS System                                 17:47 Monday, November 21, 2011   4

                                                              I                          XA        XB

                                                Step         17  E-step: XA, XB:  139.85472 20.145282


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step         17  M-step: p, lambda, loglikeobs loglikecomplete:  0.4661824 2.0731176  -236.6463       -297.2117


                                                              I                          XA        XB

                                                Step         18  E-step: XA, XB:  139.85483 20.145171


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step         18  M-step: p, lambda, loglikeobs loglikecomplete:  0.4661828  2.073119  -236.6463       -297.2115


                                                              I                          XA        XB

                                                Step         19  E-step: XA, XB:  139.85488  20.14512


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step         19  M-step: p, lambda, loglikeobs loglikecomplete:  0.4661829 2.0731197  -236.6463       -297.2114


                                                              I                          XA        XB

                                                Step         20  E-step: XA, XB:   139.8549 20.145096


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step         20  M-step: p, lambda, loglikeobs loglikecomplete:   0.466183   2.07312  -236.6463       -297.2113


                                                              I                          XA        XB

                                                Step         21  E-step: XA, XB:  139.85491 20.145085


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step         21  M-step: p, lambda, loglikeobs loglikecomplete:   0.466183 2.0731201  -236.6463       -297.2113


                                                              I                          XA        XB

                                                Step         22  E-step: XA, XB:  139.85492  20.14508
 
 
                                                        ~john-c/5421/sasprog.sas 21NOV11 17:47
                                                                    The SAS System                                 17:47 Monday, November 21, 2011   5

                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step         22  M-step: p, lambda, loglikeobs loglikecomplete:  0.4661831 2.0731202  -236.6463       -297.2113


                                                              I                          XA        XB

                                                Step         23  E-step: XA, XB:  139.85492 20.145078


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step         23  M-step: p, lambda, loglikeobs loglikecomplete:  0.4661831 2.0731202  -236.6463       -297.2113


                                                              I                          XA        XB

                                                Step         24  E-step: XA, XB:  139.85492 20.145077


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step         24  M-step: p, lambda, loglikeobs loglikecomplete:  0.4661831 2.0731202  -236.6463       -297.2113


                                                              I                          XA        XB

                                                Step         25  E-step: XA, XB:  139.85492 20.145076


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step         25  M-step: p, lambda, loglikeobs loglikecomplete:  0.4661831 2.0731202  -236.6463       -297.2113


                                                              I                          XA        XB

                                                Step         26  E-step: XA, XB:  139.85492 20.145076


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step         26  M-step: p, lambda, loglikeobs loglikecomplete:  0.4661831 2.0731203  -236.6463       -297.2113


                                                              I                          XA        XB

                                                Step         27  E-step: XA, XB:  139.85492 20.145076


                                 I                                                          P    LAMBDA LOGLIKEOBS LOGLIKECOMPLETE

                   Step         27  M-step: p, lambda, loglikeobs loglikecomplete:  0.4661831 2.0731203  -236.6463       -297.2113
 
 
                                                        ~john-c/5421/sasprog.sas 21NOV11 17:47


zeroinflatedpoisson.notes  Most recent update: November 21, 2011.