APPLICATIONS OF SAS IML: SPH 5421 notes.020 FINDING MAXIMUM LIKELIHOOD ESTIMATES, CONTINUED ... The program below improves on the program in notes.019 in the following ways: 1. It estimates the second derivative matrix numerically (note: this may be less accurate than using an exact expression for the second derivatives). 2. If the likelihood does not increase from one step to the next, it initiates a step-halving routine. 3. If convergence did not occur, it prints a warning message. 4. It computes and prints estimated standard errors of coefficient estimates and other statistics. ================================================================================= /* Program to compute maximum likelihood estimates .... */; /* Based on Newton-Raphson methods - uses IML, and approximates */; /* the second derivative matrix numerically. */; /* Also incorporates step-halving. */; footnote "program: /home/walleye/john-c/5421/numml.sas &sysdate &systime" ; options linesize = 80 ; data sim ; one = 1 ; a = 2 ; b = 1 ; c = .5 ; n = 300 ; seed = 20000214 ; /* Simulated data from logistic distribution */; do i = 1 to n ; u = 2 * i / n ; u2 = u*u ; prob = 1 / (1 + exp(-a - b * u - c * u*u)) ; r = ranuni(seed) ; yi = 0 ; if r lt prob then yi = 1 ; output ; end ; run ; /* Logistic analysis of simulated data ...... */; proc logistic descending ; model yi = u u2 ; title1 'Proc logistic results ... ' ; run ; /* Begin proc iml .............................. */; proc iml ; title1 'PROC IML Results ... Maximum likelihood estimation.' ; title2 'Logistic model.' ; use sim ; /* Input the simulated dataset */; read all var {one u u2} into x ; /* Read the design matrix into x */; read all var {yi} into y ; /* Read the outcome data into y */; p = 3 ; /* Set the number of parameters */; /* -------------------------------------------------------------------*/; /* Module which computes loglikelihood and derivatives ... */; start loglike(beta, p, x, y, l, dl, d2l, evals) ; evals = evals + 1 ; n = 300 ; h = 1e-6 ; l = 0 ; /* Initialize log likelihood ... */; dl = j(p, 1, 0) ; /* Initialize 1st derivatives... */; d2l = j(p, p, 0) ; /* Initialize 2nd derivatives... */; do i = 1 to n ; xi = x[i,] ; yi = y[i] ; xibeta = xi * beta ; w = exp(-xibeta) ; /* The log likelihood for the i-th observation ... */; iloglike = log((1 - yi) * w + yi) - log(1 + w) ; l = l + iloglike ; /* Add up the individual loglikelihoods */ do j = 1 to p ; xij = xi[j] ; /* The jth 1st derivative of the log likelihood for the ith obs */; jdlogl = -(1 - yi) * w * xij / ((1 - yi) * w + yi) + w * xij / (1 + w) ; dl[j] = dl[j] + jdlogl ; do k = 1 to p ; xik = xi[k] ; /* The jkth 2nd derivative of the log likelihood for the ith obs */; betakp = beta; betakm = beta ; betakp[k] = beta[k] + h ; betakm[k] = beta[k] - h ; wjkp = exp(-xi * betakp) ; wjkm = exp(-xi * betakm) ; term1 = -(1 - yi) * wjkp * xij / ((1 - yi) * wjkp + yi) + wjkp * xij / (1 + wjkp) ; term2 = -(1 - yi) * wjkm * xij / ((1 - yi) * wjkm + yi) + wjkm * xij / (1 + wjkm) ; /* */ /* Here the numerical approximation of the 2nd derivs occurs */ /* */ jkd2logl = (term1 - term2) / (2 * h) ; d2l[j, k] = d2l[j, k] + jkd2logl ; end ; end ; end ; finish loglike ; /* -------------------------------------------------------------------*/; eps = 1e-8 ; /* Tolerance limit for increments */ diff = eps + 1 ; /* Preset diff > eps */ oldl = -1e20 ; /* Preset old likelihood very small */ p = 3 ; /* Set the number of parameters */ beta = {0.00, 0.00, 0.00} ; /* Initialize the param. vector */ evals = 0 ; /* Preset number of function evals */ /* The following do loop stops when the increments in beta are */; /* sufficiently small, or when number of iterations reaches 20, */; /* provided the loglikelihood increases. */ do iter = 1 to 20 while (diff > eps) ; run loglike(beta, p, x, y, l, dl, d2l, evals) ; invd2l = inv(d2l) ; factor = 1 ; tbeta = beta - invd2l * dl ; /* The key Newton step ... */ tl = l ; tdl = dl ; td2l = d2l ; /* Go into step-halving if the likelihood does not increase ... */ if (l < oldl) then do ; factor = .5 * factor ; tl = oldl - .1 * abs(oldl) ; do halves = 1 to 10 while (tl < oldl) ; tbeta = beta - factor * invd2l * dl ; /* Step-halving ...*/ run loglike(tbeta, p, x, y, tl, tdl, td2l, evals) ; factor = .5 * factor ; print "Step-halving: " iter halves tbeta tl tdl td2l ; end ; if (tl < oldl) then do ; print, "No convergence after " halves "step-halves ..." ; /* A stop statement could be inserted here. */ end ; end ; beta = tbeta ; diff = max(abs(factor * invd2l * dl)) ; l = tl ; oldl = tl ; dl = tdl ; d2l = td2l ; print, iter beta l dl d2l diff ; end ; if (diff > eps) then do ; print, "No convergence after " iter " iterations ..." ; /* A stop statement could be inserted here also. */ end ; serr = j(p, 1, 0) ; do i = 1 to p ; serr[i] = sqrt(-invd2l[i, i]) ; /* Standard errs = sqrt(diag) ...*/ end ; minvd2l = -invd2l ; m2l = -2 * l ; print, "Number of function evaluations: " evals ; print, "-2 * loglikelihood: " m2l ; print, "Estimated Coefficients, standard errors: ", beta serr ; print, "Estimated covariance matrix of coeffs: ", minvd2l ; ================================================================================= Proc logistic results ... 1 15:35 Sunday, February 20, 2000 The LOGISTIC Procedure Data Set: WORK.SIM Response Variable: YI Response Levels: 2 Number of Observations: 300 Link Function: Logit Response Profile Ordered Value YI Count 1 1 282 2 0 18 Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 138.181 128.866 . SC 141.884 139.977 . -2 LOG L 136.181 122.866 13.315 with 2 DF (p=0.0013) Score . . 13.812 with 2 DF (p=0.0010) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT 1 1.4494 0.5449 7.0761 0.0078 . . U 1 1.5957 1.7217 0.8590 0.3540 0.508763 4.932 U2 1 0.0901 1.0376 0.0075 0.9308 0.059512 1.094 Association of Predicted Probabilities and Observed Responses Concordant = 74.3% Somers' D = 0.494 Discordant = 25.0% Gamma = 0.497 Tied = 0.7% Tau-a = 0.056 (5076 pairs) c = 0.747 program: /home/walleye/john-c/5421/numml.sas 20FEB00 15:35 PROC IML Results ... Maximum likelihood estimation. 2 15:35 Sunday, February 20, 2000 ITER BETA L DL D2L DIFF 1 1.2739054 -207.9442 132 -75 -75.25 -100.5006 1.2739054 0.7852635 140.77333 -75.25 -100.5006 -151.0017 -0.225213 192.10187 -100.5006 -151.0017 -242.0044 ITER BETA L DL D2L DIFF 2 1.4297536 -77.00134 26.596487 -37.7458 -34.88606 -45.15941 0.8388612 1.6241247 30.65339 -34.88606 -45.15941 -66.89398 -0.414557 43.000281 -45.15941 -66.89398 -106.4329 ITER BETA L DL D2L DIFF 3 1.3946428 -64.04277 6.8221913 -22.29749 -17.30946 -20.72237 0.340524 1.9646487 8.8837897 -17.30946 -20.72237 -29.49548 -0.332 13.103234 -20.72237 -29.49548 -45.86976 ITER BETA L DL D2L DIFF 4 1.425969 -61.76744 1.4434564 -17.50049 -11.2069 -11.88306 0.284514 1.742851 2.1711034 -11.2069 -11.88306 -15.66063 -0.047486 3.4806518 -11.88306 -15.66063 -23.13381 ITER BETA L DL D2L DIFF 5 1.4473961 -61.45224 0.2284664 -16.35777 -9.501674 -9.183819 0.1349817 1.6078693 0.3841724 -9.501674 -9.183819 -11.23903 0.0788801 0.6616089 -9.183819 -11.23903 -15.68377 ITER BETA L DL D2L DIFF 6 1.4493858 -61.433 0.0167499 -16.15417 -9.152353 -8.5867 0.0121209 1.5957484 0.0279566 -9.152353 -8.5867 -10.2298 0.0900068 0.0484398 -8.5867 -10.2298 -13.96175 ITER BETA L DL D2L DIFF 7 1.449399 -61.43289 0.0001118 -16.13821 -9.125007 -8.539712 0.0000797 1.5956686 0.0001848 -9.125007 -8.539712 -10.15037 0.0900803 0.0003188 -8.539712 -10.15037 -13.82647 ITER BETA L DL D2L DIFF 8 1.449399 -61.43289 4.8886E-9 -16.13811 -9.124826 -8.5394 3.4694E-9 1.5956686 8.0761E-9 -9.124826 -8.5394 -10.14984 0.0900803 1.3924E-8 -8.5394 -10.14984 -13.82558 program: /home/walleye/john-c/5421/numml.sas 20FEB00 15:35 PROC IML Results ... Maximum likelihood estimation. 3 Logistic model. 15:35 Sunday, February 20, 2000 EVALS Number of function evaluations: 8 M2L -2 * loglikelihood: 122.86577 Estimated Coefficients, standard errors: BETA SERR 1.449399 0.5448694 1.5956686 1.7217044 0.0900803 1.0375623 Estimated covariance matrix of coeffs: MINVD2L 0.2968827 -0.779218 0.3886808 -0.779218 2.9642661 -1.694886 0.3886808 -1.694886 1.0765356 program: /home/walleye/john-c/5421/numml.sas 20FEB00 15:35 ================================================================================= The program above, and in fact any program which computes maximum likelihood estimates, can produce wrong answers. It can converge to a local maximum which is not the true maximum, or to a saddle point (place where the derivatives are zero but is not a maximum or minimum). It can simply fail to converge at all, or can converge to absurd values which cannot be realized biologically. Convergence can depend very strongly on initial esimates, and some exploratory work may need to be done in advance (for example, using linear approximations) to provide good initial estimates. A possible weakness in the above program is the numerical estimation of the second derivative matrix. The accuracy of numerical estimates of derivatives depends critically on the size of the parameter h . In this case, h was set equal to 1e-6. If it is set much smaller than that, there will be problems with computing accuracy because there is a limit to how many digits the computing software stores for each real number. In SAS, all computations are done in double precision, and h could probably be set to 1e-8 without causing a problem. Recall that an approximate 1st derivative for a function f(x) is f'(x) = [f(x + h) - f(x - h)] / (2*h). Similarly, an approximate 2nd derivative is f''(x) = [f(x + 2*h) + f(x - 2*h) - 2 * f(x)] / (4 * h*h). Here the denomator is 4*h*h; thus if h = 1e-6, the denominator is at or near the limit of accuracy of computation even in double precision. Thus although numerical approximations to second derivatives can be done, they will be even more subject to computational error than those for first derivatives. Another minor weakness in the program occurs at the beginning of the IML section, with the statement n = 300 ; This statement regarding the sample size of the dataset is explicitly coded into the likelihood function module. It would be better to have the sample size determined by the dataset itself. This can be done through the use of a macro variable, to be discussed later. ----------------------------------------------------------------------- PROBLEM 19.1 Write a maximum likelihood estimation program for data (T, X), where T is a survival time, and X is a vector of covariates, and the conditional distribution of Y given X is exponential: prob(T < t | X) = 1 - exp(-t * (X * beta)), for t > 0. More explicitly, assume that for the i-th person, the observed survival time is ti, and Xi = (1, agei)`. That is, the i-th person's survival time depends on that person's age (at the beginning of the study). In this case, beta = (B0, B1)`. The exponential CDF of T conditional on X is thus: prob(T < t | X) = 1 - exp(-t * (B0 + B1 * age)). The PDF is the derivative of the CDF with respect to t. You will need to work with the PDF in doing this problem. The object of the analysis is to estimate B0 and B1. You should also perform a test of the hypothesis that B1 = 0 as described in the Appendix to notes.019. Use the following initial estimates: B0 = .2, B1 = .2. The program may work better if you use the exact expressions for both the first and second derivatives rather than the numerical approximation. The test dataset is the following: OBS AGE T 1 0.0931 5.62994 2 0.6825 6.57191 3 0.8927 2.54954 4 1.5869 2.54606 5 1.6003 1.15370 6 1.6646 1.24921 7 2.0518 0.05831 8 2.0894 0.59835 9 2.2963 0.70921 10 2.3751 0.09628 11 2.4290 2.66394 12 2.8505 2.04750 13 3.9073 0.96704 14 3.9378 0.48032 15 4.5191 0.34689 16 5.5660 0.21520 17 5.6108 0.52901 18 6.1112 0.48802 19 6.3737 0.00484 20 6.6923 0.45718 21 6.8968 0.20422 22 6.9071 0.93855 23 7.2477 0.04297 24 7.5362 0.36811 25 7.7595 0.24867 26 8.1425 0.00611 27 8.7957 0.26880 28 9.5102 0.00134 29 9.7572 0.00631 30 9.8959 0.09441 31 10.2688 0.00195 32 10.3519 0.11509 33 10.6757 0.46864 34 11.6380 0.05539 35 12.1047 0.32790 36 12.3934 0.06539 37 13.1758 0.27457 38 14.0182 0.03294 39 14.4035 0.01287 40 14.6230 0.08795 41 15.1447 0.07662 42 15.5620 0.00006 43 15.6788 0.05757 44 16.3669 0.03507 45 16.4008 0.06219 46 16.5179 0.28215 47 16.8670 0.08094 48 17.9720 0.00258 49 18.2080 0.10972 50 18.6951 0.02063 51 19.3668 0.05690 52 19.4576 0.08034 53 20.0762 0.14633 54 20.3164 0.08896 55 20.6885 0.18534 56 20.8577 0.01693 57 20.8636 0.42841 58 21.2643 0.07275 59 21.7970 0.13027 60 21.8349 0.22166 61 22.4181 0.11848 62 22.9595 0.05473 63 23.0442 0.01807 64 23.3808 0.51152 65 24.7859 0.09666 66 25.0556 0.02394 67 25.1192 0.00634 68 25.1921 0.07799 69 25.5029 0.13645 70 26.8159 0.01525 71 27.2633 0.22272 72 28.2087 0.21042 73 28.2235 0.03343 74 28.5499 0.11511 75 28.8990 0.05993 76 29.4803 0.10167 77 29.8006 0.01807 78 29.8947 0.14480 79 30.2257 0.23459 80 31.4659 0.03437 81 31.8060 0.03392 82 32.6798 0.01267 83 32.9508 0.06466 84 33.0339 0.01766 85 34.4671 0.00846 86 34.4701 0.06582 87 35.0643 0.00761 88 35.2887 0.39833 89 35.5018 0.00137 90 35.8315 0.10726 91 35.9420 0.05192 92 35.9431 0.07865 93 36.1816 0.10730 94 36.4298 0.07252 95 36.4939 0.03098 96 36.6815 0.05515 97 36.9500 0.03848 98 37.5995 0.04957 99 38.4104 0.08323 100 39.6751 0.01132 Problem 19.2: Zero-inflated Poisson In the following datafile, the observations were generated as follows: The value of obsn is 0 with probability r + (1 - r) * exp(-lambda), where 0 < r < 1 For k > 0, the value of obsn is k with probability (1 - r) * (lambda)^k * exp(-lambda) / k! (where k! means k factorial). Here r and lambda are both fixed constants. Find the maximum likelihood estimates of r and lambda and estimate their standard errors. Test the hypothesis that r = 0. Data: The SAS System 1 17:42 Wednesday, October 27, 2010 Obs obsn 1 5 2 4 3 0 4 4 5 5 6 5 7 3 8 3 9 3 10 0 11 0 12 2 13 2 14 4 15 6 16 2 17 2 18 4 19 3 20 2 21 2 22 4 23 4 24 3 25 2 26 0 27 0 28 0 29 4 30 6 31 0 32 0 33 5 34 4 35 1 36 2 37 5 38 3 39 4 40 0 41 2 42 1 43 6 44 1 45 2 46 1 47 0 48 0 49 7 50 0 51 4 52 3 53 6 54 2 55 0 56 3 57 5 58 0 59 3 60 1 61 5 62 4 63 2 64 3 65 1 66 4 67 0 68 3 69 4 70 5 71 6 72 4 73 0 74 4 75 7 76 1 77 4 78 2 79 4 80 0 81 5 82 5 83 5 84 3 85 2 86 0 87 5 88 5 89 0 90 3 91 4 92 2 93 4 94 2 95 3 96 3 97 0 98 1 99 0 100 0 ~john-c/5421/zeroinfl.poisson.sas 27OCT10 17:42 /home/walleye/john-c/5421/notes.020 Last update: November 14, 2011