APPLICATIONS OF SAS IML: SPH 5421 notes.020
FINDING MAXIMUM LIKELIHOOD ESTIMATES, CONTINUED ...
The program below improves on the program in notes.019 in
the following ways:
1. It estimates the second derivative matrix numerically (note: this
may be less accurate than using an exact expression for the second
derivatives).
2. If the likelihood does not increase from one step to the next,
it initiates a step-halving routine.
3. If convergence did not occur, it prints a warning message.
4. It computes and prints estimated standard errors of
coefficient estimates and other statistics.
=================================================================================
/* Program to compute maximum likelihood estimates .... */;
/* Based on Newton-Raphson methods - uses IML, and approximates */;
/* the second derivative matrix numerically. */;
/* Also incorporates step-halving. */;
footnote "program: /home/walleye/john-c/5421/numml.sas &sysdate &systime" ;
options linesize = 80 ;
data sim ;
one = 1 ;
a = 2 ;
b = 1 ;
c = .5 ;
n = 300 ;
seed = 20000214 ;
/* Simulated data from logistic distribution */;
do i = 1 to n ;
u = 2 * i / n ;
u2 = u*u ;
prob = 1 / (1 + exp(-a - b * u - c * u*u)) ;
r = ranuni(seed) ;
yi = 0 ;
if r lt prob then yi = 1 ;
output ;
end ;
run ;
/* Logistic analysis of simulated data ...... */;
proc logistic descending ;
model yi = u u2 ;
title1 'Proc logistic results ... ' ;
run ;
/* Begin proc iml .............................. */;
proc iml ;
title1 'PROC IML Results ... Maximum likelihood estimation.' ;
title2 'Logistic model.' ;
use sim ; /* Input the simulated dataset */;
read all var {one u u2} into x ; /* Read the design matrix into x */;
read all var {yi} into y ; /* Read the outcome data into y */;
p = 3 ; /* Set the number of parameters */;
/* -------------------------------------------------------------------*/;
/* Module which computes loglikelihood and derivatives ... */;
start loglike(beta, p, x, y, l, dl, d2l, evals) ;
evals = evals + 1 ;
n = 300 ;
h = 1e-6 ;
l = 0 ; /* Initialize log likelihood ... */;
dl = j(p, 1, 0) ; /* Initialize 1st derivatives... */;
d2l = j(p, p, 0) ; /* Initialize 2nd derivatives... */;
do i = 1 to n ;
xi = x[i,] ;
yi = y[i] ;
xibeta = xi * beta ;
w = exp(-xibeta) ;
/* The log likelihood for the i-th observation ... */;
iloglike = log((1 - yi) * w + yi) - log(1 + w) ;
l = l + iloglike ; /* Add up the individual loglikelihoods */
do j = 1 to p ;
xij = xi[j] ;
/* The jth 1st derivative of the log likelihood for the ith obs */;
jdlogl = -(1 - yi) * w * xij / ((1 - yi) * w + yi)
+ w * xij / (1 + w) ;
dl[j] = dl[j] + jdlogl ;
do k = 1 to p ;
xik = xi[k] ;
/* The jkth 2nd derivative of the log likelihood for the ith obs */;
betakp = beta; betakm = beta ;
betakp[k] = beta[k] + h ;
betakm[k] = beta[k] - h ;
wjkp = exp(-xi * betakp) ;
wjkm = exp(-xi * betakm) ;
term1 = -(1 - yi) * wjkp * xij / ((1 - yi) * wjkp + yi)
+ wjkp * xij / (1 + wjkp) ;
term2 = -(1 - yi) * wjkm * xij / ((1 - yi) * wjkm + yi)
+ wjkm * xij / (1 + wjkm) ;
/* */
/* Here the numerical approximation of the 2nd derivs occurs */
/* */
jkd2logl = (term1 - term2) / (2 * h) ;
d2l[j, k] = d2l[j, k] + jkd2logl ;
end ;
end ;
end ;
finish loglike ;
/* -------------------------------------------------------------------*/;
eps = 1e-8 ; /* Tolerance limit for increments */
diff = eps + 1 ; /* Preset diff > eps */
oldl = -1e20 ; /* Preset old likelihood very small */
p = 3 ; /* Set the number of parameters */
beta = {0.00, 0.00, 0.00} ; /* Initialize the param. vector */
evals = 0 ; /* Preset number of function evals */
/* The following do loop stops when the increments in beta are */;
/* sufficiently small, or when number of iterations reaches 20, */;
/* provided the loglikelihood increases. */
do iter = 1 to 20 while (diff > eps) ;
run loglike(beta, p, x, y, l, dl, d2l, evals) ;
invd2l = inv(d2l) ;
factor = 1 ;
tbeta = beta - invd2l * dl ; /* The key Newton step ... */
tl = l ; tdl = dl ; td2l = d2l ;
/* Go into step-halving if the likelihood does not increase ... */
if (l < oldl) then do ;
factor = .5 * factor ;
tl = oldl - .1 * abs(oldl) ;
do halves = 1 to 10 while (tl < oldl) ;
tbeta = beta - factor * invd2l * dl ; /* Step-halving ...*/
run loglike(tbeta, p, x, y, tl, tdl, td2l, evals) ;
factor = .5 * factor ;
print "Step-halving: " iter halves tbeta tl tdl td2l ;
end ;
if (tl < oldl) then do ;
print, "No convergence after " halves "step-halves ..." ;
/* A stop statement could be inserted here. */
end ;
end ;
beta = tbeta ;
diff = max(abs(factor * invd2l * dl)) ;
l = tl ; oldl = tl ; dl = tdl ; d2l = td2l ;
print, iter beta l dl d2l diff ;
end ;
if (diff > eps) then do ;
print, "No convergence after " iter " iterations ..." ;
/* A stop statement could be inserted here also. */
end ;
serr = j(p, 1, 0) ;
do i = 1 to p ;
serr[i] = sqrt(-invd2l[i, i]) ; /* Standard errs = sqrt(diag) ...*/
end ;
minvd2l = -invd2l ;
m2l = -2 * l ;
print, "Number of function evaluations: " evals ;
print, "-2 * loglikelihood: " m2l ;
print, "Estimated Coefficients, standard errors: ", beta serr ;
print, "Estimated covariance matrix of coeffs: ", minvd2l ;
=================================================================================
Proc logistic results ... 1
15:35 Sunday, February 20, 2000
The LOGISTIC Procedure
Data Set: WORK.SIM
Response Variable: YI
Response Levels: 2
Number of Observations: 300
Link Function: Logit
Response Profile
Ordered
Value YI Count
1 1 282
2 0 18
Model Fitting Information and Testing Global Null Hypothesis BETA=0
Intercept
Intercept and
Criterion Only Covariates Chi-Square for Covariates
AIC 138.181 128.866 .
SC 141.884 139.977 .
-2 LOG L 136.181 122.866 13.315 with 2 DF (p=0.0013)
Score . . 13.812 with 2 DF (p=0.0010)
Analysis of Maximum Likelihood Estimates
Parameter Standard Wald Pr > Standardized Odds
Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio
INTERCPT 1 1.4494 0.5449 7.0761 0.0078 . .
U 1 1.5957 1.7217 0.8590 0.3540 0.508763 4.932
U2 1 0.0901 1.0376 0.0075 0.9308 0.059512 1.094
Association of Predicted Probabilities and Observed Responses
Concordant = 74.3% Somers' D = 0.494
Discordant = 25.0% Gamma = 0.497
Tied = 0.7% Tau-a = 0.056
(5076 pairs) c = 0.747
program: /home/walleye/john-c/5421/numml.sas 20FEB00 15:35
PROC IML Results ... Maximum likelihood estimation. 2
15:35 Sunday, February 20, 2000
ITER BETA L DL D2L DIFF
1 1.2739054 -207.9442 132 -75 -75.25 -100.5006 1.2739054
0.7852635 140.77333 -75.25 -100.5006 -151.0017
-0.225213 192.10187 -100.5006 -151.0017 -242.0044
ITER BETA L DL D2L DIFF
2 1.4297536 -77.00134 26.596487 -37.7458 -34.88606 -45.15941 0.8388612
1.6241247 30.65339 -34.88606 -45.15941 -66.89398
-0.414557 43.000281 -45.15941 -66.89398 -106.4329
ITER BETA L DL D2L DIFF
3 1.3946428 -64.04277 6.8221913 -22.29749 -17.30946 -20.72237 0.340524
1.9646487 8.8837897 -17.30946 -20.72237 -29.49548
-0.332 13.103234 -20.72237 -29.49548 -45.86976
ITER BETA L DL D2L DIFF
4 1.425969 -61.76744 1.4434564 -17.50049 -11.2069 -11.88306 0.284514
1.742851 2.1711034 -11.2069 -11.88306 -15.66063
-0.047486 3.4806518 -11.88306 -15.66063 -23.13381
ITER BETA L DL D2L DIFF
5 1.4473961 -61.45224 0.2284664 -16.35777 -9.501674 -9.183819 0.1349817
1.6078693 0.3841724 -9.501674 -9.183819 -11.23903
0.0788801 0.6616089 -9.183819 -11.23903 -15.68377
ITER BETA L DL D2L DIFF
6 1.4493858 -61.433 0.0167499 -16.15417 -9.152353 -8.5867 0.0121209
1.5957484 0.0279566 -9.152353 -8.5867 -10.2298
0.0900068 0.0484398 -8.5867 -10.2298 -13.96175
ITER BETA L DL D2L DIFF
7 1.449399 -61.43289 0.0001118 -16.13821 -9.125007 -8.539712 0.0000797
1.5956686 0.0001848 -9.125007 -8.539712 -10.15037
0.0900803 0.0003188 -8.539712 -10.15037 -13.82647
ITER BETA L DL D2L DIFF
8 1.449399 -61.43289 4.8886E-9 -16.13811 -9.124826 -8.5394 3.4694E-9
1.5956686 8.0761E-9 -9.124826 -8.5394 -10.14984
0.0900803 1.3924E-8 -8.5394 -10.14984 -13.82558
program: /home/walleye/john-c/5421/numml.sas 20FEB00 15:35
PROC IML Results ... Maximum likelihood estimation. 3
Logistic model. 15:35 Sunday, February 20, 2000
EVALS
Number of function evaluations: 8
M2L
-2 * loglikelihood: 122.86577
Estimated Coefficients, standard errors:
BETA SERR
1.449399 0.5448694
1.5956686 1.7217044
0.0900803 1.0375623
Estimated covariance matrix of coeffs:
MINVD2L
0.2968827 -0.779218 0.3886808
-0.779218 2.9642661 -1.694886
0.3886808 -1.694886 1.0765356
program: /home/walleye/john-c/5421/numml.sas 20FEB00 15:35
=================================================================================
The program above, and in fact any program which computes maximum likelihood
estimates, can produce wrong answers. It can converge to a local maximum which is
not the true maximum, or to a saddle point (place where the derivatives are zero but
is not a maximum or minimum). It can simply fail to converge at all, or can converge
to absurd values which cannot be realized biologically. Convergence can depend very
strongly on initial esimates, and some exploratory work may need to be done in
advance (for example, using linear approximations) to provide good initial
estimates.
A possible weakness in the above program is the numerical estimation of the
second derivative matrix. The accuracy of numerical estimates of derivatives
depends critically on the size of the parameter h . In this case, h was set equal
to 1e-6. If it is set much smaller than that, there will be problems with computing
accuracy because there is a limit to how many digits the computing software stores
for each real number. In SAS, all computations are done in double precision, and h
could probably be set to 1e-8 without causing a problem.
Recall that an approximate 1st derivative for a function f(x) is
f'(x) = [f(x + h) - f(x - h)] / (2*h).
Similarly, an approximate 2nd derivative is
f''(x) = [f(x + 2*h) + f(x - 2*h) - 2 * f(x)] / (4 * h*h).
Here the denomator is 4*h*h; thus if h = 1e-6, the denominator is at or near
the limit of accuracy of computation even in double precision. Thus although
numerical approximations to second derivatives can be done, they will be even more
subject to computational error than those for first derivatives.
Another minor weakness in the program occurs at the beginning of the IML
section, with the statement
n = 300 ;
This statement regarding the sample size of the dataset is explicitly coded
into the likelihood function module. It would be better to have the sample size
determined by the dataset itself. This can be done through the use of a macro
variable, to be discussed later.
-----------------------------------------------------------------------
PROBLEM 19.1
Write a maximum likelihood estimation program for data (T, X), where T is a
survival time, and X is a vector of covariates, and the conditional distribution of
Y given X is exponential:
prob(T < t | X) = 1 - exp(-t * (X * beta)), for t > 0.
More explicitly, assume that for the i-th person, the observed survival time is
ti, and Xi = (1, agei)`. That is, the i-th person's survival time depends on that
person's age (at the beginning of the study). In this case, beta = (B0, B1)`. The
exponential CDF of T conditional on X is thus:
prob(T < t | X) = 1 - exp(-t * (B0 + B1 * age)).
The PDF is the derivative of the CDF with respect to t. You will need to
work with the PDF in doing this problem.
The object of the analysis is to estimate B0 and B1. You should also perform
a test of the hypothesis that B1 = 0 as described in the Appendix to notes.019.
Use the following initial estimates: B0 = .2, B1 = .2.
The program may work better if you use the exact expressions for both the
first and second derivatives rather than the numerical approximation.
The test dataset is the following:
OBS AGE T
1 0.0931 5.62994
2 0.6825 6.57191
3 0.8927 2.54954
4 1.5869 2.54606
5 1.6003 1.15370
6 1.6646 1.24921
7 2.0518 0.05831
8 2.0894 0.59835
9 2.2963 0.70921
10 2.3751 0.09628
11 2.4290 2.66394
12 2.8505 2.04750
13 3.9073 0.96704
14 3.9378 0.48032
15 4.5191 0.34689
16 5.5660 0.21520
17 5.6108 0.52901
18 6.1112 0.48802
19 6.3737 0.00484
20 6.6923 0.45718
21 6.8968 0.20422
22 6.9071 0.93855
23 7.2477 0.04297
24 7.5362 0.36811
25 7.7595 0.24867
26 8.1425 0.00611
27 8.7957 0.26880
28 9.5102 0.00134
29 9.7572 0.00631
30 9.8959 0.09441
31 10.2688 0.00195
32 10.3519 0.11509
33 10.6757 0.46864
34 11.6380 0.05539
35 12.1047 0.32790
36 12.3934 0.06539
37 13.1758 0.27457
38 14.0182 0.03294
39 14.4035 0.01287
40 14.6230 0.08795
41 15.1447 0.07662
42 15.5620 0.00006
43 15.6788 0.05757
44 16.3669 0.03507
45 16.4008 0.06219
46 16.5179 0.28215
47 16.8670 0.08094
48 17.9720 0.00258
49 18.2080 0.10972
50 18.6951 0.02063
51 19.3668 0.05690
52 19.4576 0.08034
53 20.0762 0.14633
54 20.3164 0.08896
55 20.6885 0.18534
56 20.8577 0.01693
57 20.8636 0.42841
58 21.2643 0.07275
59 21.7970 0.13027
60 21.8349 0.22166
61 22.4181 0.11848
62 22.9595 0.05473
63 23.0442 0.01807
64 23.3808 0.51152
65 24.7859 0.09666
66 25.0556 0.02394
67 25.1192 0.00634
68 25.1921 0.07799
69 25.5029 0.13645
70 26.8159 0.01525
71 27.2633 0.22272
72 28.2087 0.21042
73 28.2235 0.03343
74 28.5499 0.11511
75 28.8990 0.05993
76 29.4803 0.10167
77 29.8006 0.01807
78 29.8947 0.14480
79 30.2257 0.23459
80 31.4659 0.03437
81 31.8060 0.03392
82 32.6798 0.01267
83 32.9508 0.06466
84 33.0339 0.01766
85 34.4671 0.00846
86 34.4701 0.06582
87 35.0643 0.00761
88 35.2887 0.39833
89 35.5018 0.00137
90 35.8315 0.10726
91 35.9420 0.05192
92 35.9431 0.07865
93 36.1816 0.10730
94 36.4298 0.07252
95 36.4939 0.03098
96 36.6815 0.05515
97 36.9500 0.03848
98 37.5995 0.04957
99 38.4104 0.08323
100 39.6751 0.01132
Problem 19.2: Zero-inflated Poisson
In the following datafile, the observations were generated as follows:
The value of obsn is 0 with probability r + (1 - r) * exp(-lambda), where 0 < r < 1
For k > 0, the value of obsn is k with probability
(1 - r) * (lambda)^k * exp(-lambda) / k!
(where k! means k factorial).
Here r and lambda are both fixed constants. Find the maximum likelihood estimates of
r and lambda and estimate their standard errors. Test the hypothesis that r = 0.
Data:
The SAS System 1
17:42 Wednesday, October 27, 2010
Obs obsn
1 5
2 4
3 0
4 4
5 5
6 5
7 3
8 3
9 3
10 0
11 0
12 2
13 2
14 4
15 6
16 2
17 2
18 4
19 3
20 2
21 2
22 4
23 4
24 3
25 2
26 0
27 0
28 0
29 4
30 6
31 0
32 0
33 5
34 4
35 1
36 2
37 5
38 3
39 4
40 0
41 2
42 1
43 6
44 1
45 2
46 1
47 0
48 0
49 7
50 0
51 4
52 3
53 6
54 2
55 0
56 3
57 5
58 0
59 3
60 1
61 5
62 4
63 2
64 3
65 1
66 4
67 0
68 3
69 4
70 5
71 6
72 4
73 0
74 4
75 7
76 1
77 4
78 2
79 4
80 0
81 5
82 5
83 5
84 3
85 2
86 0
87 5
88 5
89 0
90 3
91 4
92 2
93 4
94 2
95 3
96 3
97 0
98 1
99 0
100 0
~john-c/5421/zeroinfl.poisson.sas 27OCT10 17:42
/home/walleye/john-c/5421/notes.020 Last update: November 14, 2011