NUMERICAL INTEGRATION SPH 5421 notes.030 An important reason for doing numerical integration in statistics is the fact that computation of Bayesian posterior distributions requires integration of the likelihood function over the prior distribution. (This will be explained in more detail later.) In general, the integrals involved cannot be evaluated analytically. Statistical packages such as SAS, SPSSX, and BMDP do not provide methods for numerical integration, and this may explain why Bayesian methods are not as widely used in biostatistics as they could be and possibly should be. It is interesting to note, however, that the new SAS procedure PROC NLMIXED does perform evaluation of the likelihood function by integrating out random effects. The main method used by NLMIXED is adaptive Gaussian quadrature (cf. Pinheiro JC, Bates DM, "Approximations to the log-likelihood function in the nonlinear mixed-effects model", J. Comp. & Graphical Stats., 4, 12-35, 1995). Much of the material in this section is based on the methods discussed in Chapter 4 of the book Numerical Recipes: The Art of Scientific Computing, by Press, Flannery et al.. Only one-dimensional integrals will be discussed here. The Riemann integral of a continuous function f(x) over an interval [a, b] by definition is a limit of sums of the form [1] SUMi(f(xi) * di), where x1 < x2 < ... < xi < ... xn are points in the interval [a, b], and di is the distance between the i-th and (i+1)-th points, with the limit being taken as the maximum of the di goes to zero. This can be thought of basically as a process of adding up the areas of a collection of rectangles of width di and height equal to the value of the function at the left-hand endpoint xi of the subinterval [xi, xi+1]. It is simple to write a program to compute sums like that in [1]. Usually the points {xi} are taken to be equally spaced with a common distance between successive points of d. However for most functions, [1] is not very efficient. The interval [a, b] will need to be subdivided into a great many points to give a good approximation to the true integral. It is possible to make dramatic improvements in the efficiency. The first improvement is the 'trapezoid rule', in which the rectangles involved in the sum [1] are replaced by trapezoids. The area of a trapezoid is equal to the average of the heights at the two ends multiplied by the width. In terms of the summands, this is .5 * (f(xi) + f(xi+1)) * di. The following PROC IML program is based on the trapezoid rule. The function being integrated here is cos(x). An iterative procedure is used, with a criterion for convergence being agreement of the estimated integral between successive iterations to within eps = 1e-6. The function is integrated over the interval [a, b] = [0, 1]. The true integral is sin(1) = .841471. The program convergences to an answer which is correct to within .000001 in 9 iterations, requiring a total of 513 function evaluations. ================================================================================== options linesize = 80 mprint ; footnote "~john-c/5421/integ.sas &sysdate &systime" ; /* Proc iml program to perform numerical integration ... */ /* Uses just the trapezoid rule. */ proc iml ; start func(x, fvalue, evals) ; fvalue = cos(x) ; evals = evals + 1 ; finish func ; eps = 1e-6 ; jmax = 15 ; reldiff = eps + 1 ; a = 0 ; b = 1 ; evals = 0 ; run func(a, fa, evals) ; run func(b, fb, evals) ; newint = .5 * (b - a) * (fa + fb) ; oldint = newint ; reldiff = 1 + eps ; it = 1 ; do j = 1 to jmax while (reldiff > eps) ; tnm = it ; del = (b - a) / tnm ; sumfun = 0 ; x = a + .5 * del ; do i = 1 to it ; run func(x, fx, evals) ; sumfun = sumfun + fx ; x = x + del ; end ; newint = .5 * (newint + (b - a) * sumfun / tnm) ; reldiff = abs((newint - oldint) / oldint) ; print j it oldint newint reldiff ; oldint = newint ; it = it * 2 ; end ; if (j >= jmax & reldiff > eps) then do ; print "No convergence after" jmax " iterations." ; stop ; end ; print "Number of function evaluations: " evals ; print "Estimated integral = " newint ; quit ; ================================================================================== The SAS System 1 13:12 Saturday, April 15, 2000 J IT OLDINT NEWINT RELDIFF 1 1 0.7701512 0.8238669 0.069747 J IT OLDINT NEWINT RELDIFF 2 2 0.8238669 0.8370838 0.0160425 J IT OLDINT NEWINT RELDIFF 3 4 0.8370838 0.840375 0.0039318 J IT OLDINT NEWINT RELDIFF 4 8 0.840375 0.8411971 0.0009782 J IT OLDINT NEWINT RELDIFF 5 16 0.8411971 0.8414025 0.0002442 J IT OLDINT NEWINT RELDIFF 6 32 0.8414025 0.8414539 0.000061 J IT OLDINT NEWINT RELDIFF 7 64 0.8414539 0.8414667 0.0000153 J IT OLDINT NEWINT RELDIFF 8 128 0.8414667 0.8414699 3.8147E-6 J IT OLDINT NEWINT RELDIFF 9 256 0.8414699 0.8414707 9.5368E-7 EVALS Number of function evaluations: 513 NEWINT Estimated integral = 0.8414707 SIN1 Sin(1) = 0.841471 ~john-c/5421/integ.sas 15APR00 13:12 ================================================================================== With very little effort it is possible to modify the above program to greatly improve the efficiency. This is done by using Simpson's rule, which is a slight modification to the trapezoid algorithm. This is shown in the following program. Note that the printout indicates convergence within 5 iterations and only 33 function evaluations. ================================================================================== options linesize = 80 mprint ; footnote "~john-c/5421/simp.sas &sysdate &systime" ; /* Proc iml program to perform numerical integration ... */ /* Uses Simpson's rule. */ proc iml ; start func(x, fvalue, evals) ; fvalue = cos(x) ; evals = evals + 1 ; finish func ; eps = 1e-6 ; jmax = 15 ; reldiff = eps + 1 ; a = 0 ; b = 1 ; evals = 0 ; run func(a, fa, evals) ; run func(b, fb, evals) ; newint = .5 * (b - a) * (fa + fb) ; newsimp = newint ; oldsimp = newsimp ; oldint = newint ; reldiff = 1 + eps ; it = 1 ; qsimp = 0 ; do j = 1 to jmax while (qsimp = 0 ) ; tnm = it ; del = (b - a) / tnm ; sumfun = 0 ; x = a + .5 * del ; do i = 1 to it ; run func(x, fx, evals) ; sumfun = sumfun + fx ; x = x + del ; end ; newint = .5 * (newint + (b - a) * sumfun / tnm) ; reldiff = abs((newint - oldint) / oldint) ; newsimp = (4 * newint - oldint) / 3 ; print j it oldint newint reldiff oldsimp newsimp ; if (abs(newsimp - oldsimp) < eps * abs(oldsimp)) then qsimp = 1 ; oldsimp = newsimp ; oldint = newint ; it = it * 2 ; end ; if (j >= jmax & qsimp = 0) then do ; print "No convergence after" jmax " iterations." ; stop ; end ; print "Number of function evaluations: " evals ; print "Estimated integral = " newsimp ; quit ; ================================================================================== The SAS System 1 13:26 Saturday, April 15, 2000 J IT OLDINT NEWINT RELDIFF OLDSIMP NEWSIMP 1 1 0.7701512 0.8238669 0.069747 0.7701512 0.8417721 J IT OLDINT NEWINT RELDIFF OLDSIMP NEWSIMP 2 2 0.8238669 0.8370838 0.0160425 0.8417721 0.8414894 J IT OLDINT NEWINT RELDIFF OLDSIMP NEWSIMP 3 4 0.8370838 0.840375 0.0039318 0.8414894 0.8414721 J IT OLDINT NEWINT RELDIFF OLDSIMP NEWSIMP 4 8 0.840375 0.8411971 0.0009782 0.8414721 0.8414711 J IT OLDINT NEWINT RELDIFF OLDSIMP NEWSIMP 5 16 0.8411971 0.8414025 0.0002442 0.8414711 0.841471 EVALS Number of function evaluations: 33 NEWSIMP Estimated integral = 0.841471 ~john-c/5421/simp.sas 15APR00 13:26 ================================================================================== PROBLEM 32 Assume X has an N(0, 1) distribution, and let Y = abs(X). Use (1) the trapezoidal rule algorithm and (2) the Simpson's rule algorithm to find the expected value of Y. You can assume that the normal distribution has negligible probability mass for X larger than 8 or less than -8. Comment on whether your computed expectation is correct by simulating 1000 observations and carrying out an appropriate test. ================================================================================== /home/walleye/john-c/5421/notes.030 Last update: December 7, 2002.