NUMERICAL INTEGRATION                                          SPH 5421 notes.030

     An important reason for doing numerical integration in statistics is the
fact that computation of Bayesian posterior distributions requires integration
of the likelihood function over the prior distribution.  (This will be explained
in more detail later.)  In general, the integrals involved cannot be evaluated
analytically.  Statistical packages such as SAS, SPSSX, and BMDP do not provide
methods for numerical integration, and this may explain why Bayesian methods
are not as widely used in biostatistics as they could be and possibly should be.

     It is interesting to note, however, that the new SAS procedure PROC NLMIXED
does perform evaluation of the likelihood function by integrating out random
effects.  The main method used by NLMIXED is adaptive Gaussian quadrature
(cf. Pinheiro JC, Bates DM, "Approximations to the log-likelihood function in
the nonlinear mixed-effects model", J. Comp. & Graphical Stats., 4, 12-35, 1995).

     Much of the material in this section is based on the methods discussed in
Chapter 4 of the book Numerical Recipes: The Art of Scientific Computing, by
Press, Flannery et al..

     Only one-dimensional integrals will be discussed here.  The Riemann
integral of a continuous function  f(x)  over an interval [a, b] by
definition is a limit of sums of the form

[1]                  SUMi(f(xi) * di),

where x1 < x2 < ... < xi < ... xn are points in the interval [a, b], and
di is the distance between the i-th and (i+1)-th points, with the limit
being taken as the maximum of the di goes to zero.  This can be thought
of basically as a process of adding up the areas of a collection of rectangles
of width di and height equal to the value of the function at the left-hand
endpoint xi of the subinterval [xi, xi+1].

     It is simple to write a program to compute sums like that in [1].  Usually
the points {xi} are taken to be equally spaced with a common distance between
successive points of  d.  However for most functions, [1] is not very efficient.
The interval [a, b] will need to be subdivided into a great many points to
give a good approximation to the true integral. 

     It is possible to make dramatic improvements in the efficiency.  The first
improvement is the 'trapezoid rule', in which the rectangles involved in the
sum [1] are replaced by trapezoids.  The area of a trapezoid is equal to the
average of the heights at the two ends multiplied by the width.  In terms of
the summands, this is

              .5 * (f(xi) + f(xi+1)) * di.

     The following PROC IML program is based on the trapezoid rule.  The function
being integrated here is  cos(x).  An iterative procedure is used, with
a criterion for convergence being agreement of the estimated integral between
successive iterations to within eps = 1e-6.  The function is integrated over the
interval [a, b] = [0, 1].  The true integral is sin(1) = .841471.  The program
convergences to an answer which is correct to within .000001 in 9 iterations,
requiring a total of 513 function evaluations.

==================================================================================

options linesize = 80 mprint ;
footnote "~john-c/5421/integ.sas &sysdate &systime" ;

/*  Proc iml program to perform numerical integration ...             */
/*  Uses just the trapezoid rule.                                     */

proc iml ;

start func(x, fvalue, evals) ;

      fvalue = cos(x) ;
      evals = evals + 1 ;

finish func ;

  eps = 1e-6 ;
  jmax = 15 ;
  reldiff = eps + 1 ;

  a = 0 ; b = 1 ;
  evals = 0 ;
  run func(a, fa, evals) ;
  run func(b, fb, evals) ;
  newint = .5 * (b - a) * (fa + fb) ;
  oldint = newint ;
  reldiff = 1 + eps ;
  it = 1 ;

  do j = 1 to jmax while (reldiff > eps) ;

     tnm = it ;
     del = (b - a) / tnm ;

     sumfun = 0 ;
     x = a + .5 * del ;

     do i = 1 to it ;

        run func(x, fx, evals) ;
        sumfun = sumfun + fx ;
        x = x + del ;

     end ;

     newint = .5 * (newint + (b - a) * sumfun / tnm) ;

     reldiff = abs((newint - oldint) / oldint) ;
     print j it oldint newint reldiff ;
     oldint = newint ;
     it = it * 2 ;

  end ;

  if (j >= jmax & reldiff > eps) then do ;

     print "No convergence after" jmax " iterations." ;
     stop ;

  end ;

  print "Number of function evaluations: " evals ;
  print "Estimated integral =            " newint ;


quit ;
==================================================================================


                                 The SAS System                                1
                                                  13:12 Saturday, April 15, 2000

                       J        IT    OLDINT    NEWINT   RELDIFF
                       1         1 0.7701512 0.8238669  0.069747


                       J        IT    OLDINT    NEWINT   RELDIFF
                       2         2 0.8238669 0.8370838 0.0160425


                       J        IT    OLDINT    NEWINT   RELDIFF
                       3         4 0.8370838  0.840375 0.0039318


                       J        IT    OLDINT    NEWINT   RELDIFF
                       4         8  0.840375 0.8411971 0.0009782


                       J        IT    OLDINT    NEWINT   RELDIFF
                       5        16 0.8411971 0.8414025 0.0002442


                       J        IT    OLDINT    NEWINT   RELDIFF
                       6        32 0.8414025 0.8414539  0.000061


                       J        IT    OLDINT    NEWINT   RELDIFF
                       7        64 0.8414539 0.8414667 0.0000153


                       J        IT    OLDINT    NEWINT   RELDIFF
                       8       128 0.8414667 0.8414699 3.8147E-6


                       J        IT    OLDINT    NEWINT   RELDIFF
                       9       256 0.8414699 0.8414707 9.5368E-7


                                                        EVALS
                   Number of function evaluations:        513


                                                       NEWINT
                   Estimated integral =             0.8414707


                                                         SIN1
                   Sin(1)             =              0.841471

 
                      ~john-c/5421/integ.sas 15APR00 13:12
==================================================================================

     With very little effort it is possible to modify the above program to
greatly improve the efficiency.  This is done by using Simpson's rule, which
is a slight modification to the trapezoid algorithm.  This is shown in the
following program.  Note that the printout indicates convergence within
5 iterations and only 33 function evaluations.

==================================================================================

options linesize = 80 mprint ;
footnote "~john-c/5421/simp.sas &sysdate &systime" ;

/*  Proc iml program to perform numerical integration ...             */
/*  Uses Simpson's rule.                                              */

proc iml ;

start func(x, fvalue, evals) ;

      fvalue = cos(x) ;
      evals = evals + 1 ;

finish func ;

  eps = 1e-6 ;
  jmax = 15 ;
  reldiff = eps + 1 ;

  a = 0 ; b = 1 ;
  evals = 0 ;

  run func(a, fa, evals) ;
  run func(b, fb, evals) ;
  newint = .5 * (b - a) * (fa + fb) ;
  newsimp = newint ;
  oldsimp = newsimp ;

  oldint = newint ;
  reldiff = 1 + eps ;
  it = 1 ;
  qsimp = 0 ;

  do j = 1 to jmax while (qsimp = 0 ) ;

     tnm = it ;
     del = (b - a) / tnm ;

     sumfun = 0 ;
     x = a + .5 * del ;

     do i = 1 to it ;

        run func(x, fx, evals) ;
        sumfun = sumfun + fx ;
        x = x + del ;

     end ;

     newint = .5 * (newint + (b - a) * sumfun / tnm) ;

     reldiff = abs((newint - oldint) / oldint) ;
     newsimp = (4 * newint - oldint) / 3 ;
     print j it oldint newint reldiff oldsimp newsimp ;
     if (abs(newsimp - oldsimp) < eps * abs(oldsimp)) then qsimp = 1 ;
     oldsimp = newsimp ;
     oldint = newint ;

     it = it * 2 ;

  end ;

  if (j >= jmax & qsimp = 0) then do ;

     print "No convergence after" jmax " iterations." ;
     stop ;

  end ;

  print "Number of function evaluations: " evals ;
  print "Estimated integral =            " newsimp ;


quit ;

==================================================================================

                                 The SAS System                                1
                                                  13:26 Saturday, April 15, 2000

             J        IT    OLDINT    NEWINT   RELDIFF   OLDSIMP   NEWSIMP
             1         1 0.7701512 0.8238669  0.069747 0.7701512 0.8417721


             J        IT    OLDINT    NEWINT   RELDIFF   OLDSIMP   NEWSIMP
             2         2 0.8238669 0.8370838 0.0160425 0.8417721 0.8414894


             J        IT    OLDINT    NEWINT   RELDIFF   OLDSIMP   NEWSIMP
             3         4 0.8370838  0.840375 0.0039318 0.8414894 0.8414721


             J        IT    OLDINT    NEWINT   RELDIFF   OLDSIMP   NEWSIMP
             4         8  0.840375 0.8411971 0.0009782 0.8414721 0.8414711


             J        IT    OLDINT    NEWINT   RELDIFF   OLDSIMP   NEWSIMP
             5        16 0.8411971 0.8414025 0.0002442 0.8414711  0.841471


                                                        EVALS
                   Number of function evaluations:         33


                                                      NEWSIMP
                   Estimated integral =              0.841471

 
                      ~john-c/5421/simp.sas 15APR00 13:26
==================================================================================

PROBLEM 32

     Assume X has an N(0, 1) distribution, and let Y = abs(X).  Use (1) the
trapezoidal rule algorithm and (2) the Simpson's rule algorithm to find the
expected value of Y.  You can assume that the normal distribution has negligible
probability mass for X larger than 8 or less than -8.

     Comment on whether your computed expectation is correct by simulating
1000 observations and carrying out an appropriate test.
==================================================================================
/home/walleye/john-c/5421/notes.030   Last update: December 7, 2002.