BAYES COMPUTATIONS AND NUMERICAL INTEGRATION                   SPH 5421 notes.032


     The Bayesian approach to inference and estimation for a given dataset
depends essentially on two ingredients: (1) A prior distribution for unknown
parameters, and (2) the likelihood function, based on the observed data, of
those parameters.

     Here is a brief summary of the Bayesian argument.  You have some information
about a parameter which represents a characteristic of a population.  Such a
parameter might be the death rate per year among men who are 70 years of age.
From your own observations and public information, you know the death rate is
not zero and not 100 percent.  In fact you are almost completely certain that
the rate is lower than 20%.  You are less certain about lower rates, like 2% or
5% or 8%.  Your uncertainty could be represented as a probability density
function.  It might have a peak at, say, 3%, and would decrease down to zero in
both directions.  In fact, it might be reasonable to represent your uncertainty
and prior information about the death rate as a beta distribution.  The following
program graphs several beta distributions that might be plausible priors for
various individuals' guesses about the death rate in 70-year old men:


==================================================================================

FILENAME GRAPH 'gsas.grf' ;

OPTIONS  LINESIZE = 80 ;

GOPTIONS
         RESET = GLOBAL
         ROTATE = LANDSCAPE
         FTEXT = SWISSB
         DEVICE = PSCOLOR
         GACCESS = SASGASTD
         GSFNAME = GRAPH
         GSFMODE = REPLACE
         GUNIT = PCT BORDER
         CBACK = WHITE
         HTITLE = 2 HTEXT = 1 ;

*===================================================================== ;        

options linesize = 80 ;

footnote "~john-c/5421/gbeta.sas &sysdate &systime" ;

data beta ;

array a(36) a1-a36 ;
array b(36) b1-b36 ;


     a1  =  2.0 ; b1  = 10.0 ;
     a2  =  2.0 ; b2  = 15.0 ;
     a3  =  2.0 ; b3  = 20.0 ;
     a4  =  1.5 ; b4  = 10.0 ;

     h = 1e-7 ;

     do j = 1 to 4 ;

        do i = 1 to 100 ;

           t = (i - .5) / 100 ;
           y =  (probbeta(t + h, a(j), b(j))
                  - probbeta(t - h, a(j), b(j)))/ (2 * h) ;
           output ;

        end ;

     end ;

run ;
/*==================================================================*/
proc format ;

     value  glab 1 = 'a = 2, b = 10'
                 2 = 'a = 2, b = 15'
                 3 = 'a = 2, b = 20'
                 4 = 'a = 1.5, b = 10' ;

/*==================================================================*/
symbol1  interpol = join w = 2 c = black  v = none l = 1   ;
symbol2  interpol = join w = 2 c = black  v = none l = 2   ;
symbol3  interpol = join w = 2 c = black  v = none l = 3   ;
symbol4  interpol = join w = 2 c = black  v = none l = 4   ;

proc gplot data = beta ;
     plot y * t = j ;
title1 h = 2 f = swissb "Overlay Graphs of Beta Distributions" ;
format j glab. ;
run ;

endsas ;
==================================================================================

     If you run the program and graph the output, you will see 4 curves which
start at (0, 0) and end at (1, 0), and have peaks somewhere between 0.05 and
0.15.  Perhaps one of these curves is close to your own prior belief about the
death rate in 70-year-old men.

     A curve with a very sharp, narrow peak represents a high degree of
certainty about what the true death rate is.  Conversely, a curve with a low,
broad peak represents great uncertainty.  In the graph from the program,
the beta distribution with a = 2 and b = 20 has the highest and sharpest
peak, while the curve with a = 2 and b = 10 has the lowest, broadest peak.

     Let  r  be the unknown death rate, and let g(r) be a probability distribution
function which represents your prior belief about r.  Now assume that you observe
40 men aged 70 for one year, and 6 of these men die.  This will have an
effect on your prior belief.  If you thought the death rate was less than 1%,
you are likely to revise your estimate upward, closer to the observed value of
15%.  If you thought the death rate was 50%, you are likely to revise your
estimate downward.  This small sample, however, does not completely wipe out
the effect of your prior belief.  In fact, the narrower and sharper the curve
was that represented your prior belief, the less effect the observed data has
on it.

     There is a formal way, based on Bayes Theorem, to calculate how the data
changes your estimate of the true value of r.  The posterior
distribution of r is defined by the formula

[1]        h(r | x) = likelihood(x | r) * g(r) / f(x),

where g(r) is the prior distribution, x is the observed data, and f(x) is the
unconditional probability distribution function of the observed data.

     {The formula [1] is really the basis for Bayesian statistical
inference and estimation.}

     The denominator term on the right, f(x), in general is difficult to calculate.
In fact its computation requires integration.  A formula for the computation
of f(x) is

[2]        f(x) = integral{ likelihood(x | r) * g(r) dr }.

     The integral will be over the entire range of possible values for r.  In the
case of the death rates in men of age 70, r will range from 0 to 1 (expressing
the rate as a proportion).

     In some cases you can derive an exact answer for the integral expression.
Let us take the example above.  Assume that your prior distribution g(r) is
the beta distribution with parameters (2, 20).  This means that the probability
distribution function g(r) is

                      g(r) = K * r¹ * (1 - r)¹⁹,

where K = G(2 + 20) / (G(2) * G(20)), and G is the gamma function.  The general
expression for the pdf of the beta distribution with parameters a and b is

                      g(r) = K * r^{a - 1} * (1 - r)^{b - 1},

where 0 <= r <= 1, and  K = G(a + b) / (G(a) * G(b)).

     Recall that you observed 40 men for one year, and 6 of these men died.
If the 'true' rate of death in such men is r, then the likelihood of the
observed data is binomial:

            likelihood(6 of 40 die | r) = B(40: 6)* r⁶ * (1 - r)³⁴,

where B(40: 6) is the binomial coefficient for 40 objects taken 6 at a time.

     Now consider the expression inside the integral in equation [2].
It is

              B(40: 6) * r⁶ * (1 - r)³⁴ * K * r¹ * (1 - r)¹⁹

            = B(40: 6) * K * r⁷ * (1 - r)⁵³.


     Note that this same expression occurs in the numerator of [1].  What this
means is that the posterior distribution h(r | x) in this case is also a beta
distribution with parameters (a, b) = (8, 54) which are essentially a combination
of the prior parameters (2, 20), and the observed data.  Thus in this case, the
distribution can be computed without having to perform numerical integration.

     In general you will not be that lucky.  Most posterior distributions
are integrals which do not have simple analytic expressions, and evaluating
them will require numerical integration.

     Bayesian methods give you something that you don't get with frequentist
statistics.  The posterior probability distribution can be used to answer
questions like, "What is the probability that the true rate of death in 70
year old men is between 2% and 5%?"  Frequentist statistics can yield
confidence intervals which, a first glance, seem to answer this kind of question,
but in fact they do not.  The endpoints of confidence intervals are themselves
statistics.  In computing a 95% confidence interval, you can be assured that
in 95% of the experiments you might perform, the true value of the parameter
lies within the confidence interval.  But for any SPECIFIC experiment, the
true value either lies within the computed confidence interval or it does not.
The probability that it does is thus either 1 or 0.

     There is a price that must be paid for Bayesian posterior probability
statements.  That price is having to assume some sort of prior distribution.
This goes against the feeling of many scientists that inference should be based
only on observed data and objective criteria in general.  This is a topic
relating to the basic philosophy of science which has caused a very long and
heated debate among statisticians for decades, and it will not be settled here.

==================================================================================
PROBLEM 34

     As noted above, you can find the posterior distribution with the assumed
beta(2, 20) prior and the observed data (6 of 40 participants died) explicitly.

     1.  Graph this posterior distribution.  Compare it to the prior distribution.

     2.  Compute the integral [2] using a numerical integration
         program.  Does the value agree with what you get analytically?


==================================================================================

PROBLEM 35

     You are conducting a surveillance program to see what the probability is
that a child in the third grade will contract chicken pox.  At the beginning of
the program you have no idea what the probability p might be, and you therefore
decide that a reasonable prior distribution for p is the uniform distribution,
U[0, 1].  As it happens, U[0, 1] is a beta distribution with parameters
a = 1 and b = 1.

     You collect data for 5 different years, with different third-grade classes
in each year.  The data are as follows:
                                                 Number Who Contracted
                            Number in Class           Chicken Pox
                            ---------------      ---------------------
               Year 1             25                      3
               Year 2             28                      5
               Year 3             30                     10
               Year 4             22                      8
               Year 5             25                      4


     Each year you look at the accumulated data from that year and each
previous year.  Using this, you compute a new posterior distribution at the
end of each year.

     1.  Graph these 5 posterior distributions and compute their mean values and
         their standard deviations.

     2.  Use the last of these posterior distributions to compute the probability
         that the true rate r is between 15% and 25%.

==================================================================================

/home/walleye/john-c/5421/notes.032   Last update: April 21, 2000