BAYES COMPUTATIONS AND NUMERICAL INTEGRATION SPH 5421 notes.032 The Bayesian approach to inference and estimation for a given dataset depends essentially on two ingredients: (1) A prior distribution for unknown parameters, and (2) the likelihood function, based on the observed data, of those parameters. Here is a brief summary of the Bayesian argument. You have some information about a parameter which represents a characteristic of a population. Such a parameter might be the death rate per year among men who are 70 years of age. From your own observations and public information, you know the death rate is not zero and not 100 percent. In fact you are almost completely certain that the rate is lower than 20%. You are less certain about lower rates, like 2% or 5% or 8%. Your uncertainty could be represented as a probability density function. It might have a peak at, say, 3%, and would decrease down to zero in both directions. In fact, it might be reasonable to represent your uncertainty and prior information about the death rate as a beta distribution. The following program graphs several beta distributions that might be plausible priors for various individuals' guesses about the death rate in 70-year old men: ================================================================================== FILENAME GRAPH 'gsas.grf' ; OPTIONS LINESIZE = 80 ; GOPTIONS RESET = GLOBAL ROTATE = LANDSCAPE FTEXT = SWISSB DEVICE = PSCOLOR GACCESS = SASGASTD GSFNAME = GRAPH GSFMODE = REPLACE GUNIT = PCT BORDER CBACK = WHITE HTITLE = 2 HTEXT = 1 ; *===================================================================== ; options linesize = 80 ; footnote "~john-c/5421/gbeta.sas &sysdate &systime" ; data beta ; array a(36) a1-a36 ; array b(36) b1-b36 ; a1 = 2.0 ; b1 = 10.0 ; a2 = 2.0 ; b2 = 15.0 ; a3 = 2.0 ; b3 = 20.0 ; a4 = 1.5 ; b4 = 10.0 ; h = 1e-7 ; do j = 1 to 4 ; do i = 1 to 100 ; t = (i - .5) / 100 ; y = (probbeta(t + h, a(j), b(j)) - probbeta(t - h, a(j), b(j)))/ (2 * h) ; output ; end ; end ; run ; /*==================================================================*/ proc format ; value glab 1 = 'a = 2, b = 10' 2 = 'a = 2, b = 15' 3 = 'a = 2, b = 20' 4 = 'a = 1.5, b = 10' ; /*==================================================================*/ symbol1 interpol = join w = 2 c = black v = none l = 1 ; symbol2 interpol = join w = 2 c = black v = none l = 2 ; symbol3 interpol = join w = 2 c = black v = none l = 3 ; symbol4 interpol = join w = 2 c = black v = none l = 4 ; proc gplot data = beta ; plot y * t = j ; title1 h = 2 f = swissb "Overlay Graphs of Beta Distributions" ; format j glab. ; run ; endsas ; ================================================================================== If you run the program and graph the output, you will see 4 curves which start at (0, 0) and end at (1, 0), and have peaks somewhere between 0.05 and 0.15. Perhaps one of these curves is close to your own prior belief about the death rate in 70-year-old men. A curve with a very sharp, narrow peak represents a high degree of certainty about what the true death rate is. Conversely, a curve with a low, broad peak represents great uncertainty. In the graph from the program, the beta distribution with a = 2 and b = 20 has the highest and sharpest peak, while the curve with a = 2 and b = 10 has the lowest, broadest peak. Let r be the unknown death rate, and let g(r) be a probability distribution function which represents your prior belief about r. Now assume that you observe 40 men aged 70 for one year, and 6 of these men die. This will have an effect on your prior belief. If you thought the death rate was less than 1%, you are likely to revise your estimate upward, closer to the observed value of 15%. If you thought the death rate was 50%, you are likely to revise your estimate downward. This small sample, however, does not completely wipe out the effect of your prior belief. In fact, the narrower and sharper the curve was that represented your prior belief, the less effect the observed data has on it. There is a formal way, based on Bayes Theorem, to calculate how the data changes your estimate of the true value of r. The posterior distribution of r is defined by the formula [1] h(r | x) = likelihood(x | r) * g(r) / f(x), where g(r) is the prior distribution, x is the observed data, and f(x) is the unconditional probability distribution function of the observed data. {The formula [1] is really the basis for Bayesian statistical inference and estimation.} The denominator term on the right, f(x), in general is difficult to calculate. In fact its computation requires integration. A formula for the computation of f(x) is [2] f(x) = integral{ likelihood(x | r) * g(r) dr }. The integral will be over the entire range of possible values for r. In the case of the death rates in men of age 70, r will range from 0 to 1 (expressing the rate as a proportion). In some cases you can derive an exact answer for the integral expression. Let us take the example above. Assume that your prior distribution g(r) is the beta distribution with parameters (2, 20). This means that the probability distribution function g(r) is g(r) = K * r1 * (1 - r)19, where K = G(2 + 20) / (G(2) * G(20)), and G is the gamma function. The general expression for the pdf of the beta distribution with parameters a and b is g(r) = K * ra - 1 * (1 - r)b - 1, where 0 <= r <= 1, and K = G(a + b) / (G(a) * G(b)). Recall that you observed 40 men for one year, and 6 of these men died. If the 'true' rate of death in such men is r, then the likelihood of the observed data is binomial: likelihood(6 of 40 die | r) = B(40: 6)* r6 * (1 - r)34, where B(40: 6) is the binomial coefficient for 40 objects taken 6 at a time. Now consider the expression inside the integral in equation [2]. It is B(40: 6) * r6 * (1 - r)34 * K * r1 * (1 - r)19 = B(40: 6) * K * r7 * (1 - r)53. Note that this same expression occurs in the numerator of [1]. What this means is that the posterior distribution h(r | x) in this case is also a beta distribution with parameters (a, b) = (8, 54) which are essentially a combination of the prior parameters (2, 20), and the observed data. Thus in this case, the distribution can be computed without having to perform numerical integration. In general you will not be that lucky. Most posterior distributions are integrals which do not have simple analytic expressions, and evaluating them will require numerical integration. Bayesian methods give you something that you don't get with frequentist statistics. The posterior probability distribution can be used to answer questions like, "What is the probability that the true rate of death in 70 year old men is between 2% and 5%?" Frequentist statistics can yield confidence intervals which, a first glance, seem to answer this kind of question, but in fact they do not. The endpoints of confidence intervals are themselves statistics. In computing a 95% confidence interval, you can be assured that in 95% of the experiments you might perform, the true value of the parameter lies within the confidence interval. But for any SPECIFIC experiment, the true value either lies within the computed confidence interval or it does not. The probability that it does is thus either 1 or 0. There is a price that must be paid for Bayesian posterior probability statements. That price is having to assume some sort of prior distribution. This goes against the feeling of many scientists that inference should be based only on observed data and objective criteria in general. This is a topic relating to the basic philosophy of science which has caused a very long and heated debate among statisticians for decades, and it will not be settled here. ================================================================================== PROBLEM 34 As noted above, you can find the posterior distribution with the assumed beta(2, 20) prior and the observed data (6 of 40 participants died) explicitly. 1. Graph this posterior distribution. Compare it to the prior distribution. 2. Compute the integral [2] using a numerical integration program. Does the value agree with what you get analytically? ================================================================================== PROBLEM 35 You are conducting a surveillance program to see what the probability is that a child in the third grade will contract chicken pox. At the beginning of the program you have no idea what the probability p might be, and you therefore decide that a reasonable prior distribution for p is the uniform distribution, U[0, 1]. As it happens, U[0, 1] is a beta distribution with parameters a = 1 and b = 1. You collect data for 5 different years, with different third-grade classes in each year. The data are as follows: Number Who Contracted Number in Class Chicken Pox --------------- --------------------- Year 1 25 3 Year 2 28 5 Year 3 30 10 Year 4 22 8 Year 5 25 4 Each year you look at the accumulated data from that year and each previous year. Using this, you compute a new posterior distribution at the end of each year. 1. Graph these 5 posterior distributions and compute their mean values and their standard deviations. 2. Use the last of these posterior distributions to compute the probability that the true rate r is between 15% and 25%. ================================================================================== /home/walleye/john-c/5421/notes.032 Last update: April 21, 2000