SAS Distribution Functions It frequently happens that you have computed a test statistic, and you want to find the corresponding p-value. SAS can compute the cumulative probability of an observation from several different distributions of random variables. For example, the function 'probnorm(z)' computes the probability that an observation from the standard normal distribution is is less than or equal to the observed value 'z'. If, for example, z = 1.96, then the value returned by probnorm(1.96) is approximately 0.975. If you want the probability of observing a z-statistic greater than z = 1.96, it would equal: 1 - probnorm(1.96) = 1 - .975 = 0.025. This is a ONE-SIDED p-value. If you want a TWO-SIDED p-value, then for the standard normal distribution, you would double the one-sided value. In other words, 2*(1 - probnorm(1.96)) = 2*(1 - .975) = 2 * 0.025 = 0.05. Below is a program which shows how you can compute p-values for normal, t, chi-square and F distributions, using SAS cumulative distribution functions. SAS can also compute values for the binomial, poisson, gamma, beta, hypergeometric, and negative binomial distributions. The four functions shown in the program below are probably the most useful: ======================================================================== options linesize = 80 ; footnote "~john-c/probfuns.sas &sysdate &systime " ; data probfuns ; file 'probfuns.out' ; zstat = 2.32 ; npvalue = 1 - probnorm(zstat) ; tstat = 2.32 ; tdf = 10 ; tpvalue = 1 - probt(tstat, tdf) ; chistat = 2.32**2 ; cdf = 1 ; cpvalue = 1 - probchi(chistat, cdf) ; fstat = 31.83 ; ndf = 2 ; ddf = 681 ; fpvalue = 1 - probf(fstat, ndf, ddf) ; put ' ------------------------------------------------------------------' ; put ' ' ; put 'Date: ' "&sysdate" ' Time: ' "&systime" ; put ' ' ; put 'Examples of computations from SAS distribution functions:' ; put ' ' ; put ' z-value = ' zstat ' p-value = ' npvalue ; put ' ' ; put ' t-value = ' tstat ' df = ' tdf ' p-value = ' tpvalue ; put ' ' ; put ' X2-value = ' chistat ' df = ' cdf ' p-value = ' cpvalue ; put ' ' ; put ' F-value = ' fstat ' ndf = ' ndf ' ddf = ' ddf ' p-value = ' fpvalue ; run ; ------------------------------------------------------------------ The following is an output file from the preceding program, 'probfuns.out' : ------------------------------------------------------------------ Date: 24FEB04 Time: 20:15 Examples of computations from SAS distribution functions: z-value = 2.32 p-value = 0.0101704387 t-value = 2.32 df = 10 p-value = 0.0213863809 X2-value = 5.3824 df = 1 p-value = 0.0203408773 F-value = 31.83 ndf = 2 ddf = 681 p-value = 6.095124E-14 ------------------------------------------------------------------------ Note that this program also makes extensive use of the SAS "put" function. This is an extremely versatile way to produce readable output from a SAS program. This function has many options which enable you to place text in specified columns, format decimal numbers, dates, and other kinds of variables. Note that in the data step in which the "put" statement is used, the first line after the 'data' statement is file 'probfuns.out' ; This specifies an output file on which the results of the 'put' statements appear. If this is omitted, then the output from the 'put' statements will appear on the log file. ------------------------------------------------------------------------ Problem 1. Given the following statistics, compute the indicated p-values: a. Assume X is an observation from a normal distribution with standard deviation 2; X = 3; test the hypothesis that the mean of the distribution is 1.0 (both one-sided and two-sided tests). b. Assume W has an F-distribution with degrees of freedom (2, 20). What is the p-value that W = 7 ? c. Assume t has a t-distribution with 100 degrees of freedom. What is the two-sided p-value corresponding to t = -1.96 ? Compare this to the z-distribution p-value for z = -1.96. d. Assume Y has a chi-square distribution with 5 degrees of freedom. What is the p-value corresponding to Y = 10 ? e. Suppose N has a binomial distribution binom(100, .5). What is the probability that N < 40? [Note: the appropriate function is probbnml(p, n, m), where p is the probability, n is the number of trials, and m is the number of "successes".] Problem 2. The theoretically expected frequencies of 3 haplotypes for a certain gene are the following: AA : prob = .16 AB : prob = .48 BB : prob = .36 A random sample of 1000 people was taken. The observed counts of haplotypes were the following: AA: n = 200 AB: n = 470 BB: n = 330 Use the Pearson chi-square statistic to test the hypothesis that the true proportions in the population match the expectations given above. [Hint: Use probchi. What should the degrees of freedom be?] Describe your conclusion. Problem 3. Assume you test a drug on 100 people, randomly chosen from the population of people who have an ear infection. The drug manufacturer says that 70% of people will be cured within two days of taking the drug. What is the probability that the number cured is between 60 and 80 (including both 60 and 80)? Use the SAS function probbnml for this. ------------------------------------------------------------------------ n54703.006 Last update: February 23, 2006.