SAS Distribution Functions

     It frequently happens that you have computed a test
statistic, and you want to find the corresponding p-value.

     SAS can compute the cumulative probability of an observation
from several different distributions of random variables.
For example, the function 'probnorm(z)' computes the probability
that an observation from the standard normal distribution is
is less than or equal to the observed value 'z'.  If, for example,
z = 1.96, then the value returned by probnorm(1.96) is approximately
0.975.

     If you want the probability of observing a z-statistic greater than
z = 1.96, it would equal:

     1 - probnorm(1.96) = 1 - .975 = 0.025.

     This is a ONE-SIDED p-value.

     If you want a TWO-SIDED p-value, then for the standard normal
distribution, you would double the one-sided value.  In other words,

     2*(1 - probnorm(1.96)) = 2*(1 - .975) = 2 * 0.025 = 0.05.

     Below is a program which shows how you can compute p-values
for normal, t, chi-square and F distributions, using SAS cumulative
distribution functions.  SAS can also compute values for the
binomial, poisson, gamma, beta, hypergeometric, and negative
binomial distributions.  The four functions shown in the program
below are probably the most useful:

========================================================================

options linesize = 80 ;
footnote "~john-c/probfuns.sas &sysdate &systime " ;

data probfuns ;
     file 'probfuns.out' ;
     zstat = 2.32 ;
     npvalue = 1 - probnorm(zstat) ;

     tstat = 2.32 ;
     tdf = 10 ;
     tpvalue = 1 - probt(tstat, tdf) ;

     chistat = 2.32**2 ;
     cdf = 1 ;
     cpvalue = 1 - probchi(chistat, cdf) ;

     fstat = 31.83 ;

     ndf = 2 ;
     ddf = 681 ;
     fpvalue = 1 - probf(fstat, ndf, ddf) ;

     put ' ------------------------------------------------------------------' ;
     put ' ' ;
     put 'Date: '  "&sysdate"   '  Time: ' "&systime" ;
     put ' ' ;
     put 'Examples of computations from SAS distribution functions:' ;
     put ' ' ;

     put ' z-value = '  zstat  '  p-value = '  npvalue ;
     put ' ' ;

     put ' t-value = '  tstat   '  df = ' tdf  '  p-value = '  tpvalue ;
     put ' ' ;

     put ' X2-value = '  chistat  '  df = ' cdf  '  p-value = ' cpvalue ;
     put ' ' ;

     put ' F-value = ' fstat '  ndf = ' ndf '  ddf = ' ddf '  p-value = '  fpvalue ;

run ;


 ------------------------------------------------------------------

 The following is an output file from the preceding program, 'probfuns.out' :


 ------------------------------------------------------------------
 
Date: 24FEB04  Time: 20:15
 
Examples of computations from SAS distribution functions:
 
 z-value = 2.32   p-value = 0.0101704387
 
 t-value = 2.32   df = 10   p-value = 0.0213863809
 
 X2-value = 5.3824   df = 1   p-value = 0.0203408773
 
 F-value = 31.83   ndf = 2   ddf = 681   p-value = 6.095124E-14

------------------------------------------------------------------------

     Note that this program also makes extensive use of the
SAS "put" function.  This is an extremely versatile way to produce
readable output from a SAS program.  This function has many options
which enable you to place text in specified columns, format
decimal numbers, dates, and other kinds of variables.

     Note that in the data step in which the "put" statement is
used, the first line after the 'data' statement is

     file 'probfuns.out' ;

     This specifies an output file on which the results of the
'put' statements appear.  If this is omitted, then the output from
the 'put' statements will appear on the log file.

------------------------------------------------------------------------

Problem 1.

     Given the following statistics, compute the indicated
p-values:

     a.  Assume X is an observation from a normal distribution with standard
         deviation 2; X = 3; test the hypothesis that the mean
         of the distribution is 1.0 (both one-sided and two-sided tests).

     b.  Assume W has an F-distribution with degrees of freedom (2, 20).
         What is the p-value that W = 7 ?

     c.  Assume t has a t-distribution with 100 degrees of freedom.
         What is the two-sided p-value corresponding to t = -1.96 ?
         Compare this to the z-distribution p-value for z = -1.96.

     d.  Assume Y has a chi-square distribution with 5 degrees of
         freedom.  What is the p-value corresponding to Y = 10 ?

     e.  Suppose N has a binomial distribution binom(100, .5).
         What is the probability that N < 40?
         [Note: the appropriate function is probbnml(p, n, m),
          where p is the probability, n is the number of trials,
          and m is the number of "successes".]


Problem 2.

     The theoretically expected frequencies of 3 haplotypes for a 
certain gene are the following:

         AA :   prob = .16

         AB :   prob = .48

         BB :   prob = .36

     A random sample of 1000 people was taken.  The observed counts
of haplotypes were the following:

         AA:    n = 200

         AB:    n = 470

         BB:    n = 330

     Use the Pearson chi-square statistic to test the hypothesis
that the true proportions in the population match the expectations
given above.  [Hint: Use probchi.  What should the degrees of freedom
be?]  Describe your conclusion.

Problem 3.

     Assume you test a drug on 100 people, randomly chosen from
the population of people who have an ear infection.  The drug
manufacturer says that 70% of people will be cured within two days
of taking the drug.

     What is the probability that the number cured is between
60 and 80 (including both 60 and 80)?  Use the SAS function
probbnml for this.
------------------------------------------------------------------------


n54703.006  Last update: February 23, 2006.