RANDOM NUMBER GENERATORS IN SAS AND R SPH 7460 notes.001
Statisticians make extensive use of pseudo-random number generators (PRNGs)
in simulation studies of statistical tests, estimation procedures, and design
of experiments. The reason for this is that summary and sufficient statistics
are often nonlinear combinations of data from a large number of observations,
and in general there are no closed-form analytic expressions of the resulting
parameter estimates or probabilities. Even when available, analytic expressions
often apply only asymptotically, and the numbers of observations required for the
analytic expressions to be acceptably close to the truth often are not known.
SAS and R both have a number of pseudo-random number generators which
perform very reliably to produce sequences of numbers whose distributions are
very close to what is claimed.
Below is a program which generates 100 random numbers from seven SAS
random number generators: binomial, poisson, exponential, gamma, normal,
uniform, and triangular. SAS 8 and earlier versions do not have random number generators for
negative binomial, hypergeometric, Weibull, chi-square, beta, F, or other
commonly used distributions.
The function RAND in SAS 9.1.3+ can generate pseudorandom numbers from any one of the
following distributions: bernoulli, beta, binomial, cauchy, chisquare, erlang,
exponential, F, gamma, geometric, hypergeometric, lognormal, negbinomial,
normal, poisson, t, triangle, uniform, and weibull.
For example: to generate a pseudorandom number x with a Weibull
distribution:
x = rand('weibull', a, b) ;
where
f(x) = (a/b^a) * x^(a - 1) * e((-x/b)^a),
and a is the 'shape' parameter, and b is the 'scale' parameter.
* ============================================================================== ;
* ~john-c/randgens.sas
* SAS random number generators. ;
OPTIONS LINESIZE = 120 ERRORS = 1 MPRINT ;
%let n = 100 ;
footnote "&sysdate &systime ~john-c/5421/randgens.sas" ;
data rands ;
seed = today() ;
do i = 1 to &n ;
rbinnom = ranbin(seed, 100, .3) ;
rcauchy = rancau(seed) ;
rpoiss = ranpoi(seed, 30) ;
rexpon = ranexp(seed) ;
rgamma = rangam(seed, 1) ;
rnormal = rannor(seed) ;
runif = ranuni(seed) ;
rtri = rantri(seed, .5) ;
output ;
end ;
run ;
proc print data = rands ;
title "Output from SAS random number generators ... &n observations." ;
* ================================================================== ;
Note that this program makes use of a macro variable (&n). This
will be explained later.
Here is a list of the R functions that will generate a random sample from common
distributions: runif, rnorm, rpois, rmvnorm, rnbinom, rbinom, rbeta, rchisq, rexp, rgamma,
rlogis, rstab, rt, rgeom, rhyper, rwilcox, rweibull.
Both SAS and R have advantages in describing data. For graphics, it
is often easier to use R than SAS. It is a good idea to know how to use
SAS and R together. The SAS program below generates 1000 numbers (x) from
a chi-square distribution with 3 degrees of freedom. It also computes
an approximate derivative of the cumulative distribution function for a chi-square
variate with 3 DF. It writes these data out to a SAS system file. It also
runs PROC UNIVARIATE to illustrate the distribution of the random variable x,
and a PROC PLOT to show the approximate PDF for a chi-square with 3 DF.
The second program is an R program which (1) reads the SAS csv output
created by the SAS program, and (2) creates and histogram for the simulated
chi-square data, and (3) overlays a plot of the approximate PDF on the
histogram. In this case the plot is a postscript file called chi3.ps. You can
view and print this file by using ghostview on an X-terminal.
* ============================================================================== ;
* This SAS program simulates 1000 points from a chi-square distribution with ;
* 3 degrees of freedom. It also computes an approximate pdf for a chi-square ;
* with 3 DF. It writes out the simulated data to a SAS system file called ;
* "examp". The actual UNIX name of this file is "examp.ssd01". It is in a ;
* SAS library called 'sasplus'. ;
* ;
* The program also uses PROC UNIVARIATE to illustrate the distribution of the ;
* simulated data, and PROC PLOT to show the approximate pdf. ;
* ;
options linesize = 80 ;
footnote "~john-c/5421/csv.output.sas &sysdate &systime" ;
data rands ;
n = 1000 ;
seed = today() ;
low = 0 ;
high = +12 ;
h = (high - low) / n ;
df = 3 ;
do i = 1 to n ;
tph = i * h + low ;
t = tph - h ;
xpdf = (probchi(tph, df) - probchi(t, df)) / h ;
z1 = rannor(seed) ;
z2 = rannor(seed) ;
z3 = rannor(seed) ;
x = z1 * z1 + z2 * z2 + z3 * z3 ;
output ;
end ;
run ;
ods csv file = 'sas.output.csv' ;
proc print data = rands ;
run ;
ods trace on ;
ods csv close ;
data sasplus.examp ;
set rands ;
keep i t x xpdf ;
run ;
proc univariate plot normal ;
var x xpdf ;
proc plot data = rands ;
plot xpdf * t ;
* ============================================================================== ;
# R program example ... histogram of a distribution.
#
# To run this program:
#
# 1. First run the sas program, sasplusexamp.sas
# The output is on a file called sas.output.csv
#
# 2. Get into R by typing, at the Unix command prompt,
# R
#
# 3. Run the program below. Note particularly the step:
# sasoutput <- read.csv(file="sas.output.csv", head=TRUE, sep = ",")
#
# 4. The output is a postscript file, chi3.ps, which you can
# examine using ghostview.
#
postscript("chi3.ps", horizontal = F, onefile = T)
par(mfrow = c(2,1), oma = c(2, 2, 2, 2))
sasoutput <- read.csv(file="sas.output.csv", head=TRUE, sep = ",")
x <- sasoutput$x
y <- sasoutput$y
hist(x, nclass = 20, xlim = c(0, 12),
cex = .7, main = "SPH 5421 Histogram example: Chi-Square (DF = 3)")
par(new = T, xaxs = "d")
plot(t, y, axes = F, type = "l")
# rm(x, t, xpdf, nobs, y)
dev.off()
* ============================================================================== ;
PROBLEM SET 1
1. Use the seven SAS random number generators to generate 1000 numbers
from each of the 7 distributions in the first SAS example.
2. Find and give examples of the use of the Splus random number
generators for the same 7 distributions as in the SAS program
example.
3. Use SAS/GRAPH or SPlus to graph histograms for each of the above.
4. Superimpose the pdfs for the random variables on each histogram.
5. Use the random number generator for the gamma distribution
to generate random numbers from a chi-square distribution
with 3 degrees of freedom. How can you test whether your
method produces observations from the chi-square distribution
with df = 3 ?
----------------------------------------------------------------------
It is possible to use a uniform random number generator to generate
random numbers from the standard normal N(0, 1) distribution. The
method depends on the 'Box-Muller Transformation'.
Basically, if X1 and X2 have independent U[0, 1] distributions,
then
Y = (sqrt(-2 * log(X1))) * COS(2 * pi * X2)
has an N(0, 1) distribution. Here 'log' is the natural logarithm,
and pi is the usual pi = 3.141592....
The Box-Muller transformation was first described in the following short note:
Box GEP, Muller ME. A note on the generation of random normal deviates.
Annals of Math. Stat. 29: 610-611, 1958.
Some languages, such as BASIC, have only a uniform random number
generator. Below is a BASIC program which uses the Box-Muller
transformation to generate a random N(0, 1) variate.
======================================================================
10 CLS : KEY OFF : SCREEN 9
20 '
30 ' This program implements the Box-Muller method of generating
40 ' normal random deviates.
50 '
60 DIM Z(640)
70 RANDOMIZE : PRINT
80 TWOPI = 4 * ATN(1)
90 '
100 PRINT "This program generates N(0, 1) random deviates, using the"
110 PRINT "Box-Muller approach, and graphs a histogram ... "
120 PRINT
130 INPUT; "Enter N : ", N : PRINT : PRINT
140 '
150 CLS
160 '
170 C = 1
180 FOR I = 1 TO N
190 '
200 X1 = RND
210 X2 = RND
220 IF X1 = 0 THEN 200
230 W = -2 * LOG(X1)
240 Y = (SQR(W)) * COS(TWOPI * X2)
250 '
260 ' Note: y has a standard normal distribution, N(0, 1).
270 '
280 YSTAR = Y + 3.2
290 J = INT(YSTAR * 100)
300 IF YSTAR < 0 THEN J = 1
310 IF YSTAR > 6.4 THEN J = 640
320 Z(J) = Z(J) + 1
330 PSET(J, 349 - Z(J)), 15
340 IF Z(J) > 348 THEN 410
350 '
360 NEXT I
370 '
380 C = C + 1
390 IF C > 15 THEN C = 1
400 GOTO 180
410 GOTO 410
======================================================================
PROJECT ASSIGNMENT 2
1. Write a SAS version of the BASIC program. Don't worry about doing
the graphics part (lines 280-340).
2. Generate 1000 numbers using your program and create a histogram.
Use PROC UNIVARIATE to check that the distribution is approximately
normal.
3. Read the Box-Muller paper and explain in your own words why the
transformation works.
----------------------------------------------------------------------
Suppose you had access to only a uniform pseudorandom number generator.
How might you generate random numbers from chi-square, t, and f distributions?
If you find a way to do this, how might you check that it is
working correctly?
----------------------------------------------------------------------
PROJECT ASSIGNMENT 3
1. Find a way in SAS to generate random numbers for chi-square,
t-, and F-distributions.
2. Use a Pearson chi-square approach to check that your proposed
methods actually work.
3. The 'birthday problem' is a reference to the fact that if about N = 30
or more people are chosen at random, it is likely that at least two will have
the same birthday (that is, same day and month). Simulate the birthday
problem with N = 30.
Carry out 1000 simulations, and count in how many of them at least 2 of
the N people have the same birthday.
~john-c/5421/notes.001 Revised Sep. 2, 2013.