RANDOM NUMBER GENERATORS IN SAS AND R SPH 7460 notes.001 Statisticians make extensive use of pseudo-random number generators (PRNGs) in simulation studies of statistical tests, estimation procedures, and design of experiments. The reason for this is that summary and sufficient statistics are often nonlinear combinations of data from a large number of observations, and in general there are no closed-form analytic expressions of the resulting parameter estimates or probabilities. Even when available, analytic expressions often apply only asymptotically, and the numbers of observations required for the analytic expressions to be acceptably close to the truth often are not known. SAS and R both have a number of pseudo-random number generators which perform very reliably to produce sequences of numbers whose distributions are very close to what is claimed. Below is a program which generates 100 random numbers from seven SAS random number generators: binomial, poisson, exponential, gamma, normal, uniform, and triangular. SAS 8 and earlier versions do not have random number generators for negative binomial, hypergeometric, Weibull, chi-square, beta, F, or other commonly used distributions. The function RAND in SAS 9.1.3+ can generate pseudorandom numbers from any one of the following distributions: bernoulli, beta, binomial, cauchy, chisquare, erlang, exponential, F, gamma, geometric, hypergeometric, lognormal, negbinomial, normal, poisson, t, triangle, uniform, and weibull. For example: to generate a pseudorandom number x with a Weibull distribution: x = rand('weibull', a, b) ; where f(x) = (a/b^a) * x^(a - 1) * e((-x/b)^a), and a is the 'shape' parameter, and b is the 'scale' parameter. * ============================================================================== ; * ~john-c/randgens.sas * SAS random number generators. ; OPTIONS LINESIZE = 120 ERRORS = 1 MPRINT ; %let n = 100 ; footnote "&sysdate &systime ~john-c/5421/randgens.sas" ; data rands ; seed = today() ; do i = 1 to &n ; rbinnom = ranbin(seed, 100, .3) ; rcauchy = rancau(seed) ; rpoiss = ranpoi(seed, 30) ; rexpon = ranexp(seed) ; rgamma = rangam(seed, 1) ; rnormal = rannor(seed) ; runif = ranuni(seed) ; rtri = rantri(seed, .5) ; output ; end ; run ; proc print data = rands ; title "Output from SAS random number generators ... &n observations." ; * ================================================================== ; Note that this program makes use of a macro variable (&n). This will be explained later. Here is a list of the R functions that will generate a random sample from common distributions: runif, rnorm, rpois, rmvnorm, rnbinom, rbinom, rbeta, rchisq, rexp, rgamma, rlogis, rstab, rt, rgeom, rhyper, rwilcox, rweibull. Both SAS and R have advantages in describing data. For graphics, it is often easier to use R than SAS. It is a good idea to know how to use SAS and R together. The SAS program below generates 1000 numbers (x) from a chi-square distribution with 3 degrees of freedom. It also computes an approximate derivative of the cumulative distribution function for a chi-square variate with 3 DF. It writes these data out to a SAS system file. It also runs PROC UNIVARIATE to illustrate the distribution of the random variable x, and a PROC PLOT to show the approximate PDF for a chi-square with 3 DF. The second program is an R program which (1) reads the SAS csv output created by the SAS program, and (2) creates and histogram for the simulated chi-square data, and (3) overlays a plot of the approximate PDF on the histogram. In this case the plot is a postscript file called chi3.ps. You can view and print this file by using ghostview on an X-terminal. * ============================================================================== ; * This SAS program simulates 1000 points from a chi-square distribution with ; * 3 degrees of freedom. It also computes an approximate pdf for a chi-square ; * with 3 DF. It writes out the simulated data to a SAS system file called ; * "examp". The actual UNIX name of this file is "examp.ssd01". It is in a ; * SAS library called 'sasplus'. ; * ; * The program also uses PROC UNIVARIATE to illustrate the distribution of the ; * simulated data, and PROC PLOT to show the approximate pdf. ; * ; options linesize = 80 ; footnote "~john-c/5421/csv.output.sas &sysdate &systime" ; data rands ; n = 1000 ; seed = today() ; low = 0 ; high = +12 ; h = (high - low) / n ; df = 3 ; do i = 1 to n ; tph = i * h + low ; t = tph - h ; xpdf = (probchi(tph, df) - probchi(t, df)) / h ; z1 = rannor(seed) ; z2 = rannor(seed) ; z3 = rannor(seed) ; x = z1 * z1 + z2 * z2 + z3 * z3 ; output ; end ; run ; ods csv file = 'sas.output.csv' ; proc print data = rands ; run ; ods trace on ; ods csv close ; data sasplus.examp ; set rands ; keep i t x xpdf ; run ; proc univariate plot normal ; var x xpdf ; proc plot data = rands ; plot xpdf * t ; * ============================================================================== ; # R program example ... histogram of a distribution. # # To run this program: # # 1. First run the sas program, sasplusexamp.sas # The output is on a file called sas.output.csv # # 2. Get into R by typing, at the Unix command prompt, # R # # 3. Run the program below. Note particularly the step: # sasoutput <- read.csv(file="sas.output.csv", head=TRUE, sep = ",") # # 4. The output is a postscript file, chi3.ps, which you can # examine using ghostview. # postscript("chi3.ps", horizontal = F, onefile = T) par(mfrow = c(2,1), oma = c(2, 2, 2, 2)) sasoutput <- read.csv(file="sas.output.csv", head=TRUE, sep = ",") x <- sasoutput$x y <- sasoutput$y hist(x, nclass = 20, xlim = c(0, 12), cex = .7, main = "SPH 5421 Histogram example: Chi-Square (DF = 3)") par(new = T, xaxs = "d") plot(t, y, axes = F, type = "l") # rm(x, t, xpdf, nobs, y) dev.off() * ============================================================================== ; PROBLEM SET 1 1. Use the seven SAS random number generators to generate 1000 numbers from each of the 7 distributions in the first SAS example. 2. Find and give examples of the use of the Splus random number generators for the same 7 distributions as in the SAS program example. 3. Use SAS/GRAPH or SPlus to graph histograms for each of the above. 4. Superimpose the pdfs for the random variables on each histogram. 5. Use the random number generator for the gamma distribution to generate random numbers from a chi-square distribution with 3 degrees of freedom. How can you test whether your method produces observations from the chi-square distribution with df = 3 ? ---------------------------------------------------------------------- It is possible to use a uniform random number generator to generate random numbers from the standard normal N(0, 1) distribution. The method depends on the 'Box-Muller Transformation'. Basically, if X1 and X2 have independent U[0, 1] distributions, then Y = (sqrt(-2 * log(X1))) * COS(2 * pi * X2) has an N(0, 1) distribution. Here 'log' is the natural logarithm, and pi is the usual pi = 3.141592.... The Box-Muller transformation was first described in the following short note: Box GEP, Muller ME. A note on the generation of random normal deviates. Annals of Math. Stat. 29: 610-611, 1958. Some languages, such as BASIC, have only a uniform random number generator. Below is a BASIC program which uses the Box-Muller transformation to generate a random N(0, 1) variate. ====================================================================== 10 CLS : KEY OFF : SCREEN 9 20 ' 30 ' This program implements the Box-Muller method of generating 40 ' normal random deviates. 50 ' 60 DIM Z(640) 70 RANDOMIZE : PRINT 80 TWOPI = 4 * ATN(1) 90 ' 100 PRINT "This program generates N(0, 1) random deviates, using the" 110 PRINT "Box-Muller approach, and graphs a histogram ... " 120 PRINT 130 INPUT; "Enter N : ", N : PRINT : PRINT 140 ' 150 CLS 160 ' 170 C = 1 180 FOR I = 1 TO N 190 ' 200 X1 = RND 210 X2 = RND 220 IF X1 = 0 THEN 200 230 W = -2 * LOG(X1) 240 Y = (SQR(W)) * COS(TWOPI * X2) 250 ' 260 ' Note: y has a standard normal distribution, N(0, 1). 270 ' 280 YSTAR = Y + 3.2 290 J = INT(YSTAR * 100) 300 IF YSTAR < 0 THEN J = 1 310 IF YSTAR > 6.4 THEN J = 640 320 Z(J) = Z(J) + 1 330 PSET(J, 349 - Z(J)), 15 340 IF Z(J) > 348 THEN 410 350 ' 360 NEXT I 370 ' 380 C = C + 1 390 IF C > 15 THEN C = 1 400 GOTO 180 410 GOTO 410 ====================================================================== PROJECT ASSIGNMENT 2 1. Write a SAS version of the BASIC program. Don't worry about doing the graphics part (lines 280-340). 2. Generate 1000 numbers using your program and create a histogram. Use PROC UNIVARIATE to check that the distribution is approximately normal. 3. Read the Box-Muller paper and explain in your own words why the transformation works. ---------------------------------------------------------------------- Suppose you had access to only a uniform pseudorandom number generator. How might you generate random numbers from chi-square, t, and f distributions? If you find a way to do this, how might you check that it is working correctly? ---------------------------------------------------------------------- PROJECT ASSIGNMENT 3 1. Find a way in SAS to generate random numbers for chi-square, t-, and F-distributions. 2. Use a Pearson chi-square approach to check that your proposed methods actually work. 3. The 'birthday problem' is a reference to the fact that if about N = 30 or more people are chosen at random, it is likely that at least two will have the same birthday (that is, same day and month). Simulate the birthday problem with N = 30. Carry out 1000 simulations, and count in how many of them at least 2 of the N people have the same birthday. ~john-c/5421/notes.001 Revised Sep. 2, 2013.