PSEUDO-RANDOM NUMBER GENERATORS SPH 7460 notes.003
Random number generators like RANNOR in SAS do not generate random
numbers. In fact they generate completely 100% predictable sequences
of numbers. It is more accurate to call them pseudo-random number
generators (PRNGs).
Here is how most PRNGs work. Choose a large integer, M. Choose a
multiplier, A. Choose an integer increment, C. Choose an initial integer, X0.
Generate the next integer, X1, from the formula
X1 = (A * X0 + C) mod M.
Similarly X2 is computed as
X2 = (A * X1 + C) mod M.
In general,
Xn = (A * Xn-1 + C) mod M.
The key thing is to make good choices for M, A, and C. This is
explained very well in a classic book on numerical methods, 'Seminumerical
Methods - The Art of Computer Programming, Vol 2' by Donald Knuth
(Addison-Wesley, Reading, Mass., 1969). One usually chooses M according
to the 'word size' of the computer; for a computer based on 32-bit
architecture, M = 231 - 1. Even the very best PRNG will eventually
start to repeat; that is, the sequence will cycle. Ideally the length of
the cycle is maximal: namely, M. That is why M is chosen as large as
possible.
Given a sequence of PRNs X1, X2, X3, ..., Xn, you can get
pseudo-random numbers from the uniform U[0, 1] distribution by computing
Rn = Xn / M.
Note that Rn is between 0 and 1.
The special starting number X0 cannot be regarded as random. In SAS,
X0 is the seed, and in general you choose the seed. It should be chosen
to be a large number but less than M . Once you pick the seed, a sequence
of random numbers is launched. In order that different simulation studies
have independent outcomes, separate seeds should be used. You can call
SAS PRNGs without really specifying a seed. If you say seed = -1 (or any
other negative number) then SAS chooses the seed to equal the time of day
(in seconds, I think). An advantage of doing this is that it almost
guarantees that different runs generate independent sequences. A
disadvantage is that you cannot guarantee getting the same sequence again
if you need to check on some computation. In general it is better to
know what the seed is.
Below is an example of a SAS program which implements a pseudo-
random number generator with M = 47, C = 11, and X0 = 32. There are
two printouts: one in which A = 5 and the other in which A = 14. Note
that the maximal cycle length here is M = 47. The first printout has
the maximal cycle length, but the second one does not.
-----------------------------------------------------------------------
options linesize = 80 pagesize = 65 ;
footnote "~john-c/5421/prng.sas &sysdate &systime" ;
* A pseudo-random number generator example ... ;
%let M = 47 ;
%let A = 5 ;
%let C = 11 ;
%let X0 = 32 ;
data prng ;
i = 0 ;
M = &M ; A = &A ; C = &C ; X0 = &X0 ;
X1 = X0 ;
output ;
do i = 1 to M ;
X1 = mod(A * X0 + C, M) ;
R = X1 / M ;
output ;
X0 = X1 ;
end ;
run ;
proc print data = prng ;
var M A C i X0 X1 R ;
title1 'Output from a pseudo-random number generator ... ' ;
title2 '-------------------------------------------------' ;
title3 "M = &M " ;
title4 "A = &A " ;
title5 "C = &C " ;
title6 "X0 = &X0 " ;
title7 " " ;
title8 "X1 = (A*X0 + C) mod M " ;
title9 "R = X1 / M " ;
title10 '-------------------------------------------------' ;
title11 ' ' ;
run ;
endsas ;
-----------------------------------------------------------------------
Output from a pseudo-random number generator ... 1
-------------------------------------------------
M = 47
A = 5
C = 11
X0 = 32
X1 = (A*X0 + C) mod M
R = X1 / M
-------------------------------------------------
18:40 Monday, September 17, 2007
Obs M A C i X0 X1 R
1 47 5 11 0 32 32 .
2 47 5 11 1 32 30 0.63830
3 47 5 11 2 30 20 0.42553
4 47 5 11 3 20 17 0.36170
5 47 5 11 4 17 2 0.04255
6 47 5 11 5 2 21 0.44681
7 47 5 11 6 21 22 0.46809
8 47 5 11 7 22 27 0.57447
9 47 5 11 8 27 5 0.10638
10 47 5 11 9 5 36 0.76596
11 47 5 11 10 36 3 0.06383
12 47 5 11 11 3 26 0.55319
13 47 5 11 12 26 0 0.00000
14 47 5 11 13 0 11 0.23404
15 47 5 11 14 11 19 0.40426
16 47 5 11 15 19 12 0.25532
17 47 5 11 16 12 24 0.51064
18 47 5 11 17 24 37 0.78723
19 47 5 11 18 37 8 0.17021
20 47 5 11 19 8 4 0.08511
21 47 5 11 20 4 31 0.65957
22 47 5 11 21 31 25 0.53191
23 47 5 11 22 25 42 0.89362
24 47 5 11 23 42 33 0.70213
25 47 5 11 24 33 35 0.74468
26 47 5 11 25 35 45 0.95745
27 47 5 11 26 45 1 0.02128
28 47 5 11 27 1 16 0.34043
29 47 5 11 28 16 44 0.93617
30 47 5 11 29 44 43 0.91489
31 47 5 11 30 43 38 0.80851
32 47 5 11 31 38 13 0.27660
33 47 5 11 32 13 29 0.61702
34 47 5 11 33 29 15 0.31915
35 47 5 11 34 15 39 0.82979
36 47 5 11 35 39 18 0.38298
37 47 5 11 36 18 7 0.14894
38 47 5 11 37 7 46 0.97872
39 47 5 11 38 46 6 0.12766
40 47 5 11 39 6 41 0.87234
41 47 5 11 40 41 28 0.59574
42 47 5 11 41 28 10 0.21277
43 47 5 11 42 10 14 0.29787
44 47 5 11 43 14 34 0.72340
45 47 5 11 44 34 40 0.85106
46 47 5 11 45 40 23 0.48936
47 47 5 11 46 23 32 0.68085
48 47 5 11 47 32 30 0.63830
~john-c/5421/prng.sas 17SEP07 18:40
------------------------------------------------------------------------
Output from a pseudo-random number generator ... 1
-------------------------------------------------
M = 47
A = 14
C = 11
X0 = 32
X1 = (A*X0 + C) mod M
R = X1 / M
-------------------------------------------------
18:39 Monday, September 17, 2007
Obs M A C i X0 X1 R
1 47 14 11 0 32 32 .
2 47 14 11 1 32 36 0.76596
3 47 14 11 2 36 45 0.95745
4 47 14 11 3 45 30 0.63830
5 47 14 11 4 30 8 0.17021
6 47 14 11 5 8 29 0.61702
7 47 14 11 6 29 41 0.87234
8 47 14 11 7 41 21 0.44681
9 47 14 11 8 21 23 0.48936
10 47 14 11 9 23 4 0.08511
11 47 14 11 10 4 20 0.42553
12 47 14 11 11 20 9 0.19149
13 47 14 11 12 9 43 0.91489
14 47 14 11 13 43 2 0.04255
15 47 14 11 14 2 39 0.82979
16 47 14 11 15 39 40 0.85106
17 47 14 11 16 40 7 0.14894
18 47 14 11 17 7 15 0.31915
19 47 14 11 18 15 33 0.70213
20 47 14 11 19 33 3 0.06383
21 47 14 11 20 3 6 0.12766
22 47 14 11 21 6 1 0.02128
23 47 14 11 22 1 25 0.53191
24 47 14 11 23 25 32 0.68085
25 47 14 11 24 32 36 0.76596
26 47 14 11 25 36 45 0.95745
27 47 14 11 26 45 30 0.63830
28 47 14 11 27 30 8 0.17021
29 47 14 11 28 8 29 0.61702
30 47 14 11 29 29 41 0.87234
31 47 14 11 30 41 21 0.44681
32 47 14 11 31 21 23 0.48936
33 47 14 11 32 23 4 0.08511
34 47 14 11 33 4 20 0.42553
35 47 14 11 34 20 9 0.19149
36 47 14 11 35 9 43 0.91489
37 47 14 11 36 43 2 0.04255
38 47 14 11 37 2 39 0.82979
39 47 14 11 38 39 40 0.85106
40 47 14 11 39 40 7 0.14894
41 47 14 11 40 7 15 0.31915
42 47 14 11 41 15 33 0.70213
43 47 14 11 42 33 3 0.06383
44 47 14 11 43 3 6 0.12766
45 47 14 11 44 6 1 0.02128
46 47 14 11 45 1 25 0.53191
47 47 14 11 46 25 32 0.68085
48 47 14 11 47 32 36 0.76596
~john-c/5421/prng.sas 17SEP07 18:39
------------------------------------------------------------------------
BERNOULLI RANDOM VARIABLES
How can you use a random number generator to generate the simplest of
all random variables, a Bernoulli random variable? By definition, a
Bernoulli random variable X with parameter p takes on only two values:
X = 1 with probability p and X = 0 with probability 1 - p.
To generate independent observations for such X, start with a uniform
random number generator (RANUNI in SAS or runif in Splus), and use it
as follows (SAS code):
X = 0 ;
r = ranuni(seed) ;
if r le p then X = 1 ;
GENERATING LOGISTIC-MODEL OBSERVATIONS
Assume risk of disease is modelled as a logistic function of a risk
factor such as diastolic blood pressure. For men aged 40-59 in the
Framingham study, the risk of heart disease within a six-year period, ,
given an initial level of diastolic blood pressure (DBP), was estimated
to be:
risk = 1 / (1 + exp(6.082 - .0243 * DBP)).
[Reference: A. Agresti, Categorical Data Analysis, John Wiley & Sons,
1990; pages 93-95]
If you want to use this to simulate the Bernoulli event of having heart
disease within a 6-year period for a man in the age range of 40-59, with
a DBP of 100 mm Hg, do the following:
p = 1 / (1 + exp(6.082 - .0243 * 100)) ;
r = ranuni(seed) ;
heartdis = 0 ;
if r lt p then heartdis = 1 ;
==================================================================================
Problem 1:
Suppose you are given the following margins in a 2 x 2 table,
-----------------------
| | |
| | | n1
| | |
-----------------------
| | |
| | | n2
| | |
-----------------------
m1 m2
Use SAS to write a program that fills in the cells of the table
randomly, conditional on the given margins. Apply your program to generate
ten random tables with n1 = 30, n2 = 40, m1 = 35, m2 = 35.
~john-c/5421/notes.003 Last update: Sep 17, 2007