PSEUDO-RANDOM NUMBER GENERATORS SPH 7460 notes.003 Random number generators like RANNOR in SAS do not generate random numbers. In fact they generate completely 100% predictable sequences of numbers. It is more accurate to call them pseudo-random number generators (PRNGs). Here is how most PRNGs work. Choose a large integer, M. Choose a multiplier, A. Choose an integer increment, C. Choose an initial integer, X0. Generate the next integer, X1, from the formula X1 = (A * X0 + C) mod M. Similarly X2 is computed as X2 = (A * X1 + C) mod M. In general, Xn = (A * Xn-1 + C) mod M. The key thing is to make good choices for M, A, and C. This is explained very well in a classic book on numerical methods, 'Seminumerical Methods - The Art of Computer Programming, Vol 2' by Donald Knuth (Addison-Wesley, Reading, Mass., 1969). One usually chooses M according to the 'word size' of the computer; for a computer based on 32-bit architecture, M = 231 - 1. Even the very best PRNG will eventually start to repeat; that is, the sequence will cycle. Ideally the length of the cycle is maximal: namely, M. That is why M is chosen as large as possible. Given a sequence of PRNs X1, X2, X3, ..., Xn, you can get pseudo-random numbers from the uniform U[0, 1] distribution by computing Rn = Xn / M. Note that Rn is between 0 and 1. The special starting number X0 cannot be regarded as random. In SAS, X0 is the seed, and in general you choose the seed. It should be chosen to be a large number but less than M . Once you pick the seed, a sequence of random numbers is launched. In order that different simulation studies have independent outcomes, separate seeds should be used. You can call SAS PRNGs without really specifying a seed. If you say seed = -1 (or any other negative number) then SAS chooses the seed to equal the time of day (in seconds, I think). An advantage of doing this is that it almost guarantees that different runs generate independent sequences. A disadvantage is that you cannot guarantee getting the same sequence again if you need to check on some computation. In general it is better to know what the seed is. Below is an example of a SAS program which implements a pseudo- random number generator with M = 47, C = 11, and X0 = 32. There are two printouts: one in which A = 5 and the other in which A = 14. Note that the maximal cycle length here is M = 47. The first printout has the maximal cycle length, but the second one does not. ----------------------------------------------------------------------- options linesize = 80 pagesize = 65 ; footnote "~john-c/5421/prng.sas &sysdate &systime" ; * A pseudo-random number generator example ... ; %let M = 47 ; %let A = 5 ; %let C = 11 ; %let X0 = 32 ; data prng ; i = 0 ; M = &M ; A = &A ; C = &C ; X0 = &X0 ; X1 = X0 ; output ; do i = 1 to M ; X1 = mod(A * X0 + C, M) ; R = X1 / M ; output ; X0 = X1 ; end ; run ; proc print data = prng ; var M A C i X0 X1 R ; title1 'Output from a pseudo-random number generator ... ' ; title2 '-------------------------------------------------' ; title3 "M = &M " ; title4 "A = &A " ; title5 "C = &C " ; title6 "X0 = &X0 " ; title7 " " ; title8 "X1 = (A*X0 + C) mod M " ; title9 "R = X1 / M " ; title10 '-------------------------------------------------' ; title11 ' ' ; run ; endsas ; ----------------------------------------------------------------------- Output from a pseudo-random number generator ... 1 ------------------------------------------------- M = 47 A = 5 C = 11 X0 = 32 X1 = (A*X0 + C) mod M R = X1 / M ------------------------------------------------- 18:40 Monday, September 17, 2007 Obs M A C i X0 X1 R 1 47 5 11 0 32 32 . 2 47 5 11 1 32 30 0.63830 3 47 5 11 2 30 20 0.42553 4 47 5 11 3 20 17 0.36170 5 47 5 11 4 17 2 0.04255 6 47 5 11 5 2 21 0.44681 7 47 5 11 6 21 22 0.46809 8 47 5 11 7 22 27 0.57447 9 47 5 11 8 27 5 0.10638 10 47 5 11 9 5 36 0.76596 11 47 5 11 10 36 3 0.06383 12 47 5 11 11 3 26 0.55319 13 47 5 11 12 26 0 0.00000 14 47 5 11 13 0 11 0.23404 15 47 5 11 14 11 19 0.40426 16 47 5 11 15 19 12 0.25532 17 47 5 11 16 12 24 0.51064 18 47 5 11 17 24 37 0.78723 19 47 5 11 18 37 8 0.17021 20 47 5 11 19 8 4 0.08511 21 47 5 11 20 4 31 0.65957 22 47 5 11 21 31 25 0.53191 23 47 5 11 22 25 42 0.89362 24 47 5 11 23 42 33 0.70213 25 47 5 11 24 33 35 0.74468 26 47 5 11 25 35 45 0.95745 27 47 5 11 26 45 1 0.02128 28 47 5 11 27 1 16 0.34043 29 47 5 11 28 16 44 0.93617 30 47 5 11 29 44 43 0.91489 31 47 5 11 30 43 38 0.80851 32 47 5 11 31 38 13 0.27660 33 47 5 11 32 13 29 0.61702 34 47 5 11 33 29 15 0.31915 35 47 5 11 34 15 39 0.82979 36 47 5 11 35 39 18 0.38298 37 47 5 11 36 18 7 0.14894 38 47 5 11 37 7 46 0.97872 39 47 5 11 38 46 6 0.12766 40 47 5 11 39 6 41 0.87234 41 47 5 11 40 41 28 0.59574 42 47 5 11 41 28 10 0.21277 43 47 5 11 42 10 14 0.29787 44 47 5 11 43 14 34 0.72340 45 47 5 11 44 34 40 0.85106 46 47 5 11 45 40 23 0.48936 47 47 5 11 46 23 32 0.68085 48 47 5 11 47 32 30 0.63830 ~john-c/5421/prng.sas 17SEP07 18:40 ------------------------------------------------------------------------ Output from a pseudo-random number generator ... 1 ------------------------------------------------- M = 47 A = 14 C = 11 X0 = 32 X1 = (A*X0 + C) mod M R = X1 / M ------------------------------------------------- 18:39 Monday, September 17, 2007 Obs M A C i X0 X1 R 1 47 14 11 0 32 32 . 2 47 14 11 1 32 36 0.76596 3 47 14 11 2 36 45 0.95745 4 47 14 11 3 45 30 0.63830 5 47 14 11 4 30 8 0.17021 6 47 14 11 5 8 29 0.61702 7 47 14 11 6 29 41 0.87234 8 47 14 11 7 41 21 0.44681 9 47 14 11 8 21 23 0.48936 10 47 14 11 9 23 4 0.08511 11 47 14 11 10 4 20 0.42553 12 47 14 11 11 20 9 0.19149 13 47 14 11 12 9 43 0.91489 14 47 14 11 13 43 2 0.04255 15 47 14 11 14 2 39 0.82979 16 47 14 11 15 39 40 0.85106 17 47 14 11 16 40 7 0.14894 18 47 14 11 17 7 15 0.31915 19 47 14 11 18 15 33 0.70213 20 47 14 11 19 33 3 0.06383 21 47 14 11 20 3 6 0.12766 22 47 14 11 21 6 1 0.02128 23 47 14 11 22 1 25 0.53191 24 47 14 11 23 25 32 0.68085 25 47 14 11 24 32 36 0.76596 26 47 14 11 25 36 45 0.95745 27 47 14 11 26 45 30 0.63830 28 47 14 11 27 30 8 0.17021 29 47 14 11 28 8 29 0.61702 30 47 14 11 29 29 41 0.87234 31 47 14 11 30 41 21 0.44681 32 47 14 11 31 21 23 0.48936 33 47 14 11 32 23 4 0.08511 34 47 14 11 33 4 20 0.42553 35 47 14 11 34 20 9 0.19149 36 47 14 11 35 9 43 0.91489 37 47 14 11 36 43 2 0.04255 38 47 14 11 37 2 39 0.82979 39 47 14 11 38 39 40 0.85106 40 47 14 11 39 40 7 0.14894 41 47 14 11 40 7 15 0.31915 42 47 14 11 41 15 33 0.70213 43 47 14 11 42 33 3 0.06383 44 47 14 11 43 3 6 0.12766 45 47 14 11 44 6 1 0.02128 46 47 14 11 45 1 25 0.53191 47 47 14 11 46 25 32 0.68085 48 47 14 11 47 32 36 0.76596 ~john-c/5421/prng.sas 17SEP07 18:39 ------------------------------------------------------------------------ BERNOULLI RANDOM VARIABLES How can you use a random number generator to generate the simplest of all random variables, a Bernoulli random variable? By definition, a Bernoulli random variable X with parameter p takes on only two values: X = 1 with probability p and X = 0 with probability 1 - p. To generate independent observations for such X, start with a uniform random number generator (RANUNI in SAS or runif in Splus), and use it as follows (SAS code): X = 0 ; r = ranuni(seed) ; if r le p then X = 1 ; GENERATING LOGISTIC-MODEL OBSERVATIONS Assume risk of disease is modelled as a logistic function of a risk factor such as diastolic blood pressure. For men aged 40-59 in the Framingham study, the risk of heart disease within a six-year period, , given an initial level of diastolic blood pressure (DBP), was estimated to be: risk = 1 / (1 + exp(6.082 - .0243 * DBP)). [Reference: A. Agresti, Categorical Data Analysis, John Wiley & Sons, 1990; pages 93-95] If you want to use this to simulate the Bernoulli event of having heart disease within a 6-year period for a man in the age range of 40-59, with a DBP of 100 mm Hg, do the following: p = 1 / (1 + exp(6.082 - .0243 * 100)) ; r = ranuni(seed) ; heartdis = 0 ; if r lt p then heartdis = 1 ; ================================================================================== Problem 1: Suppose you are given the following margins in a 2 x 2 table, ----------------------- | | | | | | n1 | | | ----------------------- | | | | | | n2 | | | ----------------------- m1 m2 Use SAS to write a program that fills in the cells of the table randomly, conditional on the given margins. Apply your program to generate ten random tables with n1 = 30, n2 = 40, m1 = 35, m2 = 35. ~john-c/5421/notes.003 Last update: Sep 17, 2007