SPH 5421 notes.006: Some program examples ...
1. Illustration of the Box-Muller algorithm for generating normal
random variables from uniform random variables: See Box & Muller,
Annals of Statistics, 1958. An Splus program is used to show the
distribution of a sample generated by the Box-Muller program.
2. Example of using a sort-and-carry routine. The sorting algorithm employed in this
program is called a 'bubble sort'. It is easy to write but it is not
an efficient sort. For arrays of size 10 or less it performs
adequately. More efficient sorting algorithms are described in the
book 'Numerical Recipes', Flannery et al., and Knuth's book Sorting and
Searching.
3. Example of a program to simulate 2 x 2 tables and compute chi-square
statistics.
=======================================================================;
* SAS program boxmuller.sas which writes 1000 observations to the ;
* SAS system file 'boxmull'. ;
* ;
libname sasplus '' ;
footnote "~john-c/5421/boxmuller.sas &sysdate &systime" ;
%let n = 10000 ;
data rands ;
twopi = 2 * 3.141592654 ;
seed = today() ;
low = -4 ; high = 4 ;
h = (high - low) / &n ;
do i = 1 to &n ;
x = i * h + low ;
r = ranuni(seed) ;
s = ranuni(seed) ;
z = sqrt(-2 * log(r)) * cos(twopi * s) ;
pdfz = (1 / sqrt(twopi)) * exp(- x * x / 2) ;
output ;
end ;
run ;
data sasplus.boxmull ;
set rands ;
run ;
proc univariate plot normal ;
var z ;
title "SAS program to illustrate the Box-Muller algorithm ... n = &n";
run ;
=======================================================================;
# Splus program hist.s, which (1) reads the SAS file boxmull, and (2) plots
# two histograms and an overlaid probability density function.
# The postscript graphics output file is hist.ps, which can be viewed and
# printed using ghostview: ghostview hist.ps.
#
postscript("hist.ps", horizontal = F, onefile = T)
par(mfrow = c(2,1), oma = c(2, 2, 2, 2))
rm(sasdat)
sasdat <- sas.get(library = "/home/walleye/john-c/5421", mem = "boxmull")
r <- sasdat[, "r"]
s <- sasdat[, "s"]
z <- sasdat[, "z"]
x <- sasdat[, "x"]
nobs <- max(sasdat[, "i"])
pdfz <- sasdat[, "pdfz"]
y <- nobs * pdfz
hist(r, nclass = 20, xlim = c(0, 1),
cex = .7, main = "SPH 5421 Histogram example: r (U[0, 1])")
hist(z, nclass = 20, xlim = c(-5, 5),
cex = .7, main = "SPH 5421 Histogram example: z (N(0, 1))")
par(new = T, xaxs = "d")
plot(x, y, axes = F, type = "l")
# rm(r, s, x, z, y)
dev.off()
=======================================================================;
PRINTOUT FROM boxmuller.sas:
SAS program to illustrate the Box-Muller algorithm ... n = 10000 1
18:28 Monday, September 22, 2003
Univariate Procedure
Variable=Z
Moments Quantiles(Def=5) Extremes
N 10000 Sum Wgts 10000 100% Max 3.924911 99% 2.312681 Lowest Obs Highest Obs
Mean 0.004308 Sum 43.08068 75% Q3 0.686031 95% 1.636393 -3.7367( 9062) 3.409322( 1858)
Std Dev 1.005764 Variance 1.01156 50% Med 0.010654 90% 1.27482 -3.6254( 1192) 3.474899( 6128)
Skewness -0.00433 Kurtosis -0.05557 25% Q1 -0.67618 10% -1.31243 -3.50457( 3876) 3.543545( 6560)
USS 10114.78 CSS 10114.59 0% Min -3.7367 5% -1.66405 -3.49485( 5332) 3.610974( 280)
CV 23346.05 Std Mean 0.010058 1% -2.29995 -3.1509( 8977) 3.924911( 5299)
T:Mean=0 0.428338 Pr>|T| 0.6684 Range 7.661615
Num ^= 0 10000 Num > 0 5043 Q3-Q1 1.362207
M(Sign) 43 Pr>=|M| 0.3953 Mode -3.7367
Sgn Rank 182589 Pr>=|S| 0.5271
D:Normal 0.008707 Pr>D 0.0660
Histogram # Boxplot Normal Probability Plot
3.75+* 3 0 3.75+ *
.* 13 0 | *
.** 46 0 | *
2.25+***** 166 | 2.25+ ******
.************ 436 | | ******
.************************* 944 | | ******
0.75+***************************************** 1567 +-----+ 0.75+ ******
.************************************************ 1868 *--+--* | ******
.************************************************ 1872 | | | ******
-0.75+************************************** 1466 +-----+ -0.75+ ******
.************************ 922 | | ******
.************ 462 | | ******
-2.25+***** 182 | -2.25+******
.** 44 0 |*
.* 6 0 |*
-3.75+* 3 0 -3.75+*
----+----+----+----+----+----+----+----+----+--- +----+----+----+----+----+----+----+----+----+----+
* may represent up to 39 counts -2 -1 0 +1 +2
~john-c/5421/boxmuller.sas 22SEP03 18:28
=======================================================================;
SORTING SMALL ARRAYS
SAS has the feature that, within an observation in a data set, it is
possible to define arrays. An array is a collection of
variables with some sort of index (usually an integer). Here is
an example:
data examp ;
array DBP(10) ;
In this example, the array is called 'DBP' and could be used to
denote diastolic blood pressure of an individual at 10 different
time points.
For example, DBP(4) would represent the person's diastolic BP
at time point 4.
It is frequently desirable to sort arrays by the values of the
variables at the various 'time points'. However SAS does not have
a function for sorting arrays. SAS does provide a way of sorting
entire files (PROC SORT), but there is no built-in function to sort
arrays.
have a function for sorting arrays.
Here is a simple program that can be used to sort arrays. This is
called a 'bubble sort'.
Assume that DBP(1), DBP(2), ..., DBP(10) are diastolic blood
pressures. The bubble sort algorithm is as follows for sorting
these values in ascending order:
====================================================================
do i = 2 to 10 ;
do j = 1 to i - 1 ;
temp = DBP(i) ;
if DBP(j) > DBP(i) then do ;
DBP(i) = DBP(j) ;
DBP(j) = temp ;
end ;
end ;
end ;
Here is how this program works with the following data:
data dbps ;
array dbp(10) ;
DBP(1) = 102; DBP(2) = 86; DBP(3) = 98; DBP(4) = 76; DBP(5) = 66 ;
DBP(6) = 100; DBP(7) = 88; DBP(98 = 90; DBP(9) = 94; DBP(10) = 64 ;
n = 10 ;
do i = 2 to n ;
do j = 1 to i - 1 ;
temp = DBP(i) ;
if DBP(j) > DBP(i) then do ;
DBP(i) = DBP(j) ;
DBP(j) = temp ;
end ;
output ;
end ;
end ;
run ;
proc print data = dbps ;
var DBP1 DBP2 DBP3 DBP4 DBP5 DBP6 DBP7 DBP8 DBP9 DBP10 ;
title 'Step-by-step execution of a bubble sort:' ;
====================================================================
Here is the printout from this program:
Step-by-step execution of a bubble sort: 1
17:13 Sunday, September 12, 2010
Obs dbp1 dbp2 dbp3 dbp4 dbp5 dbp6 dbp7 dbp8 dbp9 dbp10
1 86 102 98 76 66 100 88 90 94 64
2 86 102 98 76 66 100 88 90 94 64
3 86 98 102 76 66 100 88 90 94 64
4 76 98 102 86 66 100 88 90 94 64
5 76 86 102 98 66 100 88 90 94 64
6 76 86 98 102 66 100 88 90 94 64
7 66 86 98 102 76 100 88 90 94 64
8 66 76 98 102 86 100 88 90 94 64
9 66 76 86 102 98 100 88 90 94 64
10 66 76 86 98 102 100 88 90 94 64
11 66 76 86 98 102 100 88 90 94 64
12 66 76 86 98 102 100 88 90 94 64
13 66 76 86 98 102 100 88 90 94 64
14 66 76 86 98 102 100 88 90 94 64
15 66 76 86 98 100 102 88 90 94 64
16 66 76 86 98 100 102 88 90 94 64
17 66 76 86 98 100 102 88 90 94 64
18 66 76 86 98 100 102 88 90 94 64
19 66 76 86 88 100 102 98 90 94 64
20 66 76 86 88 98 102 100 90 94 64
21 66 76 86 88 98 100 102 90 94 64
22 66 76 86 88 98 100 102 90 94 64
23 66 76 86 88 98 100 102 90 94 64
24 66 76 86 88 98 100 102 90 94 64
25 66 76 86 88 98 100 102 90 94 64
26 66 76 86 88 90 100 102 98 94 64
27 66 76 86 88 90 98 102 100 94 64
28 66 76 86 88 90 98 100 102 94 64
29 66 76 86 88 90 98 100 102 94 64
30 66 76 86 88 90 98 100 102 94 64
31 66 76 86 88 90 98 100 102 94 64
32 66 76 86 88 90 98 100 102 94 64
33 66 76 86 88 90 98 100 102 94 64
34 66 76 86 88 90 94 100 102 98 64
35 66 76 86 88 90 94 98 102 100 64
36 66 76 86 88 90 94 98 100 102 64
37 64 76 86 88 90 94 98 100 102 66
38 64 66 86 88 90 94 98 100 102 76
39 64 66 76 88 90 94 98 100 102 86
40 64 66 76 86 90 94 98 100 102 88
41 64 66 76 86 88 94 98 100 102 90
42 64 66 76 86 88 90 98 100 102 94
43 64 66 76 86 88 90 94 100 102 98
44 64 66 76 86 88 90 94 98 102 100
45 64 66 76 86 88 90 94 98 100 102
~john-c/5421/bubble.sort.sas 12SEP10 17:13
=============================================================================
As you can see, 45 steps were required for the program to sort this array of 10
elements. The bubble sort is not very efficient, and it is not recommended for
large arrays. For small arrays (in general 20 elements or less), its slow speed
is usually not a problem. For large arrays, there are much faster sorting algorithms:
quicksort, heapsort, and various merge sorts are asymptotically quite efficient.
However programs to execute the faster sorts are considerably more complicated than
that for the bubble sort.
========================================================================================
It is often desirable to do what is called 'sort and carry'. This means that you have
two arrays with the same number of elements. You want to sort one of them, and permute
the other one in parallel with the sorting of the first one. Here is a modification of
the bubble sort program which does this:
data dbps ;
array dbp(10) ;
array idx(10) ;
DBP(1) = 102; DBP(2) = 86; DBP(3) = 98; DBP(4) = 76; DBP(5) = 66 ;
DBP(6) = 100; DBP(7) = 88; DBP(98 = 90; DBP(9) = 94; DBP(10) = 64 ;
idx(1) = 1 ; idx(2) = 2 ; idx(3) = 3 ; idx(4) = 4 ; idx(5) = 5 ;
idx(6) = 6 ; idx(7) = 7 ; idx(8) = 8 ; idx(9) = 9 ; idx(10) = 10 ;
n = 10 ;
do i = 2 to n ;
do j = 1 to i - 1 ;
tempdbp = DBP(i) ;
tempidx = idx(i) ;
if DBP(j) > DBP(i) then do ;
DBP(i) = DBP(j) ;
DBP(j) = tempdbp ;
idx(i) = idx(j) ;
idx(j) = tempidx ;
end ;
output ;
end ;
end ;
run ;
=========================================================================
The above program will have the effect of permuting the array idx in parallel
with the sorting of the array dbp.
=======================================================================;
* SPH 5421 SAS program ~john-c/5421/sim.sas ;
* Example of simulating 2 x 2 tables and computing statistics ... ;
* ;
* SPH 5421 * ;
* Example of simulating 2 x 2 tables and computing statistics ... * ;
* * ;
options linesize = 80 ;
data sim ;
seed = 20000122 ;
pa = .5 ; pb = .7 ;
na = 100 ; nb = 100 ;
n = 3 ;
do i = 1 to n ;
a = ranbin(seed, na, pa) ;
c = na - a ;
b = ranbin(seed, nb, pb) ;
d = nb - b ;
m0 = a + c ;
m1 = b + d ;
n0 = a + b ;
n1 = c + d ;
total = a + b + c + d ;
chisquar = total * (a * d - b * c)**2 / (m0 * m1 * n0 * n1) ;
prob = 1 - probchi(chisquar, 1) ;
fisherleft = probhypr(a + b + c + d, a + b, a + c, a) ;
tableprob = probhypr(a + b + c + d, a + b, a + c, a)
- probhypr(a + b + c + d, a + b, a + c, a - 1) ;
output ;
end ;
run ;
proc print ;
var a b c d chisquar prob fisherleft tableprob ;
title 'Simulated 2 x 2 tables: non-PROC computation of probabilities' ;
data tabs ;
set sim ;
x = 0 ; y = 0 ; m = a ; output ;
x = 0 ; y = 1 ; m = b ; output ;
x = 1 ; y = 0 ; m = c ; output ;
x = 1 ; y = 1 ; m = d ; output ;
run ;
proc freq data = tabs ;
by i ;
weight m ;
tables x * y / chisqr ;
title1 'PROC FREQ computation of chi-squares ...' ;
*----------------------------------------------------------------------;
*
* Below is the output from sim.sas
*
=======================================================================;
Simulated 2 x 2 tables: non-PROC computation of probabilities 1
18:10 Monday, October 1, 2007
Obs a b c d chisquar prob fisherleft tableprob
1 59 68 41 32 1.74738 0.18621 0.11995 0.049125
2 56 69 44 31 3.60533 0.05759 0.03966 0.019396
3 62 73 38 27 2.75783 0.09678 0.06540 0.030589
PROC FREQ computation of chi-squares ... 2
18:10 Monday, October 1, 2007
------------------------------------- i=1 --------------------------------------
The FREQ Procedure
Table of x by y
x y
Frequency|
Percent |
Row Pct |
Col Pct | 0| 1| Total
---------+--------+--------+
0 | 59 | 68 | 127
| 29.50 | 34.00 | 63.50
| 46.46 | 53.54 |
| 59.00 | 68.00 |
---------+--------+--------+
1 | 41 | 32 | 73
| 20.50 | 16.00 | 36.50
| 56.16 | 43.84 |
| 41.00 | 32.00 |
---------+--------+--------+
Total 100 100 200
50.00 50.00 100.00
Statistics for Table of x by y
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 1.7474 0.1862
Likelihood Ratio Chi-Square 1 1.7507 0.1858
Continuity Adj. Chi-Square 1 1.3806 0.2400
Mantel-Haenszel Chi-Square 1 1.7386 0.1873
Phi Coefficient -0.0935
Contingency Coefficient 0.0931
Cramer's V -0.0935
Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F) 59
Left-sided Pr <= F 0.1200
Right-sided Pr >= F 0.9292
Table Probability (P) 0.0491
Two-sided Pr <= P 0.2399
Sample Size = 200
PROC FREQ computation of chi-squares ... 3
18:10 Monday, October 1, 2007
------------------------------------- i=2 --------------------------------------
The FREQ Procedure
Table of x by y
x y
Frequency|
Percent |
Row Pct |
Col Pct | 0| 1| Total
---------+--------+--------+
0 | 56 | 69 | 125
| 28.00 | 34.50 | 62.50
| 44.80 | 55.20 |
| 56.00 | 69.00 |
---------+--------+--------+
1 | 44 | 31 | 75
| 22.00 | 15.50 | 37.50
| 58.67 | 41.33 |
| 44.00 | 31.00 |
---------+--------+--------+
Total 100 100 200
50.00 50.00 100.00
Statistics for Table of x by y
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 3.6053 0.0576
Likelihood Ratio Chi-Square 1 3.6192 0.0571
Continuity Adj. Chi-Square 1 3.0720 0.0797
Mantel-Haenszel Chi-Square 1 3.5873 0.0582
Phi Coefficient -0.1343
Contingency Coefficient 0.1331
Cramer's V -0.1343
Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F) 56
Left-sided Pr <= F 0.0397
Right-sided Pr >= F 0.9797
Table Probability (P) 0.0194
Two-sided Pr <= P 0.0793
Sample Size = 200
PROC FREQ computation of chi-squares ... 4
18:10 Monday, October 1, 2007
------------------------------------- i=3 --------------------------------------
The FREQ Procedure
Table of x by y
x y
Frequency|
Percent |
Row Pct |
Col Pct | 0| 1| Total
---------+--------+--------+
0 | 62 | 73 | 135
| 31.00 | 36.50 | 67.50
| 45.93 | 54.07 |
| 62.00 | 73.00 |
---------+--------+--------+
1 | 38 | 27 | 65
| 19.00 | 13.50 | 32.50
| 58.46 | 41.54 |
| 38.00 | 27.00 |
---------+--------+--------+
Total 100 100 200
50.00 50.00 100.00
Statistics for Table of x by y
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 2.7578 0.0968
Likelihood Ratio Chi-Square 1 2.7678 0.0962
Continuity Adj. Chi-Square 1 2.2792 0.1311
Mantel-Haenszel Chi-Square 1 2.7440 0.0976
Phi Coefficient -0.1174
Contingency Coefficient 0.1166
Cramer's V -0.1174
Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F) 62
Left-sided Pr <= F 0.0654
Right-sided Pr >= F 0.9652
Table Probability (P) 0.0306
Two-sided Pr <= P 0.1308
Sample Size = 200
======================================================================
~john-c/5421/notes.006 Last update: September 13, 2010.