SPH 5421 notes.006: Some program examples ...
1. Illustration of the Box-Muller algorithm for generating normal
random variables from uniform random variables: See Box & Muller,
Annals of Statistics, 1958. An Splus program is used to show the
distribution of a sample generated by the Box-Muller program.
2. Example of using a sort-and-carry routine. The sorting algorithm employed in this
program is called a 'bubble sort'. It is easy to write but it is not
an efficient sort. For arrays of size 10 or less it performs
adequately. More efficient sorting algorithms are described in the
book 'Numerical Recipes', Flannery et al., and Knuth's book Sorting and
Searching.
3. Example of a program to simulate 2 x 2 tables and compute chi-square
statistics.
=======================================================================;
* SAS program boxmuller.sas which writes 1000 observations to the ;
* SAS system file 'boxmull'. ;
* ;
libname sasplus '' ;
footnote "~john-c/5421/boxmuller.sas &sysdate &systime" ;
%let n = 10000 ;
data rands ;
twopi = 2 * 3.141592654 ;
seed = today() ;
low = -4 ; high = 4 ;
h = (high - low) / &n ;
do i = 1 to &n ;
x = i * h + low ;
r = ranuni(seed) ;
s = ranuni(seed) ;
z = sqrt(-2 * log(r)) * cos(twopi * s) ;
pdfz = (1 / sqrt(twopi)) * exp(- x * x / 2) ;
output ;
end ;
run ;
data sasplus.boxmull ;
set rands ;
run ;
proc univariate plot normal ;
var z ;
title "SAS program to illustrate the Box-Muller algorithm ... n = &n";
run ;
=======================================================================;
# Splus program hist.s, which (1) reads the SAS file boxmull, and (2) plots
# two histograms and an overlaid probability density function.
# The postscript graphics output file is hist.ps, which can be viewed and
# printed using ghostview: ghostview hist.ps.
#
postscript("hist.ps", horizontal = F, onefile = T)
par(mfrow = c(2,1), oma = c(2, 2, 2, 2))
rm(sasdat)
sasdat <- sas.get(library = "/home/walleye/john-c/5421", mem = "boxmull")
r <- sasdat[, "r"]
s <- sasdat[, "s"]
z <- sasdat[, "z"]
x <- sasdat[, "x"]
nobs <- max(sasdat[, "i"])
pdfz <- sasdat[, "pdfz"]
y <- nobs * pdfz
hist(r, nclass = 20, xlim = c(0, 1),
cex = .7, main = "SPH 5421 Histogram example: r (U[0, 1])")
hist(z, nclass = 20, xlim = c(-5, 5),
cex = .7, main = "SPH 5421 Histogram example: z (N(0, 1))")
par(new = T, xaxs = "d")
plot(x, y, axes = F, type = "l")
# rm(r, s, x, z, y)
dev.off()
=======================================================================;
PRINTOUT FROM boxmuller.sas:
SAS program to illustrate the Box-Muller algorithm ... n = 10000 1
18:28 Monday, September 22, 2003
Univariate Procedure
Variable=Z
Moments Quantiles(Def=5) Extremes
N 10000 Sum Wgts 10000 100% Max 3.924911 99% 2.312681 Lowest Obs Highest Obs
Mean 0.004308 Sum 43.08068 75% Q3 0.686031 95% 1.636393 -3.7367( 9062) 3.409322( 1858)
Std Dev 1.005764 Variance 1.01156 50% Med 0.010654 90% 1.27482 -3.6254( 1192) 3.474899( 6128)
Skewness -0.00433 Kurtosis -0.05557 25% Q1 -0.67618 10% -1.31243 -3.50457( 3876) 3.543545( 6560)
USS 10114.78 CSS 10114.59 0% Min -3.7367 5% -1.66405 -3.49485( 5332) 3.610974( 280)
CV 23346.05 Std Mean 0.010058 1% -2.29995 -3.1509( 8977) 3.924911( 5299)
T:Mean=0 0.428338 Pr>|T| 0.6684 Range 7.661615
Num ^= 0 10000 Num > 0 5043 Q3-Q1 1.362207
M(Sign) 43 Pr>=|M| 0.3953 Mode -3.7367
Sgn Rank 182589 Pr>=|S| 0.5271
D:Normal 0.008707 Pr>D 0.0660
Histogram # Boxplot Normal Probability Plot
3.75+* 3 0 3.75+ *
.* 13 0 | *
.** 46 0 | *
2.25+***** 166 | 2.25+ ******
.************ 436 | | ******
.************************* 944 | | ******
0.75+***************************************** 1567 +-----+ 0.75+ ******
.************************************************ 1868 *--+--* | ******
.************************************************ 1872 | | | ******
-0.75+************************************** 1466 +-----+ -0.75+ ******
.************************ 922 | | ******
.************ 462 | | ******
-2.25+***** 182 | -2.25+******
.** 44 0 |*
.* 6 0 |*
-3.75+* 3 0 -3.75+*
----+----+----+----+----+----+----+----+----+--- +----+----+----+----+----+----+----+----+----+----+
* may represent up to 39 counts -2 -1 0 +1 +2
~john-c/5421/boxmuller.sas 22SEP03 18:28
=======================================================================;
* How to sort-and-carry arrays in SAS. ;
* ;
footnote "prog: /home/walleye/john-c/5421/arraysort.sas &sysdate &systime" ;
options linesize = 120 ;
data arrays ;
array x(*) x1 x2 x3 x4 x5 ;
array y(*) y1 y2 y3 y4 y5 ;
array z(*) z1 z2 z3 z4 z5 ;
seed = 20000123 ;
n = 5 ;
nexamps = 10 ;
do k = 1 to nexamps ;
do i = 1 to n ;
x(i) = ranuni(seed) ;
y(i) = x(i) ;
z(i) = i ;
end ;
do i = 2 to n ;
do j = 1 to i - 1 ;
if y(j) gt y(i) then do ;
ytemp = y(i) ;
y(i) = y(j) ;
y(j) = ytemp ;
ztemp = z(i) ;
z(i) = z(j) ;
z(j) = ztemp ;
end ;
end ;
end ;
output ;
end ;
run ;
proc print ;
var k x1 x2 x3 x4 x5
y1 y2 y3 y4 y5
z1 z2 z3 z4 z5 ;
title 'Original x(i), sorted y(i), scrambled z(i)' ;
=======================================================================;
* SPH 5421 SAS program ~john-c/5421/sim.sas ;
* Example of simulating 2 x 2 tables and computing statistics ... ;
* ;
* SPH 5421 * ;
* Example of simulating 2 x 2 tables and computing statistics ... * ;
* * ;
options linesize = 80 ;
data sim ;
seed = 20000122 ;
pa = .5 ; pb = .7 ;
na = 100 ; nb = 100 ;
n = 3 ;
do i = 1 to n ;
a = ranbin(seed, na, pa) ;
c = na - a ;
b = ranbin(seed, nb, pb) ;
d = nb - b ;
m0 = a + c ;
m1 = b + d ;
n0 = a + b ;
n1 = c + d ;
total = a + b + c + d ;
chisquar = total * (a * d - b * c)**2 / (m0 * m1 * n0 * n1) ;
prob = 1 - probchi(chisquar, 1) ;
fisherleft = probhypr(a + b + c + d, a + b, a + c, a) ;
tableprob = probhypr(a + b + c + d, a + b, a + c, a)
- probhypr(a + b + c + d, a + b, a + c, a - 1) ;
output ;
end ;
run ;
proc print ;
var a b c d chisquar prob fisherleft tableprob ;
title 'Simulated 2 x 2 tables: non-PROC computation of probabilities' ;
data tabs ;
set sim ;
x = 0 ; y = 0 ; m = a ; output ;
x = 0 ; y = 1 ; m = b ; output ;
x = 1 ; y = 0 ; m = c ; output ;
x = 1 ; y = 1 ; m = d ; output ;
run ;
proc freq data = tabs ;
by i ;
weight m ;
tables x * y / chisqr ;
title1 'PROC FREQ computation of chi-squares ...' ;
*----------------------------------------------------------------------;
*
* Below is the output from sim.sas
*
=======================================================================;
Simulated 2 x 2 tables: non-PROC computation of probabilities 1
18:10 Monday, October 1, 2007
Obs a b c d chisquar prob fisherleft tableprob
1 59 68 41 32 1.74738 0.18621 0.11995 0.049125
2 56 69 44 31 3.60533 0.05759 0.03966 0.019396
3 62 73 38 27 2.75783 0.09678 0.06540 0.030589
PROC FREQ computation of chi-squares ... 2
18:10 Monday, October 1, 2007
------------------------------------- i=1 --------------------------------------
The FREQ Procedure
Table of x by y
x y
Frequency|
Percent |
Row Pct |
Col Pct | 0| 1| Total
---------+--------+--------+
0 | 59 | 68 | 127
| 29.50 | 34.00 | 63.50
| 46.46 | 53.54 |
| 59.00 | 68.00 |
---------+--------+--------+
1 | 41 | 32 | 73
| 20.50 | 16.00 | 36.50
| 56.16 | 43.84 |
| 41.00 | 32.00 |
---------+--------+--------+
Total 100 100 200
50.00 50.00 100.00
Statistics for Table of x by y
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 1.7474 0.1862
Likelihood Ratio Chi-Square 1 1.7507 0.1858
Continuity Adj. Chi-Square 1 1.3806 0.2400
Mantel-Haenszel Chi-Square 1 1.7386 0.1873
Phi Coefficient -0.0935
Contingency Coefficient 0.0931
Cramer's V -0.0935
Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F) 59
Left-sided Pr <= F 0.1200
Right-sided Pr >= F 0.9292
Table Probability (P) 0.0491
Two-sided Pr <= P 0.2399
Sample Size = 200
PROC FREQ computation of chi-squares ... 3
18:10 Monday, October 1, 2007
------------------------------------- i=2 --------------------------------------
The FREQ Procedure
Table of x by y
x y
Frequency|
Percent |
Row Pct |
Col Pct | 0| 1| Total
---------+--------+--------+
0 | 56 | 69 | 125
| 28.00 | 34.50 | 62.50
| 44.80 | 55.20 |
| 56.00 | 69.00 |
---------+--------+--------+
1 | 44 | 31 | 75
| 22.00 | 15.50 | 37.50
| 58.67 | 41.33 |
| 44.00 | 31.00 |
---------+--------+--------+
Total 100 100 200
50.00 50.00 100.00
Statistics for Table of x by y
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 3.6053 0.0576
Likelihood Ratio Chi-Square 1 3.6192 0.0571
Continuity Adj. Chi-Square 1 3.0720 0.0797
Mantel-Haenszel Chi-Square 1 3.5873 0.0582
Phi Coefficient -0.1343
Contingency Coefficient 0.1331
Cramer's V -0.1343
Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F) 56
Left-sided Pr <= F 0.0397
Right-sided Pr >= F 0.9797
Table Probability (P) 0.0194
Two-sided Pr <= P 0.0793
Sample Size = 200
PROC FREQ computation of chi-squares ... 4
18:10 Monday, October 1, 2007
------------------------------------- i=3 --------------------------------------
The FREQ Procedure
Table of x by y
x y
Frequency|
Percent |
Row Pct |
Col Pct | 0| 1| Total
---------+--------+--------+
0 | 62 | 73 | 135
| 31.00 | 36.50 | 67.50
| 45.93 | 54.07 |
| 62.00 | 73.00 |
---------+--------+--------+
1 | 38 | 27 | 65
| 19.00 | 13.50 | 32.50
| 58.46 | 41.54 |
| 38.00 | 27.00 |
---------+--------+--------+
Total 100 100 200
50.00 50.00 100.00
Statistics for Table of x by y
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 2.7578 0.0968
Likelihood Ratio Chi-Square 1 2.7678 0.0962
Continuity Adj. Chi-Square 1 2.2792 0.1311
Mantel-Haenszel Chi-Square 1 2.7440 0.0976
Phi Coefficient -0.1174
Contingency Coefficient 0.1166
Cramer's V -0.1174
Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F) 62
Left-sided Pr <= F 0.0654
Right-sided Pr >= F 0.9652
Table Probability (P) 0.0306
Two-sided Pr <= P 0.1308
Sample Size = 200
======================================================================
~john-c/5421/notes.006 Last update: October 1, 2007.