SPH 5421 notes.006: Some program examples ...

1.  Illustration of the Box-Muller algorithm for generating normal
    random variables from uniform random variables: See Box & Muller,
    Annals of Statistics, 1958.  An Splus program is used to show the
    distribution of a sample generated by the Box-Muller program.

2.  Example of using a sort-and-carry routine.  The sorting algorithm employed in this
    program is called a 'bubble sort'.  It is easy to write but it is not
    an efficient sort.  For arrays of size 10 or less it performs
    adequately.  More efficient sorting algorithms are described in the
    book 'Numerical Recipes', Flannery et al., and Knuth's book Sorting and
    Searching.

3.  Example of a program to simulate 2 x 2 tables and compute chi-square
    statistics.

=======================================================================;


* SAS program boxmuller.sas which writes 1000 observations to the      ;
* SAS system file 'boxmull'.                                           ;
*                                                                      ;

libname sasplus '' ;
footnote "~john-c/5421/boxmuller.sas &sysdate &systime" ;

%let n = 10000 ;

data rands ;

  twopi = 2 * 3.141592654 ;

  seed = today() ;

  low = -4 ; high = 4 ;
  h = (high - low) / &n ;

  do i = 1 to &n ;

     x = i * h + low ;
     r = ranuni(seed) ;
     s = ranuni(seed) ;

     z = sqrt(-2 * log(r)) * cos(twopi * s) ;

     pdfz = (1 / sqrt(twopi)) * exp(- x * x / 2) ;

     output ;

   end ;

run ;

data sasplus.boxmull ;
     set rands ;
run ;

proc univariate plot normal ;
     var z ;
title "SAS program to illustrate the Box-Muller algorithm ... n = &n";
run ;
=======================================================================;

# Splus program hist.s, which (1) reads the SAS file boxmull, and (2) plots
# two histograms and an overlaid probability density function.
# The postscript graphics output file is hist.ps, which can be viewed and
# printed using ghostview:   ghostview hist.ps.
#

postscript("hist.ps", horizontal = F, onefile = T)

par(mfrow = c(2,1), oma = c(2, 2, 2, 2))

rm(sasdat)
sasdat <- sas.get(library = "/home/walleye/john-c/5421", mem = "boxmull")

r <- sasdat[, "r"]
s <- sasdat[, "s"]
z <- sasdat[, "z"]
x <- sasdat[, "x"]
nobs <- max(sasdat[, "i"])
pdfz <- sasdat[, "pdfz"]
y <- nobs * pdfz

hist(r, nclass = 20, xlim = c(0, 1),
     cex = .7, main = "SPH 5421 Histogram example: r (U[0, 1])")

hist(z, nclass = 20, xlim = c(-5, 5),
     cex = .7, main = "SPH 5421 Histogram example: z (N(0, 1))")

par(new = T, xaxs = "d")

plot(x, y, axes = F, type = "l")

# rm(r, s, x, z, y)

dev.off()

=======================================================================;

PRINTOUT FROM boxmuller.sas:

                                  SAS program to illustrate the Box-Muller algorithm ... n = 10000                                 1
                                                                                                    18:28 Monday, September 22, 2003

                                                        Univariate Procedure

Variable=Z

                 Moments                                  Quantiles(Def=5)                                 Extremes

 N             10000  Sum Wgts      10000      100% Max  3.924911       99%  2.312681         Lowest    Obs     Highest    Obs
 Mean       0.004308  Sum        43.08068       75% Q3   0.686031       95%  1.636393        -3.7367(    9062) 3.409322(    1858)
 Std Dev    1.005764  Variance    1.01156       50% Med  0.010654       90%   1.27482        -3.6254(    1192) 3.474899(    6128)
 Skewness   -0.00433  Kurtosis   -0.05557       25% Q1   -0.67618       10%  -1.31243       -3.50457(    3876) 3.543545(    6560)
 USS        10114.78  CSS        10114.59        0% Min   -3.7367        5%  -1.66405       -3.49485(    5332) 3.610974(     280)
 CV         23346.05  Std Mean   0.010058                                1%  -2.29995        -3.1509(    8977) 3.924911(    5299)
 T:Mean=0   0.428338  Pr>|T|       0.6684      Range     7.661615                    
 Num ^= 0      10000  Num > 0        5043      Q3-Q1     1.362207                    
 M(Sign)          43  Pr>=|M|      0.3953      Mode       -3.7367                    
 Sgn Rank     182589  Pr>=|S|      0.5271
 D:Normal   0.008707  Pr>D         0.0660


                          Histogram                          #  Boxplot                        Normal Probability Plot              
   3.75+*                                                    3     0        3.75+                                                  *
       .*                                                   13     0            |                                                  *
       .**                                                  46     0            |                                                  *
   2.25+*****                                              166     |        2.25+                                             ******
       .************                                       436     |            |                                        ******     
       .*************************                          944     |            |                                   ******          
   0.75+*****************************************         1567  +-----+     0.75+                              ******               
       .************************************************  1868  *--+--*         |                         ******                    
       .************************************************  1872  |     |         |                    ******                         
  -0.75+**************************************            1466  +-----+    -0.75+               ******                              
       .************************                           922     |            |          ******                                   
       .************                                       462     |            |     ******                                        
  -2.25+*****                                              182     |       -2.25+******                                             
       .**                                                  44     0            |*                                                  
       .*                                                    6     0            |*                                                  
  -3.75+*                                                    3     0       -3.75+*                                                  
        ----+----+----+----+----+----+----+----+----+---                         +----+----+----+----+----+----+----+----+----+----+
        * may represent up to 39 counts                                              -2        -1         0        +1        +2     

 
                                              ~john-c/5421/boxmuller.sas 22SEP03 18:28

=======================================================================;

*  How to sort-and-carry arrays in SAS.                           ;
*                                                                 ;

footnote "prog: /home/walleye/john-c/5421/arraysort.sas &sysdate &systime" ;

options linesize = 120 ;

data arrays ;
  array x(*) x1 x2 x3 x4 x5 ;
  array y(*) y1 y2 y3 y4 y5 ;
  array z(*) z1 z2 z3 z4 z5 ;

  seed = 20000123 ;

  n = 5 ;
  nexamps = 10 ;

  do k = 1 to nexamps ;

     do i = 1 to n ;

        x(i) = ranuni(seed) ;
        y(i) = x(i) ;
        z(i) = i ;

     end ;

     do i = 2 to n ;

        do j = 1 to i - 1 ;

           if y(j) gt y(i) then do ;

              ytemp = y(i) ;
              y(i) = y(j) ;
              y(j) = ytemp ;

              ztemp = z(i) ;
              z(i) = z(j) ;
              z(j) = ztemp ;

            end ;

         end ;

      end ;

      output ;

   end ;

run ;

proc print ;
     var k x1 x2 x3 x4 x5
           y1 y2 y3 y4 y5
           z1 z2 z3 z4 z5 ;
title 'Original x(i), sorted y(i), scrambled z(i)' ;

=======================================================================;
* SPH 5421 SAS program ~john-c/5421/sim.sas                            ;
* Example of simulating 2 x 2 tables and computing statistics ...      ;
*                                                                      ;
* SPH 5421                                                        * ;
* Example of simulating 2 x 2 tables and computing statistics ... * ;
*                                                                 * ;

options linesize = 80 ;

data sim ;

     seed = 20000122 ;
     pa = .5 ; pb = .7 ;
     na = 100 ; nb = 100 ;

     n = 3 ;

     do i = 1 to n ;

        a = ranbin(seed, na, pa) ;
        c = na - a ;
        b = ranbin(seed, nb, pb) ;
        d = nb - b ;

        m0 = a + c ;
        m1 = b + d ;
        n0 = a + b ;
        n1 = c + d ;
        total = a + b + c + d ;

        chisquar = total * (a * d - b * c)**2 / (m0 * m1 * n0 * n1) ;

        prob = 1 - probchi(chisquar, 1) ;
        fisherleft = probhypr(a + b + c + d, a + b, a + c, a) ;
        tableprob =  probhypr(a + b + c + d, a + b, a + c, a)
                   - probhypr(a + b + c + d, a + b, a + c, a - 1) ;

     output ;

     end ;

 run ;

 proc print ;
      var a b c d chisquar prob fisherleft tableprob ;
 title 'Simulated 2 x 2 tables: non-PROC computation of probabilities' ;

 data tabs ;
      set sim ;
      x = 0 ; y = 0 ; m = a ; output ;
      x = 0 ; y = 1 ; m = b ; output ;
      x = 1 ; y = 0 ; m = c ; output ;
      x = 1 ; y = 1 ; m = d ; output ;

 run ;

 proc freq data = tabs ;
      by i ;
      weight m ;
      tables x * y / chisqr ;
 title1 'PROC FREQ computation of chi-squares ...' ;

*----------------------------------------------------------------------;
*
*    Below is the output from sim.sas
*
=======================================================================;

         Simulated 2 x 2 tables: non-PROC computation of probabilities         1
                                                   18:10 Monday, October 1, 2007

 Obs     a     b     c     d    chisquar      prob     fisherleft    tableprob

  1     59    68    41    32     1.74738    0.18621      0.11995      0.049125
  2     56    69    44    31     3.60533    0.05759      0.03966      0.019396
  3     62    73    38    27     2.75783    0.09678      0.06540      0.030589
                    PROC FREQ computation of chi-squares ...                   2
                                                   18:10 Monday, October 1, 2007

------------------------------------- i=1 --------------------------------------

                               The FREQ Procedure

                                Table of x by y

                      x         y

                      Frequency|
                      Percent  |
                      Row Pct  |
                      Col Pct  |       0|       1|  Total
                      ---------+--------+--------+
                             0 |     59 |     68 |    127
                               |  29.50 |  34.00 |  63.50
                               |  46.46 |  53.54 |
                               |  59.00 |  68.00 |
                      ---------+--------+--------+
                             1 |     41 |     32 |     73
                               |  20.50 |  16.00 |  36.50
                               |  56.16 |  43.84 |
                               |  41.00 |  32.00 |
                      ---------+--------+--------+
                      Total         100      100      200
                                  50.00    50.00   100.00


                         Statistics for Table of x by y

             Statistic                     DF       Value      Prob
             ------------------------------------------------------
             Chi-Square                     1      1.7474    0.1862
             Likelihood Ratio Chi-Square    1      1.7507    0.1858
             Continuity Adj. Chi-Square     1      1.3806    0.2400
             Mantel-Haenszel Chi-Square     1      1.7386    0.1873
             Phi Coefficient                      -0.0935          
             Contingency Coefficient               0.0931          
             Cramer's V                           -0.0935          


                              Fisher's Exact Test
                       ----------------------------------
                       Cell (1,1) Frequency (F)        59
                       Left-sided Pr <= F          0.1200
                       Right-sided Pr >= F         0.9292
                                                         
                       Table Probability (P)       0.0491
                       Two-sided Pr <= P           0.2399

                               Sample Size = 200
                    PROC FREQ computation of chi-squares ...                   3
                                                   18:10 Monday, October 1, 2007

------------------------------------- i=2 --------------------------------------

                               The FREQ Procedure

                                Table of x by y

                      x         y

                      Frequency|
                      Percent  |
                      Row Pct  |
                      Col Pct  |       0|       1|  Total
                      ---------+--------+--------+
                             0 |     56 |     69 |    125
                               |  28.00 |  34.50 |  62.50
                               |  44.80 |  55.20 |
                               |  56.00 |  69.00 |
                      ---------+--------+--------+
                             1 |     44 |     31 |     75
                               |  22.00 |  15.50 |  37.50
                               |  58.67 |  41.33 |
                               |  44.00 |  31.00 |
                      ---------+--------+--------+
                      Total         100      100      200
                                  50.00    50.00   100.00


                         Statistics for Table of x by y

             Statistic                     DF       Value      Prob
             ------------------------------------------------------
             Chi-Square                     1      3.6053    0.0576
             Likelihood Ratio Chi-Square    1      3.6192    0.0571
             Continuity Adj. Chi-Square     1      3.0720    0.0797
             Mantel-Haenszel Chi-Square     1      3.5873    0.0582
             Phi Coefficient                      -0.1343          
             Contingency Coefficient               0.1331          
             Cramer's V                           -0.1343          


                              Fisher's Exact Test
                       ----------------------------------
                       Cell (1,1) Frequency (F)        56
                       Left-sided Pr <= F          0.0397
                       Right-sided Pr >= F         0.9797
                                                         
                       Table Probability (P)       0.0194
                       Two-sided Pr <= P           0.0793

                               Sample Size = 200
                    PROC FREQ computation of chi-squares ...                   4
                                                   18:10 Monday, October 1, 2007

------------------------------------- i=3 --------------------------------------

                               The FREQ Procedure

                                Table of x by y

                      x         y

                      Frequency|
                      Percent  |
                      Row Pct  |
                      Col Pct  |       0|       1|  Total
                      ---------+--------+--------+
                             0 |     62 |     73 |    135
                               |  31.00 |  36.50 |  67.50
                               |  45.93 |  54.07 |
                               |  62.00 |  73.00 |
                      ---------+--------+--------+
                             1 |     38 |     27 |     65
                               |  19.00 |  13.50 |  32.50
                               |  58.46 |  41.54 |
                               |  38.00 |  27.00 |
                      ---------+--------+--------+
                      Total         100      100      200
                                  50.00    50.00   100.00


                         Statistics for Table of x by y

             Statistic                     DF       Value      Prob
             ------------------------------------------------------
             Chi-Square                     1      2.7578    0.0968
             Likelihood Ratio Chi-Square    1      2.7678    0.0962
             Continuity Adj. Chi-Square     1      2.2792    0.1311
             Mantel-Haenszel Chi-Square     1      2.7440    0.0976
             Phi Coefficient                      -0.1174          
             Contingency Coefficient               0.1166          
             Cramer's V                           -0.1174          


                              Fisher's Exact Test
                       ----------------------------------
                       Cell (1,1) Frequency (F)        62
                       Left-sided Pr <= F          0.0654
                       Right-sided Pr >= F         0.9652
                                                         
                       Table Probability (P)       0.0306
                       Two-sided Pr <= P           0.1308

                               Sample Size = 200
======================================================================


~john-c/5421/notes.006         Last update: October 1, 2007.