SPH 5421 notes.006: Some program examples ...

1.  Illustration of the Box-Muller algorithm for generating normal
    random variables from uniform random variables: See Box & Muller,
    Annals of Statistics, 1958.  An Splus program is used to show the
    distribution of a sample generated by the Box-Muller program.

2.  Example of using a sort-and-carry routine.  The sorting algorithm employed in this
    program is called a 'bubble sort'.  It is easy to write but it is not
    an efficient sort.  For arrays of size 10 or less it performs
    adequately.  More efficient sorting algorithms are described in the
    book 'Numerical Recipes', Flannery et al., and Knuth's book Sorting and
    Searching.

3.  Example of a program to simulate 2 x 2 tables and compute chi-square
    statistics.

=======================================================================;


* SAS program boxmuller.sas which writes 1000 observations to the      ;
* SAS system file 'boxmull'.                                           ;
*                                                                      ;

libname sasplus '' ;
footnote "~john-c/5421/boxmuller.sas &sysdate &systime" ;

%let n = 10000 ;

data rands ;

  twopi = 2 * 3.141592654 ;

  seed = today() ;

  low = -4 ; high = 4 ;
  h = (high - low) / &n ;

  do i = 1 to &n ;

     x = i * h + low ;
     r = ranuni(seed) ;
     s = ranuni(seed) ;

     z = sqrt(-2 * log(r)) * cos(twopi * s) ;

     pdfz = (1 / sqrt(twopi)) * exp(- x * x / 2) ;

     output ;

   end ;

run ;

data sasplus.boxmull ;
     set rands ;
run ;

proc univariate plot normal ;
     var z ;
title "SAS program to illustrate the Box-Muller algorithm ... n = &n";
run ;
=======================================================================;

# Splus program hist.s, which (1) reads the SAS file boxmull, and (2) plots
# two histograms and an overlaid probability density function.
# The postscript graphics output file is hist.ps, which can be viewed and
# printed using ghostview:   ghostview hist.ps.
#

postscript("hist.ps", horizontal = F, onefile = T)

par(mfrow = c(2,1), oma = c(2, 2, 2, 2))

rm(sasdat)
sasdat <- sas.get(library = "/home/walleye/john-c/5421", mem = "boxmull")

r <- sasdat[, "r"]
s <- sasdat[, "s"]
z <- sasdat[, "z"]
x <- sasdat[, "x"]
nobs <- max(sasdat[, "i"])
pdfz <- sasdat[, "pdfz"]
y <- nobs * pdfz

hist(r, nclass = 20, xlim = c(0, 1),
     cex = .7, main = "SPH 5421 Histogram example: r (U[0, 1])")

hist(z, nclass = 20, xlim = c(-5, 5),
     cex = .7, main = "SPH 5421 Histogram example: z (N(0, 1))")

par(new = T, xaxs = "d")

plot(x, y, axes = F, type = "l")

# rm(r, s, x, z, y)

dev.off()

=======================================================================;

PRINTOUT FROM boxmuller.sas:

                                  SAS program to illustrate the Box-Muller algorithm ... n = 10000                                 1
                                                                                                    18:28 Monday, September 22, 2003

                                                        Univariate Procedure

Variable=Z

                 Moments                                  Quantiles(Def=5)                                 Extremes

 N             10000  Sum Wgts      10000      100% Max  3.924911       99%  2.312681         Lowest    Obs     Highest    Obs
 Mean       0.004308  Sum        43.08068       75% Q3   0.686031       95%  1.636393        -3.7367(    9062) 3.409322(    1858)
 Std Dev    1.005764  Variance    1.01156       50% Med  0.010654       90%   1.27482        -3.6254(    1192) 3.474899(    6128)
 Skewness   -0.00433  Kurtosis   -0.05557       25% Q1   -0.67618       10%  -1.31243       -3.50457(    3876) 3.543545(    6560)
 USS        10114.78  CSS        10114.59        0% Min   -3.7367        5%  -1.66405       -3.49485(    5332) 3.610974(     280)
 CV         23346.05  Std Mean   0.010058                                1%  -2.29995        -3.1509(    8977) 3.924911(    5299)
 T:Mean=0   0.428338  Pr>|T|       0.6684      Range     7.661615                    
 Num ^= 0      10000  Num > 0        5043      Q3-Q1     1.362207                    
 M(Sign)          43  Pr>=|M|      0.3953      Mode       -3.7367                    
 Sgn Rank     182589  Pr>=|S|      0.5271
 D:Normal   0.008707  Pr>D         0.0660


                          Histogram                          #  Boxplot                        Normal Probability Plot              
   3.75+*                                                    3     0        3.75+                                                  *
       .*                                                   13     0            |                                                  *
       .**                                                  46     0            |                                                  *
   2.25+*****                                              166     |        2.25+                                             ******
       .************                                       436     |            |                                        ******     
       .*************************                          944     |            |                                   ******          
   0.75+*****************************************         1567  +-----+     0.75+                              ******               
       .************************************************  1868  *--+--*         |                         ******                    
       .************************************************  1872  |     |         |                    ******                         
  -0.75+**************************************            1466  +-----+    -0.75+               ******                              
       .************************                           922     |            |          ******                                   
       .************                                       462     |            |     ******                                        
  -2.25+*****                                              182     |       -2.25+******                                             
       .**                                                  44     0            |*                                                  
       .*                                                    6     0            |*                                                  
  -3.75+*                                                    3     0       -3.75+*                                                  
        ----+----+----+----+----+----+----+----+----+---                         +----+----+----+----+----+----+----+----+----+----+
        * may represent up to 39 counts                                              -2        -1         0        +1        +2     

 
                                              ~john-c/5421/boxmuller.sas 22SEP03 18:28

=======================================================================;

SORTING SMALL ARRAYS

SAS has the feature that, within an observation in a data set, it is
possible to define arrays.  An array is a collection of
variables with some sort of index (usually an integer).  Here is
an example:

data examp ;
     array DBP(10) ;


In this example, the array is called 'DBP' and could be used to
denote diastolic blood pressure of an individual at 10 different
time points.

For example, DBP(4) would represent the person's diastolic BP
at time point 4.

It is frequently desirable to sort arrays by the values of the
variables at the various 'time points'.  However SAS does not have
a function for sorting arrays.  SAS does provide a way of sorting
entire files (PROC SORT), but there is no built-in function to sort
arrays.
have a function for sorting arrays.

Here is a simple program that can be used to sort arrays.  This is
called a 'bubble sort'.

Assume that DBP(1), DBP(2), ..., DBP(10) are diastolic blood
pressures.  The bubble sort algorithm is as follows for sorting
these values in ascending order:

====================================================================

      do i = 2 to 10 ;

         do j = 1 to i - 1 ;

            temp = DBP(i) ;
            if DBP(j) > DBP(i) then do ;
               DBP(i) = DBP(j) ;
               DBP(j) = temp ;
            end ;

         end ;

      end ;

Here is how this program works with the following data:

data dbps ;
     array dbp(10) ;

DBP(1) = 102; DBP(2) = 86; DBP(3) = 98; DBP(4) = 76; DBP(5) = 66 ;
DBP(6) = 100; DBP(7) = 88; DBP(98 = 90; DBP(9) = 94; DBP(10) = 64 ;

     n = 10 ;

     do i = 2 to n ;

        do j = 1 to i - 1 ;

           temp = DBP(i) ;
           if DBP(j) > DBP(i) then do ;
              DBP(i) = DBP(j) ;
              DBP(j) = temp ;
           end ;

           output ;

        end ;

     end ;

run ;

proc print data = dbps ;
     var DBP1 DBP2 DBP3 DBP4 DBP5 DBP6 DBP7 DBP8 DBP9 DBP10 ;
title 'Step-by-step execution of a bubble sort:' ;

====================================================================
Here is the printout from this program:


                    Step-by-step execution of a bubble sort:                   1
                                                17:13 Sunday, September 12, 2010

   Obs   dbp1   dbp2   dbp3   dbp4   dbp5   dbp6   dbp7   dbp8   dbp9   dbp10

     1    86     102     98     76     66    100     88     90     94     64 
     2    86     102     98     76     66    100     88     90     94     64 
     3    86      98    102     76     66    100     88     90     94     64 
     4    76      98    102     86     66    100     88     90     94     64 
     5    76      86    102     98     66    100     88     90     94     64 
     6    76      86     98    102     66    100     88     90     94     64 
     7    66      86     98    102     76    100     88     90     94     64 
     8    66      76     98    102     86    100     88     90     94     64 
     9    66      76     86    102     98    100     88     90     94     64 
    10    66      76     86     98    102    100     88     90     94     64 
    11    66      76     86     98    102    100     88     90     94     64 
    12    66      76     86     98    102    100     88     90     94     64 
    13    66      76     86     98    102    100     88     90     94     64 
    14    66      76     86     98    102    100     88     90     94     64 
    15    66      76     86     98    100    102     88     90     94     64 
    16    66      76     86     98    100    102     88     90     94     64 
    17    66      76     86     98    100    102     88     90     94     64 
    18    66      76     86     98    100    102     88     90     94     64 
    19    66      76     86     88    100    102     98     90     94     64 
    20    66      76     86     88     98    102    100     90     94     64 
    21    66      76     86     88     98    100    102     90     94     64 
    22    66      76     86     88     98    100    102     90     94     64 
    23    66      76     86     88     98    100    102     90     94     64 
    24    66      76     86     88     98    100    102     90     94     64 
    25    66      76     86     88     98    100    102     90     94     64 
    26    66      76     86     88     90    100    102     98     94     64 
    27    66      76     86     88     90     98    102    100     94     64 
    28    66      76     86     88     90     98    100    102     94     64 
    29    66      76     86     88     90     98    100    102     94     64 
    30    66      76     86     88     90     98    100    102     94     64 
    31    66      76     86     88     90     98    100    102     94     64 
    32    66      76     86     88     90     98    100    102     94     64 
    33    66      76     86     88     90     98    100    102     94     64 
    34    66      76     86     88     90     94    100    102     98     64 
    35    66      76     86     88     90     94     98    102    100     64 
    36    66      76     86     88     90     94     98    100    102     64 
    37    64      76     86     88     90     94     98    100    102     66 
    38    64      66     86     88     90     94     98    100    102     76 
    39    64      66     76     88     90     94     98    100    102     86 
    40    64      66     76     86     90     94     98    100    102     88 
    41    64      66     76     86     88     94     98    100    102     90 
    42    64      66     76     86     88     90     98    100    102     94 
    43    64      66     76     86     88     90     94    100    102     98 
    44    64      66     76     86     88     90     94     98    102    100 
    45    64      66     76     86     88     90     94     98    100    102 
 
 
                   ~john-c/5421/bubble.sort.sas 12SEP10 17:13

=============================================================================

As you can see, 45 steps were required for the program to sort this array of 10
elements.  The bubble sort is not very efficient, and it is not recommended for
large arrays.  For small arrays (in general 20 elements or less), its slow speed
is usually not a problem.  For large arrays, there are much faster sorting algorithms:
quicksort, heapsort, and various merge sorts are asymptotically quite efficient.
However programs to execute the faster sorts are considerably more complicated than
that for the bubble sort.

========================================================================================

It is often desirable to do what is called 'sort and carry'.  This means that you have
two arrays with the same number of elements.  You want to sort one of them, and permute
the other one in parallel with the sorting of the first one.  Here is a modification of
the bubble sort program which does this:


data dbps ;
     array dbp(10) ;
     array idx(10) ;

DBP(1) = 102; DBP(2) = 86; DBP(3) = 98; DBP(4) = 76; DBP(5) = 66 ;
DBP(6) = 100; DBP(7) = 88; DBP(98 = 90; DBP(9) = 94; DBP(10) = 64 ;

idx(1) = 1 ; idx(2) = 2 ; idx(3) = 3 ; idx(4) = 4 ; idx(5) = 5 ;
idx(6) = 6 ; idx(7) = 7 ; idx(8) = 8 ; idx(9) = 9 ; idx(10) = 10 ;

     n = 10 ;

     do i = 2 to n ;

        do j = 1 to i - 1 ;

           tempdbp = DBP(i) ;
           tempidx = idx(i) ;
           if DBP(j) > DBP(i) then do ;
              DBP(i) = DBP(j) ;
              DBP(j) = tempdbp ;
              idx(i) = idx(j) ;
              idx(j) = tempidx ;

           end ;

           output ;

        end ;

     end ;

run ;

=========================================================================

The above program will have the effect of permuting the array idx in parallel
with the sorting of the array dbp.


=======================================================================;
* SPH 5421 SAS program ~john-c/5421/sim.sas                            ;
* Example of simulating 2 x 2 tables and computing statistics ...      ;
*                                                                      ;
* SPH 5421                                                        * ;
* Example of simulating 2 x 2 tables and computing statistics ... * ;
*                                                                 * ;

options linesize = 80 ;

data sim ;

     seed = 20000122 ;
     pa = .5 ; pb = .7 ;
     na = 100 ; nb = 100 ;

     n = 3 ;

     do i = 1 to n ;

        a = ranbin(seed, na, pa) ;
        c = na - a ;
        b = ranbin(seed, nb, pb) ;
        d = nb - b ;

        m0 = a + c ;
        m1 = b + d ;
        n0 = a + b ;
        n1 = c + d ;
        total = a + b + c + d ;

        chisquar = total * (a * d - b * c)**2 / (m0 * m1 * n0 * n1) ;

        prob = 1 - probchi(chisquar, 1) ;
        fisherleft = probhypr(a + b + c + d, a + b, a + c, a) ;
        tableprob =  probhypr(a + b + c + d, a + b, a + c, a)
                   - probhypr(a + b + c + d, a + b, a + c, a - 1) ;

     output ;

     end ;

 run ;

 proc print ;
      var a b c d chisquar prob fisherleft tableprob ;
 title 'Simulated 2 x 2 tables: non-PROC computation of probabilities' ;

 data tabs ;
      set sim ;
      x = 0 ; y = 0 ; m = a ; output ;
      x = 0 ; y = 1 ; m = b ; output ;
      x = 1 ; y = 0 ; m = c ; output ;
      x = 1 ; y = 1 ; m = d ; output ;

 run ;

 proc freq data = tabs ;
      by i ;
      weight m ;
      tables x * y / chisqr ;
 title1 'PROC FREQ computation of chi-squares ...' ;

*----------------------------------------------------------------------;
*
*    Below is the output from sim.sas
*
=======================================================================;

         Simulated 2 x 2 tables: non-PROC computation of probabilities         1
                                                   18:10 Monday, October 1, 2007

 Obs     a     b     c     d    chisquar      prob     fisherleft    tableprob

  1     59    68    41    32     1.74738    0.18621      0.11995      0.049125
  2     56    69    44    31     3.60533    0.05759      0.03966      0.019396
  3     62    73    38    27     2.75783    0.09678      0.06540      0.030589
                    PROC FREQ computation of chi-squares ...                   2
                                                   18:10 Monday, October 1, 2007

------------------------------------- i=1 --------------------------------------

                               The FREQ Procedure

                                Table of x by y

                      x         y

                      Frequency|
                      Percent  |
                      Row Pct  |
                      Col Pct  |       0|       1|  Total
                      ---------+--------+--------+
                             0 |     59 |     68 |    127
                               |  29.50 |  34.00 |  63.50
                               |  46.46 |  53.54 |
                               |  59.00 |  68.00 |
                      ---------+--------+--------+
                             1 |     41 |     32 |     73
                               |  20.50 |  16.00 |  36.50
                               |  56.16 |  43.84 |
                               |  41.00 |  32.00 |
                      ---------+--------+--------+
                      Total         100      100      200
                                  50.00    50.00   100.00


                         Statistics for Table of x by y

             Statistic                     DF       Value      Prob
             ------------------------------------------------------
             Chi-Square                     1      1.7474    0.1862
             Likelihood Ratio Chi-Square    1      1.7507    0.1858
             Continuity Adj. Chi-Square     1      1.3806    0.2400
             Mantel-Haenszel Chi-Square     1      1.7386    0.1873
             Phi Coefficient                      -0.0935          
             Contingency Coefficient               0.0931          
             Cramer's V                           -0.0935          


                              Fisher's Exact Test
                       ----------------------------------
                       Cell (1,1) Frequency (F)        59
                       Left-sided Pr <= F          0.1200
                       Right-sided Pr >= F         0.9292
                                                         
                       Table Probability (P)       0.0491
                       Two-sided Pr <= P           0.2399

                               Sample Size = 200
                    PROC FREQ computation of chi-squares ...                   3
                                                   18:10 Monday, October 1, 2007

------------------------------------- i=2 --------------------------------------

                               The FREQ Procedure

                                Table of x by y

                      x         y

                      Frequency|
                      Percent  |
                      Row Pct  |
                      Col Pct  |       0|       1|  Total
                      ---------+--------+--------+
                             0 |     56 |     69 |    125
                               |  28.00 |  34.50 |  62.50
                               |  44.80 |  55.20 |
                               |  56.00 |  69.00 |
                      ---------+--------+--------+
                             1 |     44 |     31 |     75
                               |  22.00 |  15.50 |  37.50
                               |  58.67 |  41.33 |
                               |  44.00 |  31.00 |
                      ---------+--------+--------+
                      Total         100      100      200
                                  50.00    50.00   100.00


                         Statistics for Table of x by y

             Statistic                     DF       Value      Prob
             ------------------------------------------------------
             Chi-Square                     1      3.6053    0.0576
             Likelihood Ratio Chi-Square    1      3.6192    0.0571
             Continuity Adj. Chi-Square     1      3.0720    0.0797
             Mantel-Haenszel Chi-Square     1      3.5873    0.0582
             Phi Coefficient                      -0.1343          
             Contingency Coefficient               0.1331          
             Cramer's V                           -0.1343          


                              Fisher's Exact Test
                       ----------------------------------
                       Cell (1,1) Frequency (F)        56
                       Left-sided Pr <= F          0.0397
                       Right-sided Pr >= F         0.9797
                                                         
                       Table Probability (P)       0.0194
                       Two-sided Pr <= P           0.0793

                               Sample Size = 200
                    PROC FREQ computation of chi-squares ...                   4
                                                   18:10 Monday, October 1, 2007

------------------------------------- i=3 --------------------------------------

                               The FREQ Procedure

                                Table of x by y

                      x         y

                      Frequency|
                      Percent  |
                      Row Pct  |
                      Col Pct  |       0|       1|  Total
                      ---------+--------+--------+
                             0 |     62 |     73 |    135
                               |  31.00 |  36.50 |  67.50
                               |  45.93 |  54.07 |
                               |  62.00 |  73.00 |
                      ---------+--------+--------+
                             1 |     38 |     27 |     65
                               |  19.00 |  13.50 |  32.50
                               |  58.46 |  41.54 |
                               |  38.00 |  27.00 |
                      ---------+--------+--------+
                      Total         100      100      200
                                  50.00    50.00   100.00


                         Statistics for Table of x by y

             Statistic                     DF       Value      Prob
             ------------------------------------------------------
             Chi-Square                     1      2.7578    0.0968
             Likelihood Ratio Chi-Square    1      2.7678    0.0962
             Continuity Adj. Chi-Square     1      2.2792    0.1311
             Mantel-Haenszel Chi-Square     1      2.7440    0.0976
             Phi Coefficient                      -0.1174          
             Contingency Coefficient               0.1166          
             Cramer's V                           -0.1174          


                              Fisher's Exact Test
                       ----------------------------------
                       Cell (1,1) Frequency (F)        62
                       Left-sided Pr <= F          0.0654
                       Right-sided Pr >= F         0.9652
                                                         
                       Table Probability (P)       0.0306
                       Two-sided Pr <= P           0.1308

                               Sample Size = 200
======================================================================


~john-c/5421/notes.006         Last update: September 13, 2010.