SAS PROC IML                                                   SPH 5421 notes.016

    'IML' stands for Interactive Matrix Language.  It can be used interactively,
but in my experience, it is more likely in applications that one would
use it in programs in 'batch' mode.

    IML provides a way to write and manipulate vectors and matrices.
In particular, the following matrix operations are possible:

    1) Addition of vectors and matrics
    2) Multiplication of matrices by scalars, vectors and other matrices
    3) Transposition
    4) Inversion of matrices
    5  Finding determinants
    6) Finding eigenvectors and eigenvalues

    PROC IML also provides ways of writing design matrices for analysis of
variance and regression problems; see p. 289 in the IML Manual.

    The SAS IML manual is indispensable for learning the language and as a
reference: SAS/IML Software, Version 6, First Edition, 1990. SAS Institute, Inc,
Cary, NC.

    The following is a SAS program showing how simple linear regression
may be done in (ordinary) SAS, by the use of PROC REG, and then by
the use of PROC IML.  The IML program illustrates many of the basics of
matrix manipulation.

==============================================================================

* PROC IML Example :  Simple linear regression.                    ;

options linesize = 80 ;

footnote "/home/walleye/john-c/5421/sregiml.sas &sysdate &systime" ;

data sreg ;
     retain sumxi sumyi sumxisq sumyisq sumxiyi count 0 ;

     infile cards  EOF = slope ;
     input obs xi yi ;

     one = 1 ;
     count = count + 1 ;
     sumxi = sumxi + xi ;
     sumyi = sumyi + yi ;
     sumxisq = sumxisq + xi * xi ;
     sumyisq = sumyisq + yi * yi ;
     sumxiyi = sumxiyi + xi * yi ;

     return ;

slope:

     xbar = sumxi / count ;
     ybar = sumyi / count ;
     b1 = (sumxiyi - sumxi * sumyi / count) / (sumxisq - sumxi * sumxi / count) ;
     b0   = ybar - b1 * xbar ;

     cards ;
     1   2   4
     2   2   3
     3   3   5
     4   3   6
     5   4   2
     6   4   7
     7   5   8
     8   5   8
     9   6   9
    10   6  11
    11   7  12
    12   7  15
    ;

run ;

proc print data = sreg ;
title1 'Dataset sreg and simple linear regression:' ;

proc reg data = sreg ;
     model yi = xi ;
title1 'Proc reg results for yi vs. xi' ;

proc iml ;

use sreg ;

read all var {one xi} where (xi ^= . & yi ^= .) into x ;
read all var {yi}     where (xi ^= . & yi ^= .) into y ;

xtx = x` * x ;
xty = x` * y ;
n = nrow(y) ;
p = 1 ;

determ = det(x` * x) ;

xtxinv = inv(x` * x) ;
yt = y` ;
xt = x` ;
b = xtxinv * xty ;
bt = b` ;

s2 = (yt*y - bt*xt*y) / (n - p - 1) ;

ybar = sum(y) / n ;
ssreg = bt*xt*y - n*ybar**2 ;
ssres = yt*y - bt*xt*y ;
sstot = yt*y - n*ybar**2 ;
rsquare = ssreg / sstot ;
fstat = (ssreg/p) / (ssres/(n - p - 1)) ;
covb = s2 * xtxinv ;

serrb0 = sqrt(covb[1, 1]) ;
serrb1 = sqrt(covb[2, 2]) ;
b0 = b[1] ;
b1 = b[2] ;

tstat0 = b0 / serrb0 ;
tstat1 = b1 / serrb1 ;
df = n - p - 1 ;
pvalue0 = 2 * (1 - probt(abs(tstat0), df)) ;
pvalue1 = 2 * (1 - probt(abs(tstat1), df)) ;

title1 'PROC IML results for simple linear regression' ;

print "x y xtx xty determ(xtx) xtxinv b : " ; print x y  xtx  xty  determ  xtxinv  b ;
print, "s2,  ssreg, ssres, sstot, fstat, rsquare = " ; print s2 ssreg ssres sstot fstat rsquare ;
print, " b0, serr(b0), t-stat0, p-value0 : " ; print b0 serrb0 tstat0 pvalue0 ;
print, " b1, serr(b1), t-stat1, p-value1 : " ; print b1 serrb1 tstat1 pvalue1 ;
endsas ;


========================================================================

                   Dataset sreg and simple linear regression:                  1
                                                18:20 Wednesday, October 9, 2013

                 s    s    s
                 u    u    u
         s   s   m    m    m    c
         u   u   x    y    x    o                 x    y
     O   m   m   i    i    i    u   o         o   b    b
     b   x   y   s    s    y    n   b  x   y  n   a    a      b         b
     s   i   i   q    q    i    t   s  i   i  e   r    r      1         0

     1   2   4    4   16    8   1   1  2   4  1   .    .    .         .     
     2   4   7    8   25   14   2   2  2   3  1   .    .    .         .     
     3   7  12   17   50   29   3   3  3   5  1   .    .    .         .     
     4  10  18   26   86   47   4   4  3   6  1   .    .    .         .     
     5  14  20   42   90   55   5   5  4   2  1   .    .    .         .     
     6  18  27   58  139   83   6   6  4   7  1   .    .    .         .     
     7  23  35   83  203  123   7   7  5   8  1   .    .    .         .     
     8  28  43  108  267  163   8   8  5   8  1   .    .    .         .     
     9  34  52  144  348  217   9   9  6   9  1   .    .    .         .     
    10  40  63  180  469  283  10  10  6  11  1   .    .    .         .     
    11  47  75  229  613  367  11  11  7  12  1   .    .    .         .     
    12  54  90  278  838  472  12  12  7  15  1   .    .    .         .     
    13  54  90  278  838  472  12   .  .   .  .  4.5  7.5  1.91429  -1.11429
 
 
 
 
 
              /home/walleye/john-c/5421/sregiml.sas 09OCT13 18:20
                         Proc reg results for yi vs. xi                        2
                                                18:20 Wednesday, October 9, 2013

                               The REG Procedure
                                 Model: MODEL1
                            Dependent Variable: yi 

             Number of Observations Read                         13
             Number of Observations Used                         12
             Number of Observations with Missing Values           1


                              Analysis of Variance
 
                                     Sum of           Mean
 Source                   DF        Squares         Square    F Value    Pr > F

 Model                     1      128.25714      128.25714      36.92    0.0001
 Error                    10       34.74286        3.47429                     
 Corrected Total          11      163.00000                                    


              Root MSE              1.86394    R-Square     0.7869
              Dependent Mean        7.50000    Adj R-Sq     0.7655
              Coeff Var            24.85258                       


                              Parameter Estimates
 
                           Parameter       Standard
      Variable     DF       Estimate          Error    t Value    Pr > |t|

      Intercept     1       -1.11429        1.51646      -0.73      0.4793
      xi            1        1.91429        0.31506       6.08      0.0001
 
 
 
              /home/walleye/john-c/5421/sregiml.sas 09OCT13 18:20


                  PROC IML results for simple linear regression                 3
                                                18:20 Wednesday, October 9, 2013

                      x y xtx xty determ(xtx) xtxinv b : 


             x                   y       xtx                 xty    determ

             1         2         4        12        54        90       420
             1         2         3        54       278       472          
             1         3         5                                        
             1         3         6                                        
             1         4         2                                        
             1         4         7                                        
             1         5         8                                        
             1         5         8                                        
             1         6         9                                        
             1         6        11                                        
             1         7        12                                        
             1         7        15                                        
                            xtxinv                   b

                         0.6619048 -0.128571 -1.114286
                         -0.128571 0.0285714 1.9142857

                  s2,  ssreg, ssres, sstot, fstat, rsquare = 


                 s2     ssreg     ssres     sstot     fstat   rsquare

          3.4742857 128.25714 34.742857       163 36.916118 0.7868536

                       b0, serr(b0), t-stat0, p-value0 : 


                           b0    serrb0    tstat0   pvalue0

                    -1.114286 1.5164585 -0.734795 0.4793413

                       b1, serr(b1), t-stat1, p-value1 : 


                           b1    serrb1    tstat1   pvalue1

                    1.9142857  0.315064 6.0758636 0.0001194
 
 
 
 
 
 
 
 
 
 
 
 
 
 
              /home/walleye/john-c/5421/sregiml.sas 09OCT13 18:20

==============================================================================

     Note that there are some differences in syntax between IML and the usual
SAS syntax; note for example the logical comparators in the 'where'
statements, in which  "xi ^= ." in IML means the same as "xi ne ." in
ordinary SAS.  In fact, "xi ne ." will not work in IML.  See page 43 of the IML
Manual for comparative operators such as ^= .

     Note also that IML in many cases automatically gives computed matrices and
vectors the right dimensions.  In the above example,  x` * x  is dimensioned
as a  2 x 2  matrix without the programmer having to specify such.

     There are more extensive regression examples in the SAS IML manual:
In particular, see Chapter 3 and Chapter 8, Example 3.  The really essential lines
in the above example are the following:

==============================================================================
xtx = x` * x ;
xty = x` * y ;

xtxinv = inv(x` * x) ;

b = xtxinv * xty ;
==============================================================================

     This section of code would be identical for higher-dimensional linear
regression: that is, for models of the form

     Yi = B0 + B1 * X1i + B2 * X2i + ... + Bp * Xpi + e,

where e has a normal distribution with mean 0 and variance sigma2.

     Note that the character indicating a matrix transpose is:  ` .
Especially note that this is not the same as:  ' .

     The Manual examples make it clear that you can derive essentially all
the regression statistics that appear in, for example, PROC REG output,
using IML.  It is a good idea to study these examples carefully.

     Recommended chapters/sections in the SAS IML Manual:

     Chapter 1, Sect 1.2
     Chapter 2
     Chapter 3
     Chapter 4
     Chapter 5
     Chapter 6
     Chapter 8 Examples 1, 2, 3, 5, 8

------------------------------------------------------------------------------

PROBLEM 14

     1.  Use PROC IML to write a program that does one-way analysis of
         variance.  The input file would be of the following form:

         Factor value         Y-value
         ------------         -------
               1                 12
               1                 13
               1                 11
               2                 16
               2                 15
               2                 14
               2                 16
               3                  9
               3                 10

         In general, the factor values will range over a relatively small
         set of integers, with possibly differing numbers of observations
         for different factor values.  The Y-values in general are not
         necessarily integers.

         The key to this problem is using the data to construct the right
         design matrix.

         Output from this program should include the usual statistics that
         would appear in an ANOVA table for one-way analysis of variance.
         In particular, it should include the numbers in the blank boxes
         below (although they need not be in this format):


---------------------------------------------------------------------------------

                                    Sums of        Mean
     Source                D.F.     Squares       Square         F       p-value
     -----------------    ------   ---------     --------     -------  ----------
                          ______   _________    _________     _______    _______
                         |      | |         |  |         |   |       |  |       |
     Between groups      |      | |         |  |         |   |       |  |       |
                         |______| |_________|  |_________|   |_______|  |_______|
                          ______   _________    _________
                         |      | |         |  |         |
     Error (within gps)  |      | |         |  |         |
                         |______| |_________|  |_________|
                          ______   _________
                         |      | |         |
     Total (corrected)   |      | |         |
                         |______| |_________|
     ----------------------------------------------------------------------------
                          _________
                         |         |
     R-square:           |         |
                         |_________|

------------------------------------------------------------------------------

     2.  Use PROC IML to write a program that does two-way analysis of
         variance where it may be assumed that each factor takes on
         only two levels.  The analysis should include a test for
         interaction.

==================================================================================

SIMPLE LINEAR REGRESSION IN R

     Matrix operations can be implemented in R in much the same way as proc
iml.  The following R program shows two different ways of computing
regression coefficients:

==================================================================================

# Simple linear regression in R ...

ones <- rep(1, 10)              #   vector of 1's
x <- c(1:10)                    #   x = (1, 2, 3, ..., 10)
x0x1 <- cbind(ones, x)          #   design matrix X, 10 x 2
tx0x1 <- t(x0x1)                #   transpose

xpx <- tx0x1 %*% x0x1           #   compute X`X
invxpx <- solve(xpx)            #   invert  X`X

y <- c(1.78345, -0.54128, -0.44503, -0.66441, -1.72841, -3.00969,
+     -2.28972, -4.39501, -3.14249, -4.44228)

xty <- tx0x1 %*% y              #   compute X`Y

xtxixty <- invxpx %*% xty       #   compute  inv(X`X)*X`Y = coeff estimates

r <- lsfit(x, y)                #   R linear regression program

==================================================================================


/home/walleye/john-c/5421/notes.016    Last update: October 9, 2013