SAS PROC IML                                                   SPH 5421 notes.016

    'IML' stands for Interactive Matrix Language.  It can be used interactively,
but in my experience, it is more likely in applications that one would
use it in programs in 'batch' mode.

    IML provides a way to write and manipulate vectors and matrices.
In particular, the following matrix operations are possible:

    1) Addition of vectors and matrics
    2) Multiplication of matrices by scalars, vectors and other matrices
    3) Transposition
    4) Inversion of matrices
    5  Finding determinants
    6) Finding eigenvectors and eigenvalues

    PROC IML also provides ways of writing design matrices for analysis of
variance and regression problems; see p. 289 in the IML Manual.

    The SAS IML manual is indispensable for learning the language and as a
reference: SAS/IML Software, Version 6, First Edition, 1990. SAS Institute, Inc,
Cary, NC.

    The following is a SAS program showing how simple linear regression
may be done in (ordinary) SAS, by the use of PROC REG, and then by
the use of PROC IML.  The IML program illustrates many of the basics of
matrix manipulation.

==============================================================================

* PROC IML Example :  Simple linear regression.                    ;

options linesize = 80 ;

footnote "/home/walleye/john-c/5421/sregiml.sas &sysdate &systime" ;

data sreg ;
     retain sumxi sumyi sumxisq sumyisq sumxiyi count 0 ;

     infile cards  EOF = slope ;
     input obs xi yi ;

     one = 1 ;
     count = count + 1 ;
     sumxi = sumxi + xi ;
     sumyi = sumyi + yi ;
     sumxisq = sumxisq + xi * xi ;
     sumyisq = sumyisq + yi * yi ;
     sumxiyi = sumxiyi + xi * yi ;

     return ;

slope:

     xbar = sumxi / count ;
     ybar = sumyi / count ;
     b1 = (sumxiyi - sumxi * sumyi / count) / (sumxisq - sumxi * sumxi / count) ;
     b0   = ybar - b1 * xbar ;

     cards ;
     1   2   4
     2   2   3
     3   3   5
     4   3   6
     5   4   2
     6   4   7
     7   5   8
     8   5   8
     9   6   9
    10   6  11
    11   7  12
    12   7  15
    ;

run ;

proc print data = sreg ;
title1 'Dataset sreg and simple linear regression:' ;

proc reg data = sreg ;
     model yi = xi ;
title1 'Proc reg results for yi vs. xi' ;

proc iml ;

use sreg ;

read all var {one xi} where (xi ^= . & yi ^= .) into x ;
read all var {yi}     where (xi ^= . & yi ^= .) into y ;

xtx = x` * x ;
xty = x` * y ;
determ = det(x` * x) ;

xtxinv = inv(x` * x) ;

b = xtxinv * xty ;

title1 'PROC IML results for simple linear regression' ;

print,  x  y  xtx  xty  determ  xtxinv  b ;

endsas ;

========================================================================

                   Dataset sreg and simple linear regression:                  1
                                               17:53 Wednesday, October 16, 2002

                  S   S     S
                  U   U     U
         S   S    M   M     M   C
         U   U    X   Y     X   O                 X    Y
     O   M   M    I   I     I   U   O         O   B    B
     B   X   Y    S   S     Y   N   B  X   Y  N   A    A      B         B
     S   I   I    Q   Q     I   T   S  I   I  E   R    R      1         0

     1   2   4    4   16    8   1   1  2   4  1   .    .    .         .     
     2   4   7    8   25   14   2   2  2   3  1   .    .    .         .     
     3   7  12   17   50   29   3   3  3   5  1   .    .    .         .     
     4  10  18   26   86   47   4   4  3   6  1   .    .    .         .     
     5  14  20   42   90   55   5   5  4   2  1   .    .    .         .     
     6  18  27   58  139   83   6   6  4   7  1   .    .    .         .     
     7  23  35   83  203  123   7   7  5   8  1   .    .    .         .     
     8  28  43  108  267  163   8   8  5   8  1   .    .    .         .     
     9  34  52  144  348  217   9   9  6   9  1   .    .    .         .     
    10  40  63  180  469  283  10  10  6  11  1   .    .    .         .     
    11  47  75  229  613  367  11  11  7  12  1   .    .    .         .     
    12  54  90  278  838  472  12  12  7  15  1   .    .    .         .     
    13  54  90  278  838  472  12   .  .   .  .  4.5  7.5  1.91429  -1.11429
 
 

              /home/walleye/john-c/5421/sregiml.sas 16OCT02 17:53
========================================================================
                          Proc reg results for yi vs. xi                        2
                                               17:53 Wednesday, October 16, 2002

Model: MODEL1  
Dependent Variable: YI                                                 

                              Analysis of Variance

                                 Sum of         Mean
        Source          DF      Squares       Square      F Value       Prob>F

        Model            1    128.25714    128.25714       36.916       0.0001
        Error           10     34.74286      3.47429
        C Total         11    163.00000

            Root MSE       1.86394     R-square       0.7869
            Dep Mean       7.50000     Adj R-sq       0.7655
            C.V.          24.85258

                              Parameter Estimates

                       Parameter      Standard    T for H0:               
      Variable  DF      Estimate         Error   Parameter=0    Prob > |T|

      INTERCEP   1     -1.114286    1.51645846        -0.735        0.4793
      XI         1      1.914286    0.31506397         6.076        0.0001

 
 
 
              /home/walleye/john-c/5421/sregiml.sas 16OCT02 17:53
========================================================================
                  PROC IML results for simple linear regression                 3
                                               17:53 Wednesday, October 16, 2002

             X                   Y       XTX                 XTY    DETERM
             1         2         4        12        54        90       420
             1         2         3        54       278       472
             1         3         5
             1         3         6
             1         4         2
             1         4         7
             1         5         8
             1         5         8
             1         6         9
             1         6        11
             1         7        12
             1         7        15


                            XTXINV                   B
                         0.6619048 -0.128571 -1.114286
                         -0.128571 0.0285714 1.9142857

 
 
 
              /home/walleye/john-c/5421/sregiml.sas 16OCT02 17:53
==============================================================================

     Note that there are some differences in syntax between IML and the usual
SAS syntax; note for example the logical comparators in the 'where'
statements, in which  "xi ^= ." in IML means the same as "xi ne ." in
ordinary SAS.  In fact, "xi ne ." will not work in IML.  See page 43 of the IML
Manual for comparative operators such as ^= .

     Note also that IML in many cases automatically gives computed matrices and
vectors the right dimensions.  In the above example,  x` * x  is dimensioned
as a  2 x 2  matrix without the programmer having to specify such.

     There are more extensive regression examples in the SAS IML manual:
In particular, see Chapter 3 and Chapter 8, Example 3.  The really essential lines
in the above example are the following:

==============================================================================
xtx = x` * x ;
xty = x` * y ;

xtxinv = inv(x` * x) ;

b = xtxinv * xty ;
==============================================================================

     This section of code would be identical for higher-dimensional linear
regression: that is, for models of the form

     Yi = B0 + B1 * X1i + B2 * X2i + ... + Bp * Xpi + e,

where e has a normal distribution with mean 0 and variance sigma2.

     Note that the character indicating a matrix transpose is:  ` .
Especially note that this is not the same as:  ' .

     The Manual examples make it clear that you can derive essentially all
the regression statistics that appear in, for example, PROC REG output,
using IML.  It is a good idea to study these examples carefully.

     Recommended chapters/sections in the SAS IML Manual:

     Chapter 1, Sect 1.2
     Chapter 2
     Chapter 3
     Chapter 4
     Chapter 5
     Chapter 6
     Chapter 8 Examples 1, 2, 3, 5, 8

------------------------------------------------------------------------------

PROBLEM 14

     1.  Use PROC IML to write a program that does one-way analysis of
         variance.  The input file would be of the following form:

         Factor value         Y-value
         ------------         -------
               1                 12
               1                 13
               1                 11
               2                 16
               2                 15
               2                 14
               2                 16
               3                  9
               3                 10

         In general, the factor values will range over a relatively small
         set of integers, with possibly differing numbers of observations
         for different factor values.  The Y-values in general are not
         necessarily integers.

         The key to this problem is using the data to construct the right
         design matrix.

         Output from this program should include the usual statistics that
         would appear in an ANOVA table for one-way analysis of variance.
         In particular, it should include the numbers in the blank boxes
         below (although they need not be in this format):


---------------------------------------------------------------------------------

                                    Sums of        Mean
     Source                D.F.     Squares       Square         F       p-value
     -----------------    ------   ---------     --------     -------  ----------
                          ______   _________    _________     _______    _______
                         |      | |         |  |         |   |       |  |       |
     Between groups      |      | |         |  |         |   |       |  |       |
                         |______| |_________|  |_________|   |_______|  |_______|
                          ______   _________    _________
                         |      | |         |  |         |
     Error (within gps)  |      | |         |  |         |
                         |______| |_________|  |_________|
                          ______   _________
                         |      | |         |
     Total (corrected)   |      | |         |
                         |______| |_________|
     ----------------------------------------------------------------------------
                          _________
                         |         |
     R-square:           |         |
                         |_________|

------------------------------------------------------------------------------

     2.  Use PROC IML to write a program that does two-way analysis of
         variance where it may be assumed that each factor takes on
         only two levels.  The analysis should include a test for
         interaction.

==================================================================================

SIMPLE LINEAR REGRESSION IN SPLUS

     Matrix operations can be implemented in Splus in much the same way as proc
iml.  The following Splus program shows two different ways of computing
regression coefficients:

==================================================================================

# Simple linear regression in Splus ...

ones <- rep(1, 10)              #   vector of 1's
x <- c(1:10)                    #   x = (1, 2, 3, ..., 10)
x0x1 <- cbind(ones, x)          #   design matrix X, 10 x 2
tx0x1 <- t(x0x1)                #   transpose

xpx <- tx0x1 %*% x0x1           #   compute X`X
invxpx <- solve(xpx)            #   invert  X`X

y <- c(1.78345, -0.54128, -0.44503, -0.66441, -1.72841, -3.00969,
+     -2.28972, -4.39501, -3.14249, -4.44228)

xty <- tx0x1 %*% y              #   compute X`Y

xtxixty <- invxpx %*% xty       #   compute  inv(X`X)*X`Y = coeff estimates

r <- lsfit(x, y)                #   Splus linear regression program

==================================================================================


/home/walleye/john-c/5421/notes.016    Last update: October 16, 2002