SAS PROC IML SPH 5421 notes.016
'IML' stands for Interactive Matrix Language. It can be used interactively,
but in my experience, it is more likely in applications that one would
use it in programs in 'batch' mode.
IML provides a way to write and manipulate vectors and matrices.
In particular, the following matrix operations are possible:
1) Addition of vectors and matrics
2) Multiplication of matrices by scalars, vectors and other matrices
3) Transposition
4) Inversion of matrices
5 Finding determinants
6) Finding eigenvectors and eigenvalues
PROC IML also provides ways of writing design matrices for analysis of
variance and regression problems; see p. 289 in the IML Manual.
The SAS IML manual is indispensable for learning the language and as a
reference: SAS/IML Software, Version 6, First Edition, 1990. SAS Institute, Inc,
Cary, NC.
The following is a SAS program showing how simple linear regression
may be done in (ordinary) SAS, by the use of PROC REG, and then by
the use of PROC IML. The IML program illustrates many of the basics of
matrix manipulation.
==============================================================================
* PROC IML Example : Simple linear regression. ;
options linesize = 80 ;
footnote "/home/walleye/john-c/5421/sregiml.sas &sysdate &systime" ;
data sreg ;
retain sumxi sumyi sumxisq sumyisq sumxiyi count 0 ;
infile cards EOF = slope ;
input obs xi yi ;
one = 1 ;
count = count + 1 ;
sumxi = sumxi + xi ;
sumyi = sumyi + yi ;
sumxisq = sumxisq + xi * xi ;
sumyisq = sumyisq + yi * yi ;
sumxiyi = sumxiyi + xi * yi ;
return ;
slope:
xbar = sumxi / count ;
ybar = sumyi / count ;
b1 = (sumxiyi - sumxi * sumyi / count) / (sumxisq - sumxi * sumxi / count) ;
b0 = ybar - b1 * xbar ;
cards ;
1 2 4
2 2 3
3 3 5
4 3 6
5 4 2
6 4 7
7 5 8
8 5 8
9 6 9
10 6 11
11 7 12
12 7 15
;
run ;
proc print data = sreg ;
title1 'Dataset sreg and simple linear regression:' ;
proc reg data = sreg ;
model yi = xi ;
title1 'Proc reg results for yi vs. xi' ;
proc iml ;
use sreg ;
read all var {one xi} where (xi ^= . & yi ^= .) into x ;
read all var {yi} where (xi ^= . & yi ^= .) into y ;
xtx = x` * x ;
xty = x` * y ;
determ = det(x` * x) ;
xtxinv = inv(x` * x) ;
b = xtxinv * xty ;
title1 'PROC IML results for simple linear regression' ;
print, x y xtx xty determ xtxinv b ;
endsas ;
========================================================================
Dataset sreg and simple linear regression: 1
17:53 Wednesday, October 16, 2002
S S S
U U U
S S M M M C
U U X Y X O X Y
O M M I I I U O O B B
B X Y S S Y N B X Y N A A B B
S I I Q Q I T S I I E R R 1 0
1 2 4 4 16 8 1 1 2 4 1 . . . .
2 4 7 8 25 14 2 2 2 3 1 . . . .
3 7 12 17 50 29 3 3 3 5 1 . . . .
4 10 18 26 86 47 4 4 3 6 1 . . . .
5 14 20 42 90 55 5 5 4 2 1 . . . .
6 18 27 58 139 83 6 6 4 7 1 . . . .
7 23 35 83 203 123 7 7 5 8 1 . . . .
8 28 43 108 267 163 8 8 5 8 1 . . . .
9 34 52 144 348 217 9 9 6 9 1 . . . .
10 40 63 180 469 283 10 10 6 11 1 . . . .
11 47 75 229 613 367 11 11 7 12 1 . . . .
12 54 90 278 838 472 12 12 7 15 1 . . . .
13 54 90 278 838 472 12 . . . . 4.5 7.5 1.91429 -1.11429
/home/walleye/john-c/5421/sregiml.sas 16OCT02 17:53
========================================================================
Proc reg results for yi vs. xi 2
17:53 Wednesday, October 16, 2002
Model: MODEL1
Dependent Variable: YI
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 1 128.25714 128.25714 36.916 0.0001
Error 10 34.74286 3.47429
C Total 11 163.00000
Root MSE 1.86394 R-square 0.7869
Dep Mean 7.50000 Adj R-sq 0.7655
C.V. 24.85258
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 -1.114286 1.51645846 -0.735 0.4793
XI 1 1.914286 0.31506397 6.076 0.0001
/home/walleye/john-c/5421/sregiml.sas 16OCT02 17:53
========================================================================
PROC IML results for simple linear regression 3
17:53 Wednesday, October 16, 2002
X Y XTX XTY DETERM
1 2 4 12 54 90 420
1 2 3 54 278 472
1 3 5
1 3 6
1 4 2
1 4 7
1 5 8
1 5 8
1 6 9
1 6 11
1 7 12
1 7 15
XTXINV B
0.6619048 -0.128571 -1.114286
-0.128571 0.0285714 1.9142857
/home/walleye/john-c/5421/sregiml.sas 16OCT02 17:53
==============================================================================
Note that there are some differences in syntax between IML and the usual
SAS syntax; note for example the logical comparators in the 'where'
statements, in which "xi ^= ." in IML means the same as "xi ne ." in
ordinary SAS. In fact, "xi ne ." will not work in IML. See page 43 of the IML
Manual for comparative operators such as ^= .
Note also that IML in many cases automatically gives computed matrices and
vectors the right dimensions. In the above example, x` * x is dimensioned
as a 2 x 2 matrix without the programmer having to specify such.
There are more extensive regression examples in the SAS IML manual:
In particular, see Chapter 3 and Chapter 8, Example 3. The really essential lines
in the above example are the following:
==============================================================================
xtx = x` * x ;
xty = x` * y ;
xtxinv = inv(x` * x) ;
b = xtxinv * xty ;
==============================================================================
This section of code would be identical for higher-dimensional linear
regression: that is, for models of the form
Yi = B0 + B1 * X1i + B2 * X2i + ... + Bp * Xpi + e,
where e has a normal distribution with mean 0 and variance sigma2.
Note that the character indicating a matrix transpose is: ` .
Especially note that this is not the same as: ' .
The Manual examples make it clear that you can derive essentially all
the regression statistics that appear in, for example, PROC REG output,
using IML. It is a good idea to study these examples carefully.
Recommended chapters/sections in the SAS IML Manual:
Chapter 1, Sect 1.2
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 8 Examples 1, 2, 3, 5, 8
------------------------------------------------------------------------------
PROBLEM 14
1. Use PROC IML to write a program that does one-way analysis of
variance. The input file would be of the following form:
Factor value Y-value
------------ -------
1 12
1 13
1 11
2 16
2 15
2 14
2 16
3 9
3 10
In general, the factor values will range over a relatively small
set of integers, with possibly differing numbers of observations
for different factor values. The Y-values in general are not
necessarily integers.
The key to this problem is using the data to construct the right
design matrix.
Output from this program should include the usual statistics that
would appear in an ANOVA table for one-way analysis of variance.
In particular, it should include the numbers in the blank boxes
below (although they need not be in this format):
---------------------------------------------------------------------------------
Sums of Mean
Source D.F. Squares Square F p-value
----------------- ------ --------- -------- ------- ----------
______ _________ _________ _______ _______
| | | | | | | | | |
Between groups | | | | | | | | | |
|______| |_________| |_________| |_______| |_______|
______ _________ _________
| | | | | |
Error (within gps) | | | | | |
|______| |_________| |_________|
______ _________
| | | |
Total (corrected) | | | |
|______| |_________|
----------------------------------------------------------------------------
_________
| |
R-square: | |
|_________|
------------------------------------------------------------------------------
2. Use PROC IML to write a program that does two-way analysis of
variance where it may be assumed that each factor takes on
only two levels. The analysis should include a test for
interaction.
==================================================================================
SIMPLE LINEAR REGRESSION IN SPLUS
Matrix operations can be implemented in Splus in much the same way as proc
iml. The following Splus program shows two different ways of computing
regression coefficients:
==================================================================================
# Simple linear regression in Splus ...
ones <- rep(1, 10) # vector of 1's
x <- c(1:10) # x = (1, 2, 3, ..., 10)
x0x1 <- cbind(ones, x) # design matrix X, 10 x 2
tx0x1 <- t(x0x1) # transpose
xpx <- tx0x1 %*% x0x1 # compute X`X
invxpx <- solve(xpx) # invert X`X
y <- c(1.78345, -0.54128, -0.44503, -0.66441, -1.72841, -3.00969,
+ -2.28972, -4.39501, -3.14249, -4.44228)
xty <- tx0x1 %*% y # compute X`Y
xtxixty <- invxpx %*% xty # compute inv(X`X)*X`Y = coeff estimates
r <- lsfit(x, y) # Splus linear regression program
==================================================================================
/home/walleye/john-c/5421/notes.016 Last update: October 16, 2002