SAS PROC IML SPH 5421 notes.016
'IML' stands for Interactive Matrix Language. It can be used interactively,
but in my experience, it is more likely in applications that one would
use it in programs in 'batch' mode.
IML provides a way to write and manipulate vectors and matrices.
In particular, the following matrix operations are possible:
1) Addition of vectors and matrics
2) Multiplication of matrices by scalars, vectors and other matrices
3) Transposition
4) Inversion of matrices
5 Finding determinants
6) Finding eigenvectors and eigenvalues
PROC IML also provides ways of writing design matrices for analysis of
variance and regression problems; see p. 289 in the IML Manual.
The SAS IML manual is indispensable for learning the language and as a
reference: SAS/IML Software, Version 6, First Edition, 1990. SAS Institute, Inc,
Cary, NC.
The following is a SAS program showing how simple linear regression
may be done in (ordinary) SAS, by the use of PROC REG, and then by
the use of PROC IML. The IML program illustrates many of the basics of
matrix manipulation.
==============================================================================
* PROC IML Example : Simple linear regression. ;
options linesize = 80 ;
footnote "/home/walleye/john-c/5421/sregiml.sas &sysdate &systime" ;
data sreg ;
retain sumxi sumyi sumxisq sumyisq sumxiyi count 0 ;
infile cards EOF = slope ;
input obs xi yi ;
one = 1 ;
count = count + 1 ;
sumxi = sumxi + xi ;
sumyi = sumyi + yi ;
sumxisq = sumxisq + xi * xi ;
sumyisq = sumyisq + yi * yi ;
sumxiyi = sumxiyi + xi * yi ;
return ;
slope:
xbar = sumxi / count ;
ybar = sumyi / count ;
b1 = (sumxiyi - sumxi * sumyi / count) / (sumxisq - sumxi * sumxi / count) ;
b0 = ybar - b1 * xbar ;
cards ;
1 2 4
2 2 3
3 3 5
4 3 6
5 4 2
6 4 7
7 5 8
8 5 8
9 6 9
10 6 11
11 7 12
12 7 15
;
run ;
proc print data = sreg ;
title1 'Dataset sreg and simple linear regression:' ;
proc reg data = sreg ;
model yi = xi ;
title1 'Proc reg results for yi vs. xi' ;
proc iml ;
use sreg ;
read all var {one xi} where (xi ^= . & yi ^= .) into x ;
read all var {yi} where (xi ^= . & yi ^= .) into y ;
xtx = x` * x ;
xty = x` * y ;
n = nrow(y) ;
p = 1 ;
determ = det(x` * x) ;
xtxinv = inv(x` * x) ;
yt = y` ;
xt = x` ;
b = xtxinv * xty ;
bt = b` ;
s2 = (yt*y - bt*xt*y) / (n - p - 1) ;
ybar = sum(y) / n ;
ssreg = bt*xt*y - n*ybar**2 ;
ssres = yt*y - bt*xt*y ;
sstot = yt*y - n*ybar**2 ;
rsquare = ssreg / sstot ;
fstat = (ssreg/p) / (ssres/(n - p - 1)) ;
covb = s2 * xtxinv ;
serrb0 = sqrt(covb[1, 1]) ;
serrb1 = sqrt(covb[2, 2]) ;
b0 = b[1] ;
b1 = b[2] ;
tstat0 = b0 / serrb0 ;
tstat1 = b1 / serrb1 ;
df = n - p - 1 ;
pvalue0 = 2 * (1 - probt(abs(tstat0), df)) ;
pvalue1 = 2 * (1 - probt(abs(tstat1), df)) ;
title1 'PROC IML results for simple linear regression' ;
print "x y xtx xty determ(xtx) xtxinv b : " ; print x y xtx xty determ xtxinv b ;
print, "s2, ssreg, ssres, sstot, fstat, rsquare = " ; print s2 ssreg ssres sstot fstat rsquare ;
print, " b0, serr(b0), t-stat0, p-value0 : " ; print b0 serrb0 tstat0 pvalue0 ;
print, " b1, serr(b1), t-stat1, p-value1 : " ; print b1 serrb1 tstat1 pvalue1 ;
endsas ;
========================================================================
Dataset sreg and simple linear regression: 1
18:20 Wednesday, October 9, 2013
s s s
u u u
s s m m m c
u u x y x o x y
O m m i i i u o o b b
b x y s s y n b x y n a a b b
s i i q q i t s i i e r r 1 0
1 2 4 4 16 8 1 1 2 4 1 . . . .
2 4 7 8 25 14 2 2 2 3 1 . . . .
3 7 12 17 50 29 3 3 3 5 1 . . . .
4 10 18 26 86 47 4 4 3 6 1 . . . .
5 14 20 42 90 55 5 5 4 2 1 . . . .
6 18 27 58 139 83 6 6 4 7 1 . . . .
7 23 35 83 203 123 7 7 5 8 1 . . . .
8 28 43 108 267 163 8 8 5 8 1 . . . .
9 34 52 144 348 217 9 9 6 9 1 . . . .
10 40 63 180 469 283 10 10 6 11 1 . . . .
11 47 75 229 613 367 11 11 7 12 1 . . . .
12 54 90 278 838 472 12 12 7 15 1 . . . .
13 54 90 278 838 472 12 . . . . 4.5 7.5 1.91429 -1.11429
/home/walleye/john-c/5421/sregiml.sas 09OCT13 18:20
Proc reg results for yi vs. xi 2
18:20 Wednesday, October 9, 2013
The REG Procedure
Model: MODEL1
Dependent Variable: yi
Number of Observations Read 13
Number of Observations Used 12
Number of Observations with Missing Values 1
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 128.25714 128.25714 36.92 0.0001
Error 10 34.74286 3.47429
Corrected Total 11 163.00000
Root MSE 1.86394 R-Square 0.7869
Dependent Mean 7.50000 Adj R-Sq 0.7655
Coeff Var 24.85258
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 -1.11429 1.51646 -0.73 0.4793
xi 1 1.91429 0.31506 6.08 0.0001
/home/walleye/john-c/5421/sregiml.sas 09OCT13 18:20
PROC IML results for simple linear regression 3
18:20 Wednesday, October 9, 2013
x y xtx xty determ(xtx) xtxinv b :
x y xtx xty determ
1 2 4 12 54 90 420
1 2 3 54 278 472
1 3 5
1 3 6
1 4 2
1 4 7
1 5 8
1 5 8
1 6 9
1 6 11
1 7 12
1 7 15
xtxinv b
0.6619048 -0.128571 -1.114286
-0.128571 0.0285714 1.9142857
s2, ssreg, ssres, sstot, fstat, rsquare =
s2 ssreg ssres sstot fstat rsquare
3.4742857 128.25714 34.742857 163 36.916118 0.7868536
b0, serr(b0), t-stat0, p-value0 :
b0 serrb0 tstat0 pvalue0
-1.114286 1.5164585 -0.734795 0.4793413
b1, serr(b1), t-stat1, p-value1 :
b1 serrb1 tstat1 pvalue1
1.9142857 0.315064 6.0758636 0.0001194
/home/walleye/john-c/5421/sregiml.sas 09OCT13 18:20
==============================================================================
Note that there are some differences in syntax between IML and the usual
SAS syntax; note for example the logical comparators in the 'where'
statements, in which "xi ^= ." in IML means the same as "xi ne ." in
ordinary SAS. In fact, "xi ne ." will not work in IML. See page 43 of the IML
Manual for comparative operators such as ^= .
Note also that IML in many cases automatically gives computed matrices and
vectors the right dimensions. In the above example, x` * x is dimensioned
as a 2 x 2 matrix without the programmer having to specify such.
There are more extensive regression examples in the SAS IML manual:
In particular, see Chapter 3 and Chapter 8, Example 3. The really essential lines
in the above example are the following:
==============================================================================
xtx = x` * x ;
xty = x` * y ;
xtxinv = inv(x` * x) ;
b = xtxinv * xty ;
==============================================================================
This section of code would be identical for higher-dimensional linear
regression: that is, for models of the form
Yi = B0 + B1 * X1i + B2 * X2i + ... + Bp * Xpi + e,
where e has a normal distribution with mean 0 and variance sigma2.
Note that the character indicating a matrix transpose is: ` .
Especially note that this is not the same as: ' .
The Manual examples make it clear that you can derive essentially all
the regression statistics that appear in, for example, PROC REG output,
using IML. It is a good idea to study these examples carefully.
Recommended chapters/sections in the SAS IML Manual:
Chapter 1, Sect 1.2
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 8 Examples 1, 2, 3, 5, 8
------------------------------------------------------------------------------
PROBLEM 14
1. Use PROC IML to write a program that does one-way analysis of
variance. The input file would be of the following form:
Factor value Y-value
------------ -------
1 12
1 13
1 11
2 16
2 15
2 14
2 16
3 9
3 10
In general, the factor values will range over a relatively small
set of integers, with possibly differing numbers of observations
for different factor values. The Y-values in general are not
necessarily integers.
The key to this problem is using the data to construct the right
design matrix.
Output from this program should include the usual statistics that
would appear in an ANOVA table for one-way analysis of variance.
In particular, it should include the numbers in the blank boxes
below (although they need not be in this format):
---------------------------------------------------------------------------------
Sums of Mean
Source D.F. Squares Square F p-value
----------------- ------ --------- -------- ------- ----------
______ _________ _________ _______ _______
| | | | | | | | | |
Between groups | | | | | | | | | |
|______| |_________| |_________| |_______| |_______|
______ _________ _________
| | | | | |
Error (within gps) | | | | | |
|______| |_________| |_________|
______ _________
| | | |
Total (corrected) | | | |
|______| |_________|
----------------------------------------------------------------------------
_________
| |
R-square: | |
|_________|
------------------------------------------------------------------------------
2. Use PROC IML to write a program that does two-way analysis of
variance where it may be assumed that each factor takes on
only two levels. The analysis should include a test for
interaction.
==================================================================================
SIMPLE LINEAR REGRESSION IN R
Matrix operations can be implemented in R in much the same way as proc
iml. The following R program shows two different ways of computing
regression coefficients:
==================================================================================
# Simple linear regression in R ...
ones <- rep(1, 10) # vector of 1's
x <- c(1:10) # x = (1, 2, 3, ..., 10)
x0x1 <- cbind(ones, x) # design matrix X, 10 x 2
tx0x1 <- t(x0x1) # transpose
xpx <- tx0x1 %*% x0x1 # compute X`X
invxpx <- solve(xpx) # invert X`X
y <- c(1.78345, -0.54128, -0.44503, -0.66441, -1.72841, -3.00969,
+ -2.28972, -4.39501, -3.14249, -4.44228)
xty <- tx0x1 %*% y # compute X`Y
xtxixty <- invxpx %*% xty # compute inv(X`X)*X`Y = coeff estimates
r <- lsfit(x, y) # R linear regression program
==================================================================================
/home/walleye/john-c/5421/notes.016 Last update: October 9, 2013