SAS PROC IML SPH 5421 notes.016 'IML' stands for Interactive Matrix Language. It can be used interactively, but in my experience, it is more likely in applications that one would use it in programs in 'batch' mode. IML provides a way to write and manipulate vectors and matrices. In particular, the following matrix operations are possible: 1) Addition of vectors and matrics 2) Multiplication of matrices by scalars, vectors and other matrices 3) Transposition 4) Inversion of matrices 5 Finding determinants 6) Finding eigenvectors and eigenvalues PROC IML also provides ways of writing design matrices for analysis of variance and regression problems; see p. 289 in the IML Manual. The SAS IML manual is indispensable for learning the language and as a reference: SAS/IML Software, Version 6, First Edition, 1990. SAS Institute, Inc, Cary, NC. The following is a SAS program showing how simple linear regression may be done in (ordinary) SAS, by the use of PROC REG, and then by the use of PROC IML. The IML program illustrates many of the basics of matrix manipulation. ============================================================================== * PROC IML Example : Simple linear regression. ; options linesize = 80 ; footnote "/home/walleye/john-c/5421/sregiml.sas &sysdate &systime" ; data sreg ; retain sumxi sumyi sumxisq sumyisq sumxiyi count 0 ; infile cards EOF = slope ; input obs xi yi ; one = 1 ; count = count + 1 ; sumxi = sumxi + xi ; sumyi = sumyi + yi ; sumxisq = sumxisq + xi * xi ; sumyisq = sumyisq + yi * yi ; sumxiyi = sumxiyi + xi * yi ; return ; slope: xbar = sumxi / count ; ybar = sumyi / count ; b1 = (sumxiyi - sumxi * sumyi / count) / (sumxisq - sumxi * sumxi / count) ; b0 = ybar - b1 * xbar ; cards ; 1 2 4 2 2 3 3 3 5 4 3 6 5 4 2 6 4 7 7 5 8 8 5 8 9 6 9 10 6 11 11 7 12 12 7 15 ; run ; proc print data = sreg ; title1 'Dataset sreg and simple linear regression:' ; proc reg data = sreg ; model yi = xi ; title1 'Proc reg results for yi vs. xi' ; proc iml ; use sreg ; read all var {one xi} where (xi ^= . & yi ^= .) into x ; read all var {yi} where (xi ^= . & yi ^= .) into y ; xtx = x` * x ; xty = x` * y ; n = nrow(y) ; p = 1 ; determ = det(x` * x) ; xtxinv = inv(x` * x) ; yt = y` ; xt = x` ; b = xtxinv * xty ; bt = b` ; s2 = (yt*y - bt*xt*y) / (n - p - 1) ; ybar = sum(y) / n ; ssreg = bt*xt*y - n*ybar**2 ; ssres = yt*y - bt*xt*y ; sstot = yt*y - n*ybar**2 ; rsquare = ssreg / sstot ; fstat = (ssreg/p) / (ssres/(n - p - 1)) ; covb = s2 * xtxinv ; serrb0 = sqrt(covb[1, 1]) ; serrb1 = sqrt(covb[2, 2]) ; b0 = b[1] ; b1 = b[2] ; tstat0 = b0 / serrb0 ; tstat1 = b1 / serrb1 ; df = n - p - 1 ; pvalue0 = 2 * (1 - probt(abs(tstat0), df)) ; pvalue1 = 2 * (1 - probt(abs(tstat1), df)) ; title1 'PROC IML results for simple linear regression' ; print "x y xtx xty determ(xtx) xtxinv b : " ; print x y xtx xty determ xtxinv b ; print, "s2, ssreg, ssres, sstot, fstat, rsquare = " ; print s2 ssreg ssres sstot fstat rsquare ; print, " b0, serr(b0), t-stat0, p-value0 : " ; print b0 serrb0 tstat0 pvalue0 ; print, " b1, serr(b1), t-stat1, p-value1 : " ; print b1 serrb1 tstat1 pvalue1 ; endsas ; ======================================================================== Dataset sreg and simple linear regression: 1 18:20 Wednesday, October 9, 2013 s s s u u u s s m m m c u u x y x o x y O m m i i i u o o b b b x y s s y n b x y n a a b b s i i q q i t s i i e r r 1 0 1 2 4 4 16 8 1 1 2 4 1 . . . . 2 4 7 8 25 14 2 2 2 3 1 . . . . 3 7 12 17 50 29 3 3 3 5 1 . . . . 4 10 18 26 86 47 4 4 3 6 1 . . . . 5 14 20 42 90 55 5 5 4 2 1 . . . . 6 18 27 58 139 83 6 6 4 7 1 . . . . 7 23 35 83 203 123 7 7 5 8 1 . . . . 8 28 43 108 267 163 8 8 5 8 1 . . . . 9 34 52 144 348 217 9 9 6 9 1 . . . . 10 40 63 180 469 283 10 10 6 11 1 . . . . 11 47 75 229 613 367 11 11 7 12 1 . . . . 12 54 90 278 838 472 12 12 7 15 1 . . . . 13 54 90 278 838 472 12 . . . . 4.5 7.5 1.91429 -1.11429 /home/walleye/john-c/5421/sregiml.sas 09OCT13 18:20 Proc reg results for yi vs. xi 2 18:20 Wednesday, October 9, 2013 The REG Procedure Model: MODEL1 Dependent Variable: yi Number of Observations Read 13 Number of Observations Used 12 Number of Observations with Missing Values 1 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 128.25714 128.25714 36.92 0.0001 Error 10 34.74286 3.47429 Corrected Total 11 163.00000 Root MSE 1.86394 R-Square 0.7869 Dependent Mean 7.50000 Adj R-Sq 0.7655 Coeff Var 24.85258 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -1.11429 1.51646 -0.73 0.4793 xi 1 1.91429 0.31506 6.08 0.0001 /home/walleye/john-c/5421/sregiml.sas 09OCT13 18:20 PROC IML results for simple linear regression 3 18:20 Wednesday, October 9, 2013 x y xtx xty determ(xtx) xtxinv b : x y xtx xty determ 1 2 4 12 54 90 420 1 2 3 54 278 472 1 3 5 1 3 6 1 4 2 1 4 7 1 5 8 1 5 8 1 6 9 1 6 11 1 7 12 1 7 15 xtxinv b 0.6619048 -0.128571 -1.114286 -0.128571 0.0285714 1.9142857 s2, ssreg, ssres, sstot, fstat, rsquare = s2 ssreg ssres sstot fstat rsquare 3.4742857 128.25714 34.742857 163 36.916118 0.7868536 b0, serr(b0), t-stat0, p-value0 : b0 serrb0 tstat0 pvalue0 -1.114286 1.5164585 -0.734795 0.4793413 b1, serr(b1), t-stat1, p-value1 : b1 serrb1 tstat1 pvalue1 1.9142857 0.315064 6.0758636 0.0001194 /home/walleye/john-c/5421/sregiml.sas 09OCT13 18:20 ============================================================================== Note that there are some differences in syntax between IML and the usual SAS syntax; note for example the logical comparators in the 'where' statements, in which "xi ^= ." in IML means the same as "xi ne ." in ordinary SAS. In fact, "xi ne ." will not work in IML. See page 43 of the IML Manual for comparative operators such as ^= . Note also that IML in many cases automatically gives computed matrices and vectors the right dimensions. In the above example, x` * x is dimensioned as a 2 x 2 matrix without the programmer having to specify such. There are more extensive regression examples in the SAS IML manual: In particular, see Chapter 3 and Chapter 8, Example 3. The really essential lines in the above example are the following: ============================================================================== xtx = x` * x ; xty = x` * y ; xtxinv = inv(x` * x) ; b = xtxinv * xty ; ============================================================================== This section of code would be identical for higher-dimensional linear regression: that is, for models of the form Yi = B0 + B1 * X1i + B2 * X2i + ... + Bp * Xpi + e, where e has a normal distribution with mean 0 and variance sigma2. Note that the character indicating a matrix transpose is: ` . Especially note that this is not the same as: ' . The Manual examples make it clear that you can derive essentially all the regression statistics that appear in, for example, PROC REG output, using IML. It is a good idea to study these examples carefully. Recommended chapters/sections in the SAS IML Manual: Chapter 1, Sect 1.2 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 8 Examples 1, 2, 3, 5, 8 ------------------------------------------------------------------------------ PROBLEM 14 1. Use PROC IML to write a program that does one-way analysis of variance. The input file would be of the following form: Factor value Y-value ------------ ------- 1 12 1 13 1 11 2 16 2 15 2 14 2 16 3 9 3 10 In general, the factor values will range over a relatively small set of integers, with possibly differing numbers of observations for different factor values. The Y-values in general are not necessarily integers. The key to this problem is using the data to construct the right design matrix. Output from this program should include the usual statistics that would appear in an ANOVA table for one-way analysis of variance. In particular, it should include the numbers in the blank boxes below (although they need not be in this format): --------------------------------------------------------------------------------- Sums of Mean Source D.F. Squares Square F p-value ----------------- ------ --------- -------- ------- ---------- ______ _________ _________ _______ _______ | | | | | | | | | | Between groups | | | | | | | | | | |______| |_________| |_________| |_______| |_______| ______ _________ _________ | | | | | | Error (within gps) | | | | | | |______| |_________| |_________| ______ _________ | | | | Total (corrected) | | | | |______| |_________| ---------------------------------------------------------------------------- _________ | | R-square: | | |_________| ------------------------------------------------------------------------------ 2. Use PROC IML to write a program that does two-way analysis of variance where it may be assumed that each factor takes on only two levels. The analysis should include a test for interaction. ================================================================================== SIMPLE LINEAR REGRESSION IN R Matrix operations can be implemented in R in much the same way as proc iml. The following R program shows two different ways of computing regression coefficients: ================================================================================== # Simple linear regression in R ... ones <- rep(1, 10) # vector of 1's x <- c(1:10) # x = (1, 2, 3, ..., 10) x0x1 <- cbind(ones, x) # design matrix X, 10 x 2 tx0x1 <- t(x0x1) # transpose xpx <- tx0x1 %*% x0x1 # compute X`X invxpx <- solve(xpx) # invert X`X y <- c(1.78345, -0.54128, -0.44503, -0.66441, -1.72841, -3.00969, + -2.28972, -4.39501, -3.14249, -4.44228) xty <- tx0x1 %*% y # compute X`Y xtxixty <- invxpx %*% xty # compute inv(X`X)*X`Y = coeff estimates r <- lsfit(x, y) # R linear regression program ================================================================================== /home/walleye/john-c/5421/notes.016 Last update: October 9, 2013