APPLICATIONS OF SAS IML: SPH 5421 notes.021a
EIGENVECTORS AND EIGENVALUES
CREATING UNCORRELATED VARIABLES ...
----------------------------------------------------------------------------------
Assume R and S are random variables with covariance matrix:
| var(R) cov(R, S) |
A = | |
| cov(R, S) var(S) |
In general, cov(R, S) is not zero, i.e., you expect that these two variables are correlated.
If R and S are observed covariates (potential predictors) of an outcome variable Y, the fact
that they are correlated in general complicates the inference regarding their effects,
based on regression of Y on R and S.
It is, however, possible to use the theory and computational methods for eigenvectors to
replace R and S by new variables V and W which are linear combinations of R and S, which
are uncorrelated, and which provide exactly equivalent regression results. The following
program illustrates this. Here, data from the 'lhs.data' (see notes.010) are analyzed.
The variables of interest are: S2FEVPOS, WEIGHT0, and CIGSA0. Both WEIGHT0 and CIGSA0
are covariates that are predictive of the outcome, S2FEVPOS. PROC CORR computes the
covariance matrix of WEIGHT0 and CIGSA0. This is:
| 205.439300 14.4896962 |
A = | |
| 14.4896962 193.0993948 |
Then PROC IML is invoked to find the matrix of eigenvectors of this covariance matrix:
| 0.8341993 -0.551463 |
P = | |
| 0.551463 0.8341993 |
Next, two new variables V and W are defined by the matrix multiplication:
| V | | WEIGHT0 |
| | = P` * | |
| W | | CIGSA0 |
(Note that P` is the transpose of P.)
Which means V and W are expressed as a linear combination of WEIGHT0 and CIGSA0:
V = 0.8341993 * WEIGHT0 + 0.551463 * CIGSA0, and
W = -0.551463 * WEIGHT0 + 0.8341993 * CIGSA0.
Next, PROC CORR is invoked again, indicating cov(V, W) = 0, as expected.
Then a series of regressions are run using PROC REG with the following models
and results:
Regression Sum of Squares
-------------------------
Model 1: S2FEVPOS = WEIGHT0 47.50824
Model 2: S2FEVPOS = CIGSA0 3.79839
Model 3: S2FEVPOS = WEIGHT0 + CIGSA0 49.61468
Model 4: S2FEVPOS = V 44.07377
Model 5: S2FEVPOS = W 5.54091
Model 6: S2FEVPOS = V + W 49.61468
(Note that an intercept term is assumed in each regression)
The important thing to notice here is, the regression sums of squares for
the models with WEIGHT0 and CIGSA0 separately do not add up to the regression
sum of squares for the WEIGHT0 + CIGSA0 model. However, the regression sums of
squares for V and W separately DO add up to the RegSS for the V + W model.
This is due to the fact that V and W are uncorrelated.
========================================================================
FILENAME GRAPH 'gsas.grf' ;
OPTIONS LINESIZE = 120 ;
GOPTIONS
RESET = GLOBAL ROTATE = LANDSCAPE FTEXT = SWISSB
DEVICE = PS300 GACCESS = SASGASTD GSFNAME = GRAPH
GSFMODE = REPLACE GUNIT = PCT BORDER
CBACK = WHITE HTITLE = 2 HTEXT = 1 ;
*===================================================================== ;
footnote "program: ~john-c/5421/lhsreg.sas &sysdate &systime" ;
DATA lhs ;
infile '/home/gnome/john-c/5421/lhs.data' ;
INPUT CASENUM AGE GENDER BASECIGS GROUP RANDDATE DEADDATE DEADCODE
BODYMASS F31MSTAT
VPCQUIT1 VPCQUIT2 VPCQUIT3 VPCQUIT4 VPCQUIT5
CIGSA0 CIGSA1 CIGSA2 CIGSA3 CIGSA4 CIGSA5
S1MFEV S2FEVPRE A1FEVPRE A2FEVPRE A3FEVPRE A4FEVPRE A5FEVPRE
S2FEVPOS A1FEVPOS A2FEVPOS A3FEVPOS A4FEVPOS A5FEVPOS
WEIGHT0 WEIGHT1 WEIGHT2 WEIGHT3 WEIGHT4 WEIGHT5 ;
RUN ;
*======================================================================;
proc rank data = lhs groups = 5 out = lhs ;
var bodymass ;
ranks bodymassquint ;
run ;
*===================================================================== ;
data lhs ; set lhs ;
bodymassquint = bodymassquint + 1 ;
genderbmi = gender * bodymass ;
data lhsnonmiss ; set lhs ;
where s2fevpos ne . and weight0 ne . and cigsa0 ne . ;
run ;
*===================================================================== ;
proc means data = lhsnonmiss n mean var stddev ;
var s2fevpos weight0 cigsa0 ;
run ;
proc corr data = lhsnonmiss cov ;
var weight0 cigsa0 ;
run ;
*===================================================================== ;
proc reg data = lhsnonmiss ;
model s2fevpos = weight0 ;
run ;
proc reg data = lhsnonmiss ;
model s2fevpos = cigsa0 ;
run ;
proc reg data = lhsnonmiss ;
model s2fevpos = weight0 cigsa0 ;
run ;
*===================================================================== ;
proc iml ;
file 'lhsreg.out' ;
use lhsnonmiss ;
read all var {weight0 cigsa0} into X ;
read all var {s2fevpos} into Y ;
covar = {205.4393 14.4896962, 14.4896962 193.0993948} ;
call eigen(e, p, covar) ;
print "cov(weight0, cigsa0), E, P:" covar e p ;
u = x * p ;
yu = y || x || u ;
varnames = {'y' 'weight0' 'cigsa0' 'v' 'w'} ;
create ortho from yu [colname = varnames] ;
append from yu ;
quit ;
*===================================================================== ;
data ortho ;
retain count 0 ;
set ortho ;
count = count + 1 ;
run ;
proc print data = ortho ; where count le 50 ;
var y weight0 cigsa0 v w ;
*===================================================================== ;
proc corr data = ortho ;
var y weight0 cigsa0 v w ;
run ;
*===================================================================== ;
proc reg data = ortho ;
model y = v ;
run ;
proc reg data = ortho ;
model y = w ;
run ;
proc reg data = ortho ;
model y = v w ;
run ;
endsas ;
================================================================================================================
16:17 Friday, November 27, 2009 1
The MEANS Procedure
Variable N Mean Variance Std Dev
---------------------------------------------------------------
S2FEVPOS 500 2.5926800 0.3500048 0.5916121
WEIGHT0 500 74.5816000 205.4393000 14.3331539
CIGSA0 500 33.7020000 193.0993948 13.8960208
---------------------------------------------------------------
program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
================================================================================================================
16:17 Friday, November 27, 2009 2
The CORR Procedure
2 Variables: WEIGHT0 CIGSA0
Covariance Matrix, DF = 499
WEIGHT0 CIGSA0
WEIGHT0 205.4393000 14.4896962
CIGSA0 14.4896962 193.0993948
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
WEIGHT0 500 74.58160 14.33315 37291 40.00000 130.00000
CIGSA0 500 33.70200 13.89602 16851 9.00000 100.00000
Pearson Correlation Coefficients, N = 500
Prob > |r| under H0: Rho=0
WEIGHT0 CIGSA0
WEIGHT0 1.00000 0.07275
0.1042
CIGSA0 0.07275 1.00000
0.1042
program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
================================================================================================================
16:17 Friday, November 27, 2009 3
The REG Procedure
Model: MODEL1
Dependent Variable: S2FEVPOS
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 47.50824 47.50824 186.08 <.0001
Error 498 127.14417 0.25531
Corrected Total 499 174.65241
Root MSE 0.50528 R-Square 0.2720
Dependent Mean 2.59268 Adj R-Sq 0.2706
Coeff Var 19.48878
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 0.98713 0.11985 8.24 <.0001
WEIGHT0 1 0.02153 0.00158 13.64 <.0001
program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
================================================================================================================
16:17 Friday, November 27, 2009 4
The REG Procedure
Model: MODEL1
Dependent Variable: S2FEVPOS
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 3.79839 3.79839 11.07 0.0009
Error 498 170.85402 0.34308
Corrected Total 499 174.65241
Root MSE 0.58573 R-Square 0.0217
Dependent Mean 2.59268 Adj R-Sq 0.0198
Coeff Var 22.59171
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 2.38108 0.06878 34.62 <.0001
CIGSA0 1 0.00628 0.00189 3.33 0.0009
program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
================================================================================================================
16:17 Friday, November 27, 2009 5
The REG Procedure
Model: MODEL1
Dependent Variable: S2FEVPOS
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 49.61468 24.80734 98.60 <.0001
Error 497 125.03773 0.25158
Corrected Total 499 174.65241
Root MSE 0.50158 R-Square 0.2841
Dependent Mean 2.59268 Adj R-Sq 0.2812
Coeff Var 19.34610
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 0.85379 0.12758 6.69 <.0001
WEIGHT0 1 0.02120 0.00157 13.49 <.0001
CIGSA0 1 0.00469 0.00162 2.89 0.0040
program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
================================================================================================================
16:17 Friday, November 27, 2009 6
COVAR E P
cov(weight0, cigsa0), E, P: 205.4393 14.489696 215.01799 0.8341993 -0.551463
14.489696 193.09939 183.52071 0.5514631 0.8341993
program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
================================================================================================================
16:17 Friday, November 27, 2009 7
Obs y weight0 cigsa0 v w
1 2.37 72.6 20 71.592 -23.3522
2 2.98 77.6 60 97.822 7.2584
3 3.28 97.5 40 103.393 -20.3997
4 2.76 64.9 40 76.198 -2.4220
5 2.76 77.1 35 83.618 -13.3208
6 2.27 63.5 40 75.030 -1.6499
7 2.80 88.4 30 90.287 -23.7234
8 2.67 79.4 40 88.294 -10.4182
9 1.78 54.4 40 67.439 3.3684
10 2.94 104.3 35 106.308 -28.3206
11 2.97 102.3 30 101.882 -31.3887
12 2.22 61.4 30 67.764 -8.8339
13 2.90 77.3 30 81.027 -17.6021
14 3.31 77.3 40 86.542 -9.2601
15 2.36 95.5 40 101.725 -19.2968
16 2.88 90.9 30 92.373 -25.1020
17 2.92 78.2 35 84.536 -13.9274
18 3.68 70.5 30 75.355 -13.8522
19 3.00 68.0 40 78.784 -4.1315
20 2.58 81.8 30 84.781 -20.0837
21 2.52 93.2 20 88.777 -34.7124
22 2.28 77.3 45 89.299 -5.0891
23 2.60 75.0 20 73.594 -24.6757
24 1.49 77.2 20 75.429 -25.8890
25 1.64 97.9 30 98.212 -28.9623
26 2.26 85.0 30 87.451 -21.8484
27 2.22 81.8 20 79.267 -28.4257
28 2.31 90.9 30 92.373 -25.1020
29 2.16 66.8 23 68.408 -17.6512
30 2.99 90.9 30 92.373 -25.1020
31 3.24 61.4 40 73.278 -0.4919
32 2.93 78.6 60 98.656 6.7070
33 2.47 59.0 40 71.276 0.8316
34 3.32 70.5 25 72.598 -18.0232
35 2.71 61.4 45 76.036 3.6791
36 1.78 56.4 40 69.107 2.2655
37 2.65 69.5 40 80.035 -4.9587
38 2.97 55.9 20 57.661 -14.1428
39 3.33 94.5 40 100.890 -18.7453
40 3.34 105.0 15 95.863 -45.3906
41 2.91 67.3 30 72.686 -12.0875
42 1.66 59.1 20 60.330 -15.9075
43 2.88 79.5 20 77.348 -27.1573
44 2.25 65.9 30 71.518 -11.3154
45 2.78 89.5 30 91.205 -24.3300
46 2.80 68.2 20 67.922 -20.9258
47 2.82 64.5 60 86.894 14.4826
48 2.96 90.0 40 97.136 -16.2637
49 3.03 82.3 20 79.684 -28.7014
50 2.98 82.7 40 91.047 -12.2380
program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
================================================================================================================
16:17 Friday, November 27, 2009 8
The CORR Procedure
5 Variables: y weight0 cigsa0 v w
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
y 500 2.59268 0.59161 1296 1.20000 4.33000
weight0 500 74.58160 14.33315 37291 40.00000 130.00000
cigsa0 500 33.70200 13.89602 16851 9.00000 100.00000
v 500 80.80133 14.66349 40401 44.39723 141.53369
w 500 -13.01482 13.54698 -6507 -51.54690 38.31025
Pearson Correlation Coefficients, N = 500
Prob > |r| under H0: Rho=0
y weight0 cigsa0 v w
y 1.00000 0.52155 0.14747 0.50235 -0.17812
<.0001 0.0009 <.0001 <.0001
weight0 0.52155 1.00000 0.07275 0.85343 -0.52122
<.0001 0.1042 <.0001 <.0001
cigsa0 0.14747 0.07275 1.00000 0.58192 0.81325
0.0009 0.1042 <.0001 <.0001
v 0.50235 0.85343 0.58192 1.00000 -0.00000
<.0001 <.0001 <.0001 1.0000
w -0.17812 -0.52122 0.81325 -0.00000 1.00000
<.0001 <.0001 <.0001 1.0000
program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
================================================================================================================
16:17 Friday, November 27, 2009 9
The REG Procedure
Model: MODEL1
Dependent Variable: y
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 44.07377 44.07377 168.09 <.0001
Error 498 130.57863 0.26221
Corrected Total 499 174.65241
Root MSE 0.51206 R-Square 0.2524
Dependent Mean 2.59268 Adj R-Sq 0.2509
Coeff Var 19.75024
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 0.95503 0.12837 7.44 <.0001
v 1 0.02027 0.00156 12.96 <.0001
program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
================================================================================================================
16:17 Friday, November 27, 2009 10
The REG Procedure
Model: MODEL1
Dependent Variable: y
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 5.54091 5.54091 16.32 <.0001
Error 498 169.11150 0.33958
Corrected Total 499 174.65241
Root MSE 0.58274 R-Square 0.0317
Dependent Mean 2.59268 Adj R-Sq 0.0298
Coeff Var 22.47620
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 2.49144 0.03616 68.91 <.0001
w 1 -0.00778 0.00193 -4.04 <.0001
program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
================================================================================================================
16:17 Friday, November 27, 2009 11
The REG Procedure
Model: MODEL1
Dependent Variable: y
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 49.61468 24.80734 98.60 <.0001
Error 497 125.03773 0.25158
Corrected Total 499 174.65241
Root MSE 0.50158 R-Square 0.2841
Dependent Mean 2.59268 Adj R-Sq 0.2812
Coeff Var 19.34610
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 0.85379 0.12758 6.69 <.0001
v 1 0.02027 0.00153 13.24 <.0001
w 1 -0.00778 0.00166 -4.69 <.0001
program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
----------------------------------------------------------------------------------
/home/walleye/john-c/5421/notes.021a Last update: November 11, 2011.