APPLICATIONS OF SAS IML: SPH 5421 notes.021a EIGENVECTORS AND EIGENVALUES CREATING UNCORRELATED VARIABLES ... ---------------------------------------------------------------------------------- Assume R and S are random variables with covariance matrix: | var(R) cov(R, S) | A = | | | cov(R, S) var(S) | In general, cov(R, S) is not zero, i.e., you expect that these two variables are correlated. If R and S are observed covariates (potential predictors) of an outcome variable Y, the fact that they are correlated in general complicates the inference regarding their effects, based on regression of Y on R and S. It is, however, possible to use the theory and computational methods for eigenvectors to replace R and S by new variables V and W which are linear combinations of R and S, which are uncorrelated, and which provide exactly equivalent regression results. The following program illustrates this. Here, data from the 'lhs.data' (see notes.010) are analyzed. The variables of interest are: S2FEVPOS, WEIGHT0, and CIGSA0. Both WEIGHT0 and CIGSA0 are covariates that are predictive of the outcome, S2FEVPOS. PROC CORR computes the covariance matrix of WEIGHT0 and CIGSA0. This is: | 205.439300 14.4896962 | A = | | | 14.4896962 193.0993948 | Then PROC IML is invoked to find the matrix of eigenvectors of this covariance matrix: | 0.8341993 -0.551463 | P = | | | 0.551463 0.8341993 | Next, two new variables V and W are defined by the matrix multiplication: | V | | WEIGHT0 | | | = P` * | | | W | | CIGSA0 | (Note that P` is the transpose of P.) Which means V and W are expressed as a linear combination of WEIGHT0 and CIGSA0: V = 0.8341993 * WEIGHT0 + 0.551463 * CIGSA0, and W = -0.551463 * WEIGHT0 + 0.8341993 * CIGSA0. Next, PROC CORR is invoked again, indicating cov(V, W) = 0, as expected. Then a series of regressions are run using PROC REG with the following models and results: Regression Sum of Squares ------------------------- Model 1: S2FEVPOS = WEIGHT0 47.50824 Model 2: S2FEVPOS = CIGSA0 3.79839 Model 3: S2FEVPOS = WEIGHT0 + CIGSA0 49.61468 Model 4: S2FEVPOS = V 44.07377 Model 5: S2FEVPOS = W 5.54091 Model 6: S2FEVPOS = V + W 49.61468 (Note that an intercept term is assumed in each regression) The important thing to notice here is, the regression sums of squares for the models with WEIGHT0 and CIGSA0 separately do not add up to the regression sum of squares for the WEIGHT0 + CIGSA0 model. However, the regression sums of squares for V and W separately DO add up to the RegSS for the V + W model. This is due to the fact that V and W are uncorrelated. ======================================================================== FILENAME GRAPH 'gsas.grf' ; OPTIONS LINESIZE = 120 ; GOPTIONS RESET = GLOBAL ROTATE = LANDSCAPE FTEXT = SWISSB DEVICE = PS300 GACCESS = SASGASTD GSFNAME = GRAPH GSFMODE = REPLACE GUNIT = PCT BORDER CBACK = WHITE HTITLE = 2 HTEXT = 1 ; *===================================================================== ; footnote "program: ~john-c/5421/lhsreg.sas &sysdate &systime" ; DATA lhs ; infile '/home/gnome/john-c/5421/lhs.data' ; INPUT CASENUM AGE GENDER BASECIGS GROUP RANDDATE DEADDATE DEADCODE BODYMASS F31MSTAT VPCQUIT1 VPCQUIT2 VPCQUIT3 VPCQUIT4 VPCQUIT5 CIGSA0 CIGSA1 CIGSA2 CIGSA3 CIGSA4 CIGSA5 S1MFEV S2FEVPRE A1FEVPRE A2FEVPRE A3FEVPRE A4FEVPRE A5FEVPRE S2FEVPOS A1FEVPOS A2FEVPOS A3FEVPOS A4FEVPOS A5FEVPOS WEIGHT0 WEIGHT1 WEIGHT2 WEIGHT3 WEIGHT4 WEIGHT5 ; RUN ; *======================================================================; proc rank data = lhs groups = 5 out = lhs ; var bodymass ; ranks bodymassquint ; run ; *===================================================================== ; data lhs ; set lhs ; bodymassquint = bodymassquint + 1 ; genderbmi = gender * bodymass ; data lhsnonmiss ; set lhs ; where s2fevpos ne . and weight0 ne . and cigsa0 ne . ; run ; *===================================================================== ; proc means data = lhsnonmiss n mean var stddev ; var s2fevpos weight0 cigsa0 ; run ; proc corr data = lhsnonmiss cov ; var weight0 cigsa0 ; run ; *===================================================================== ; proc reg data = lhsnonmiss ; model s2fevpos = weight0 ; run ; proc reg data = lhsnonmiss ; model s2fevpos = cigsa0 ; run ; proc reg data = lhsnonmiss ; model s2fevpos = weight0 cigsa0 ; run ; *===================================================================== ; proc iml ; file 'lhsreg.out' ; use lhsnonmiss ; read all var {weight0 cigsa0} into X ; read all var {s2fevpos} into Y ; covar = {205.4393 14.4896962, 14.4896962 193.0993948} ; call eigen(e, p, covar) ; print "cov(weight0, cigsa0), E, P:" covar e p ; u = x * p ; yu = y || x || u ; varnames = {'y' 'weight0' 'cigsa0' 'v' 'w'} ; create ortho from yu [colname = varnames] ; append from yu ; quit ; *===================================================================== ; data ortho ; retain count 0 ; set ortho ; count = count + 1 ; run ; proc print data = ortho ; where count le 50 ; var y weight0 cigsa0 v w ; *===================================================================== ; proc corr data = ortho ; var y weight0 cigsa0 v w ; run ; *===================================================================== ; proc reg data = ortho ; model y = v ; run ; proc reg data = ortho ; model y = w ; run ; proc reg data = ortho ; model y = v w ; run ; endsas ; ================================================================================================================ 16:17 Friday, November 27, 2009 1 The MEANS Procedure Variable N Mean Variance Std Dev --------------------------------------------------------------- S2FEVPOS 500 2.5926800 0.3500048 0.5916121 WEIGHT0 500 74.5816000 205.4393000 14.3331539 CIGSA0 500 33.7020000 193.0993948 13.8960208 --------------------------------------------------------------- program: ~john-c/5421/lhsreg.sas 27NOV09 16:17 ================================================================================================================ 16:17 Friday, November 27, 2009 2 The CORR Procedure 2 Variables: WEIGHT0 CIGSA0 Covariance Matrix, DF = 499 WEIGHT0 CIGSA0 WEIGHT0 205.4393000 14.4896962 CIGSA0 14.4896962 193.0993948 Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum WEIGHT0 500 74.58160 14.33315 37291 40.00000 130.00000 CIGSA0 500 33.70200 13.89602 16851 9.00000 100.00000 Pearson Correlation Coefficients, N = 500 Prob > |r| under H0: Rho=0 WEIGHT0 CIGSA0 WEIGHT0 1.00000 0.07275 0.1042 CIGSA0 0.07275 1.00000 0.1042 program: ~john-c/5421/lhsreg.sas 27NOV09 16:17 ================================================================================================================ 16:17 Friday, November 27, 2009 3 The REG Procedure Model: MODEL1 Dependent Variable: S2FEVPOS Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 47.50824 47.50824 186.08 <.0001 Error 498 127.14417 0.25531 Corrected Total 499 174.65241 Root MSE 0.50528 R-Square 0.2720 Dependent Mean 2.59268 Adj R-Sq 0.2706 Coeff Var 19.48878 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.98713 0.11985 8.24 <.0001 WEIGHT0 1 0.02153 0.00158 13.64 <.0001 program: ~john-c/5421/lhsreg.sas 27NOV09 16:17 ================================================================================================================ 16:17 Friday, November 27, 2009 4 The REG Procedure Model: MODEL1 Dependent Variable: S2FEVPOS Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 3.79839 3.79839 11.07 0.0009 Error 498 170.85402 0.34308 Corrected Total 499 174.65241 Root MSE 0.58573 R-Square 0.0217 Dependent Mean 2.59268 Adj R-Sq 0.0198 Coeff Var 22.59171 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 2.38108 0.06878 34.62 <.0001 CIGSA0 1 0.00628 0.00189 3.33 0.0009 program: ~john-c/5421/lhsreg.sas 27NOV09 16:17 ================================================================================================================ 16:17 Friday, November 27, 2009 5 The REG Procedure Model: MODEL1 Dependent Variable: S2FEVPOS Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 49.61468 24.80734 98.60 <.0001 Error 497 125.03773 0.25158 Corrected Total 499 174.65241 Root MSE 0.50158 R-Square 0.2841 Dependent Mean 2.59268 Adj R-Sq 0.2812 Coeff Var 19.34610 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.85379 0.12758 6.69 <.0001 WEIGHT0 1 0.02120 0.00157 13.49 <.0001 CIGSA0 1 0.00469 0.00162 2.89 0.0040 program: ~john-c/5421/lhsreg.sas 27NOV09 16:17 ================================================================================================================ 16:17 Friday, November 27, 2009 6 COVAR E P cov(weight0, cigsa0), E, P: 205.4393 14.489696 215.01799 0.8341993 -0.551463 14.489696 193.09939 183.52071 0.5514631 0.8341993 program: ~john-c/5421/lhsreg.sas 27NOV09 16:17 ================================================================================================================ 16:17 Friday, November 27, 2009 7 Obs y weight0 cigsa0 v w 1 2.37 72.6 20 71.592 -23.3522 2 2.98 77.6 60 97.822 7.2584 3 3.28 97.5 40 103.393 -20.3997 4 2.76 64.9 40 76.198 -2.4220 5 2.76 77.1 35 83.618 -13.3208 6 2.27 63.5 40 75.030 -1.6499 7 2.80 88.4 30 90.287 -23.7234 8 2.67 79.4 40 88.294 -10.4182 9 1.78 54.4 40 67.439 3.3684 10 2.94 104.3 35 106.308 -28.3206 11 2.97 102.3 30 101.882 -31.3887 12 2.22 61.4 30 67.764 -8.8339 13 2.90 77.3 30 81.027 -17.6021 14 3.31 77.3 40 86.542 -9.2601 15 2.36 95.5 40 101.725 -19.2968 16 2.88 90.9 30 92.373 -25.1020 17 2.92 78.2 35 84.536 -13.9274 18 3.68 70.5 30 75.355 -13.8522 19 3.00 68.0 40 78.784 -4.1315 20 2.58 81.8 30 84.781 -20.0837 21 2.52 93.2 20 88.777 -34.7124 22 2.28 77.3 45 89.299 -5.0891 23 2.60 75.0 20 73.594 -24.6757 24 1.49 77.2 20 75.429 -25.8890 25 1.64 97.9 30 98.212 -28.9623 26 2.26 85.0 30 87.451 -21.8484 27 2.22 81.8 20 79.267 -28.4257 28 2.31 90.9 30 92.373 -25.1020 29 2.16 66.8 23 68.408 -17.6512 30 2.99 90.9 30 92.373 -25.1020 31 3.24 61.4 40 73.278 -0.4919 32 2.93 78.6 60 98.656 6.7070 33 2.47 59.0 40 71.276 0.8316 34 3.32 70.5 25 72.598 -18.0232 35 2.71 61.4 45 76.036 3.6791 36 1.78 56.4 40 69.107 2.2655 37 2.65 69.5 40 80.035 -4.9587 38 2.97 55.9 20 57.661 -14.1428 39 3.33 94.5 40 100.890 -18.7453 40 3.34 105.0 15 95.863 -45.3906 41 2.91 67.3 30 72.686 -12.0875 42 1.66 59.1 20 60.330 -15.9075 43 2.88 79.5 20 77.348 -27.1573 44 2.25 65.9 30 71.518 -11.3154 45 2.78 89.5 30 91.205 -24.3300 46 2.80 68.2 20 67.922 -20.9258 47 2.82 64.5 60 86.894 14.4826 48 2.96 90.0 40 97.136 -16.2637 49 3.03 82.3 20 79.684 -28.7014 50 2.98 82.7 40 91.047 -12.2380 program: ~john-c/5421/lhsreg.sas 27NOV09 16:17 ================================================================================================================ 16:17 Friday, November 27, 2009 8 The CORR Procedure 5 Variables: y weight0 cigsa0 v w Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum y 500 2.59268 0.59161 1296 1.20000 4.33000 weight0 500 74.58160 14.33315 37291 40.00000 130.00000 cigsa0 500 33.70200 13.89602 16851 9.00000 100.00000 v 500 80.80133 14.66349 40401 44.39723 141.53369 w 500 -13.01482 13.54698 -6507 -51.54690 38.31025 Pearson Correlation Coefficients, N = 500 Prob > |r| under H0: Rho=0 y weight0 cigsa0 v w y 1.00000 0.52155 0.14747 0.50235 -0.17812 <.0001 0.0009 <.0001 <.0001 weight0 0.52155 1.00000 0.07275 0.85343 -0.52122 <.0001 0.1042 <.0001 <.0001 cigsa0 0.14747 0.07275 1.00000 0.58192 0.81325 0.0009 0.1042 <.0001 <.0001 v 0.50235 0.85343 0.58192 1.00000 -0.00000 <.0001 <.0001 <.0001 1.0000 w -0.17812 -0.52122 0.81325 -0.00000 1.00000 <.0001 <.0001 <.0001 1.0000 program: ~john-c/5421/lhsreg.sas 27NOV09 16:17 ================================================================================================================ 16:17 Friday, November 27, 2009 9 The REG Procedure Model: MODEL1 Dependent Variable: y Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 44.07377 44.07377 168.09 <.0001 Error 498 130.57863 0.26221 Corrected Total 499 174.65241 Root MSE 0.51206 R-Square 0.2524 Dependent Mean 2.59268 Adj R-Sq 0.2509 Coeff Var 19.75024 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.95503 0.12837 7.44 <.0001 v 1 0.02027 0.00156 12.96 <.0001 program: ~john-c/5421/lhsreg.sas 27NOV09 16:17 ================================================================================================================ 16:17 Friday, November 27, 2009 10 The REG Procedure Model: MODEL1 Dependent Variable: y Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 5.54091 5.54091 16.32 <.0001 Error 498 169.11150 0.33958 Corrected Total 499 174.65241 Root MSE 0.58274 R-Square 0.0317 Dependent Mean 2.59268 Adj R-Sq 0.0298 Coeff Var 22.47620 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 2.49144 0.03616 68.91 <.0001 w 1 -0.00778 0.00193 -4.04 <.0001 program: ~john-c/5421/lhsreg.sas 27NOV09 16:17 ================================================================================================================ 16:17 Friday, November 27, 2009 11 The REG Procedure Model: MODEL1 Dependent Variable: y Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 49.61468 24.80734 98.60 <.0001 Error 497 125.03773 0.25158 Corrected Total 499 174.65241 Root MSE 0.50158 R-Square 0.2841 Dependent Mean 2.59268 Adj R-Sq 0.2812 Coeff Var 19.34610 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.85379 0.12758 6.69 <.0001 v 1 0.02027 0.00153 13.24 <.0001 w 1 -0.00778 0.00166 -4.69 <.0001 program: ~john-c/5421/lhsreg.sas 27NOV09 16:17 ---------------------------------------------------------------------------------- /home/walleye/john-c/5421/notes.021a Last update: November 11, 2011.