APPLICATIONS OF SAS IML:                                 SPH 5421 notes.021a

EIGENVECTORS AND EIGENVALUES

CREATING UNCORRELATED VARIABLES ...
----------------------------------------------------------------------------------

Assume R and S are random variables with covariance matrix:


            | var(R)     cov(R, S) |
       A =  |                      |
            | cov(R, S)     var(S) |

In general, cov(R, S) is not zero, i.e., you expect that these two variables are correlated.  
If R and S are observed covariates (potential predictors) of an outcome variable Y, the fact
that they are correlated in general complicates the inference regarding their effects,
based on regression of Y on R and S.

It is, however, possible to use the theory and computational methods for eigenvectors to
replace R and S by new variables V and W which are linear combinations of R and S, which
are uncorrelated, and which provide exactly equivalent regression results.  The following
program illustrates this.  Here, data from the 'lhs.data' (see notes.010) are analyzed.
The variables of interest are: S2FEVPOS, WEIGHT0, and CIGSA0.  Both WEIGHT0 and CIGSA0
are covariates that are predictive of the outcome, S2FEVPOS.  PROC CORR computes the
covariance matrix of WEIGHT0 and CIGSA0.  This is:

            | 205.439300   14.4896962 |
       A =  |                         |
            | 14.4896962  193.0993948 |


Then PROC IML is invoked to find the matrix of eigenvectors of this covariance matrix:

            | 0.8341993     -0.551463  |
       P =  |                          |
            |  0.551463      0.8341993 |

Next, two new variables V and W are defined by the matrix multiplication:

      | V |         | WEIGHT0 |
      |   |  = P` * |         |
      | W |         |  CIGSA0 |

(Note that P` is the transpose of P.)

Which means V and W are expressed as a linear combination of WEIGHT0 and CIGSA0:

        V =  0.8341993 * WEIGHT0 + 0.551463  * CIGSA0, and

        W = -0.551463  * WEIGHT0 + 0.8341993 * CIGSA0.

Next, PROC CORR is invoked again, indicating cov(V, W) = 0, as expected.

Then a series of regressions are run using PROC REG with the following models 
and results:

                                     Regression Sum of Squares
                                     -------------------------
Model 1:  S2FEVPOS = WEIGHT0                  47.50824

Model 2:  S2FEVPOS = CIGSA0                    3.79839

Model 3:  S2FEVPOS = WEIGHT0  + CIGSA0        49.61468

Model 4:  S2FEVPOS = V                        44.07377

Model 5:  S2FEVPOS = W                         5.54091

Model 6:  S2FEVPOS = V + W                    49.61468

(Note that an intercept term is assumed in each regression)

The important thing to notice here is, the regression sums of squares for
the models with WEIGHT0 and CIGSA0 separately do not add up to the regression
sum of squares for the WEIGHT0 + CIGSA0 model.  However, the regression sums of
squares for V and W separately DO add up to the RegSS for the V + W model.
This is due to the fact that V and W are uncorrelated.

========================================================================

FILENAME GRAPH 'gsas.grf' ;
OPTIONS  LINESIZE = 120 ;
GOPTIONS
         RESET = GLOBAL  ROTATE = LANDSCAPE   FTEXT = SWISSB
         DEVICE = PS300  GACCESS = SASGASTD  GSFNAME = GRAPH
         GSFMODE = REPLACE  GUNIT = PCT BORDER
         CBACK = WHITE  HTITLE = 2 HTEXT = 1 ;
*===================================================================== ;        
footnote "program: ~john-c/5421/lhsreg.sas &sysdate &systime" ;
 DATA lhs ;
      infile '/home/gnome/john-c/5421/lhs.data' ;

      INPUT CASENUM  AGE GENDER BASECIGS GROUP RANDDATE DEADDATE DEADCODE
            BODYMASS F31MSTAT
            VPCQUIT1 VPCQUIT2 VPCQUIT3  VPCQUIT4 VPCQUIT5
            CIGSA0   CIGSA1   CIGSA2    CIGSA3   CIGSA4   CIGSA5
            S1MFEV   S2FEVPRE  A1FEVPRE  A2FEVPRE A3FEVPRE A4FEVPRE A5FEVPRE
                     S2FEVPOS  A1FEVPOS  A2FEVPOS A3FEVPOS A4FEVPOS A5FEVPOS
                     WEIGHT0   WEIGHT1   WEIGHT2  WEIGHT3  WEIGHT4  WEIGHT5 ;

 RUN ;
*======================================================================;
proc rank data = lhs  groups = 5  out = lhs ;
     var bodymass ;
     ranks bodymassquint ;
run ;

*===================================================================== ;        

data lhs ; set lhs ;
     bodymassquint = bodymassquint + 1 ;
     genderbmi = gender * bodymass ;

data lhsnonmiss ; set lhs ;
     where s2fevpos ne . and weight0 ne . and cigsa0 ne . ;
run ;

*===================================================================== ;        

proc means data = lhsnonmiss n mean var stddev ;
     var s2fevpos weight0 cigsa0 ;
run ;

proc corr data = lhsnonmiss cov ;
     var weight0 cigsa0 ;
run ;

*===================================================================== ;        

proc reg data = lhsnonmiss ;
     model s2fevpos = weight0 ;
run ;

proc reg data = lhsnonmiss ;
     model s2fevpos = cigsa0 ;
run ;

proc reg data = lhsnonmiss ;
     model s2fevpos = weight0 cigsa0 ;
run ;

*===================================================================== ;        

proc iml ;

     file 'lhsreg.out' ;

     use lhsnonmiss ;
     read all var {weight0 cigsa0} into X ;
     read all var {s2fevpos} into Y ;

     covar = {205.4393 14.4896962, 14.4896962 193.0993948} ;

     call eigen(e, p, covar) ;

     print "cov(weight0, cigsa0), E, P:" covar e p ;

     u = x * p ;
     yu = y || x || u ;
     varnames = {'y' 'weight0' 'cigsa0' 'v' 'w'} ;
     create ortho from yu [colname = varnames] ;
     append from yu ;

quit ;

*===================================================================== ;        

data ortho ;
     retain count 0 ;
     set ortho ;
     count = count + 1 ;
run ;

proc print data = ortho ; where count le 50 ;
     var y weight0 cigsa0 v w ;

*===================================================================== ;        

proc corr data = ortho ;
     var y weight0 cigsa0 v w ;
run ;

*===================================================================== ;        

proc reg data = ortho ;
     model y = v ;
run ;

proc reg data = ortho ;
     model y = w ;
run ;

proc reg data = ortho ;
     model y = v w ;
run ;

endsas ;

================================================================================================================


                                                                                     16:17 Friday, November 27, 2009   1

                                                  The MEANS Procedure

                            Variable      N            Mean        Variance         Std Dev
                            ---------------------------------------------------------------
                            S2FEVPOS    500       2.5926800       0.3500048       0.5916121
                            WEIGHT0     500      74.5816000     205.4393000      14.3331539
                            CIGSA0      500      33.7020000     193.0993948      13.8960208
                            ---------------------------------------------------------------
 
 
                                     program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
================================================================================================================

                                                                                      16:17 Friday, November 27, 2009   2

                                                   The CORR Procedure

                                            2  Variables:    WEIGHT0  CIGSA0   


                                              Covariance Matrix, DF = 499
 
                                                        WEIGHT0            CIGSA0

                                      WEIGHT0       205.4393000        14.4896962
                                      CIGSA0         14.4896962       193.0993948


                                                   Simple Statistics
 
               Variable           N          Mean       Std Dev           Sum       Minimum       Maximum

               WEIGHT0          500      74.58160      14.33315         37291      40.00000     130.00000
               CIGSA0           500      33.70200      13.89602         16851       9.00000     100.00000


                                       Pearson Correlation Coefficients, N = 500 
                                               Prob > |r| under H0: Rho=0
 
                                                        WEIGHT0        CIGSA0

                                          WEIGHT0       1.00000       0.07275
                                                                       0.1042

                                          CIGSA0        0.07275       1.00000
                                                         0.1042              
 
 
 
                                     program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
================================================================================================================

                                                                                      16:17 Friday, November 27, 2009   3

                                                   The REG Procedure
                                                     Model: MODEL1
                                             Dependent Variable: S2FEVPOS 

                                                  Analysis of Variance
 
                                                         Sum of           Mean
                     Source                   DF        Squares         Square    F Value    Pr > F

                     Model                     1       47.50824       47.50824     186.08    <.0001
                     Error                   498      127.14417        0.25531                     
                     Corrected Total         499      174.65241                                    


                                  Root MSE              0.50528    R-Square     0.2720
                                  Dependent Mean        2.59268    Adj R-Sq     0.2706
                                  Coeff Var            19.48878                       


                                                  Parameter Estimates
 
                                               Parameter       Standard
                          Variable     DF       Estimate          Error    t Value    Pr > |t|

                          Intercept     1        0.98713        0.11985       8.24      <.0001
                          WEIGHT0       1        0.02153        0.00158      13.64      <.0001
 
 
                                     program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
================================================================================================================

                                                                                      16:17 Friday, November 27, 2009   4

                                                   The REG Procedure
                                                     Model: MODEL1
                                             Dependent Variable: S2FEVPOS 

                                                  Analysis of Variance
 
                                                         Sum of           Mean
                     Source                   DF        Squares         Square    F Value    Pr > F

                     Model                     1        3.79839        3.79839      11.07    0.0009
                     Error                   498      170.85402        0.34308                     
                     Corrected Total         499      174.65241                                    


                                  Root MSE              0.58573    R-Square     0.0217
                                  Dependent Mean        2.59268    Adj R-Sq     0.0198
                                  Coeff Var            22.59171                       


                                                  Parameter Estimates
 
                                               Parameter       Standard
                          Variable     DF       Estimate          Error    t Value    Pr > |t|

                          Intercept     1        2.38108        0.06878      34.62      <.0001
                          CIGSA0        1        0.00628        0.00189       3.33      0.0009
 
 
                                     program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
================================================================================================================

                                                                                      16:17 Friday, November 27, 2009   5

                                                   The REG Procedure
                                                     Model: MODEL1
                                             Dependent Variable: S2FEVPOS 

                                                  Analysis of Variance
 
                                                         Sum of           Mean
                     Source                   DF        Squares         Square    F Value    Pr > F

                     Model                     2       49.61468       24.80734      98.60    <.0001
                     Error                   497      125.03773        0.25158                     
                     Corrected Total         499      174.65241                                    


                                  Root MSE              0.50158    R-Square     0.2841
                                  Dependent Mean        2.59268    Adj R-Sq     0.2812
                                  Coeff Var            19.34610                       


                                                  Parameter Estimates
 
                                               Parameter       Standard
                          Variable     DF       Estimate          Error    t Value    Pr > |t|

                          Intercept     1        0.85379        0.12758       6.69      <.0001
                          WEIGHT0       1        0.02120        0.00157      13.49      <.0001
                          CIGSA0        1        0.00469        0.00162       2.89      0.0040
 
                                     program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
================================================================================================================

                                                                                      16:17 Friday, November 27, 2009   6

                                                     COVAR                   E         P

                     cov(weight0, cigsa0), E, P:  205.4393 14.489696 215.01799 0.8341993 -0.551463
                                                 14.489696 193.09939 183.52071 0.5514631 0.8341993
 
 
                                     program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
================================================================================================================

                                                                                      16:17 Friday, November 27, 2009   7

                                Obs      y     weight0    cigsa0       v              w

                                  1    2.37      72.6       20       71.592    -23.3522
                                  2    2.98      77.6       60       97.822      7.2584
                                  3    3.28      97.5       40      103.393    -20.3997
                                  4    2.76      64.9       40       76.198     -2.4220
                                  5    2.76      77.1       35       83.618    -13.3208
                                  6    2.27      63.5       40       75.030     -1.6499
                                  7    2.80      88.4       30       90.287    -23.7234
                                  8    2.67      79.4       40       88.294    -10.4182
                                  9    1.78      54.4       40       67.439      3.3684
                                 10    2.94     104.3       35      106.308    -28.3206
                                 11    2.97     102.3       30      101.882    -31.3887
                                 12    2.22      61.4       30       67.764     -8.8339
                                 13    2.90      77.3       30       81.027    -17.6021
                                 14    3.31      77.3       40       86.542     -9.2601
                                 15    2.36      95.5       40      101.725    -19.2968
                                 16    2.88      90.9       30       92.373    -25.1020
                                 17    2.92      78.2       35       84.536    -13.9274
                                 18    3.68      70.5       30       75.355    -13.8522
                                 19    3.00      68.0       40       78.784     -4.1315
                                 20    2.58      81.8       30       84.781    -20.0837
                                 21    2.52      93.2       20       88.777    -34.7124
                                 22    2.28      77.3       45       89.299     -5.0891
                                 23    2.60      75.0       20       73.594    -24.6757
                                 24    1.49      77.2       20       75.429    -25.8890
                                 25    1.64      97.9       30       98.212    -28.9623
                                 26    2.26      85.0       30       87.451    -21.8484
                                 27    2.22      81.8       20       79.267    -28.4257
                                 28    2.31      90.9       30       92.373    -25.1020
                                 29    2.16      66.8       23       68.408    -17.6512
                                 30    2.99      90.9       30       92.373    -25.1020
                                 31    3.24      61.4       40       73.278     -0.4919
                                 32    2.93      78.6       60       98.656      6.7070
                                 33    2.47      59.0       40       71.276      0.8316
                                 34    3.32      70.5       25       72.598    -18.0232
                                 35    2.71      61.4       45       76.036      3.6791
                                 36    1.78      56.4       40       69.107      2.2655
                                 37    2.65      69.5       40       80.035     -4.9587
                                 38    2.97      55.9       20       57.661    -14.1428
                                 39    3.33      94.5       40      100.890    -18.7453
                                 40    3.34     105.0       15       95.863    -45.3906
                                 41    2.91      67.3       30       72.686    -12.0875
                                 42    1.66      59.1       20       60.330    -15.9075
                                 43    2.88      79.5       20       77.348    -27.1573
                                 44    2.25      65.9       30       71.518    -11.3154
                                 45    2.78      89.5       30       91.205    -24.3300
                                 46    2.80      68.2       20       67.922    -20.9258
                                 47    2.82      64.5       60       86.894     14.4826
                                 48    2.96      90.0       40       97.136    -16.2637
                                 49    3.03      82.3       20       79.684    -28.7014
                                 50    2.98      82.7       40       91.047    -12.2380
 
 
 
 
                                     program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
================================================================================================================

                                                                                      16:17 Friday, November 27, 2009   8

                                                   The CORR Procedure

                              5  Variables:    y        weight0  cigsa0   v        w        


                                                   Simple Statistics
 
               Variable           N          Mean       Std Dev           Sum       Minimum       Maximum

               y                500       2.59268       0.59161          1296       1.20000       4.33000
               weight0          500      74.58160      14.33315         37291      40.00000     130.00000
               cigsa0           500      33.70200      13.89602         16851       9.00000     100.00000
               v                500      80.80133      14.66349         40401      44.39723     141.53369
               w                500     -13.01482      13.54698         -6507     -51.54690      38.31025


                                      Pearson Correlation Coefficients, N = 500 
                                              Prob > |r| under H0: Rho=0
 
                                         y       weight0        cigsa0             v             w

                     y             1.00000       0.52155       0.14747       0.50235      -0.17812
                                                  <.0001        0.0009        <.0001        <.0001

                     weight0       0.52155       1.00000       0.07275       0.85343      -0.52122
                                    <.0001                      0.1042        <.0001        <.0001

                     cigsa0        0.14747       0.07275       1.00000       0.58192       0.81325
                                    0.0009        0.1042                      <.0001        <.0001

                     v             0.50235       0.85343       0.58192       1.00000      -0.00000
                                    <.0001        <.0001        <.0001                      1.0000

                     w            -0.17812      -0.52122       0.81325      -0.00000       1.00000
                                    <.0001        <.0001        <.0001        1.0000              
 
 
 
 
                                     program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
================================================================================================================

                                                                                      16:17 Friday, November 27, 2009   9

                                                   The REG Procedure
                                                     Model: MODEL1
                                                 Dependent Variable: y 

                                                  Analysis of Variance
 
                                                         Sum of           Mean
                     Source                   DF        Squares         Square    F Value    Pr > F

                     Model                     1       44.07377       44.07377     168.09    <.0001
                     Error                   498      130.57863        0.26221                     
                     Corrected Total         499      174.65241                                    


                                  Root MSE              0.51206    R-Square     0.2524
                                  Dependent Mean        2.59268    Adj R-Sq     0.2509
                                  Coeff Var            19.75024                       


                                                  Parameter Estimates
 
                                               Parameter       Standard
                          Variable     DF       Estimate          Error    t Value    Pr > |t|

                          Intercept     1        0.95503        0.12837       7.44      <.0001
                          v             1        0.02027        0.00156      12.96      <.0001
 
 
                                     program: ~john-c/5421/lhsreg.sas 27NOV09 16:17

================================================================================================================
                                                                                      16:17 Friday, November 27, 2009  10

                                                   The REG Procedure
                                                     Model: MODEL1
                                                 Dependent Variable: y 

                                                  Analysis of Variance
 
                                                         Sum of           Mean
                     Source                   DF        Squares         Square    F Value    Pr > F

                     Model                     1        5.54091        5.54091      16.32    <.0001
                     Error                   498      169.11150        0.33958                     
                     Corrected Total         499      174.65241                                    


                                  Root MSE              0.58274    R-Square     0.0317
                                  Dependent Mean        2.59268    Adj R-Sq     0.0298
                                  Coeff Var            22.47620                       


                                                  Parameter Estimates
 
                                               Parameter       Standard
                          Variable     DF       Estimate          Error    t Value    Pr > |t|

                          Intercept     1        2.49144        0.03616      68.91      <.0001
                          w             1       -0.00778        0.00193      -4.04      <.0001
 
 
 
                                     program: ~john-c/5421/lhsreg.sas 27NOV09 16:17

================================================================================================================
                                                                                      16:17 Friday, November 27, 2009  11

                                                   The REG Procedure
                                                     Model: MODEL1
                                                 Dependent Variable: y 

                                                  Analysis of Variance
 
                                                         Sum of           Mean
                     Source                   DF        Squares         Square    F Value    Pr > F

                     Model                     2       49.61468       24.80734      98.60    <.0001
                     Error                   497      125.03773        0.25158                     
                     Corrected Total         499      174.65241                                    


                                  Root MSE              0.50158    R-Square     0.2841
                                  Dependent Mean        2.59268    Adj R-Sq     0.2812
                                  Coeff Var            19.34610                       


                                                  Parameter Estimates
 
                                               Parameter       Standard
                          Variable     DF       Estimate          Error    t Value    Pr > |t|

                          Intercept     1        0.85379        0.12758       6.69      <.0001
                          v             1        0.02027        0.00153      13.24      <.0001
                          w             1       -0.00778        0.00166      -4.69      <.0001
 
 
                                     program: ~john-c/5421/lhsreg.sas 27NOV09 16:17
----------------------------------------------------------------------------------


/home/walleye/john-c/5421/notes.021a   Last update: November 11, 2011.