/* PubH 7450 Exampel 8.2 (continued): automatic variable selection for PHM using the larynx cancer data; */ options ls=80 center pagesize=60 number label; goptions device = PS; /* graphs are saved in a file names sasgraph.ps */ data larynx; infile '/home/merganser/weip/public_html/course/7450/data/larynx.txt' firstobs=19; input stage time age year status ; if stage=2 then Z1=1; else Z1=0; if stage=3 then Z2=1; else Z2=0; if stage=4 then Z3=1; else Z3=0; /* SELECTION= backward forward stepwise score; score is for best subset; SLE/SLS: significance level of entry/stay; you can use INCLUDE=n to include the first n variables in all the models. */ proc phreg; Z5=Z1*Age; Z6=Z2*Age; Z7=Z3*Age; Age2=Age*Age; model time*status(0) = Z1 Z2 Z3 age Z5 Z6 Z7 age2 /selection=stepwise sle=0.1 sls=0.15 ; title 'Stepwise variable selection'; /* Best subset selection based on the score statistic; BEST=k to request the best k models if possible. */ proc phreg; Z5=Z1*Age; Z6=Z2*Age; Z7=Z3*Age; Age2=Age*Age; model time*status(0) = Z1 Z2 Z3 age Z5 Z6 Z7 age2 /selection=score best=2 ; title 'Best subset variable selection'; ************************************** Stepwise variable selection 1 Step 1. Variable Z7 is entered. The model contains the following explanatory variables: Z7 Model Fit Statistics Without With Criterion Covariates Covariates -2 LOG L 394.426 381.367 AIC 394.426 383.367 SBC 394.426 385.279 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 13.0593 1 0.0003 Score 20.1619 1 <.0001 Wald 16.9073 1 <.0001 Step 2. Variable Z6 is entered. The model contains the following explanatory variables: Z6 Z7 Model Fit Statistics Without With Criterion Covariates Covariates -2 LOG L 394.426 378.143 AIC 394.426 382.143 SBC 394.426 385.967 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 16.2824 2 0.0003 Score 23.1767 2 <.0001 Wald 19.5696 2 <.0001 NOTE: No (additional) variables met the 0.1 level for entry into the model. Analysis of Maximum Likelihood Estimates Parameter Standard Hazard Variable DF Estimate Error Chi-Square Pr > ChiSq Ratio Z6 1 0.00896 0.00487 3.3849 0.0658 1.009 Z7 1 0.02426 0.00550 19.4565 <.0001 1.025 Summary of Stepwise Selection Variable Number Score Wald Step Entered Removed In Chi-Square Chi-Square Pr > ChiSq 1 Z7 1 20.1619 . <.0001 2 Z6 2 3.4790 . 0.0622 ***************************************** Best subset variable selection 3 Regression Models Selected by Score Criterion Number of Score Variables Chi-Square Variables Included in Model 1 20.1619 Z7 1 19.5608 Z3 2 23.1767 Z6 Z7 2 23.0251 Z2 Z7 3 25.7138 Z1 Z5 Z7 3 25.1081 Z1 Z3 Z5 4 28.5391 Z1 Z5 Z6 Z7 4 28.3954 Z1 Z2 Z5 Z7 5 29.0403 Z1 age Z5 Z7 Age2 5 28.5672 Z1 Z3 Z5 Z6 Z7 6 32.4620 Z1 age Z5 Z6 Z7 Age2 6 32.2929 Z1 Z2 age Z5 Z7 Age2 7 32.4895 Z1 Z2 age Z5 Z6 Z7 Age2 7 32.4621 Z1 Z3 age Z5 Z6 Z7 Age2 8 32.4911 Z1 Z2 Z3 age Z5 Z6 Z7 Age2