SURVIVAL ANALYSIS, III: PROC PHREG. n54703.016 PROC LOGISTIC makes it possible to model risk of an event as a function of covariates, and accommodates both categorical and continuous covariates. The coefficient estimates that are produced by PROC LOGISTIC can be interpreted as log(Odds Ratios). A limitation with PROC LOGISTIC is that events that may happen over a certain fixed followup period are classified simply as occurring or not occurring. The actual time that they occur is not used. Presumably if two people have an event - say, a heart attack - at some point during a one-year period, it is likely that the person who had a hard attack two days after followup started had more risk factors at the outset than did the person who had a heart attack after 11 months of followup. Thus there is information in the time of event which contributes to estimates of the effects of risk factors. PROC LIFETEST can take time-of-event into account, but it does not completely address the problem of studying the effects of multivariate models. You can stratify on one variable and study the effect of one other variable given that stratification, but the effects of other variables are not taken into account, and PROC LIFETEST provides no way of studying interactions of two or more variables. PROC PHREG essentially accomplishes both things: the outcomes are time- to-event, and risk factors can be incorporated into a regression model. The estimated coefficients of the risk factors can be interpreted as the log of the relative hazard associated with that risk factor. The PH in PHREG stands for 'proportional hazards'. This is a reference to a family of survival models that was introduced by David R. Cox in 1972 [Cox, D.R. (1972) Regression models and life-tables (with discussion), J. Royal Statist. Soc. Series B, 34, 187-220]. The REG in PHREG stands for regression. Proportional hazards regression is also frequently called 'Cox regression', and the model described below is called the 'Cox model'. The model is usually stated in terms of the hazard function, h(t; X). Here X denotes a collection of covariates (or risk factors). The model is written as h(t; X) = h0(t) * exp(b1*X1 + b2*X2 + ... + bp*Xp). This is a *semiparametric* model. Part of the model, called the base hazard function h0(t), is unspecified. This part of the model is regarded as being common to everyone in the followup study, regardless of their covariate values. The other part of the model, exp(b1*X1 + b2*X2 + ... + bp*Xp), represents the way in which the covariates *modify* the underlying base hazard. In fact, suppose that one of the covariates, say X1, was increased by one unit, and all the other covariates are left the same. The ratio of the new hazard to the old one is easy seen to be exp(b1) i.e., one may regard b1 as the log of the relative hazard corresponding to a one-unit increase in the covariate X1. A positive coefficient for a covariate in the proportional-hazards model indicates that the relative hazard is greater than 1, and that any increase in that covariate corresponds to an increase in hazard. Conversely, a negative coefficient for a given covariate indicates that the relative hazard for a 1-unit increase in that covariate is less than 1, and that increases in that covariate correspond to *decreases* in the hazard. Given a dataset comprised of times to events (or censoring times) and values of covariates for a group of individuals, PROC PHREG will produce parameter estimates, standard errors, likelihood statistics, estimates of relative hazards, and confidence limits. Below is an example analysis of the heroin addiction/methadone dataset from Chapter 12 of Der & Everitt: ================================================================================= FILENAME GRAPH 'gsas.grf' ; LIBNAME loc '' ; OPTIONS LINESIZE = 80 MPRINT ; GOPTIONS RESET = GLOBAL ROTATE = PORTRAIT FTEXT = SWISSB DEVICE = PS300 GACCESS = SASGASTD GSFNAME = GRAPH GSFMODE = REPLACE GUNIT = PCT BORDER CBACK = WHITE HTITLE = 2 HTEXT = 1 ; *===================================================================== ; /* Chapter 12 */ footnote "~john-c/5421/chap12.sas &sysdate &systime" ; data heroin; infile cards expandtabs; input id clinic status time prison dose @@; dose60 = . ; if dose ge 0 and dose lt 60 then dose60 = 1 ; if dose ge 60 then dose60 = 2 ; cards; 1 1 1 428 0 50 132 2 0 633 0 70 2 1 1 275 1 55 133 2 1 661 0 40 [lines omitted ...] 131 2 0 367 0 70 266 1 1 47 0 45 ; run ; proc phreg data=heroin; model time * status(0) = prison dose / rl; strata clinic; title1 'Data from Chapter 12 of Der & Everitt' ; title2 'Time to termination of methadone treatment for heroin addiction'; title3 'versus prison (0 or 1) and methadone dose.'; run; proc stdize data = heroin out = heroin2; var dose; run; data stdd ; set heroin ; meandose = 59.5588235 ; stddose = 14.7380326 ; dosestd = (dose - meandose)/stddose ; run ; proc means data = heroin2 ; var dose ; title1 'Mean dose from proc stdize' ; run ; proc means data = stdd ; var dosestd ; title1 'Mean dose from dataset stdd' ; run ; proc phreg data = heroin2; model time * status(0) = prison dose / rl; strata clinic; baseline out = phout logsurv = ls loglogs = lls / method = ch; title1 'Data from Chapter 12 of Der & Everitt' ; title2 'Time to termination of methadone treatment for heroin addiction'; title3 'versus prison (0 or 1) and methadone dose STANDARDIZED' ; run; data phout ; set phout ; nlogsurv = -ls ; run ; symbol1 i = steplj v = square w = 2 h = 1 c = black l = 1 ; symbol2 i = steplj v = circle w = 2 h = 1 c = gray l = 2 ; proc gplot data=phout; plot nlogsurv * time = clinic; title1 h = 2 'Data from Chapter 12 of Der & Everitt' ; title2 h = 2 'Plot of -Log(Survival) vs. Clinic' ; title3 '' ; run; =========================================================================================== Data from Chapter 12 of Der & Everitt 1 Time to termination of methadone treatment for heroin addiction versus prison (0 or 1) and methadone dose. 15:51 Sunday, April 18, 2004 The PHREG Procedure Model Information Data Set WORK.HEROIN Dependent Variable time Censoring Variable status Censoring Value(s) 0 Ties Handling BRESLOW Summary of the Number of Event and Censored Values Percent Stratum clinic Total Event Censored Censored 1 1 163 122 41 25.15 2 2 75 28 47 62.67 ------------------------------------------------------------------- Total 238 150 88 36.97 Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Without With Criterion Covariates Covariates -2 LOG L 1229.367 1204.859 AIC 1229.367 1208.859 SBC 1229.367 1214.881 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 24.5079 2 <.0001 Score 25.7467 2 <.0001 Wald 25.7764 2 <.0001 ~john-c/5421/chap12.sas 18APR04 15:51 *-----------------------------------------------------------------------------------------; Data from Chapter 12 of Der & Everitt 2 Time to termination of methadone treatment for heroin addiction versus prison (0 or 1) and methadone dose. 15:51 Sunday, April 18, 2004 The PHREG Procedure Analysis of Maximum Likelihood Estimates Parameter Standard Variable DF Estimate Error Chi-Square Pr > ChiSq prison 1 0.35771 0.16816 4.5247 0.0334 dose 1 -0.02597 0.00553 22.0845 <.0001 Analysis of Maximum Likelihood Estimates Hazard 95% Hazard Ratio Variable Ratio Confidence Limits prison 1.430 1.029 1.988 dose 0.974 0.964 0.985 ~john-c/5421/chap12.sas 18APR04 15:51 *-----------------------------------------------------------------------------------------; Mean dose from proc stdize 3 15:51 Sunday, April 18, 2004 The MEANS Procedure Analysis Variable : dose N Mean Std Dev Minimum Maximum ------------------------------------------------------------------- 238 -3.01346E-16 1.0000000 -4.0411651 2.7440010 ------------------------------------------------------------------- ~john-c/5421/chap12.sas 18APR04 15:51 *-----------------------------------------------------------------------------------------; Mean dose from dataset stdd 4 15:51 Sunday, April 18, 2004 The MEANS Procedure Analysis Variable : dosestd N Mean Std Dev Minimum Maximum ------------------------------------------------------------------- 238 1.995637E-9 1.0000000 -4.0411651 2.7440010 ------------------------------------------------------------------- ~john-c/5421/chap12.sas 18APR04 15:51 *-----------------------------------------------------------------------------------------; Data from Chapter 12 of Der & Everitt 5 Time to termination of methadone treatment for heroin addiction versus prison (0 or 1) and methadone dose STANDARDIZED 15:51 Sunday, April 18, 2004 The PHREG Procedure Model Information Data Set WORK.HEROIN2 Dependent Variable time Censoring Variable status Censoring Value(s) 0 Ties Handling BRESLOW Summary of the Number of Event and Censored Values Percent Stratum clinic Total Event Censored Censored 1 1 163 122 41 25.15 2 2 75 28 47 62.67 ------------------------------------------------------------------- Total 238 150 88 36.97 Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Without With Criterion Covariates Covariates -2 LOG L 1229.367 1204.859 AIC 1229.367 1208.859 SBC 1229.367 1214.881 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 24.5079 2 <.0001 Score 25.7467 2 <.0001 Wald 25.7764 2 <.0001 ~john-c/5421/chap12.sas 18APR04 15:51 *-----------------------------------------------------------------------------------------; Data from Chapter 12 of Der & Everitt 6 Time to termination of methadone treatment for heroin addiction versus prison (0 or 1) and methadone dose STANDARDIZED 15:51 Sunday, April 18, 2004 The PHREG Procedure Analysis of Maximum Likelihood Estimates Parameter Standard Variable DF Estimate Error Chi-Square Pr > ChiSq prison 1 0.35771 0.16816 4.5247 0.0334 dose 1 -0.38282 0.08146 22.0845 <.0001 Analysis of Maximum Likelihood Estimates Hazard 95% Hazard Ratio Variable Ratio Confidence Limits prison 1.430 1.029 1.988 dose 0.682 0.581 0.800 ~john-c/5421/chap12.sas 18APR04 15:51 ================================================================================= There are several points in this analysis which require discussion: 1. PROC PHREG SYNTAX: The MODEL statement in PROC PHREG is of the form model time * status(0) = covar1 covar2 ... covarp / rl ; Here 'time' is a variable (named by the programmer, not a SAS reserved word) which indicates followup time. 'status' is another user-named variable which indicates censoring status. For example, people who have the event of interest might have status = 1, while those who are censored might have status = 0. The value in parenthesis after 'status' indicates the censoring value. covar1 covar2 ... covarp are the covariates of interest. The option 'rl' stands for 'risk limits' - actually, confidence limits for the relative hazard associated with a given covariate. There is also a STRATA statement available in PROC PHREG. This has the effect that in the computations inside PROC PHREG, people are compared to other people within their own stratum. The syntax is: proc phreg data = datafile ; model time * death(0) = age dbp chol cigs / rl ; strata clinic ; Stratification in this manner is appropriate for studies in which it is believed in advance that the stratifying variable is strongly related to risk, or the stratifying variable is related to the study design. For example, in a clinical trial, randomization of participants is often carried out with separate randomization schedules for each clinic. Since the randomization is stratified (to achieve approximate balance within clinic), the analysis logically should be stratified by the same factor. Note that the printout in the example for the first PHREG analysis includes 1) the numbers of events and censored observations (non-events) for each stratum 2) 'Model Fit Statistics', i.e., -2 Log L, AIC, etc. 3) Tests of global null hypothesis (no effect of all the covariates) 4) Parameter estimates, standard errors, Chi-square statistics, and p-values 5) Estimated hazard ratios and 95% confidence limits 2. PROC STDIZE: PROC STDIZE is a procedure available in SAS Ver 8 (not in Ver 6) which replaces a given variable by its standardized version. For example, if X is the variable, then the standardized version is defined as X* = (X - mean(X)) / StdDev(x). The resulting variable X* has mean 0 and standard deviation 1. This is relevant to PROC PHREG and other regression routines in the following way. Frequently you want to know what the relative hazard is for a "realistic" increment in a given covariate. It is frequently the case that a 1-unit increment is not very useful. For example, the relative hazard of CHD death associated with a person's cholesterol going from 238 to 239 mg/dl is not very informative and will in any case be very close to 1. The population standard deviation of cholesterol is on the order of 25 mg/dl. It is more interesting to know the relative hazard associated with a 1-standard-deviation increase in cholesterol. Of course you would find that the relative hazard for a 100 mg/dl increase in cholesterol is even bigger, but in general an increase of 100 mg/dl is not common and not easy to achieve by treating a person with cholesterol lowering drugs. 25 mg/dl increments are large enough to be interesting, but not unrealistically large. In the example shown, the standardized version of the variable 'dose' is given on a dataset output by PROC STDIZE: out = heroin2. The calculations in PROC STDIZE can be accomplished separately by another data step as shown in the program. 3. ANALYSIS ON STANDARDIZED VARIABLE The second run of PROC PHREG uses the dataset heroin2 and the *standardized* dose variable (from PROC STDIZE). Note that the coefficient of dose is different from that in the first run of PHREG (with the original non- standardized dose), but the test statistics (-2 Log L, AIC, chi-square, etc) are the same. The hazard ratios from the two analyses can be compared: Original unstandardized dose: Hazard ratio = exp(-0.02597) = 0.974. Standardized dose : Hazard ratio = exp(-0.38282) = 0.682. (Note that the standard deviation of the original variable 'dose' was 14.73, and .38282 / .02597 = 14.74.) 4. GRAPHICS OUTPUT FROM PROC PHREG: The second run of PROC PHREG shown above includes an output statement: baseline out = phout logsurv = ls loglogs = lls / method = ch ; The word 'baseline' must be included. 'logsurv = ls' says that the output data set must include the log(survival) function, and the name given to it (by the user) here is 'ls'. Similarly, loglogs is log(-log(survival)), and the name given here to that variable is 'lls'. The log(survival) function plot is useful for telling whether the survival distribution is exponential: if it is, the plot will look like a straight line. The log(-log(survival)) plot similarly will look like a straight line if the survival distribution is 'Weibull' (a generalization of the exponential distribution). The graphics output file is here named 'phout', and this file serves as input to the the PROC GPLOT procedure. Note the two SYMBOL statements preceding PROC GPLOT: these specify that the functions to be graphed are step functions with a square printed for each event when clinic = 1, and a circle printed when clinic = 2. The graph may be seen at: http://www.biostat.umn.edu/~john-c/5421/chap12ls.grf =========================================================================================== Appended is an interesting analysis of MRFIT data on deaths-from-any-cause within the first 7 years of followup, versus age and BMI. Here I have coded BMI into 10 deciles, each of size about 1286 men. PROC PHREG does not accept CLASS variables unfortunately, so I had to create indicator variables for each decile. I used PROC RANK to define the deciles variable. The syntax for PROC RANK is as follows: proc rank group = 7 out = outrank ; var x y z ; ranks x06 y06 z06 ; In this case, septiles (7 equal-sized groups) are defined for variables x, y, and z. I have specified that the resulting septile variables are named x06, y06, and z06. Each of these variables has an integer value from 0 to 6, corresponding to the values of the original variable - that is, the values of x in the lowest septile correspond to x06 = 0, those in the second septile correspond to x06 = 1, etc.. [Why SAS did not code the output from PROC RANK as 1 to 7 rather than 0 to 6 is beyond me.] Example program: ================================================================================= [lines omitted ...] PROC RANK DATA = SEL GROUP = 10 OUT = OUTRANK ; VAR BMIBASE ; RANKS BMI09 ; PROC MEANS DATA = OUTRANK N MEAN STD ; CLASS BMI09 ; VAR DEATH7 BMIBASE ; TITLE1 'MRFIT Data: deaths and BMI means vs BMI deciles ...' ; FORMAT BMI09 DECILE. ; run ; DATA OUTRANK ; SET OUTRANK ; BMI1 = 0 ; BMI2 = 0 ; BMI3 = 0 ; BMI4 = 0 ; BMI5 = 0 ; BMI6 = 0 ; BMI7 = 0 ; BMI8 = 0 ; BMI9 = 0 ; BMI10 = 0 ; IF BMI09 EQ 0 THEN BMI1 = 1 ; IF BMI09 EQ 1 THEN BMI2 = 1 ; IF BMI09 EQ 2 THEN BMI3 = 1 ; IF BMI09 EQ 3 THEN BMI4 = 1 ; IF BMI09 EQ 4 THEN BMI5 = 1 ; IF BMI09 EQ 5 THEN BMI6 = 1 ; IF BMI09 EQ 6 THEN BMI7 = 1 ; IF BMI09 EQ 7 THEN BMI8 = 1 ; IF BMI09 EQ 8 THEN BMI9 = 1 ; IF BMI09 EQ 9 THEN BMI10 = 1 ; RUN ; PROC PHREG DATA = OUTRANK ; MODEL FOLLOW7Y * DEATH7(0) = AGE BASECHOL BMI1 BMI2 BMI3 BMI4 BMI5 BMI6 BMI7 BMI8 BMI9 / rl ; TITLE1 'MRFIT Data: Death-any-cause versus age, baseline cholesterol, and deciles of BMI'; run ; endsas ; =========================================================================================== MRFIT Data: deaths and BMI means vs BMI deciles ... 2 18:35 Sunday, April 18, 2004 BMI09 N Obs Variable N Mean Std Dev ------------------------------------------------------------- Decile 1 1286 DEATH7 1286 0.0396579 0.1952300 BMIBASE 1286 22.2325776 1.3006656 Decile 2 1287 DEATH7 1287 0.0505051 0.2190698 BMIBASE 1287 24.3086946 0.3581602 Decile 3 1290 DEATH7 1290 0.0372093 0.1893477 BMIBASE 1290 25.3626956 0.2666302 Decile 4 1286 DEATH7 1286 0.0373250 0.1896308 BMIBASE 1286 26.2082527 0.2262083 Decile 5 1284 DEATH7 1284 0.0311526 0.1737978 BMIBASE 1284 26.9993730 0.2326106 Decile 6 1287 DEATH7 1287 0.0349650 0.1837627 BMIBASE 1287 27.8309493 0.2498441 Decile 7 1287 DEATH7 1287 0.0411810 0.1987860 BMIBASE 1287 28.7412654 0.2882927 Decile 8 1286 DEATH7 1286 0.0466563 0.2109837 BMIBASE 1286 29.8439421 0.3525592 Decile 9 1287 DEATH7 1287 0.0466200 0.2109056 BMIBASE 1287 31.3412200 0.5283700 Decile 10 1286 DEATH7 1286 0.0451011 0.2076066 BMIBASE 1286 34.4814611 1.7660754 ------------------------------------------------------------- billings.sas (joanneb) 18APR04 18:35 ------------------------------------------------------------------------------------------- MRFIT Data: Death-any-cause versus age, baseline cholesterol, and deciles of BMI 3 18:35 Sunday, April 18, 2004 The PHREG Procedure Data Set: WORK.OUTRANK Dependent Variable: FOLLOW7Y Censoring Variable: DEATH7 Censoring Value(s): 0 Ties Handling: BRESLOW Summary of the Number of Event and Censored Values Percent Total Event Censored Censored 12866 528 12338 95.90 Testing Global Null Hypothesis: BETA=0 Without With Criterion Covariates Covariates Model Chi-Square -2 LOG L 8876.003 8779.573 96.430 with 11 DF (p=0.0001) Score . . 94.345 with 11 DF (p=0.0001) Wald . . 92.007 with 11 DF (p=0.0001) Analysis of Maximum Likelihood Estimates Conditional Risk Ratio and 95% Confidence Limits Parameter Standard Wald Pr > Risk Variable DF Estimate Error Chi-Square Chi-Square Ratio Lower Upper AGE 1 0.070027 0.00771 82.46664 0.0001 1.073 1.056 1.089 BASECHOL 1 0.000565 0.00125 0.20515 0.6506 1.001 0.998 1.003 BMI1 1 -0.148122 0.19230 0.59332 0.4411 0.862 0.592 1.257 BMI2 1 -0.011859 0.18082 0.00430 0.9477 0.988 0.693 1.409 BMI3 1 -0.241878 0.19518 1.53568 0.2153 0.785 0.536 1.151 BMI4 1 -0.228584 0.19530 1.36993 0.2418 0.796 0.543 1.167 BMI5 1 -0.415583 0.20569 4.08217 0.0433 0.660 0.441 0.988 BMI6 1 -0.299602 0.19872 2.27297 0.1316 0.741 0.502 1.094 BMI7 1 -0.107250 0.19011 0.31827 0.5726 0.898 0.619 1.304 BMI8 1 0.039266 0.18417 0.04546 0.8312 1.040 0.725 1.492 BMI9 1 0.007799 0.18420 0.00179 0.9662 1.008 0.702 1.446 billings.sas (joanneb) 18APR04 18:35 ================================================================================= It is interesting to note the relationship here between body mass index and relative hazard (or "Risk Ratio"). Note that BMI10 is omitted from the analysis since otherwise the set of covariates would be redundant. The "Risk Ratios" for BMI1, BMI2, ... BMI9 are relative to that of BMI10, which by default would have a "Risk Ratio" of 1. The body mass index for the average MRFIT man was 27.7. The "ideal" body mass index is considered to be between 20 and 25. Values in the range 26 to 29 are considered "FAT". Which BMI category was at lowest risk? Which was at highest risk? What were the corresponding average BMIs within those categories ? ----------------------------------------------------------------------------------------- PROBLEM 1: Use PROC PHREG to analyze the data from the Minnesota Heart Survey (see notes n54703.015), with death as the outcome and age, smoke, and sbpav2 as predictors. Test for an interaction between age and sbpav2. Describe your results. ========================================================================================= n54703.016 Last update: April 11, 2005.