SURVIVAL ANALYSIS, III: PROC PHREG. n54703.016
PROC LOGISTIC makes it possible to model risk of an event as a function of
covariates, and accommodates both categorical and continuous covariates. The
coefficient estimates that are produced by PROC LOGISTIC can be interpreted as
log(Odds Ratios).
A limitation with PROC LOGISTIC is that events that may happen over a certain
fixed followup period are classified simply as occurring or not occurring. The actual
time that they occur is not used. Presumably if two people have an event - say, a
heart attack - at some point during a one-year period, it is likely that the person
who had a hard attack two days after followup started had more risk factors at the
outset than did the person who had a heart attack after 11 months of followup. Thus
there is information in the time of event which contributes to estimates of the
effects of risk factors.
PROC LIFETEST can take time-of-event into account, but it does not completely
address the problem of studying the effects of multivariate models. You can stratify
on one variable and study the effect of one other variable given that stratification,
but the effects of other variables are not taken into account, and PROC LIFETEST
provides no way of studying interactions of two or more variables.
PROC PHREG essentially accomplishes both things: the outcomes are time- to-event,
and risk factors can be incorporated into a regression model. The estimated
coefficients of the risk factors can be interpreted as the log of the relative hazard
associated with that risk factor.
The PH in PHREG stands for 'proportional hazards'. This is a reference to a
family of survival models that was introduced by David R. Cox in 1972 [Cox, D.R.
(1972) Regression models and life-tables (with discussion), J. Royal Statist. Soc.
Series B, 34, 187-220]. The REG in PHREG stands for regression. Proportional hazards
regression is also frequently called 'Cox regression', and the model described below
is called the 'Cox model'.
The model is usually stated in terms of the hazard function, h(t; X). Here X
denotes a collection of covariates (or risk factors). The model is written as
h(t; X) = h0(t) * exp(b1*X1 + b2*X2 + ... + bp*Xp).
This is a *semiparametric* model. Part of the model, called the base hazard
function h0(t), is unspecified. This part of the model is regarded as being common to
everyone in the followup study, regardless of their covariate values. The other part
of the model,
exp(b1*X1 + b2*X2 + ... + bp*Xp),
represents the way in which the covariates *modify* the underlying base hazard. In
fact, suppose that one of the covariates, say X1, was increased by one unit, and all
the other covariates are left the same. The ratio of the new hazard to the old one is
easy seen to be
exp(b1)
i.e., one may regard b1 as the log of the relative hazard corresponding to a one-unit
increase in the covariate X1.
A positive coefficient for a covariate in the proportional-hazards model
indicates that the relative hazard is greater than 1, and that any increase in that
covariate corresponds to an increase in hazard. Conversely, a negative coefficient
for a given covariate indicates that the relative hazard for a 1-unit increase in that
covariate is less than 1, and that increases in that covariate correspond to
*decreases* in the hazard.
Given a dataset comprised of times to events (or censoring times) and values of
covariates for a group of individuals, PROC PHREG will produce parameter estimates,
standard errors, likelihood statistics, estimates of relative hazards, and confidence
limits.
Below is an example analysis of the heroin addiction/methadone dataset from
Chapter 12 of Der & Everitt:
=================================================================================
FILENAME GRAPH 'gsas.grf' ;
LIBNAME loc '' ;
OPTIONS LINESIZE = 80 MPRINT ;
GOPTIONS
RESET = GLOBAL
ROTATE = PORTRAIT
FTEXT = SWISSB
DEVICE = PS300
GACCESS = SASGASTD
GSFNAME = GRAPH
GSFMODE = REPLACE
GUNIT = PCT BORDER
CBACK = WHITE
HTITLE = 2 HTEXT = 1 ;
*===================================================================== ;
/* Chapter 12 */
footnote "~john-c/5421/chap12.sas &sysdate &systime" ;
data heroin;
infile cards expandtabs;
input id clinic status time prison dose @@;
dose60 = . ;
if dose ge 0 and dose lt 60 then dose60 = 1 ;
if dose ge 60 then dose60 = 2 ;
cards;
1 1 1 428 0 50 132 2 0 633 0 70
2 1 1 275 1 55 133 2 1 661 0 40
[lines omitted ...]
131 2 0 367 0 70 266 1 1 47 0 45
;
run ;
proc phreg data=heroin;
model time * status(0) = prison dose / rl;
strata clinic;
title1 'Data from Chapter 12 of Der & Everitt' ;
title2
'Time to termination of methadone treatment for heroin addiction';
title3 'versus prison (0 or 1) and methadone dose.';
run;
proc stdize data = heroin out = heroin2;
var dose;
run;
data stdd ; set heroin ;
meandose = 59.5588235 ;
stddose = 14.7380326 ;
dosestd = (dose - meandose)/stddose ;
run ;
proc means data = heroin2 ;
var dose ;
title1 'Mean dose from proc stdize' ;
run ;
proc means data = stdd ;
var dosestd ;
title1 'Mean dose from dataset stdd' ;
run ;
proc phreg data = heroin2;
model time * status(0) = prison dose / rl;
strata clinic;
baseline out = phout logsurv = ls loglogs = lls / method = ch;
title1 'Data from Chapter 12 of Der & Everitt' ;
title2
'Time to termination of methadone treatment for heroin addiction';
title3 'versus prison (0 or 1) and methadone dose STANDARDIZED' ;
run;
data phout ; set phout ;
nlogsurv = -ls ;
run ;
symbol1 i = steplj v = square w = 2 h = 1 c = black l = 1 ;
symbol2 i = steplj v = circle w = 2 h = 1 c = gray l = 2 ;
proc gplot data=phout;
plot nlogsurv * time = clinic;
title1 h = 2 'Data from Chapter 12 of Der & Everitt' ;
title2 h = 2 'Plot of -Log(Survival) vs. Clinic' ;
title3 '' ;
run;
===========================================================================================
Data from Chapter 12 of Der & Everitt 1
Time to termination of methadone treatment for heroin addiction
versus prison (0 or 1) and methadone dose.
15:51 Sunday, April 18, 2004
The PHREG Procedure
Model Information
Data Set WORK.HEROIN
Dependent Variable time
Censoring Variable status
Censoring Value(s) 0
Ties Handling BRESLOW
Summary of the Number of Event and Censored Values
Percent
Stratum clinic Total Event Censored Censored
1 1 163 122 41 25.15
2 2 75 28 47 62.67
-------------------------------------------------------------------
Total 238 150 88 36.97
Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Without With
Criterion Covariates Covariates
-2 LOG L 1229.367 1204.859
AIC 1229.367 1208.859
SBC 1229.367 1214.881
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 24.5079 2 <.0001
Score 25.7467 2 <.0001
Wald 25.7764 2 <.0001
~john-c/5421/chap12.sas 18APR04 15:51
*-----------------------------------------------------------------------------------------;
Data from Chapter 12 of Der & Everitt 2
Time to termination of methadone treatment for heroin addiction
versus prison (0 or 1) and methadone dose.
15:51 Sunday, April 18, 2004
The PHREG Procedure
Analysis of Maximum Likelihood Estimates
Parameter Standard
Variable DF Estimate Error Chi-Square Pr > ChiSq
prison 1 0.35771 0.16816 4.5247 0.0334
dose 1 -0.02597 0.00553 22.0845 <.0001
Analysis of Maximum Likelihood Estimates
Hazard 95% Hazard Ratio
Variable Ratio Confidence Limits
prison 1.430 1.029 1.988
dose 0.974 0.964 0.985
~john-c/5421/chap12.sas 18APR04 15:51
*-----------------------------------------------------------------------------------------;
Mean dose from proc stdize 3
15:51 Sunday, April 18, 2004
The MEANS Procedure
Analysis Variable : dose
N Mean Std Dev Minimum Maximum
-------------------------------------------------------------------
238 -3.01346E-16 1.0000000 -4.0411651 2.7440010
-------------------------------------------------------------------
~john-c/5421/chap12.sas 18APR04 15:51
*-----------------------------------------------------------------------------------------;
Mean dose from dataset stdd 4
15:51 Sunday, April 18, 2004
The MEANS Procedure
Analysis Variable : dosestd
N Mean Std Dev Minimum Maximum
-------------------------------------------------------------------
238 1.995637E-9 1.0000000 -4.0411651 2.7440010
-------------------------------------------------------------------
~john-c/5421/chap12.sas 18APR04 15:51
*-----------------------------------------------------------------------------------------;
Data from Chapter 12 of Der & Everitt 5
Time to termination of methadone treatment for heroin addiction
versus prison (0 or 1) and methadone dose STANDARDIZED
15:51 Sunday, April 18, 2004
The PHREG Procedure
Model Information
Data Set WORK.HEROIN2
Dependent Variable time
Censoring Variable status
Censoring Value(s) 0
Ties Handling BRESLOW
Summary of the Number of Event and Censored Values
Percent
Stratum clinic Total Event Censored Censored
1 1 163 122 41 25.15
2 2 75 28 47 62.67
-------------------------------------------------------------------
Total 238 150 88 36.97
Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Without With
Criterion Covariates Covariates
-2 LOG L 1229.367 1204.859
AIC 1229.367 1208.859
SBC 1229.367 1214.881
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 24.5079 2 <.0001
Score 25.7467 2 <.0001
Wald 25.7764 2 <.0001
~john-c/5421/chap12.sas 18APR04 15:51
*-----------------------------------------------------------------------------------------;
Data from Chapter 12 of Der & Everitt 6
Time to termination of methadone treatment for heroin addiction
versus prison (0 or 1) and methadone dose STANDARDIZED
15:51 Sunday, April 18, 2004
The PHREG Procedure
Analysis of Maximum Likelihood Estimates
Parameter Standard
Variable DF Estimate Error Chi-Square Pr > ChiSq
prison 1 0.35771 0.16816 4.5247 0.0334
dose 1 -0.38282 0.08146 22.0845 <.0001
Analysis of Maximum Likelihood Estimates
Hazard 95% Hazard Ratio
Variable Ratio Confidence Limits
prison 1.430 1.029 1.988
dose 0.682 0.581 0.800
~john-c/5421/chap12.sas 18APR04 15:51
=================================================================================
There are several points in this analysis which require discussion:
1. PROC PHREG SYNTAX:
The MODEL statement in PROC PHREG is of the form
model time * status(0) = covar1 covar2 ... covarp / rl ;
Here 'time' is a variable (named by the programmer, not a SAS reserved
word) which indicates followup time. 'status' is another user-named
variable which indicates censoring status. For example, people who have
the event of interest might have status = 1, while those who are censored
might have status = 0. The value in parenthesis after 'status' indicates
the censoring value.
covar1 covar2 ... covarp are the covariates of interest.
The option 'rl' stands for 'risk limits' - actually, confidence limits
for the relative hazard associated with a given covariate.
There is also a STRATA statement available in PROC PHREG. This has the
effect that in the computations inside PROC PHREG, people are compared
to other people within their own stratum. The syntax is:
proc phreg data = datafile ;
model time * death(0) = age dbp chol cigs / rl ;
strata clinic ;
Stratification in this manner is appropriate for studies in which
it is believed in advance that the stratifying variable is strongly
related to risk, or the stratifying variable is related to the study
design. For example, in a clinical trial, randomization of participants
is often carried out with separate randomization schedules for each
clinic. Since the randomization is stratified (to achieve approximate
balance within clinic), the analysis logically should be stratified by
the same factor.
Note that the printout in the example for the first PHREG analysis
includes
1) the numbers of events and censored observations (non-events)
for each stratum
2) 'Model Fit Statistics', i.e., -2 Log L, AIC, etc.
3) Tests of global null hypothesis (no effect of all the covariates)
4) Parameter estimates, standard errors, Chi-square statistics, and
p-values
5) Estimated hazard ratios and 95% confidence limits
2. PROC STDIZE:
PROC STDIZE is a procedure available in SAS Ver 8 (not in Ver 6) which
replaces a given variable by its standardized version. For example, if X
is the variable, then the standardized version is defined as
X* = (X - mean(X)) / StdDev(x).
The resulting variable X* has mean 0 and standard deviation 1.
This is relevant to PROC PHREG and other regression routines in the following
way. Frequently you want to know what the relative hazard is for a
"realistic" increment in a given covariate. It is frequently the case
that a 1-unit increment is not very useful. For example, the relative
hazard of CHD death associated with a person's cholesterol going from
238 to 239 mg/dl is not very informative and will in any case be very
close to 1. The population standard deviation of cholesterol is on the
order of 25 mg/dl. It is more interesting to know the relative hazard
associated with a 1-standard-deviation increase in cholesterol. Of course
you would find that the relative hazard for a 100 mg/dl increase in
cholesterol is even bigger, but in general an increase of 100 mg/dl is
not common and not easy to achieve by treating a person with cholesterol
lowering drugs. 25 mg/dl increments are large enough to be interesting,
but not unrealistically large.
In the example shown, the standardized version of the variable 'dose'
is given on a dataset output by PROC STDIZE: out = heroin2.
The calculations in PROC STDIZE can be accomplished separately by
another data step as shown in the program.
3. ANALYSIS ON STANDARDIZED VARIABLE
The second run of PROC PHREG uses the dataset heroin2 and the *standardized*
dose variable (from PROC STDIZE). Note that the coefficient of dose
is different from that in the first run of PHREG (with the original non-
standardized dose), but the test statistics (-2 Log L, AIC, chi-square, etc)
are the same. The hazard ratios from the two analyses can be compared:
Original unstandardized dose: Hazard ratio = exp(-0.02597) = 0.974.
Standardized dose : Hazard ratio = exp(-0.38282) = 0.682.
(Note that the standard deviation of the original variable 'dose' was
14.73, and .38282 / .02597 = 14.74.)
4. GRAPHICS OUTPUT FROM PROC PHREG:
The second run of PROC PHREG shown above includes an output statement:
baseline out = phout logsurv = ls loglogs = lls / method = ch ;
The word 'baseline' must be included. 'logsurv = ls' says that the
output data set must include the log(survival) function, and the name
given to it (by the user) here is 'ls'. Similarly, loglogs is
log(-log(survival)), and the name given here to that variable is 'lls'.
The log(survival) function plot is useful for telling whether the
survival distribution is exponential: if it is, the plot will look like a
straight line.
The log(-log(survival)) plot similarly will look like a straight line
if the survival distribution is 'Weibull' (a generalization of the
exponential distribution).
The graphics output file is here named 'phout', and this file serves as
input to the the PROC GPLOT procedure. Note the two SYMBOL statements
preceding PROC GPLOT: these specify that the functions to be graphed are
step functions with a square printed for each event when clinic = 1, and
a circle printed when clinic = 2. The graph may be seen at:
http://www.biostat.umn.edu/~john-c/5421/chap12ls.grf
===========================================================================================
Appended is an interesting analysis of MRFIT data on deaths-from-any-cause within
the first 7 years of followup, versus age and BMI. Here I have coded BMI into 10
deciles, each of size about 1286 men. PROC PHREG does not accept CLASS variables
unfortunately, so I had to create indicator variables for each decile. I used
PROC RANK to define the deciles variable. The syntax for PROC RANK is as follows:
proc rank group = 7 out = outrank ;
var x y z ;
ranks x06 y06 z06 ;
In this case, septiles (7 equal-sized groups) are defined for variables x, y, and z.
I have specified that the resulting septile variables are named x06, y06, and z06.
Each of these variables has an integer value from 0 to 6, corresponding to the values
of the original variable - that is, the values of x in the lowest septile correspond
to x06 = 0, those in the second septile correspond to x06 = 1, etc.. [Why SAS did not
code the output from PROC RANK as 1 to 7 rather than 0 to 6 is beyond me.]
Example program:
=================================================================================
[lines omitted ...]
PROC RANK DATA = SEL GROUP = 10 OUT = OUTRANK ;
VAR BMIBASE ;
RANKS BMI09 ;
PROC MEANS DATA = OUTRANK N MEAN STD ;
CLASS BMI09 ;
VAR DEATH7 BMIBASE ;
TITLE1 'MRFIT Data: deaths and BMI means vs BMI deciles ...' ;
FORMAT BMI09 DECILE. ;
run ;
DATA OUTRANK ;
SET OUTRANK ;
BMI1 = 0 ; BMI2 = 0 ; BMI3 = 0 ; BMI4 = 0 ; BMI5 = 0 ;
BMI6 = 0 ; BMI7 = 0 ; BMI8 = 0 ; BMI9 = 0 ; BMI10 = 0 ;
IF BMI09 EQ 0 THEN BMI1 = 1 ;
IF BMI09 EQ 1 THEN BMI2 = 1 ;
IF BMI09 EQ 2 THEN BMI3 = 1 ;
IF BMI09 EQ 3 THEN BMI4 = 1 ;
IF BMI09 EQ 4 THEN BMI5 = 1 ;
IF BMI09 EQ 5 THEN BMI6 = 1 ;
IF BMI09 EQ 6 THEN BMI7 = 1 ;
IF BMI09 EQ 7 THEN BMI8 = 1 ;
IF BMI09 EQ 8 THEN BMI9 = 1 ;
IF BMI09 EQ 9 THEN BMI10 = 1 ;
RUN ;
PROC PHREG DATA = OUTRANK ;
MODEL FOLLOW7Y * DEATH7(0) = AGE BASECHOL
BMI1 BMI2 BMI3 BMI4 BMI5
BMI6 BMI7 BMI8 BMI9 / rl ;
TITLE1 'MRFIT Data: Death-any-cause versus age, baseline cholesterol, and deciles of BMI';
run ;
endsas ;
===========================================================================================
MRFIT Data: deaths and BMI means vs BMI deciles ... 2
18:35 Sunday, April 18, 2004
BMI09 N Obs Variable N Mean Std Dev
-------------------------------------------------------------
Decile 1 1286 DEATH7 1286 0.0396579 0.1952300
BMIBASE 1286 22.2325776 1.3006656
Decile 2 1287 DEATH7 1287 0.0505051 0.2190698
BMIBASE 1287 24.3086946 0.3581602
Decile 3 1290 DEATH7 1290 0.0372093 0.1893477
BMIBASE 1290 25.3626956 0.2666302
Decile 4 1286 DEATH7 1286 0.0373250 0.1896308
BMIBASE 1286 26.2082527 0.2262083
Decile 5 1284 DEATH7 1284 0.0311526 0.1737978
BMIBASE 1284 26.9993730 0.2326106
Decile 6 1287 DEATH7 1287 0.0349650 0.1837627
BMIBASE 1287 27.8309493 0.2498441
Decile 7 1287 DEATH7 1287 0.0411810 0.1987860
BMIBASE 1287 28.7412654 0.2882927
Decile 8 1286 DEATH7 1286 0.0466563 0.2109837
BMIBASE 1286 29.8439421 0.3525592
Decile 9 1287 DEATH7 1287 0.0466200 0.2109056
BMIBASE 1287 31.3412200 0.5283700
Decile 10 1286 DEATH7 1286 0.0451011 0.2076066
BMIBASE 1286 34.4814611 1.7660754
-------------------------------------------------------------
billings.sas (joanneb) 18APR04 18:35
-------------------------------------------------------------------------------------------
MRFIT Data: Death-any-cause versus age, baseline cholesterol, and deciles of BMI 3
18:35 Sunday, April 18, 2004
The PHREG Procedure
Data Set: WORK.OUTRANK
Dependent Variable: FOLLOW7Y
Censoring Variable: DEATH7
Censoring Value(s): 0
Ties Handling: BRESLOW
Summary of the Number of
Event and Censored Values
Percent
Total Event Censored Censored
12866 528 12338 95.90
Testing Global Null Hypothesis: BETA=0
Without With
Criterion Covariates Covariates Model Chi-Square
-2 LOG L 8876.003 8779.573 96.430 with 11 DF (p=0.0001)
Score . . 94.345 with 11 DF (p=0.0001)
Wald . . 92.007 with 11 DF (p=0.0001)
Analysis of Maximum Likelihood Estimates
Conditional Risk Ratio and
95% Confidence Limits
Parameter Standard Wald Pr > Risk
Variable DF Estimate Error Chi-Square Chi-Square Ratio Lower Upper
AGE 1 0.070027 0.00771 82.46664 0.0001 1.073 1.056 1.089
BASECHOL 1 0.000565 0.00125 0.20515 0.6506 1.001 0.998 1.003
BMI1 1 -0.148122 0.19230 0.59332 0.4411 0.862 0.592 1.257
BMI2 1 -0.011859 0.18082 0.00430 0.9477 0.988 0.693 1.409
BMI3 1 -0.241878 0.19518 1.53568 0.2153 0.785 0.536 1.151
BMI4 1 -0.228584 0.19530 1.36993 0.2418 0.796 0.543 1.167
BMI5 1 -0.415583 0.20569 4.08217 0.0433 0.660 0.441 0.988
BMI6 1 -0.299602 0.19872 2.27297 0.1316 0.741 0.502 1.094
BMI7 1 -0.107250 0.19011 0.31827 0.5726 0.898 0.619 1.304
BMI8 1 0.039266 0.18417 0.04546 0.8312 1.040 0.725 1.492
BMI9 1 0.007799 0.18420 0.00179 0.9662 1.008 0.702 1.446
billings.sas (joanneb) 18APR04 18:35
=================================================================================
It is interesting to note the relationship here between body mass index and
relative hazard (or "Risk Ratio"). Note that BMI10 is omitted from the analysis
since otherwise the set of covariates would be redundant. The "Risk Ratios" for
BMI1, BMI2, ... BMI9 are relative to that of BMI10, which by default would have a
"Risk Ratio" of 1.
The body mass index for the average MRFIT man was 27.7. The "ideal" body mass
index is considered to be between 20 and 25. Values in the range 26 to 29 are considered
"FAT". Which BMI category was at lowest risk? Which was at highest risk? What were
the corresponding average BMIs within those categories ?
-----------------------------------------------------------------------------------------
PROBLEM 1: Use PROC PHREG to analyze the data from the Minnesota Heart Survey
(see notes n54703.015), with death as the outcome and age, smoke, and
sbpav2 as predictors. Test for an interaction between age and sbpav2. Describe
your results.
=========================================================================================
n54703.016 Last update: April 11, 2005.