SURVIVAL ANALYSIS, III: PROC PHREG.                               n54703.016

     PROC LOGISTIC makes it possible to model risk of an event as a function of
covariates, and accommodates both categorical and continuous covariates. The
coefficient estimates that are produced by PROC LOGISTIC can be interpreted as
log(Odds Ratios).

     A limitation with PROC LOGISTIC is that events that may happen over a certain
fixed followup period are classified simply as occurring or not occurring. The actual
time that they occur is not used.  Presumably if two people have an event - say, a
heart attack - at some point during a one-year period, it is likely that the person
who had a hard attack two days after followup started had more risk factors at the
outset than did the person who had a heart attack after 11 months of followup.  Thus
there is information in the time of event which contributes to estimates of the
effects of risk factors.

     PROC LIFETEST can take time-of-event into account, but it does not completely
address the problem of studying the effects of multivariate models.  You can stratify
on one variable and study the effect of one other variable given that stratification,
but the effects of other variables are not taken into account, and PROC LIFETEST
provides no way of studying interactions of two or more variables.

     PROC PHREG essentially accomplishes both things: the outcomes are time- to-event,
and risk factors can be incorporated into a regression model.  The estimated
coefficients of the risk factors can be interpreted as the log of the relative hazard
associated with that risk factor.

     The PH in PHREG stands for 'proportional hazards'.  This is a reference to a
family of survival models that was introduced by David R. Cox in 1972 [Cox, D.R.
(1972) Regression models and life-tables (with discussion), J. Royal Statist. Soc.
Series B, 34, 187-220].  The REG in PHREG stands for regression.  Proportional hazards
regression is also frequently called 'Cox regression', and the model described below
is called the 'Cox model'.

     The model is usually stated in terms of the hazard function, h(t; X). Here X
denotes a collection of covariates (or risk factors).  The model is written as

              h(t; X) = h0(t) * exp(b1*X1 + b2*X2 + ... + bp*Xp).

     This is a *semiparametric* model.  Part of the model, called the base hazard
function h0(t), is unspecified.  This part of the model is regarded as being common to
everyone in the followup study, regardless of their covariate values.  The other part
of the model,

              exp(b1*X1 + b2*X2 + ... + bp*Xp),

represents the way in which the covariates *modify* the underlying base hazard.  In
fact, suppose that one of the covariates, say X1, was increased by one unit, and all
the other covariates are left the same.  The ratio of the new hazard to the old one is
easy seen to be

              exp(b1)

i.e., one may regard b1 as the log of the relative hazard corresponding to a one-unit
increase in the covariate X1.

     A positive coefficient for a covariate in the proportional-hazards model
indicates that the relative hazard is greater than 1, and that any increase in that
covariate corresponds to an increase in hazard.  Conversely, a negative coefficient
for a given covariate indicates that the relative hazard for a 1-unit increase in that
covariate is less than 1, and that increases in that covariate correspond to
*decreases* in the hazard.

     Given a dataset comprised of times to events (or censoring times) and values of
covariates for a group of individuals, PROC PHREG will produce parameter estimates,
standard errors, likelihood statistics, estimates of relative hazards, and confidence
limits.

     Below is an example analysis of the heroin addiction/methadone dataset from
Chapter 12 of Der & Everitt:

=================================================================================

FILENAME GRAPH 'gsas.grf' ;
LIBNAME  loc '' ;

OPTIONS  LINESIZE = 80 MPRINT ;

GOPTIONS
         RESET = GLOBAL
         ROTATE = PORTRAIT
         FTEXT = SWISSB
         DEVICE = PS300
         GACCESS = SASGASTD
         GSFNAME = GRAPH
         GSFMODE = REPLACE
         GUNIT = PCT BORDER
         CBACK = WHITE
         HTITLE = 2 HTEXT = 1 ;

*===================================================================== ;        

/* Chapter 12 */

footnote "~john-c/5421/chap12.sas &sysdate &systime" ;

data heroin;
  infile cards expandtabs;
  input id clinic status time prison dose @@;

dose60 = . ;
if dose ge 0 and dose lt 60 then dose60 = 1 ;
if dose ge 60 then dose60 = 2 ;

cards;
  1 1 1 428	0  50 132 2 0  633	0	70
  2 1 1 275	1  55 133 2 1  661	0	40

  [lines omitted ...]

131 2 0 367	0  70 266 1 1   47	0	45
;
run ;
 
proc phreg data=heroin;
     model time * status(0) = prison dose / rl;
     strata clinic;
title1 'Data from Chapter 12 of Der & Everitt' ;
title2
 'Time to termination of methadone treatment for heroin addiction';
title3 'versus prison (0 or 1) and methadone dose.';
run;

proc stdize data = heroin out = heroin2;
     var dose;
run;

data stdd ; set heroin ;
     meandose = 59.5588235 ;
     stddose = 14.7380326 ;
     dosestd = (dose - meandose)/stddose ;
run ;

proc means data = heroin2 ;
     var dose ;
title1 'Mean dose from proc stdize' ;
run ;

proc means data = stdd ;
     var dosestd ;
title1 'Mean dose from dataset stdd' ;
run ;

proc phreg data = heroin2;
     model time * status(0) = prison dose / rl;
     strata clinic;
     baseline out = phout logsurv = ls loglogs = lls / method = ch;
title1 'Data from Chapter 12 of Der & Everitt' ;
title2
 'Time to termination of methadone treatment for heroin addiction';
title3 'versus prison (0 or 1) and methadone dose STANDARDIZED' ;
run;

data phout ; set phout ;
     nlogsurv = -ls ;
run ;

symbol1 i = steplj  v = square w = 2 h = 1 c = black l = 1 ;
symbol2 i = steplj  v = circle w = 2 h = 1 c = gray  l = 2 ;

proc gplot data=phout;
     plot nlogsurv * time = clinic;
title1 h = 2 'Data from Chapter 12 of Der & Everitt' ;
title2 h = 2 'Plot of -Log(Survival) vs. Clinic' ;
title3 '' ;
run;

===========================================================================================
                     Data from Chapter 12 of Der & Everitt                     1
        Time to termination of methadone treatment for heroin addiction
                   versus prison (0 or 1) and methadone dose.
                                                    15:51 Sunday, April 18, 2004

                              The PHREG Procedure

                               Model Information

                      Data Set                 WORK.HEROIN
                      Dependent Variable       time       
                      Censoring Variable       status     
                      Censoring Value(s)       0          
                      Ties Handling            BRESLOW    


               Summary of the Number of Event and Censored Values
 
                                                                  Percent
      Stratum    clinic         Total       Event    Censored    Censored

            1    1                163         122          41       25.15
            2    2                 75          28          47       62.67
      -------------------------------------------------------------------
        Total                     238         150          88       36.97


                               Convergence Status

                 Convergence criterion (GCONV=1E-8) satisfied.          


                             Model Fit Statistics
 
                                     Without           With
                    Criterion     Covariates     Covariates

                    -2 LOG L        1229.367       1204.859
                    AIC             1229.367       1208.859
                    SBC             1229.367       1214.881


                    Testing Global Null Hypothesis: BETA=0
 
            Test                 Chi-Square       DF     Pr > ChiSq

            Likelihood Ratio        24.5079        2         <.0001
            Score                   25.7467        2         <.0001
            Wald                    25.7764        2         <.0001

 
                     ~john-c/5421/chap12.sas 18APR04 15:51
*-----------------------------------------------------------------------------------------;
                      Data from Chapter 12 of Der & Everitt                     2
        Time to termination of methadone treatment for heroin addiction
                   versus prison (0 or 1) and methadone dose.
                                                    15:51 Sunday, April 18, 2004

                              The PHREG Procedure

                    Analysis of Maximum Likelihood Estimates
 
                        Parameter      Standard
     Variable    DF      Estimate         Error    Chi-Square    Pr > ChiSq

     prison       1       0.35771       0.16816        4.5247        0.0334
     dose         1      -0.02597       0.00553       22.0845        <.0001

                    Analysis of Maximum Likelihood Estimates
 
                                Hazard      95% Hazard Ratio
                  Variable       Ratio      Confidence Limits

                  prison         1.430       1.029       1.988
                  dose           0.974       0.964       0.985
 
 
                     ~john-c/5421/chap12.sas 18APR04 15:51
*-----------------------------------------------------------------------------------------;
                            Mean dose from proc stdize                          3
                                                    15:51 Sunday, April 18, 2004

                              The MEANS Procedure

                           Analysis Variable : dose 
 
        N            Mean         Std Dev         Minimum         Maximum
      -------------------------------------------------------------------
      238    -3.01346E-16       1.0000000      -4.0411651       2.7440010
      -------------------------------------------------------------------
 
                     ~john-c/5421/chap12.sas 18APR04 15:51
*-----------------------------------------------------------------------------------------;
                           Mean dose from dataset stdd                          4
                                                    15:51 Sunday, April 18, 2004

                              The MEANS Procedure

                          Analysis Variable : dosestd 
 
        N            Mean         Std Dev         Minimum         Maximum
      -------------------------------------------------------------------
      238     1.995637E-9       1.0000000      -4.0411651       2.7440010
      -------------------------------------------------------------------
 
 
 
                     ~john-c/5421/chap12.sas 18APR04 15:51
*-----------------------------------------------------------------------------------------;
                      Data from Chapter 12 of Der & Everitt                     5
        Time to termination of methadone treatment for heroin addiction
             versus prison (0 or 1) and methadone dose STANDARDIZED
                                                    15:51 Sunday, April 18, 2004

                              The PHREG Procedure

                               Model Information

                     Data Set                 WORK.HEROIN2
                     Dependent Variable       time        
                     Censoring Variable       status      
                     Censoring Value(s)       0           
                     Ties Handling            BRESLOW     


               Summary of the Number of Event and Censored Values
 
                                                                  Percent
      Stratum    clinic         Total       Event    Censored    Censored

            1    1                163         122          41       25.15
            2    2                 75          28          47       62.67
      -------------------------------------------------------------------
        Total                     238         150          88       36.97


                               Convergence Status

                 Convergence criterion (GCONV=1E-8) satisfied.          


                             Model Fit Statistics
 
                                     Without           With
                    Criterion     Covariates     Covariates

                    -2 LOG L        1229.367       1204.859
                    AIC             1229.367       1208.859
                    SBC             1229.367       1214.881


                    Testing Global Null Hypothesis: BETA=0
 
            Test                 Chi-Square       DF     Pr > ChiSq

            Likelihood Ratio        24.5079        2         <.0001
            Score                   25.7467        2         <.0001
            Wald                    25.7764        2         <.0001

 
                     ~john-c/5421/chap12.sas 18APR04 15:51
*-----------------------------------------------------------------------------------------;
                      Data from Chapter 12 of Der & Everitt                     6
        Time to termination of methadone treatment for heroin addiction
             versus prison (0 or 1) and methadone dose STANDARDIZED
                                                    15:51 Sunday, April 18, 2004

                              The PHREG Procedure

                    Analysis of Maximum Likelihood Estimates
 
                        Parameter      Standard
     Variable    DF      Estimate         Error    Chi-Square    Pr > ChiSq

     prison       1       0.35771       0.16816        4.5247        0.0334
     dose         1      -0.38282       0.08146       22.0845        <.0001

                    Analysis of Maximum Likelihood Estimates
 
                                Hazard      95% Hazard Ratio
                  Variable       Ratio      Confidence Limits

                  prison         1.430       1.029       1.988
                  dose           0.682       0.581       0.800
 
 
                     ~john-c/5421/chap12.sas 18APR04 15:51
=================================================================================

     There are several points in this analysis which require discussion:

1.  PROC PHREG SYNTAX:

     The MODEL statement in PROC PHREG is of the form

         model time * status(0) = covar1 covar2 ... covarp / rl ;

     Here 'time' is a variable (named by the programmer, not a SAS reserved
     word) which indicates followup time.  'status' is another user-named
     variable which indicates censoring status.  For example, people who have
     the event of interest might have status = 1, while those who are censored
     might have status = 0.  The value in parenthesis after 'status' indicates
     the censoring value.

     covar1 covar2 ... covarp are the covariates of interest.

     The option 'rl' stands for 'risk limits' - actually, confidence limits
     for the relative hazard associated with a given covariate.

     There is also a STRATA statement available in PROC PHREG.  This has the
     effect that in the computations inside PROC PHREG, people are compared
     to other people within their own stratum.  The syntax is:

        proc phreg data = datafile ;
             model time * death(0) = age dbp chol cigs / rl ;
             strata clinic ;

     Stratification in this manner is appropriate for studies in which
     it is believed in advance that the stratifying variable is strongly
     related to risk, or the stratifying variable is related to the study
     design.  For example, in a clinical trial, randomization of participants
     is often carried out with separate randomization schedules for each
     clinic.  Since the randomization is stratified (to achieve approximate
     balance within clinic), the analysis logically should be stratified by
     the same factor.

     Note that the printout in the example for the first PHREG analysis
     includes 

         1)  the numbers of events and censored observations (non-events)
             for each stratum

         2)  'Model Fit Statistics', i.e., -2 Log L, AIC, etc.

         3)  Tests of global null hypothesis (no effect of all the covariates)

         4)  Parameter estimates, standard errors, Chi-square statistics, and
             p-values

         5)  Estimated hazard ratios and 95% confidence limits

2.  PROC STDIZE:

     PROC STDIZE is a procedure available in SAS Ver 8 (not in Ver 6) which
     replaces a given variable by its standardized version.  For example, if X
     is the variable, then the standardized version is defined as

             X* = (X - mean(X)) / StdDev(x).

     The resulting variable X* has mean 0 and standard deviation 1.

     This is relevant to PROC PHREG and other regression routines in the following
     way.  Frequently you want to know what the relative hazard is for a
     "realistic" increment in a given covariate.  It is frequently the case
     that a 1-unit increment is not very useful.  For example, the relative
     hazard of CHD death associated with a person's cholesterol going from
     238 to 239 mg/dl is not very informative and will in any case be very
     close to 1.  The population standard deviation of cholesterol is on the
     order of 25 mg/dl.  It is more interesting to know the relative hazard
     associated with a 1-standard-deviation increase in cholesterol.  Of course
     you would find that the relative hazard for a 100 mg/dl increase in
     cholesterol is even bigger, but in general an increase of 100 mg/dl is
     not common and not easy to achieve by treating a person with cholesterol
     lowering drugs.  25 mg/dl increments are large enough to be interesting,
     but not unrealistically large.

     In the example shown, the standardized version of the variable 'dose'
     is given on a dataset output by PROC STDIZE: out = heroin2.

     The calculations in PROC STDIZE can be accomplished separately by
     another data step as shown in the program.


3.  ANALYSIS ON STANDARDIZED VARIABLE

    The second run of PROC PHREG uses the dataset heroin2 and the *standardized*
    dose variable (from PROC STDIZE).  Note that the coefficient of dose
    is different from that in the first run of PHREG (with the original non-
    standardized dose), but the test statistics (-2 Log L, AIC, chi-square, etc)
    are the same.  The hazard ratios from the two analyses can be compared:

        Original unstandardized dose:  Hazard ratio = exp(-0.02597) = 0.974.

        Standardized dose           :  Hazard ratio = exp(-0.38282) = 0.682.

     (Note that the standard deviation of the original variable 'dose' was
      14.73, and .38282 / .02597 = 14.74.)

4.  GRAPHICS OUTPUT FROM PROC PHREG:

    The second run of PROC PHREG shown above includes an output statement:

        baseline out = phout logsurv = ls loglogs = lls / method = ch ;

    The word 'baseline' must be included.  'logsurv = ls' says that the
    output data set must include the log(survival) function, and the name
    given to it (by the user) here is 'ls'.  Similarly, loglogs is
    log(-log(survival)), and the name given here to that variable is 'lls'.
    The log(survival) function plot is useful for telling whether the
    survival distribution is exponential: if it is, the plot will look like a 
    straight line.

    The log(-log(survival)) plot similarly will look like a straight line
    if the survival distribution is 'Weibull' (a generalization of the
    exponential distribution).

    The graphics output file is here named 'phout', and this file serves as
    input to the the PROC GPLOT procedure.  Note the two SYMBOL statements
    preceding PROC GPLOT: these specify that the functions to be graphed are
    step functions with a square printed for each event when clinic = 1, and
    a circle printed when clinic = 2.  The graph may be seen at:

             http://www.biostat.umn.edu/~john-c/5421/chap12ls.grf


===========================================================================================

     Appended is an interesting analysis of MRFIT data on deaths-from-any-cause within
the first 7 years of followup, versus age and BMI.  Here I have coded BMI into 10
deciles, each of size about 1286 men.  PROC PHREG does not accept CLASS variables
unfortunately, so I had to create indicator variables for each decile.  I used
PROC RANK to define the deciles variable.  The syntax for PROC RANK is as follows:

     proc rank group = 7 out = outrank ;
          var  x y z ;
          ranks x06 y06 z06 ;

     In this case, septiles (7 equal-sized groups) are defined for variables x, y, and z.
I have specified that the resulting septile variables are named x06, y06, and z06.
Each of these variables has an integer value from 0 to 6, corresponding to the values
of the original variable - that is, the values of x in the lowest septile correspond
to x06 = 0, those in the second septile correspond to x06 = 1, etc..  [Why SAS did not
code the output from PROC RANK as 1 to 7 rather than 0 to 6 is beyond me.]

     Example program:
=================================================================================
[lines omitted ...]

PROC RANK DATA = SEL GROUP = 10 OUT = OUTRANK ;
     VAR BMIBASE ;
     RANKS BMI09 ;

PROC MEANS DATA = OUTRANK N MEAN STD ;
     CLASS BMI09 ;
     VAR DEATH7 BMIBASE ;
TITLE1 'MRFIT Data: deaths and BMI means vs BMI deciles ...' ;
FORMAT BMI09 DECILE. ;
run ;

DATA OUTRANK ;
     SET OUTRANK ;
     BMI1 = 0 ; BMI2 = 0 ; BMI3 = 0 ; BMI4 = 0 ; BMI5 = 0 ;
     BMI6 = 0 ; BMI7 = 0 ; BMI8 = 0 ; BMI9 = 0 ; BMI10 = 0 ;
     IF BMI09 EQ 0 THEN BMI1 = 1 ;
     IF BMI09 EQ 1 THEN BMI2 = 1 ;
     IF BMI09 EQ 2 THEN BMI3 = 1 ;
     IF BMI09 EQ 3 THEN BMI4 = 1 ;
     IF BMI09 EQ 4 THEN BMI5 = 1 ;
     IF BMI09 EQ 5 THEN BMI6 = 1 ;
     IF BMI09 EQ 6 THEN BMI7 = 1 ;
     IF BMI09 EQ 7 THEN BMI8 = 1 ;
     IF BMI09 EQ 8 THEN BMI9 = 1 ;
     IF BMI09 EQ 9 THEN BMI10 = 1 ;
RUN ;

 PROC PHREG DATA = OUTRANK ;
      MODEL FOLLOW7Y * DEATH7(0) = AGE BASECHOL
                                   BMI1 BMI2 BMI3 BMI4 BMI5
                                   BMI6 BMI7 BMI8 BMI9 / rl ;
 TITLE1 'MRFIT Data: Death-any-cause versus age, baseline cholesterol, and deciles of BMI';
 run ;

 endsas ;
===========================================================================================

                         MRFIT Data: deaths and BMI means vs BMI deciles ...                        2
                                                                        18:35 Sunday, April 18, 2004

                       BMI09  N Obs  Variable      N          Mean       Std Dev
                   -------------------------------------------------------------
                   Decile 1    1286  DEATH7     1286     0.0396579     0.1952300
                                     BMIBASE    1286    22.2325776     1.3006656

                   Decile 2    1287  DEATH7     1287     0.0505051     0.2190698
                                     BMIBASE    1287    24.3086946     0.3581602

                   Decile 3    1290  DEATH7     1290     0.0372093     0.1893477
                                     BMIBASE    1290    25.3626956     0.2666302

                   Decile 4    1286  DEATH7     1286     0.0373250     0.1896308
                                     BMIBASE    1286    26.2082527     0.2262083

                   Decile 5    1284  DEATH7     1284     0.0311526     0.1737978
                                     BMIBASE    1284    26.9993730     0.2326106

                   Decile 6    1287  DEATH7     1287     0.0349650     0.1837627
                                     BMIBASE    1287    27.8309493     0.2498441

                   Decile 7    1287  DEATH7     1287     0.0411810     0.1987860
                                     BMIBASE    1287    28.7412654     0.2882927

                   Decile 8    1286  DEATH7     1286     0.0466563     0.2109837
                                     BMIBASE    1286    29.8439421     0.3525592

                   Decile 9    1287  DEATH7     1287     0.0466200     0.2109056
                                     BMIBASE    1287    31.3412200     0.5283700

                   Decile 10   1286  DEATH7     1286     0.0451011     0.2076066
                                     BMIBASE    1286    34.4814611     1.7660754
                   -------------------------------------------------------------
 
                                billings.sas (joanneb) 18APR04 18:35
-------------------------------------------------------------------------------------------
           MRFIT Data: Death-any-cause versus age, baseline cholesterol, and deciles of BMI         3
                                                                        18:35 Sunday, April 18, 2004

                                        The PHREG Procedure

               Data Set: WORK.OUTRANK
               Dependent Variable: FOLLOW7Y
               Censoring Variable: DEATH7  
               Censoring Value(s): 0  
               Ties Handling: BRESLOW 


                                      Summary of the Number of
                                      Event and Censored Values
 
                                                                 Percent
                               Total       Event    Censored    Censored

                               12866         528       12338       95.90


                               Testing Global Null Hypothesis: BETA=0
 
                             Without        With   
              Criterion    Covariates    Covariates    Model Chi-Square

              -2 LOG L       8876.003      8779.573      96.430 with 11 DF (p=0.0001) 
              Score              .             .         94.345 with 11 DF (p=0.0001) 
              Wald               .             .         92.007 with 11 DF (p=0.0001) 


                              Analysis of Maximum Likelihood Estimates
 
                                                                        Conditional Risk Ratio and
                                                                           95% Confidence Limits
 
                   Parameter     Standard      Wald         Pr >          Risk
Variable   DF       Estimate       Error    Chi-Square   Chi-Square      Ratio      Lower      Upper

AGE         1       0.070027      0.00771     82.46664       0.0001      1.073      1.056      1.089
BASECHOL    1       0.000565      0.00125      0.20515       0.6506      1.001      0.998      1.003
BMI1        1      -0.148122      0.19230      0.59332       0.4411      0.862      0.592      1.257
BMI2        1      -0.011859      0.18082      0.00430       0.9477      0.988      0.693      1.409
BMI3        1      -0.241878      0.19518      1.53568       0.2153      0.785      0.536      1.151
BMI4        1      -0.228584      0.19530      1.36993       0.2418      0.796      0.543      1.167
BMI5        1      -0.415583      0.20569      4.08217       0.0433      0.660      0.441      0.988
BMI6        1      -0.299602      0.19872      2.27297       0.1316      0.741      0.502      1.094
BMI7        1      -0.107250      0.19011      0.31827       0.5726      0.898      0.619      1.304
BMI8        1       0.039266      0.18417      0.04546       0.8312      1.040      0.725      1.492
BMI9        1       0.007799      0.18420      0.00179       0.9662      1.008      0.702      1.446

 
                                billings.sas (joanneb) 18APR04 18:35
=================================================================================

     It is interesting to note the relationship here between body mass index and
relative hazard (or "Risk Ratio").  Note that BMI10 is omitted from the analysis
since otherwise the set of covariates would be redundant.  The "Risk Ratios" for
BMI1, BMI2, ... BMI9 are relative to that of BMI10, which by default would have a
"Risk Ratio" of 1.

     The body mass index for the average MRFIT man was 27.7.  The "ideal" body mass
index is considered to be between 20 and 25.  Values in the range 26 to 29 are considered
"FAT".  Which BMI category was at lowest risk?  Which was at highest risk?  What were
the corresponding average BMIs within those categories ?

-----------------------------------------------------------------------------------------

 PROBLEM 1:  Use PROC PHREG to analyze the data from the Minnesota Heart Survey
 (see notes n54703.015), with death as the outcome and age, smoke, and		
 sbpav2 as predictors.  Test for an interaction between age and sbpav2.  Describe
 your results.

=========================================================================================
n54703.016  Last update: April 11, 2005.