SURVIVAL ANALYSIS, I: PROC LIFETEST                             n54703.014

     In some studies, the outcome indicates simply whether an event occurred
or it did not: for example, death after a period of treatment for a disease
condition.  But frequently there is more information available than just whether
the event occurred.  The actual *time* of the event is of interest also.
In general it is better to survive a long time than to survive for a short time,
and it is useful to know, e.g., that drug A tends to keep you alive longer than drug B.

     A typical survival study may start with a cohort of people who are followed
for 5 years.  Some people may be randomized to treatment with a medication, while
others are assigned to use a placebo.  Some people die before the 5 years of
followup are completed.  Most survive the entire time.  Deaths are counted as
the events of interest, and the *time of event* is the primary outcome.  People
who survive all 5 years are described as *censored*.  The assumption is that
people who survive the five years are still at risk of dying after the study is
complete, and that in fact they will eventually die, but the investigators
will not be able to determine the time of death.  All the investigators know is
the vital status at the end of 5 years, or at the time the person was last seen
in the study.

     A simple datafile for such a study might have the following structure:

----------------------------------------------------------------------------------

    Subject       Group     Time last seen       Censoring Status
   ---------     -------   ----------------     ------------------
        1           A             5.0                    1
        2           B             3.1                    0
        3           B             4.8                    0
        4           B             5.0                    1
        5           A             3.0                    1


----------------------------------------------------------------------------------

     People whose censoring status is 1 are alive at the last time they are seen
in the study.  Thus in the example above, Subjects 1 and 4 survived all 5 years
of followup.  Subject 5 survived three years, and that was the time of last
contact.  Subject 5 may have died some time between year 3 and year 5 or at some
time after year 5.  All the investigators know is that Subject 5 was alive at
year 3.  Subjects 2 and 3 have censorting status = 0, meaning that they died after
(respectively) 3.1 and 4.8 years of followup.

     A real example of such a datafile is the Minnesota Heart Survey file
(see notes n54703.013).  Recall the input statement for variables on this file:

----------------------------------------------------------------------------------

DATA HEART;
INFILE 'mwheart.data';

INPUT
@1   ID        12.
@14  AGE        2.
@17  SEX        1.
@19  ENTRYDAT   mmddyy8.
@28  DTHDATE    mmddyy8.
@37  CAUSE      5.1
@43  CHOL       3.		
@47  HDL        2.
@50  BMI        5.2
@56  SMOKE      1.
@58  CIGSDAY    2.
@61  THIOC      3.
@65  EVERBPRX   1.
@67  NOWBPRX    1.
@69  DBPAV2     3.
@72  SBPAV2     3.
@76  EDUYRS     2.
;
 
----------------------------------------------------------------------------------

     There are two dates on the file: ENTRYDAT and DTHDATE.  These are both
recorded in SAS as the number of days since January 1, 1960.  For people who
did not die during the course of the study, DTHDATE is missing.  For people
who died during the study, the *difference* between DTHDATE and ENTRYDAT is
the time of death (in days).  As noted in the description, the last date of
followup in this study was July 1, 1992.  For people who did not die, the
followup time is the difference between July 1, 1992 and ENTRYDAT.

     This gives you sufficient information to construct the variables which
are essential for survival analysis, as follows:



----------------------------------------------------------------------------------

     LASTFOLL = MDY(07, 01, 92) ;
     FOLLDAYS = DTHDATE - ENTRYDAT ;
     IF DTHDATE = . THEN FOLLDAYS = LASTFOLL - ENTRYDAT ;
     FOLLYRS = FOLLDAYS / 365.25 ;

     CENSOR = 1 ;
     IF DTHDATE NE . THEN CENSOR = 0 ;

     DEAD = 1 - CENSOR ;

----------------------------------------------------------------------------------

     Note here that MDY(07, 01, 92) is a special SAS function which computes
the days from January 1, 1960 to July 1, 1992.

     Note that followup time, in the form of either FOLLDAYS or FOLLYRS, is
defined for each person in the study, regardless of whether they survived or
died.

     Note that the variable CENSOR is defined to be 1 if the person did not
die, and is 0 if the person did die.

     Below are the program and the printout for a simple PROC LIFETEST analysis
of survival, by gender:

=================================================================================

*   MWHEART.SAS;
*   Reads MWHEART.DATA file (485 cases, 17 vars) randomly selected from;
*   4086 cases of the Mid-West Heart study;

OPTIONS LINESIZE = 80 CENTER PAGESIZE = 58  NUMBER LABEL;
TITLE 'Selected cases from the Mid-West Heart study conducted 1980-82 by';
TITLE2 'the Division of Epidemiology, SPH U. Minnesota';
TITLE3 'Mortality follow-up through 7/1/92 is based on National Death Index';

PROC FORMAT;
 VALUE sexfmt  1='M'  2='F';
 VALUE ynfmt   1='Y'  2='N';
 VALUE smkfmt  1='current'  2='exsmoker'  3='nonsmoker';
 
DATA HEART;
INFILE 'mwheart.data';

INPUT
@1   ID        12.
@14  AGE        2.
@17  SEX        1.
@19  ENTRYDAT   mmddyy8.
@28  DTHDATE    mmddyy8.
@37  CAUSE      5.1
@43  CHOL       3.		
@47  HDL        2.
@50  BMI        5.2
@56  SMOKE      1.
@58  CIGSDAY    2.
@61  THIOC      3.
@65  EVERBPRX   1.
@67  NOWBPRX    1.
@69  DBPAV2     3.
@72  SBPAV2     3.
@76  EDUYRS     2.
;
 
* For survival analysis need: censor = censoring indicator;
* Also need followup time: FOLLDAYS (in days) or FOLLYRS (in years);
* Compute days to death and convert to years;

*---------------------------------------------------------------------;

 LASTFOLL = MDY(07, 01, 92) ;
 FOLLDAYS = DTHDATE - ENTRYDAT ;
 IF DTHDATE = . THEN FOLLDAYS = LASTFOLL - ENTRYDAT ;
 FOLLYRS = FOLLDAYS / 365.25 ;

 CENSOR = 1 ;
 IF DTHDATE NE . THEN CENSOR = 0 ;

 DEAD = 1 - CENSOR ;

LABEL
 ID       = 'Identifying sequential number'
 AGE      = 'age at entry'
 SEX      = 'Gender: 1=M, 2=F'
 ENTRYDAT = 'Date of entry interview'
 DTHDATE  = 'Date of death'
 CAUSE    = 'ICD-9 code cause of death XXX.X'
 CHOL     = 'Serum total cholesterol (mg/dl)'
 HDL      = 'HDL cholesterol (mg/dl)'
 BMI      = 'Body Mass Index (function of height and weight)'
 SMOKE    = 'smoking status'
 CIGSDAY  = 'Cigarettes smoked per day'
 THIOC    = 'Thiocyanate level (indicator of smoking)'
 EVERBPRX = 'Ever use BP med: 1=Y, 2=N'
 NOWBPRX  = 'Now use BP med: 1=Y, 2=N'
 DBPAV2   = 'Diastolic Blood Pressure (mmHg) - ave. of two readings.'
 SBPAV2   = 'Systolic blood pressure (mmHg) - average of two readings'
 EDUYRS   = 'Years of education'
 CENSOR   = 'Survival censoring status: 1 = censored, 0 = not censored'
 DEAD     = 'Vital status: 0 = alive at end of study, 1 = dead'
 LASTFOLL = 'Date of last followup: July 1, 1992'
 FOLLDAYS = 'Followup time: days until death/last followup (if surv.)'
 FOLLYRS  = 'Followup time: years until death/last followup (if surv.)';

FORMAT	SEX sexfmt. EVERBPRX NOWBPRX ynfmt. SMOKE smkfmt.;
		
********************************** END DATA STEP ****************************;

proc lifetest data = heart outsurv = surcurve ;
     time follyrs * censor(1) ;
strata sex ;
title1 'PROC LIFETEST analysis of MWHEART data, by Sex' ;

data surcurve ;
     set surcurve ;
     if _censor_ ne 0 then delete ;

proc print data = surcurve ;
title1 'Print of output file from PROC LIFETEST.' ;

proc plot data = surcurve ;
     plot survival * follyrs = sex ;
title1 'PROC LIFETEST Analysis of MWHEART data, by Gender' ;
title2 'Survival Proportions versus Sex' ;
endsas ;
----------------------------------------------------------------------------------


                 PROC LIFETEST analysis of MWHEART data, by Sex                1
                                                    20:23 Tuesday, April 6, 2004

                             The LIFETEST Procedure

                              Stratum 1: SEX = F 
 
                        Product-Limit Survival Estimates
 
                                          Survival
                                          Standard     Number      Number 
      FOLLYRS     Survival    Failure      Error       Failed       Left  

       0.0000       1.0000           0           0        0         263   
       0.5996       0.9962     0.00380     0.00380        1         262   
       0.8652       0.9924     0.00760     0.00536        2         261   
       1.7577       0.9886      0.0114     0.00655        3         260   
       3.8494       0.9848      0.0152     0.00755        4         259   
       5.7385       0.9810      0.0190     0.00842        5         258   
       8.8022       0.9772      0.0228     0.00921        6         257   
       9.0623       0.9734      0.0266     0.00993        7         256   
      10.0260*           .           .           .        7         255   
      10.0671*           .           .           .        7         254   
      10.0753*           .           .           .        7         253   

      *******          OBSERVATIONS DELETED        ********

      12.2847*           .           .           .        7           4   
      12.3121*           .           .           .        7           3   
      12.3203*           .           .           .        7           2   
      12.3231*           .           .           .        7           1   
      12.3258*           .           .           .        7           0   

           NOTE: The marked survival times are censored observations.


                  PROC LIFETEST analysis of MWHEART data, by Sex                7
                                                    20:23 Tuesday, April 6, 2004

                             The LIFETEST Procedure

                  Summary Statistics for Time Variable FOLLYRS

                               Quartile Estimates
 
                               Point     95% Confidence Interval
                  Percent    Estimate      [Lower      Upper)

                       75       .           .           .    
                       50       .           .           .    
                       25       .           .           .    


                               Mean    Standard Error

                             8.9377            0.0618

NOTE: The mean survival time and its standard error were underestimated because 
 the largest observation was censored and the estimation was restricted to the 
                              largest event time.


                  PROC LIFETEST analysis of MWHEART data, by Sex                8
                                                    20:23 Tuesday, April 6, 2004

                             The LIFETEST Procedure

                              Stratum 2: SEX = M 
 
                        Product-Limit Survival Estimates
 
                                          Survival
                                          Standard     Number      Number 
      FOLLYRS     Survival    Failure      Error       Failed       Left  

       0.0000       1.0000           0           0        0         221   
       1.9411       0.9955     0.00452     0.00451        1         220   
       2.8720       0.9910     0.00905     0.00637        2         219   
       3.1814            .           .           .        3         218   
       3.1814       0.9819      0.0181     0.00897        4         217   
       3.5483       0.9774      0.0226      0.0100        5         216   
       5.1417       0.9729      0.0271      0.0109        6         215   
       5.5332       0.9683      0.0317      0.0118        7         214   
       6.0726       0.9638      0.0362      0.0126        8         213   
       6.0780       0.9593      0.0407      0.0133        9         212   
       6.5106       0.9548      0.0452      0.0140       10         211   
       6.8364       0.9502      0.0498      0.0146       11         210   
       7.5784       0.9457      0.0543      0.0152       12         209   
       8.1834       0.9412      0.0588      0.0158       13         208   
       9.1198            .           .           .       14         207   
       9.1198       0.9321      0.0679      0.0169       15         206   
       9.2758*           .           .           .       15         205   
      10.0397*           .           .           .       15         204   
      10.0671*           .           .           .       15         203   

      *******          OBSERVATIONS DELETED        ********

      10.6585       0.9265      0.0735      0.0177       16         166   
      10.6667*           .           .           .       16         165   
      12.3203*           .           .           .       16           2   
      12.3258*           .           .           .       16           1   
      12.3258*           .           .           .       16           0   

           NOTE: The marked survival times are censored observations.
                  PROC LIFETEST analysis of MWHEART data, by Sex               13
                                                    20:23 Tuesday, April 6, 2004

                             The LIFETEST Procedure

                  Summary Statistics for Time Variable FOLLYRS

                               Quartile Estimates
 
                               Point     95% Confidence Interval
                  Percent    Estimate      [Lower      Upper)

                       75       .           .           .    
                       50       .           .           .    
                       25       .           .           .    


                               Mean    Standard Error

                            10.3192            0.0963

NOTE: The mean survival time and its standard error were underestimated because 
 the largest observation was censored and the estimation was restricted to the 
                              largest event time.


            Summary of the Number of Censored and Uncensored Values
 
                                                                Percent
        Stratum    SEX            Total  Failed    Censored    Censored

              1    F                263       7         256       97.34
              2    M                221      16         205       92.76
        ---------------------------------------------------------------
          Total                     484      23         461       95.25

 NOTE: There were 1 observations with missing values, negative time values or 
                         frequency values less than 1.
                  PROC LIFETEST analysis of MWHEART data, by Sex               14
                                                    20:23 Tuesday, April 6, 2004

                             The LIFETEST Procedure

         Testing Homogeneity of Survival Curves for FOLLYRS over Strata


                                Rank Statistics
 
                        SEX         Log-Rank    Wilcoxon

                        F            -5.5527     -2550.0
                        M             5.5527      2550.0


                 Covariance Matrix for the Log-Rank Statistics
 
                        SEX             F             M

                        F         5.69866      -5.69866
                        M        -5.69866       5.69866


                 Covariance Matrix for the Wilcoxon Statistics
 
                        SEX             F             M

                        F         1252658      -1252658
                        M        -1252658       1252658


                          Test of Equality over Strata
 
                                                      Pr >   
                   Test      Chi-Square      DF    Chi-Square

                   Log-Rank      5.4105       1      0.0200  
                   Wilcoxon      5.1910       1      0.0227  
                   -2Log(LR)     5.4513       1      0.0196  


                     Print of output file from PROC LIFETEST.                  15
                                                    20:23 Tuesday, April 6, 2004

 Obs    SEX    FOLLYRS    _CENSOR_    SURVIVAL    SDF_LCL    SDF_UCL    STRATUM

   1     F      0.0000        0        1.00000    1.00000    1.00000       1   
   2     F      0.5996        0        0.99620    0.98876    1.00000       1   
   3     F      0.8652        0        0.99240    0.98190    1.00000       1   
   4     F      1.7577        0        0.98859    0.97576    1.00000       1   
   5     F      3.8494        0        0.98479    0.97000    0.99958       1   
   6     F      5.7385        0        0.98099    0.96448    0.99749       1   
   7     F      8.8022        0        0.97719    0.95914    0.99523       1   
   8     F      9.0623        0        0.97338    0.95393    0.99284       1   
   9     M      0.0000        0        1.00000    1.00000    1.00000       2   
  10     M      1.9411        0        0.99548    0.98663    1.00000       2   
  11     M      2.8720        0        0.99095    0.97847    1.00000       2   
  12     M      3.1814        0        0.98190    0.96432    0.99948       2   
  13     M      3.5483        0        0.97738    0.95777    0.99698       2   
  14     M      5.1417        0        0.97285    0.95142    0.99428       2   
  15     M      5.5332        0        0.96833    0.94524    0.99142       2   
  16     M      6.0726        0        0.96380    0.93917    0.98843       2   
  17     M      6.0780        0        0.95928    0.93322    0.98533       2   
  18     M      6.5106        0        0.95475    0.92735    0.98215       2   
  19     M      6.8364        0        0.95023    0.92155    0.97890       2   
  20     M      7.5784        0        0.94570    0.91583    0.97558       2   
  21     M      8.1834        0        0.94118    0.91016    0.97220       2   
  22     M      9.1198        0        0.93213    0.89896    0.96529       2   
  23     M     10.6585        0        0.92655    0.89182    0.96127       2   


                PROC LIFETEST Analysis of MWHEART data, by Gender              16
                        Survival Proportions versus Sex
                                                    20:23 Tuesday, April 6, 2004

               Plot of SURVIVAL*FOLLYRS.  Symbol is value of SEX.

       |
       |
S 1.00 +  F
u      |
r      |     F       M
v      |
i      |       F
v 0.99 +                  M
a      |            F
l      |
       |                       F
D      |                   M
i 0.98 +                                  F
s      |                      M                           F
t      |
r      |                                                    F
i      |                              M
b 0.97 +
u      |                                M
t      |
i      |                                   M
o      |
n 0.96 +                                   M
       |
F      |
u      |                                      M
n      |
c 0.95 +                                        M
t      |
i      |                                            M
o      |
n      |                                               M
  0.94 +
E      |
s      |
t      |
i      |                                                    M
m 0.93 +
a      |
t      |                                                             M
e      |
       |
  0.92 +
       |
       ---+----------+----------+----------+----------+----------+----------+--
          0          2          4          6          8         10         12

               Followup time: years until death/last followup (if surv.)

NOTE: 1 obs hidden.
=================================================================================

     Clearly PROC LIFETEST generates a LOT of output.  In fact, by default it will
print out one line for each subject in the study.  I have omitted observations within
each gender after the last observed event.  Only 7 women in the study died during
followup, and 16 men (i.e., in this subset of 485 people).  The printout for women
looks like the following:

----------------------------------------------------------------------------------
                                          Survival
                                          Standard     Number      Number 
      FOLLYRS     Survival    Failure      Error       Failed       Left  

       0.0000       1.0000           0           0        0         263   
       0.5996       0.9962     0.00380     0.00380        1         262   
       0.8652       0.9924     0.00760     0.00536        2         261   
       1.7577       0.9886      0.0114     0.00655        3         260   
----------------------------------------------------------------------------------

     Here 'FOLLYRS' is the followup time.  The first death among women occurred
0.5996 years after entry into the study.  'Survival' is the estimated fraction
surviving at that time, 'Failure' is the cumulative estimated fraction who
died, and 'Survival Standard Error' is the estimated standard error of 'Survival'
(this may be used to compute confidence limits ...).  The 'Number Failed' is the
number who have died at any point of followup, and 'Number Left' is the number
who have not failed *and* who have not been censored.  Note that Number Left
decreases with each observation, whether a censored observation or a 'Failure'.

     The printout summarizes the numbers of censored and uncensored observations,
by the stratifying variable (gender):

----------------------------------------------------------------------------------
            Summary of the Number of Censored and Uncensored Values
 
                                                                Percent
        Stratum    SEX            Total  Failed    Censored    Censored

              1    F                263       7         256       97.34
              2    M                221      16         205       92.76
        ---------------------------------------------------------------
          Total                     484      23         461       95.25
----------------------------------------------------------------------------------

     The printout also performs tests of whether the estimated survival curves
for the strata specified by the stratifying variable are significantly different;
these are summarized in the table called 'Test of Equality over Strata'.  In this
case, all three of the tests (Log-Rank, Wilcoxon, and -2Log(LR)) have p-values
of about .02, indicating that the survival distributions for men and women are
significantly different.

     In this example, an output data set from PROC LIFETEST was specified, in
the line:

     proc lifetest data = heart outsurv = surcurve ;

     This output data set has a number of variables on it that are computed by
PROC LIFETEST.  Of particular interest is the SURVIVAL variable.  The output
data set was read by another data step and the censored observations were
deleted.  The PROC PRINT output shows the variables and data on the resulting
file (which has only 23 observations).

     Finally, the OUTSURVE data set was used to plot the variable SURVIVAL versus
followup time, by GENDER.  This was done by the use of PROC PLOT.  There is
rather strong evidence in this plot that the survival distributions for men
and women are different, as indicated by the tests described above.

=================================================================================
n54703.014  Last update: April 6, 2004.