SURVIVAL ANALYSIS, I: PROC LIFETEST n54703.014 In some studies, the outcome indicates simply whether an event occurred or it did not: for example, death after a period of treatment for a disease condition. But frequently there is more information available than just whether the event occurred. The actual *time* of the event is of interest also. In general it is better to survive a long time than to survive for a short time, and it is useful to know, e.g., that drug A tends to keep you alive longer than drug B. A typical survival study may start with a cohort of people who are followed for 5 years. Some people may be randomized to treatment with a medication, while others are assigned to use a placebo. Some people die before the 5 years of followup are completed. Most survive the entire time. Deaths are counted as the events of interest, and the *time of event* is the primary outcome. People who survive all 5 years are described as *censored*. The assumption is that people who survive the five years are still at risk of dying after the study is complete, and that in fact they will eventually die, but the investigators will not be able to determine the time of death. All the investigators know is the vital status at the end of 5 years, or at the time the person was last seen in the study. A simple datafile for such a study might have the following structure: ---------------------------------------------------------------------------------- Subject Group Time last seen Censoring Status --------- ------- ---------------- ------------------ 1 A 5.0 1 2 B 3.1 0 3 B 4.8 0 4 B 5.0 1 5 A 3.0 1 ---------------------------------------------------------------------------------- People whose censoring status is 1 are alive at the last time they are seen in the study. Thus in the example above, Subjects 1 and 4 survived all 5 years of followup. Subject 5 survived three years, and that was the time of last contact. Subject 5 may have died some time between year 3 and year 5 or at some time after year 5. All the investigators know is that Subject 5 was alive at year 3. Subjects 2 and 3 have censorting status = 0, meaning that they died after (respectively) 3.1 and 4.8 years of followup. A real example of such a datafile is the Minnesota Heart Survey file (see notes n54703.013). Recall the input statement for variables on this file: ---------------------------------------------------------------------------------- DATA HEART; INFILE 'mwheart.data'; INPUT @1 ID 12. @14 AGE 2. @17 SEX 1. @19 ENTRYDAT mmddyy8. @28 DTHDATE mmddyy8. @37 CAUSE 5.1 @43 CHOL 3. @47 HDL 2. @50 BMI 5.2 @56 SMOKE 1. @58 CIGSDAY 2. @61 THIOC 3. @65 EVERBPRX 1. @67 NOWBPRX 1. @69 DBPAV2 3. @72 SBPAV2 3. @76 EDUYRS 2. ; ---------------------------------------------------------------------------------- There are two dates on the file: ENTRYDAT and DTHDATE. These are both recorded in SAS as the number of days since January 1, 1960. For people who did not die during the course of the study, DTHDATE is missing. For people who died during the study, the *difference* between DTHDATE and ENTRYDAT is the time of death (in days). As noted in the description, the last date of followup in this study was July 1, 1992. For people who did not die, the followup time is the difference between July 1, 1992 and ENTRYDAT. This gives you sufficient information to construct the variables which are essential for survival analysis, as follows: ---------------------------------------------------------------------------------- LASTFOLL = MDY(07, 01, 92) ; FOLLDAYS = DTHDATE - ENTRYDAT ; IF DTHDATE = . THEN FOLLDAYS = LASTFOLL - ENTRYDAT ; FOLLYRS = FOLLDAYS / 365.25 ; CENSOR = 1 ; IF DTHDATE NE . THEN CENSOR = 0 ; DEAD = 1 - CENSOR ; ---------------------------------------------------------------------------------- Note here that MDY(07, 01, 92) is a special SAS function which computes the days from January 1, 1960 to July 1, 1992. Note that followup time, in the form of either FOLLDAYS or FOLLYRS, is defined for each person in the study, regardless of whether they survived or died. Note that the variable CENSOR is defined to be 1 if the person did not die, and is 0 if the person did die. Below are the program and the printout for a simple PROC LIFETEST analysis of survival, by gender: ================================================================================= * MWHEART.SAS; * Reads MWHEART.DATA file (485 cases, 17 vars) randomly selected from; * 4086 cases of the Mid-West Heart study; OPTIONS LINESIZE = 80 CENTER PAGESIZE = 58 NUMBER LABEL; TITLE 'Selected cases from the Mid-West Heart study conducted 1980-82 by'; TITLE2 'the Division of Epidemiology, SPH U. Minnesota'; TITLE3 'Mortality follow-up through 7/1/92 is based on National Death Index'; PROC FORMAT; VALUE sexfmt 1='M' 2='F'; VALUE ynfmt 1='Y' 2='N'; VALUE smkfmt 1='current' 2='exsmoker' 3='nonsmoker'; DATA HEART; INFILE 'mwheart.data'; INPUT @1 ID 12. @14 AGE 2. @17 SEX 1. @19 ENTRYDAT mmddyy8. @28 DTHDATE mmddyy8. @37 CAUSE 5.1 @43 CHOL 3. @47 HDL 2. @50 BMI 5.2 @56 SMOKE 1. @58 CIGSDAY 2. @61 THIOC 3. @65 EVERBPRX 1. @67 NOWBPRX 1. @69 DBPAV2 3. @72 SBPAV2 3. @76 EDUYRS 2. ; * For survival analysis need: censor = censoring indicator; * Also need followup time: FOLLDAYS (in days) or FOLLYRS (in years); * Compute days to death and convert to years; *---------------------------------------------------------------------; LASTFOLL = MDY(07, 01, 92) ; FOLLDAYS = DTHDATE - ENTRYDAT ; IF DTHDATE = . THEN FOLLDAYS = LASTFOLL - ENTRYDAT ; FOLLYRS = FOLLDAYS / 365.25 ; CENSOR = 1 ; IF DTHDATE NE . THEN CENSOR = 0 ; DEAD = 1 - CENSOR ; LABEL ID = 'Identifying sequential number' AGE = 'age at entry' SEX = 'Gender: 1=M, 2=F' ENTRYDAT = 'Date of entry interview' DTHDATE = 'Date of death' CAUSE = 'ICD-9 code cause of death XXX.X' CHOL = 'Serum total cholesterol (mg/dl)' HDL = 'HDL cholesterol (mg/dl)' BMI = 'Body Mass Index (function of height and weight)' SMOKE = 'smoking status' CIGSDAY = 'Cigarettes smoked per day' THIOC = 'Thiocyanate level (indicator of smoking)' EVERBPRX = 'Ever use BP med: 1=Y, 2=N' NOWBPRX = 'Now use BP med: 1=Y, 2=N' DBPAV2 = 'Diastolic Blood Pressure (mmHg) - ave. of two readings.' SBPAV2 = 'Systolic blood pressure (mmHg) - average of two readings' EDUYRS = 'Years of education' CENSOR = 'Survival censoring status: 1 = censored, 0 = not censored' DEAD = 'Vital status: 0 = alive at end of study, 1 = dead' LASTFOLL = 'Date of last followup: July 1, 1992' FOLLDAYS = 'Followup time: days until death/last followup (if surv.)' FOLLYRS = 'Followup time: years until death/last followup (if surv.)'; FORMAT SEX sexfmt. EVERBPRX NOWBPRX ynfmt. SMOKE smkfmt.; ********************************** END DATA STEP ****************************; proc lifetest data = heart outsurv = surcurve ; time follyrs * censor(1) ; strata sex ; title1 'PROC LIFETEST analysis of MWHEART data, by Sex' ; data surcurve ; set surcurve ; if _censor_ ne 0 then delete ; proc print data = surcurve ; title1 'Print of output file from PROC LIFETEST.' ; proc plot data = surcurve ; plot survival * follyrs = sex ; title1 'PROC LIFETEST Analysis of MWHEART data, by Gender' ; title2 'Survival Proportions versus Sex' ; endsas ; ---------------------------------------------------------------------------------- PROC LIFETEST analysis of MWHEART data, by Sex 1 20:23 Tuesday, April 6, 2004 The LIFETEST Procedure Stratum 1: SEX = F Product-Limit Survival Estimates Survival Standard Number Number FOLLYRS Survival Failure Error Failed Left 0.0000 1.0000 0 0 0 263 0.5996 0.9962 0.00380 0.00380 1 262 0.8652 0.9924 0.00760 0.00536 2 261 1.7577 0.9886 0.0114 0.00655 3 260 3.8494 0.9848 0.0152 0.00755 4 259 5.7385 0.9810 0.0190 0.00842 5 258 8.8022 0.9772 0.0228 0.00921 6 257 9.0623 0.9734 0.0266 0.00993 7 256 10.0260* . . . 7 255 10.0671* . . . 7 254 10.0753* . . . 7 253 ******* OBSERVATIONS DELETED ******** 12.2847* . . . 7 4 12.3121* . . . 7 3 12.3203* . . . 7 2 12.3231* . . . 7 1 12.3258* . . . 7 0 NOTE: The marked survival times are censored observations. PROC LIFETEST analysis of MWHEART data, by Sex 7 20:23 Tuesday, April 6, 2004 The LIFETEST Procedure Summary Statistics for Time Variable FOLLYRS Quartile Estimates Point 95% Confidence Interval Percent Estimate [Lower Upper) 75 . . . 50 . . . 25 . . . Mean Standard Error 8.9377 0.0618 NOTE: The mean survival time and its standard error were underestimated because the largest observation was censored and the estimation was restricted to the largest event time. PROC LIFETEST analysis of MWHEART data, by Sex 8 20:23 Tuesday, April 6, 2004 The LIFETEST Procedure Stratum 2: SEX = M Product-Limit Survival Estimates Survival Standard Number Number FOLLYRS Survival Failure Error Failed Left 0.0000 1.0000 0 0 0 221 1.9411 0.9955 0.00452 0.00451 1 220 2.8720 0.9910 0.00905 0.00637 2 219 3.1814 . . . 3 218 3.1814 0.9819 0.0181 0.00897 4 217 3.5483 0.9774 0.0226 0.0100 5 216 5.1417 0.9729 0.0271 0.0109 6 215 5.5332 0.9683 0.0317 0.0118 7 214 6.0726 0.9638 0.0362 0.0126 8 213 6.0780 0.9593 0.0407 0.0133 9 212 6.5106 0.9548 0.0452 0.0140 10 211 6.8364 0.9502 0.0498 0.0146 11 210 7.5784 0.9457 0.0543 0.0152 12 209 8.1834 0.9412 0.0588 0.0158 13 208 9.1198 . . . 14 207 9.1198 0.9321 0.0679 0.0169 15 206 9.2758* . . . 15 205 10.0397* . . . 15 204 10.0671* . . . 15 203 ******* OBSERVATIONS DELETED ******** 10.6585 0.9265 0.0735 0.0177 16 166 10.6667* . . . 16 165 12.3203* . . . 16 2 12.3258* . . . 16 1 12.3258* . . . 16 0 NOTE: The marked survival times are censored observations. PROC LIFETEST analysis of MWHEART data, by Sex 13 20:23 Tuesday, April 6, 2004 The LIFETEST Procedure Summary Statistics for Time Variable FOLLYRS Quartile Estimates Point 95% Confidence Interval Percent Estimate [Lower Upper) 75 . . . 50 . . . 25 . . . Mean Standard Error 10.3192 0.0963 NOTE: The mean survival time and its standard error were underestimated because the largest observation was censored and the estimation was restricted to the largest event time. Summary of the Number of Censored and Uncensored Values Percent Stratum SEX Total Failed Censored Censored 1 F 263 7 256 97.34 2 M 221 16 205 92.76 --------------------------------------------------------------- Total 484 23 461 95.25 NOTE: There were 1 observations with missing values, negative time values or frequency values less than 1. PROC LIFETEST analysis of MWHEART data, by Sex 14 20:23 Tuesday, April 6, 2004 The LIFETEST Procedure Testing Homogeneity of Survival Curves for FOLLYRS over Strata Rank Statistics SEX Log-Rank Wilcoxon F -5.5527 -2550.0 M 5.5527 2550.0 Covariance Matrix for the Log-Rank Statistics SEX F M F 5.69866 -5.69866 M -5.69866 5.69866 Covariance Matrix for the Wilcoxon Statistics SEX F M F 1252658 -1252658 M -1252658 1252658 Test of Equality over Strata Pr > Test Chi-Square DF Chi-Square Log-Rank 5.4105 1 0.0200 Wilcoxon 5.1910 1 0.0227 -2Log(LR) 5.4513 1 0.0196 Print of output file from PROC LIFETEST. 15 20:23 Tuesday, April 6, 2004 Obs SEX FOLLYRS _CENSOR_ SURVIVAL SDF_LCL SDF_UCL STRATUM 1 F 0.0000 0 1.00000 1.00000 1.00000 1 2 F 0.5996 0 0.99620 0.98876 1.00000 1 3 F 0.8652 0 0.99240 0.98190 1.00000 1 4 F 1.7577 0 0.98859 0.97576 1.00000 1 5 F 3.8494 0 0.98479 0.97000 0.99958 1 6 F 5.7385 0 0.98099 0.96448 0.99749 1 7 F 8.8022 0 0.97719 0.95914 0.99523 1 8 F 9.0623 0 0.97338 0.95393 0.99284 1 9 M 0.0000 0 1.00000 1.00000 1.00000 2 10 M 1.9411 0 0.99548 0.98663 1.00000 2 11 M 2.8720 0 0.99095 0.97847 1.00000 2 12 M 3.1814 0 0.98190 0.96432 0.99948 2 13 M 3.5483 0 0.97738 0.95777 0.99698 2 14 M 5.1417 0 0.97285 0.95142 0.99428 2 15 M 5.5332 0 0.96833 0.94524 0.99142 2 16 M 6.0726 0 0.96380 0.93917 0.98843 2 17 M 6.0780 0 0.95928 0.93322 0.98533 2 18 M 6.5106 0 0.95475 0.92735 0.98215 2 19 M 6.8364 0 0.95023 0.92155 0.97890 2 20 M 7.5784 0 0.94570 0.91583 0.97558 2 21 M 8.1834 0 0.94118 0.91016 0.97220 2 22 M 9.1198 0 0.93213 0.89896 0.96529 2 23 M 10.6585 0 0.92655 0.89182 0.96127 2 PROC LIFETEST Analysis of MWHEART data, by Gender 16 Survival Proportions versus Sex 20:23 Tuesday, April 6, 2004 Plot of SURVIVAL*FOLLYRS. Symbol is value of SEX. | | S 1.00 + F u | r | F M v | i | F v 0.99 + M a | F l | | F D | M i 0.98 + F s | M F t | r | F i | M b 0.97 + u | M t | i | M o | n 0.96 + M | F | u | M n | c 0.95 + M t | i | M o | n | M 0.94 + E | s | t | i | M m 0.93 + a | t | M e | | 0.92 + | ---+----------+----------+----------+----------+----------+----------+-- 0 2 4 6 8 10 12 Followup time: years until death/last followup (if surv.) NOTE: 1 obs hidden. ================================================================================= Clearly PROC LIFETEST generates a LOT of output. In fact, by default it will print out one line for each subject in the study. I have omitted observations within each gender after the last observed event. Only 7 women in the study died during followup, and 16 men (i.e., in this subset of 485 people). The printout for women looks like the following: ---------------------------------------------------------------------------------- Survival Standard Number Number FOLLYRS Survival Failure Error Failed Left 0.0000 1.0000 0 0 0 263 0.5996 0.9962 0.00380 0.00380 1 262 0.8652 0.9924 0.00760 0.00536 2 261 1.7577 0.9886 0.0114 0.00655 3 260 ---------------------------------------------------------------------------------- Here 'FOLLYRS' is the followup time. The first death among women occurred 0.5996 years after entry into the study. 'Survival' is the estimated fraction surviving at that time, 'Failure' is the cumulative estimated fraction who died, and 'Survival Standard Error' is the estimated standard error of 'Survival' (this may be used to compute confidence limits ...). The 'Number Failed' is the number who have died at any point of followup, and 'Number Left' is the number who have not failed *and* who have not been censored. Note that Number Left decreases with each observation, whether a censored observation or a 'Failure'. The printout summarizes the numbers of censored and uncensored observations, by the stratifying variable (gender): ---------------------------------------------------------------------------------- Summary of the Number of Censored and Uncensored Values Percent Stratum SEX Total Failed Censored Censored 1 F 263 7 256 97.34 2 M 221 16 205 92.76 --------------------------------------------------------------- Total 484 23 461 95.25 ---------------------------------------------------------------------------------- The printout also performs tests of whether the estimated survival curves for the strata specified by the stratifying variable are significantly different; these are summarized in the table called 'Test of Equality over Strata'. In this case, all three of the tests (Log-Rank, Wilcoxon, and -2Log(LR)) have p-values of about .02, indicating that the survival distributions for men and women are significantly different. In this example, an output data set from PROC LIFETEST was specified, in the line: proc lifetest data = heart outsurv = surcurve ; This output data set has a number of variables on it that are computed by PROC LIFETEST. Of particular interest is the SURVIVAL variable. The output data set was read by another data step and the censored observations were deleted. The PROC PRINT output shows the variables and data on the resulting file (which has only 23 observations). Finally, the OUTSURVE data set was used to plot the variable SURVIVAL versus followup time, by GENDER. This was done by the use of PROC PLOT. There is rather strong evidence in this plot that the survival distributions for men and women are different, as indicated by the tests described above. ================================================================================= n54703.014 Last update: April 6, 2004.