SURVIVAL ANALYSIS, IV: PROC PHREG USED FOR MATCHED STUDIES n54703.017 This is actually not a survival analysis topic; rather it is about matched case-control studies. It generalizes the method in which PROC LOGISTIC was used to analyze pair-matched studies. It uses the survival analysis procedure PROC PHREG, but not in the way in which it was originally designed to be used. In a way, PROC PHREG is tricked into carrying out a correct analysis of matched studies. Assume that cases and matched controls occur together on one file. There is an indicator variable called 'case': it has value 1 if the person is a case and value 0 if he/she is a control. There is another variable called 'set', denoting to which case-control set the person belongs. That is, all the people who have, say, 'set = 15' all belong to case-control set number 15. In general you expect there is at least one case in each set and a variable number of controls. Assume there are various risk factors on the file - e.g., X1 = smoking (1 = yes, 0 = no), X2 = blood pressure, etc.. Here is the syntax for using PROC PHREG to analyze such data. proc phreg data = datafile ; model time * case(0) = X1 X2 / ties = discrete ; strata set ; Note that the 'strata' variable must be equal to the case-control set indicator. The 'time' variable must be defined in a surprising way, so that 'time' has a lower value for all the cases than it does for all the controls. This is the part where PROC PHREG is being tricked into analyzing a case-control study. One must regard 'control' observations as 'censored', and 'case' observations as uncensored or as 'events' with an "event-time" which is earlier than the censoring time of the controls. Thus in the DATA step preceding the PROC PHREG, one would define time = 1 - case, provided, as above, case = 1 indicates that the person has the condition (disease) of interest, and case = 0 indicates that the person is a control. Thus: if the person is a CASE, time = 0 if the person is a CONTROL, time = 1. Finally, note that the 'ties = discrete' option on the MODEL statement is needed: essentially this causes only discordant pairs to count in the analysis. Below is an example of how this actually works with data from a case- control study of endometrial cancer [Mack TM, Pike MC, Henderson BE et al. (1976) Estrogens and endometrial cancer in a retirement community. NEJM 294; 1262-1267.]. In this study, cases and controls were matched approximately on age. They were not matched on other variables of interest. There were 63 cases and 4 * 63 = 252 controls (total observations, 315). The following is documentation of the datafile: ================================================================================= DOCUMENTATION FOR ENDOMETRIAL CANCER DATA This file contains 315 records with data on cases and controls from the Leisure World study of endometrial cancer as related to treatment with estrogens for menopausal syptoms and other risk factors. See the article by Mack et al in NEJM 294:1262-1267, 1976 for a full description. The variables are as follows: Number Name Description Codes/Range ------------------------------------------------------------------------- 1 SET Matched set indicator 1-63 2 CASE Case-control indicator 0 = Control, 1 = Case 3 AGE Age in years 55-83 4 GALL Gallbladder disease 0 = No, 1 = Yes 5 HYP Hypertension 0 = No, 1 = Yes 6 OB Obesity 0 = No, 1 = Yes; 9 = Unknown 5 EST Estrogen usage 0 = No, 1 = Yes 5 DOSE Dose of conjugated 0 = 0 1 = 0.3 2 = 0.301-0.624 3 = 0.625 4 = 0.626-1.249 5 = 1.25 6 = 1.26-2.50 9 = Unknown 6 DUR Duration of estrogen 0-95 use (months) 96=96+ 99=Unknown 9 NON Non-estrogen drug 0 = No, 1 = Yes ================================================================================= This documentation and the datafile itself may be downloaded from the Computer Programs and Datafiles web page. Here is a program with 3 PROC PHREG analyses of this dataset, and the accompanying printout: options linesize = 80 ; footnote "~john-c/5421/endometrial.sas &sysdate &systime" ; data endometr ; infile '/home/gnome/john-c/5421/endometrial.data' ; input set case age gallbd hyperten obesity estrogen edose edur nonestro ; time = 1 - case ; agegroup = . ; if age ge 55 and age le 64 then agegroup = 1 ; if age ge 65 and age le 74 then agegroup = 2 ; if age ge 75 then agegroup = 3 ; run ; proc print data = endometr ; where set le 5 ; title 'First 5 case-control sets on the Endometrial Cancer file ...' ; run ; proc phreg data = endometr ; model time * case(0) = estrogen / ties = discrete ; strata set ; title1 'Case-control status vs. estrogen only' ; run ; proc phreg data = endometr ; model time * case(0) = gallbd hyperten obesity / ties = discrete ; strata set ; title1 'Case-control status vs. gallbladder dis, hypertension, obesity' ; run ; proc phreg data = endometr ; model time * case(0) = gallbd hyperten obesity estrogen / ties = discrete ; strata set ; title1 '1:4 matched case-control study of endometrial cancer' ; title2 'Case-control status vs. gallbladder dis, hypertension, obesity' ; title3 'plus estrogen use ...' ; run ; ================================================================================ First 5 case-control sets on the Endometrial Cancer file ... 1 21:23 Tuesday, April 20, 2004 H E N A Y O S O G G P B T N E A E E R E E G C L R S O D E S T R O S A A L T I G O D T I O B E S G B E T E S U R M U S T E E D N Y N E R O E P 1 1 1 74 0 0 1 1 4 96 1 0 2 2 1 0 75 0 0 9 0 0 0 0 1 3 3 1 0 74 0 0 9 0 0 0 0 1 2 4 1 0 74 0 0 9 0 0 0 0 1 2 5 1 0 75 0 0 1 1 1 48 1 1 3 6 2 1 67 0 0 0 1 6 96 1 0 2 7 2 0 67 0 0 0 1 6 5 0 1 2 8 2 0 67 0 1 1 0 0 0 1 1 2 9 2 0 67 0 0 0 1 3 53 0 1 2 10 2 0 68 0 0 0 1 3 45 1 1 2 11 3 1 76 0 1 1 1 1 9 1 0 3 12 3 0 76 0 1 1 1 2 96 1 1 3 13 3 0 76 0 1 0 1 1 3 1 1 3 14 3 0 76 0 1 1 1 3 15 1 1 3 15 3 0 77 0 0 0 1 1 36 1 1 3 16 4 1 71 0 0 9 1 9 96 0 0 2 17 4 0 70 1 0 0 1 2 7 1 1 2 18 4 0 70 0 0 0 1 0 0 1 1 2 19 4 0 71 0 1 1 1 3 7 1 1 2 20 4 0 70 0 0 1 1 2 27 1 1 2 21 5 1 69 1 0 1 1 3 36 1 0 2 22 5 0 69 0 1 0 1 1 96 1 1 2 23 5 0 69 0 0 1 1 3 1 1 1 2 24 5 0 69 0 0 0 1 0 0 1 1 2 25 5 0 68 0 0 9 0 0 0 0 1 2 ~john-c/5421/endometrial.sas 20APR04 21:23 -------------------------------------------------------------------------------- Case-control status vs. estrogen only 2 21:23 Tuesday, April 20, 2004 The PHREG Procedure Data Set: WORK.ENDOMETR Dependent Variable: TIME Censoring Variable: CASE Censoring Value(s): 0 Ties Handling: BRESLOW Summary of the Number of Event and Censored Values Percent Stratum SET Total Event Censored Censored 1 1 5 1 4 80.00 2 2 5 1 4 80.00 3 3 5 1 4 80.00 [observations deleted ...] 61 61 5 1 4 80.00 62 62 5 1 4 80.00 63 63 5 1 4 80.00 ------------------------------------------------------------------- Total 315 63 252 80.00 Testing Global Null Hypothesis: BETA=0 Without With Criterion Covariates Covariates Model Chi-Square -2 LOG L 202.789 167.443 35.346 with 1 DF (p=0.0001) Score . . 31.156 with 1 DF (p=0.0001) Wald . . 24.284 with 1 DF (p=0.0001) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Risk Variable DF Estimate Error Chi-Square Chi-Square Ratio ESTROGEN 1 2.073761 0.42082 24.28371 0.0001 7.955 ~john-c/5421/endometrial.sas 20APR04 21:23 -------------------------------------------------------------------------------- Case-control status vs. gallbladder dis, hypertension, obesity 4 21:23 Tuesday, April 20, 2004 The PHREG Procedure Data Set: WORK.ENDOMETR Dependent Variable: TIME Censoring Variable: CASE Censoring Value(s): 0 Ties Handling: BRESLOW Summary of the Number of Event and Censored Values Percent Stratum SET Total Event Censored Censored 1 1 5 1 4 80.00 2 2 5 1 4 80.00 3 3 5 1 4 80.00 [observations deleted ...] 61 61 5 1 4 80.00 62 62 5 1 4 80.00 63 63 5 1 4 80.00 ------------------------------------------------------------------- Total 315 63 252 80.00 Testing Global Null Hypothesis: BETA=0 Without With Criterion Covariates Covariates Model Chi-Square -2 LOG L 202.789 188.428 14.361 with 3 DF (p=0.0025) Score . . 15.971 with 3 DF (p=0.0011) Wald . . 14.158 with 3 DF (p=0.0027) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Risk Variable DF Estimate Error Chi-Square Chi-Square Ratio GALLBD 1 1.258795 0.37817 11.07970 0.0009 3.521 HYPERTEN 1 0.345745 0.31468 1.20721 0.2719 1.413 OBESITY 1 -0.048959 0.05987 0.66881 0.4135 0.952 ~john-c/5421/endometrial.sas 20APR04 21:23 -------------------------------------------------------------------------------- 1:4 matched case-control study of endometrial cancer 6 Case-control status vs. gallbladder dis, hypertension, obesity plus estrogen use ... 21:23 Tuesday, April 20, 2004 The PHREG Procedure Data Set: WORK.ENDOMETR Dependent Variable: TIME Censoring Variable: CASE Censoring Value(s): 0 Ties Handling: BRESLOW Summary of the Number of Event and Censored Values Percent Stratum SET Total Event Censored Censored 1 1 5 1 4 80.00 2 2 5 1 4 80.00 3 3 5 1 4 80.00 [observations deleted ...] 61 61 5 1 4 80.00 62 62 5 1 4 80.00 63 63 5 1 4 80.00 ------------------------------------------------------------------- Total 315 63 252 80.00 Testing Global Null Hypothesis: BETA=0 Without With Criterion Covariates Covariates Model Chi-Square -2 LOG L 202.789 156.957 45.832 with 4 DF (p=0.0001) Score . . 40.442 with 4 DF (p=0.0001) Wald . . 30.135 with 4 DF (p=0.0001) ~john-c/5421/endometrial.sas 20APR04 21:23 -------------------------------------------------------------------------------- 1:4 matched case-control study of endometrial cancer 8 Case-control status vs. gallbladder dis, hypertension, obesity plus estrogen use ... 21:23 Tuesday, April 20, 2004 The PHREG Procedure Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Risk Variable DF Estimate Error Chi-Square Chi-Square Ratio GALLBD 1 1.301275 0.41601 9.78416 0.0018 3.674 HYPERTEN 1 0.000880 0.34773 6.40398E-6 0.9980 1.001 OBESITY 1 0.063646 0.07184 0.78486 0.3757 1.066 ESTROGEN 1 2.266769 0.48884 21.50206 0.0001 9.648 ~john-c/5421/endometrial.sas 20APR04 21:23 ================================================================================= Note the definition of the 'time' variable in the data step: time = 1 - case ; Note here that the first analysis examines only estrogen use (coded as 1 vs. 0 for yes vs. no). The second analysis includes other risk factors: gall-bladder disease, hypertension, and obesity. Note that gall-bladder disease appears to be a risk factor, with a 'Risk ratio' of 3.521. The third analysis includes the variables in the second analysis, plus estrogen. Estrogen here has a 'Risk ratio' of 9.648 and appears to be highly significant, even after controlling for the other risk factors. IN THIS ANALYSIS, the 'Risk Ratio' can be correctly interpreted as an odds ratio. Note also that, because cases and controls are closely matched on age, there is not much point in doing an analysis with age as one of the covariates: it is a known risk factor for endometrial cancer, but here it is essentially matched out and cannot be studied (and is not of much interest anyway). ================================================================================= PROBLEM 1: Use PROC PHREG to replicate the results obtained from PROC LOGISTIC in n54703.012 regarding endometrial cancer, gallbladder disease, and hypertension [matched pairs]. ================================================================================= n54703.017 Last update: May 8, 2004.