SURVIVAL ANALYSIS, IV: PROC PHREG USED FOR MATCHED STUDIES n54703.017
This is actually not a survival analysis topic; rather it is about
matched case-control studies. It generalizes the method in which
PROC LOGISTIC was used to analyze pair-matched studies. It uses the survival
analysis procedure PROC PHREG, but not in the way in which it was originally
designed to be used. In a way, PROC PHREG is tricked into carrying out a
correct analysis of matched studies.
Assume that cases and matched controls occur together on one file.
There is an indicator variable called 'case': it has value 1 if the person is
a case and value 0 if he/she is a control.
There is another variable called 'set', denoting to which case-control set
the person belongs. That is, all the people who have, say, 'set = 15' all
belong to case-control set number 15. In general you expect there is at least
one case in each set and a variable number of controls.
Assume there are various risk factors on the file - e.g., X1 = smoking (1 = yes,
0 = no), X2 = blood pressure, etc..
Here is the syntax for using PROC PHREG to analyze such data.
proc phreg data = datafile ;
model time * case(0) = X1 X2 / ties = discrete ;
strata set ;
Note that the 'strata' variable must be equal to the case-control set
indicator.
The 'time' variable must be defined in a surprising way, so that 'time'
has a lower value for all the cases than it does for all the controls. This is
the part where PROC PHREG is being tricked into analyzing a case-control
study. One must regard 'control' observations as 'censored', and 'case'
observations as uncensored or as 'events' with an "event-time" which is
earlier than the censoring time of the controls.
Thus in the DATA step preceding the PROC PHREG, one would define
time = 1 - case,
provided, as above, case = 1 indicates that the person has the condition (disease)
of interest, and case = 0 indicates that the person is a control. Thus:
if the person is a CASE, time = 0
if the person is a CONTROL, time = 1.
Finally, note that the 'ties = discrete' option on the MODEL
statement is needed: essentially this causes only discordant pairs
to count in the analysis.
Below is an example of how this actually works with data from a case-
control study of endometrial cancer [Mack TM, Pike MC, Henderson BE et al.
(1976) Estrogens and endometrial cancer in a retirement community. NEJM
294; 1262-1267.]. In this study, cases and controls were matched approximately
on age. They were not matched on other variables of interest. There were
63 cases and 4 * 63 = 252 controls (total observations, 315). The following
is documentation of the datafile:
=================================================================================
DOCUMENTATION FOR ENDOMETRIAL CANCER DATA
This file contains 315 records with data on cases and controls from the
Leisure World study of endometrial cancer as related to treatment with
estrogens for menopausal syptoms and other risk factors. See the article
by Mack et al in NEJM 294:1262-1267, 1976 for a full description.
The variables are as follows:
Number Name Description Codes/Range
-------------------------------------------------------------------------
1 SET Matched set indicator 1-63
2 CASE Case-control indicator 0 = Control, 1 = Case
3 AGE Age in years 55-83
4 GALL Gallbladder disease 0 = No, 1 = Yes
5 HYP Hypertension 0 = No, 1 = Yes
6 OB Obesity 0 = No, 1 = Yes; 9 = Unknown
5 EST Estrogen usage 0 = No, 1 = Yes
5 DOSE Dose of conjugated 0 = 0
1 = 0.3
2 = 0.301-0.624
3 = 0.625
4 = 0.626-1.249
5 = 1.25
6 = 1.26-2.50
9 = Unknown
6 DUR Duration of estrogen 0-95
use (months) 96=96+
99=Unknown
9 NON Non-estrogen drug 0 = No, 1 = Yes
=================================================================================
This documentation and the datafile itself may be downloaded from the
Computer Programs and Datafiles web page.
Here is a program with 3 PROC PHREG analyses of this dataset,
and the accompanying printout:
options linesize = 80 ;
footnote "~john-c/5421/endometrial.sas &sysdate &systime" ;
data endometr ;
infile '/home/gnome/john-c/5421/endometrial.data' ;
input set case age gallbd hyperten obesity estrogen edose
edur nonestro ;
time = 1 - case ;
agegroup = . ;
if age ge 55 and age le 64 then agegroup = 1 ;
if age ge 65 and age le 74 then agegroup = 2 ;
if age ge 75 then agegroup = 3 ;
run ;
proc print data = endometr ;
where set le 5 ;
title 'First 5 case-control sets on the Endometrial Cancer file ...' ;
run ;
proc phreg data = endometr ;
model time * case(0) = estrogen / ties = discrete ;
strata set ;
title1 'Case-control status vs. estrogen only' ;
run ;
proc phreg data = endometr ;
model time * case(0) = gallbd hyperten obesity / ties = discrete ;
strata set ;
title1 'Case-control status vs. gallbladder dis, hypertension, obesity' ;
run ;
proc phreg data = endometr ;
model time * case(0) = gallbd hyperten obesity estrogen /
ties = discrete ;
strata set ;
title1 '1:4 matched case-control study of endometrial cancer' ;
title2 'Case-control status vs. gallbladder dis, hypertension, obesity' ;
title3 'plus estrogen use ...' ;
run ;
================================================================================
First 5 case-control sets on the Endometrial Cancer file ... 1
21:23 Tuesday, April 20, 2004
H E N A
Y O S O G
G P B T N E
A E E R E E G
C L R S O D E S T R
O S A A L T I G O D T I O
B E S G B E T E S U R M U
S T E E D N Y N E R O E P
1 1 1 74 0 0 1 1 4 96 1 0 2
2 1 0 75 0 0 9 0 0 0 0 1 3
3 1 0 74 0 0 9 0 0 0 0 1 2
4 1 0 74 0 0 9 0 0 0 0 1 2
5 1 0 75 0 0 1 1 1 48 1 1 3
6 2 1 67 0 0 0 1 6 96 1 0 2
7 2 0 67 0 0 0 1 6 5 0 1 2
8 2 0 67 0 1 1 0 0 0 1 1 2
9 2 0 67 0 0 0 1 3 53 0 1 2
10 2 0 68 0 0 0 1 3 45 1 1 2
11 3 1 76 0 1 1 1 1 9 1 0 3
12 3 0 76 0 1 1 1 2 96 1 1 3
13 3 0 76 0 1 0 1 1 3 1 1 3
14 3 0 76 0 1 1 1 3 15 1 1 3
15 3 0 77 0 0 0 1 1 36 1 1 3
16 4 1 71 0 0 9 1 9 96 0 0 2
17 4 0 70 1 0 0 1 2 7 1 1 2
18 4 0 70 0 0 0 1 0 0 1 1 2
19 4 0 71 0 1 1 1 3 7 1 1 2
20 4 0 70 0 0 1 1 2 27 1 1 2
21 5 1 69 1 0 1 1 3 36 1 0 2
22 5 0 69 0 1 0 1 1 96 1 1 2
23 5 0 69 0 0 1 1 3 1 1 1 2
24 5 0 69 0 0 0 1 0 0 1 1 2
25 5 0 68 0 0 9 0 0 0 0 1 2
~john-c/5421/endometrial.sas 20APR04 21:23
--------------------------------------------------------------------------------
Case-control status vs. estrogen only 2
21:23 Tuesday, April 20, 2004
The PHREG Procedure
Data Set: WORK.ENDOMETR
Dependent Variable: TIME
Censoring Variable: CASE
Censoring Value(s): 0
Ties Handling: BRESLOW
Summary of the Number of Event and Censored Values
Percent
Stratum SET Total Event Censored Censored
1 1 5 1 4 80.00
2 2 5 1 4 80.00
3 3 5 1 4 80.00
[observations deleted ...]
61 61 5 1 4 80.00
62 62 5 1 4 80.00
63 63 5 1 4 80.00
-------------------------------------------------------------------
Total 315 63 252 80.00
Testing Global Null Hypothesis: BETA=0
Without With
Criterion Covariates Covariates Model Chi-Square
-2 LOG L 202.789 167.443 35.346 with 1 DF (p=0.0001)
Score . . 31.156 with 1 DF (p=0.0001)
Wald . . 24.284 with 1 DF (p=0.0001)
Analysis of Maximum Likelihood Estimates
Parameter Standard Wald Pr > Risk
Variable DF Estimate Error Chi-Square Chi-Square Ratio
ESTROGEN 1 2.073761 0.42082 24.28371 0.0001 7.955
~john-c/5421/endometrial.sas 20APR04 21:23
--------------------------------------------------------------------------------
Case-control status vs. gallbladder dis, hypertension, obesity 4
21:23 Tuesday, April 20, 2004
The PHREG Procedure
Data Set: WORK.ENDOMETR
Dependent Variable: TIME
Censoring Variable: CASE
Censoring Value(s): 0
Ties Handling: BRESLOW
Summary of the Number of Event and Censored Values
Percent
Stratum SET Total Event Censored Censored
1 1 5 1 4 80.00
2 2 5 1 4 80.00
3 3 5 1 4 80.00
[observations deleted ...]
61 61 5 1 4 80.00
62 62 5 1 4 80.00
63 63 5 1 4 80.00
-------------------------------------------------------------------
Total 315 63 252 80.00
Testing Global Null Hypothesis: BETA=0
Without With
Criterion Covariates Covariates Model Chi-Square
-2 LOG L 202.789 188.428 14.361 with 3 DF (p=0.0025)
Score . . 15.971 with 3 DF (p=0.0011)
Wald . . 14.158 with 3 DF (p=0.0027)
Analysis of Maximum Likelihood Estimates
Parameter Standard Wald Pr > Risk
Variable DF Estimate Error Chi-Square Chi-Square Ratio
GALLBD 1 1.258795 0.37817 11.07970 0.0009 3.521
HYPERTEN 1 0.345745 0.31468 1.20721 0.2719 1.413
OBESITY 1 -0.048959 0.05987 0.66881 0.4135 0.952
~john-c/5421/endometrial.sas 20APR04 21:23
--------------------------------------------------------------------------------
1:4 matched case-control study of endometrial cancer 6
Case-control status vs. gallbladder dis, hypertension, obesity
plus estrogen use ...
21:23 Tuesday, April 20, 2004
The PHREG Procedure
Data Set: WORK.ENDOMETR
Dependent Variable: TIME
Censoring Variable: CASE
Censoring Value(s): 0
Ties Handling: BRESLOW
Summary of the Number of Event and Censored Values
Percent
Stratum SET Total Event Censored Censored
1 1 5 1 4 80.00
2 2 5 1 4 80.00
3 3 5 1 4 80.00
[observations deleted ...]
61 61 5 1 4 80.00
62 62 5 1 4 80.00
63 63 5 1 4 80.00
-------------------------------------------------------------------
Total 315 63 252 80.00
Testing Global Null Hypothesis: BETA=0
Without With
Criterion Covariates Covariates Model Chi-Square
-2 LOG L 202.789 156.957 45.832 with 4 DF (p=0.0001)
Score . . 40.442 with 4 DF (p=0.0001)
Wald . . 30.135 with 4 DF (p=0.0001)
~john-c/5421/endometrial.sas 20APR04 21:23
--------------------------------------------------------------------------------
1:4 matched case-control study of endometrial cancer 8
Case-control status vs. gallbladder dis, hypertension, obesity
plus estrogen use ...
21:23 Tuesday, April 20, 2004
The PHREG Procedure
Analysis of Maximum Likelihood Estimates
Parameter Standard Wald Pr > Risk
Variable DF Estimate Error Chi-Square Chi-Square Ratio
GALLBD 1 1.301275 0.41601 9.78416 0.0018 3.674
HYPERTEN 1 0.000880 0.34773 6.40398E-6 0.9980 1.001
OBESITY 1 0.063646 0.07184 0.78486 0.3757 1.066
ESTROGEN 1 2.266769 0.48884 21.50206 0.0001 9.648
~john-c/5421/endometrial.sas 20APR04 21:23
=================================================================================
Note the definition of the 'time' variable in the data step:
time = 1 - case ;
Note here that the first analysis examines only estrogen use (coded as 1
vs. 0 for yes vs. no).
The second analysis includes other risk factors: gall-bladder disease,
hypertension, and obesity. Note that gall-bladder disease appears to be
a risk factor, with a 'Risk ratio' of 3.521.
The third analysis includes the variables in the second analysis, plus
estrogen. Estrogen here has a 'Risk ratio' of 9.648 and appears to be highly
significant, even after controlling for the other risk factors.
IN THIS ANALYSIS, the 'Risk Ratio' can be correctly interpreted as an
odds ratio.
Note also that, because cases and controls are closely matched on age,
there is not much point in doing an analysis with age as one of the
covariates: it is a known risk factor for endometrial cancer, but here it
is essentially matched out and cannot be studied (and is not of much interest
anyway).
=================================================================================
PROBLEM 1:
Use PROC PHREG to replicate the results obtained from PROC LOGISTIC in n54703.012
regarding endometrial cancer, gallbladder disease, and hypertension
[matched pairs].
=================================================================================
n54703.017 Last update: May 8, 2004.