SURVIVAL ANALYSIS, IV: PROC PHREG USED FOR MATCHED STUDIES       n54703.017

     This is actually not a survival analysis topic; rather it is about
matched case-control studies.  It generalizes the method in which
PROC LOGISTIC was used to analyze pair-matched studies.  It uses the survival
analysis procedure PROC PHREG, but not in the way in which it was originally
designed to be used.  In a way, PROC PHREG is tricked into carrying out a
correct analysis of matched studies.

     Assume that cases and matched controls occur together on one file.
There is an indicator variable called 'case': it has value 1 if the person is
a case and value 0 if he/she is a control.

     There is another variable called 'set', denoting to which case-control set
the person belongs.  That is, all the people who have, say, 'set = 15' all
belong to case-control set number 15.  In general you expect there is at least
one case in each set and a variable number of controls.

     Assume there are various risk factors on the file - e.g., X1 = smoking (1 = yes,
0 = no), X2 = blood pressure, etc..

     Here is the syntax for using PROC PHREG to analyze such data.

       proc phreg data = datafile ;
            model time * case(0) = X1 X2 / ties = discrete ;
            strata set ;

     Note that the 'strata' variable must be equal to the case-control set
indicator.

     The 'time' variable must be defined in a surprising way, so that 'time'
has a lower value for all the cases than it does for all the controls.  This is
the part where PROC PHREG is being tricked into analyzing a case-control
study.  One must regard 'control' observations as 'censored', and 'case'
observations as uncensored or as 'events' with an "event-time" which is
earlier than the censoring time of the controls.

     Thus in the DATA step preceding the PROC PHREG, one would define

          time = 1 - case,

provided, as above, case = 1 indicates that the person has the condition (disease)
of interest, and case = 0 indicates that the person is a control.  Thus:

          if the person is a CASE,    time = 0

          if the person is a CONTROL, time = 1.

     Finally, note that the 'ties = discrete' option on the MODEL
statement is needed: essentially this causes only discordant pairs
to count in the analysis.

     Below is an example of how this actually works with data from a case-
control study of endometrial cancer [Mack TM, Pike MC, Henderson BE et al.
(1976) Estrogens and endometrial cancer in a retirement community.  NEJM
294; 1262-1267.].  In this study, cases and controls were matched approximately
on age.  They were not matched on other variables of interest.  There were
63 cases and 4 * 63 = 252 controls (total observations, 315). The following
is documentation of the datafile:

=================================================================================

DOCUMENTATION FOR ENDOMETRIAL CANCER DATA

This file contains 315 records with data on cases and controls from the
Leisure World study of endometrial cancer as related to treatment with
estrogens for menopausal syptoms and other risk factors.  See the article
by Mack et al in NEJM 294:1262-1267, 1976 for a full description.

The variables are as follows:

Number  Name        Description             Codes/Range
-------------------------------------------------------------------------
   1    SET         Matched set indicator   1-63

   2    CASE        Case-control indicator  0 = Control, 1 = Case 

   3    AGE         Age in years            55-83

   4    GALL        Gallbladder disease     0 = No, 1 = Yes

   5    HYP         Hypertension            0 = No, 1 = Yes 

   6    OB          Obesity                 0 = No, 1 = Yes; 9 = Unknown

   5    EST         Estrogen usage          0 = No, 1 = Yes

   5    DOSE        Dose of conjugated      0 = 0
                                            1 = 0.3
                                            2 = 0.301-0.624
                                            3 = 0.625
                                            4 = 0.626-1.249
                                            5 = 1.25
                                            6 = 1.26-2.50
                                            9 = Unknown

   6    DUR         Duration of estrogen    0-95
                    use (months)            96=96+
                                            99=Unknown

   9    NON         Non-estrogen drug       0 = No, 1 = Yes

=================================================================================

     This documentation and the datafile itself may be downloaded from the
Computer Programs and Datafiles web page.

     Here is a program with 3 PROC PHREG analyses of this dataset,
and the accompanying printout:


options linesize = 80 ;
footnote "~john-c/5421/endometrial.sas &sysdate &systime" ;

data endometr ;
     infile '/home/gnome/john-c/5421/endometrial.data' ;
     input  set   case   age   gallbd   hyperten   obesity  estrogen  edose
            edur  nonestro ;
     time = 1 - case ;

     agegroup = . ;
     if age ge 55 and age le 64 then agegroup = 1 ;
     if age ge 65 and age le 74 then agegroup = 2 ;
     if age ge 75               then agegroup = 3 ;

run ;

proc print data = endometr ;
     where set le 5 ;
title 'First 5 case-control sets on the Endometrial Cancer file ...' ;
run ;

proc phreg data = endometr ;
     model time * case(0) = estrogen / ties = discrete ;
     strata set ;
title1 'Case-control status vs. estrogen only' ;
run ;

proc phreg data = endometr ;
     model time * case(0) = gallbd hyperten obesity / ties = discrete ;
     strata set ;
title1 'Case-control status vs. gallbladder dis, hypertension, obesity' ;
run ;

proc phreg data = endometr ;
     model time * case(0) = gallbd hyperten obesity estrogen /
                            ties = discrete ;
     strata set ;
title1 '1:4 matched case-control study of endometrial cancer' ;
title2 'Case-control status vs. gallbladder dis, hypertension, obesity' ;
title3 'plus estrogen use ...' ;
run ;

================================================================================
          First 5 case-control sets on the Endometrial Cancer file ...         1
                                                   21:23 Tuesday, April 20, 2004

                                   H         E               N         A
                                   Y    O    S               O         G
                              G    P    B    T               N         E
                              A    E    E    R    E          E         G
                   C          L    R    S    O    D     E    S    T    R
        O     S    A     A    L    T    I    G    O     D    T    I    O
        B     E    S     G    B    E    T    E    S     U    R    M    U
        S     T    E     E    D    N    Y    N    E     R    O    E    P

         1    1    1    74    0    0    1    1    4    96    1    0    2
         2    1    0    75    0    0    9    0    0     0    0    1    3
         3    1    0    74    0    0    9    0    0     0    0    1    2
         4    1    0    74    0    0    9    0    0     0    0    1    2
         5    1    0    75    0    0    1    1    1    48    1    1    3
         6    2    1    67    0    0    0    1    6    96    1    0    2
         7    2    0    67    0    0    0    1    6     5    0    1    2
         8    2    0    67    0    1    1    0    0     0    1    1    2
         9    2    0    67    0    0    0    1    3    53    0    1    2
        10    2    0    68    0    0    0    1    3    45    1    1    2
        11    3    1    76    0    1    1    1    1     9    1    0    3
        12    3    0    76    0    1    1    1    2    96    1    1    3
        13    3    0    76    0    1    0    1    1     3    1    1    3
        14    3    0    76    0    1    1    1    3    15    1    1    3
        15    3    0    77    0    0    0    1    1    36    1    1    3
        16    4    1    71    0    0    9    1    9    96    0    0    2
        17    4    0    70    1    0    0    1    2     7    1    1    2
        18    4    0    70    0    0    0    1    0     0    1    1    2
        19    4    0    71    0    1    1    1    3     7    1    1    2
        20    4    0    70    0    0    1    1    2    27    1    1    2
        21    5    1    69    1    0    1    1    3    36    1    0    2
        22    5    0    69    0    1    0    1    1    96    1    1    2
        23    5    0    69    0    0    1    1    3     1    1    1    2
        24    5    0    69    0    0    0    1    0     0    1    1    2
        25    5    0    68    0    0    9    0    0     0    0    1    2
 
 
 
                     ~john-c/5421/endometrial.sas 20APR04 21:23
--------------------------------------------------------------------------------
                      Case-control status vs. estrogen only                     2
                                                   21:23 Tuesday, April 20, 2004

                              The PHREG Procedure

     Data Set: WORK.ENDOMETR
     Dependent Variable: TIME    
     Censoring Variable: CASE    
     Censoring Value(s): 0  
     Ties Handling: BRESLOW 


              Summary of the Number of Event and Censored Values
 
                                                                  Percent
      Stratum    SET            Total       Event    Censored    Censored

            1    1                  5           1           4       80.00
            2    2                  5           1           4       80.00
            3    3                  5           1           4       80.00

          [observations deleted ...]

           61    61                 5           1           4       80.00
           62    62                 5           1           4       80.00
           63    63                 5           1           4       80.00
      -------------------------------------------------------------------
        Total                     315          63         252       80.00


                     Testing Global Null Hypothesis: BETA=0
 
                   Without        With   
    Criterion    Covariates    Covariates    Model Chi-Square

    -2 LOG L        202.789       167.443      35.346 with 1 DF (p=0.0001)  
    Score              .             .         31.156 with 1 DF (p=0.0001)  
    Wald               .             .         24.284 with 1 DF (p=0.0001)  


                    Analysis of Maximum Likelihood Estimates
 
                    Parameter     Standard      Wald         Pr >          Risk
 Variable   DF       Estimate       Error    Chi-Square   Chi-Square      Ratio

 ESTROGEN    1       2.073761      0.42082     24.28371       0.0001      7.955
 
 
 
 
 
 
 
                     ~john-c/5421/endometrial.sas 20APR04 21:23
--------------------------------------------------------------------------------
          Case-control status vs. gallbladder dis, hypertension, obesity        4
                                                   21:23 Tuesday, April 20, 2004

                              The PHREG Procedure

     Data Set: WORK.ENDOMETR
     Dependent Variable: TIME    
     Censoring Variable: CASE    
     Censoring Value(s): 0  
     Ties Handling: BRESLOW 


              Summary of the Number of Event and Censored Values
 
                                                                  Percent
      Stratum    SET            Total       Event    Censored    Censored

            1    1                  5           1           4       80.00
            2    2                  5           1           4       80.00
            3    3                  5           1           4       80.00

           [observations deleted ...]

           61    61                 5           1           4       80.00
           62    62                 5           1           4       80.00
           63    63                 5           1           4       80.00
      -------------------------------------------------------------------
        Total                     315          63         252       80.00


                     Testing Global Null Hypothesis: BETA=0
 
                   Without        With   
    Criterion    Covariates    Covariates    Model Chi-Square

    -2 LOG L        202.789       188.428      14.361 with 3 DF (p=0.0025)  
    Score              .             .         15.971 with 3 DF (p=0.0011)  
    Wald               .             .         14.158 with 3 DF (p=0.0027)  


                    Analysis of Maximum Likelihood Estimates
 
                    Parameter     Standard      Wald         Pr >          Risk
 Variable   DF       Estimate       Error    Chi-Square   Chi-Square      Ratio

 GALLBD      1       1.258795      0.37817     11.07970       0.0009      3.521
 HYPERTEN    1       0.345745      0.31468      1.20721       0.2719      1.413
 OBESITY     1      -0.048959      0.05987      0.66881       0.4135      0.952
 
 
                     ~john-c/5421/endometrial.sas 20APR04 21:23
--------------------------------------------------------------------------------
               1:4 matched case-control study of endometrial cancer             6
         Case-control status vs. gallbladder dis, hypertension, obesity
                             plus estrogen use ...
                                                   21:23 Tuesday, April 20, 2004

                              The PHREG Procedure

     Data Set: WORK.ENDOMETR
     Dependent Variable: TIME    
     Censoring Variable: CASE    
     Censoring Value(s): 0  
     Ties Handling: BRESLOW 


              Summary of the Number of Event and Censored Values
 
                                                                  Percent
      Stratum    SET            Total       Event    Censored    Censored

            1    1                  5           1           4       80.00
            2    2                  5           1           4       80.00
            3    3                  5           1           4       80.00

           [observations deleted ...]

           61    61                 5           1           4       80.00
           62    62                 5           1           4       80.00
           63    63                 5           1           4       80.00
      -------------------------------------------------------------------
        Total                     315          63         252       80.00


                     Testing Global Null Hypothesis: BETA=0
 
                   Without        With   
    Criterion    Covariates    Covariates    Model Chi-Square

    -2 LOG L        202.789       156.957      45.832 with 4 DF (p=0.0001)  
    Score              .             .         40.442 with 4 DF (p=0.0001)  
    Wald               .             .         30.135 with 4 DF (p=0.0001)  

 
 
 
                     ~john-c/5421/endometrial.sas 20APR04 21:23
--------------------------------------------------------------------------------
               1:4 matched case-control study of endometrial cancer             8
         Case-control status vs. gallbladder dis, hypertension, obesity
                             plus estrogen use ...
                                                   21:23 Tuesday, April 20, 2004

                              The PHREG Procedure

                    Analysis of Maximum Likelihood Estimates
 
                    Parameter     Standard      Wald         Pr >          Risk
 Variable   DF       Estimate       Error    Chi-Square   Chi-Square      Ratio

 GALLBD      1       1.301275      0.41601      9.78416       0.0018      3.674
 HYPERTEN    1       0.000880      0.34773   6.40398E-6       0.9980      1.001
 OBESITY     1       0.063646      0.07184      0.78486       0.3757      1.066
 ESTROGEN    1       2.266769      0.48884     21.50206       0.0001      9.648
 
 
                     ~john-c/5421/endometrial.sas 20APR04 21:23
=================================================================================

     Note the definition of the 'time' variable in the data step:

       time = 1 - case ;

     Note here that the first analysis examines only estrogen use (coded as 1
vs. 0 for yes vs. no).

     The second analysis includes other risk factors: gall-bladder disease,
hypertension, and obesity.  Note that gall-bladder disease appears to be
a risk factor, with a 'Risk ratio' of 3.521.

     The third analysis includes the variables in the second analysis, plus
estrogen.  Estrogen here has a 'Risk ratio' of 9.648 and appears to be highly
significant, even after controlling for the other risk factors.

     IN THIS ANALYSIS, the 'Risk Ratio' can be correctly interpreted as an
odds ratio.

     Note also that, because cases and controls are closely matched on age,
there is not much point in doing an analysis with age as one of the
covariates: it is a known risk factor for endometrial cancer, but here it
is essentially matched out and cannot be studied (and is not of much interest
anyway).

=================================================================================

PROBLEM 1:

   Use PROC PHREG to replicate the results obtained from PROC LOGISTIC in n54703.012 
regarding endometrial cancer, gallbladder disease, and hypertension
[matched pairs].

=================================================================================
n54703.017  Last update: May 8, 2004.