PROC LOGISTIC, IV: Matched Pairs                                 n54703.012

     The structure of matched-pairs studies is the following.  Two experimental
units are paired together.  In general they are paired because they have some key
characteristics in common.  This provides automatic control for factors that are not
of interest to the investigator.

     A typical study might involve identical twins.  They have all their genetic
factors in common.  One twin might be randomized to a blood-pressure-lowering drug and
the other to a placebo.  They are given their assigned treatments for 5 years.  The
endpoint of the study might be stroke.

     The simplest analysis of such a study can be done using the McNemar statistic.  
It yields an odds ratio and 95% CI for the OR, and a test of significance.  As noted
earlier in the course, this can be carried out using PROC FREQ.

     However, even identical twins are not identical in all respects.  They might live
in different cities and have different sources of medical care. They might have
different dietary intake.  The McNemar analysis does not provide any way to control
for factors other than those directly involved in the matching.

     PROC LOGISTIC can be used to analyze pair-matched studies, and it does allow for
inclusion of risk factors other than the main factor of interest. The syntax and
interpretation are both different from the usual usage for PROC LOGISTIC.

     Below is an example.  The data are 63 matched pairs from the Los Angeles Study of
Endometrial Cancer [See: Breslow and Day, Statistical Methods in Cancer Research,
1980.].  This is a case-control study.  Each pair is comprised of a case and a
control.  The cases are people who have endometrial cancer (OUTCOME = 1) and the
controls do not have endometrial cancer (OUTCOME = 0).  The ID is the same for each
case and her matched control.

     There are two risk factors of interest: gall-bladder disease and hypertension.  
Gall-bladder disease is indicated by the variable GALL: GALL = 1 means the person has
gall-bladder disease, and GALL = 0 means she does not.  Similarly HYPER = 1 means the
person has hypertension, HYPER = 0 means she does not.

     The variables recorded for each person are: ID, OUTCOME, GALL, and HYPER. There
are two observations per line of the data file: one for the case, and the other for
the control.  For example, the 8th line of the file is:

      8     1     1     1     8     0       0      1

     This means that in pair #8, the CASE has OUTCOME = 1, GALL = 1, and HYPER = 1.  
The CONTROL has OUTCOME = 0, GALL = 0, and HYPER = 1.

     Here is how you can analyze this using PROC LOGISTIC.  First, for each risk
factor of interest, compute the difference between the case's value for that factor
and the control's value.  For example, compute

     GALLDIFF = GALLCASE - GALLCONT ;

     HYPEDIFF = HYPECASE - HYPECONT ;

     A strange feature of this kind of PROC LOGISTIC analysis is that the outcome
variable is set equal to a *constant*.  Another feature is that the NOINT option is
used on the MODEL statement.  Here is the program:


==================================================================================
     options linesize = 80 ;
     footnote "~john-c/5421/gallhype &sysdate &systime" ;

     data gallhype ;
     input idcase outcome1 gallcase hypecase idcont outcome2 gallcont hypecont ;
     galldiff = gallcase - gallcont;
     hypediff = hypecase - hypecont;

     cards;
      1     1     0     0     1     0       0      0
      2     1     0     0     2     0       0      0
      3     1     0     1     3     0       0      1
      4     1     0     0     4     0       1      0
      5     1     1     0     5     0       0      1
      6     1     0     1     6     0       0      0
      7     1     1     0     7     0       0      0
      8     1     1     1     8     0       0      1
      9     1     0     0     9     0       0      0
     10     1     0     0    10     0       0      0
     11     1     1     0    11     0       0      0
     12     1     0     0    12     0       0      1
     13     1     1     0    13     0       0      1
     14     1     1     0    14     0       1      0
     15     1     1     0    15     0       0      1
     16     1     0     1    16     0       0      0
     17     1     0     0    17     0       1      1
     18     1     0     0    18     0       1      1
     19     1     0     0    19     0       0      1
     20     1     0     1    20     0       0      0
     21     1     0     0    21     0       1      1
     22     1     0     1    22     0       0      1
     23     1     0     1    23     0       0      0
     24     1     0     0    24     0       0      0
     25     1     0     0    25     0       0      0
     26     1     0     0    26     0       0      1
     27     1     1     0    27     0       0      1
     28     1     0     0    28     0       0      1
     29     1     1     0    29     0       0      0
     30     1     0     1    30     0       0      0
     31     1     0     1    31     0       0      0
     32     1     0     1    32     0       0      0
     33     1     0     1    33     0       0      0
     34     1     0     0    34     0       0      0
     35     1     1     1    35     0       1      1
     36     1     0     0    36     0       0      1
     37     1     0     1    37     0       0      0
     38     1     0     1    38     0       0      1
     39     1     0     1    39     0       0      1
     40     1     0     1    40     0       0      0
     41     1     0     0    41     0       0      0
     42     1     0     1    42     0       1      0
     43     1     0     0    43     0       0      1
     44     1     0     0    44     0       0      0
     45     1     1     0    45     0       0      0
     46     1     0     0    46     0       0      0
     47     1     1     1    47     0       0      0
     48     1     0     1    48     0       0      0
     49     1     0     0    49     0       0      0
     50     1     0     1    50     0       0      1
     51     1     0     0    51     0       0      0
     52     1     0     1    52     0       0      1
     53     1     0     1    53     0       0      0
     54     1     0     1    54     0       0      0
     55     1     1     0    55     0       0      0
     56     1     0     0    56     0       0      0
     57     1     1     1    57     0       1      0
     58     1     0     0    58     0       0      0
     59     1     0     0    59     0       0      0
     60     1     1     1    60     0       0      0
     61     1     1     0    61     0       1      0
     62     1     0     1    62     0       0      0
     63     1     1     0    63     0       0      0
     ;
     run;

     proc logistic data = gallhype ;
       model outcome1 = galldiff / noint ;
     title1 'Endometrial cancer case-control study: effect of gall bladder dis.' ;
     run ;

     proc logistic data = gallhype ;
       model outcome1 = galldiff hypediff / noint ;
     title1 'Endometrial cancer case-control study:' ;
     title2 'Effect of gall bladder disease and hypertension.' ;
     run ;

=================================================================================
       Endometrial cancer case-control study: effect of gall bladder dis.      1
                                                  19:34 Saturday, March 27, 2004

                             The LOGISTIC Procedure

     Data Set: WORK.GALLHYPE
     Response Variable: OUTCOME1  
     Response Levels: 1
     Number of Observations: 63
     Link Function: Logit


                                Response Profile
 
                          Ordered
                            Value  OUTCOME1     Count

                                1         1        63



      Model Fitting Information and Testing Global Null Hypothesis BETA=0
 
                  Without        With   
   Criterion    Covariates    Covariates    Chi-Square for Covariates

   AIC              87.337        85.654         .                          
   SC               87.337        87.797         .                          
   -2 LOG L         87.337        83.654        3.683 with 1 DF (p=0.0550)  
   Score              .             .           3.556 with 1 DF (p=0.0593)  


                    Analysis of Maximum Likelihood Estimates
 
               Parameter Standard    Wald       Pr >    Standardized     Odds
   Variable DF  Estimate   Error  Chi-Square Chi-Square   Estimate      Ratio

   GALLDIFF 1     0.9555   0.5262     3.2970     0.0694     0.275723    2.600


NOTE: Since there is only one response level, measures of association between 
      the observed and predicted values were not calculated.
 
 
 
                    ~john-c/5421/gallhype.sas 27MAR04 19:34
=================================================================================
                      Endometrial cancer case-control study:                    2
                Effect of gall bladder disease and hypertension.
                                                  19:34 Saturday, March 27, 2004

                             The LOGISTIC Procedure

     Data Set: WORK.GALLHYPE
     Response Variable: OUTCOME1  
     Response Levels: 1
     Number of Observations: 63
     Link Function: Logit


                                Response Profile
 
                          Ordered
                            Value  OUTCOME1     Count

                                1         1        63



      Model Fitting Information and Testing Global Null Hypothesis BETA=0
 
                  Without        With   
   Criterion    Covariates    Covariates    Chi-Square for Covariates

   AIC              87.337        86.788         .                          
   SC               87.337        91.074         .                          
   -2 LOG L         87.337        82.788        4.549 with 2 DF (p=0.1029)  
   Score              .             .           4.362 with 2 DF (p=0.1129)  


                    Analysis of Maximum Likelihood Estimates
 
               Parameter Standard    Wald       Pr >    Standardized     Odds
   Variable DF  Estimate   Error  Chi-Square Chi-Square   Estimate      Ratio

   GALLDIFF 1     0.9704   0.5307     3.3432     0.0675     0.280021    2.639
   HYPEDIFF 1     0.3481   0.3770     0.8526     0.3558     0.134822    1.416


NOTE: Since there is only one response level, measures of association between 
      the observed and predicted values were not calculated.
 
 
                    ~john-c/5421/gallhype.sas 27MAR04 19:34
=================================================================================

     Note the MODEL statement in the first PROC LOGISTIC analysis:

       model outcome1 = galldiff / noint ;

     The outcome variable is set equal to 'outcome1'.  But 'outcome1' is the indicator
variable for cases, and it is always equal to 1.  That is, it is a *constant*.  As it
turns out, *any* constant can be used in this analysis and the result will be the
same.

     Note that in the printout for this analysis, the coefficient of GALLDIFF is
0.9555.  This can be interpreted as an odds ratio: specifically, exp(.9555) = 2.60 is
the same as the estimated odds ratio obtained from discordant pairs in a McNemar
analysis.  It is the odds ratio for having gall bladder disease for a person who has
endometrial cancer versus another person who does not.

     The second PROC LOGISTIC analysis here has the same structure, but it includes
both GALLDIFF and HYPEDIFF as covariates.  Again exp(coefficient) can be interpreted
as a McNemar odds ratio.

=================================================================================

PROBLEM 1:

     Carry out a PROC FREQ matched-pairs analysis to see if you get the same odds
ratio from that as was obtained from the first PROC LOGISTIC analysis given above.


PROBLEM 2:

     Carry out an analysis on this same data set that evaluates whether there is
significant interaction between gall bladder disease and hypertension as risk factors
for endometrial cancer.



=================================================================================

n54703.012  Last update: March 30, 2005.