PROC LOGISTIC, IV: Matched Pairs n54703.012 The structure of matched-pairs studies is the following. Two experimental units are paired together. In general they are paired because they have some key characteristics in common. This provides automatic control for factors that are not of interest to the investigator. A typical study might involve identical twins. They have all their genetic factors in common. One twin might be randomized to a blood-pressure-lowering drug and the other to a placebo. They are given their assigned treatments for 5 years. The endpoint of the study might be stroke. The simplest analysis of such a study can be done using the McNemar statistic. It yields an odds ratio and 95% CI for the OR, and a test of significance. As noted earlier in the course, this can be carried out using PROC FREQ. However, even identical twins are not identical in all respects. They might live in different cities and have different sources of medical care. They might have different dietary intake. The McNemar analysis does not provide any way to control for factors other than those directly involved in the matching. PROC LOGISTIC can be used to analyze pair-matched studies, and it does allow for inclusion of risk factors other than the main factor of interest. The syntax and interpretation are both different from the usual usage for PROC LOGISTIC. Below is an example. The data are 63 matched pairs from the Los Angeles Study of Endometrial Cancer [See: Breslow and Day, Statistical Methods in Cancer Research, 1980.]. This is a case-control study. Each pair is comprised of a case and a control. The cases are people who have endometrial cancer (OUTCOME = 1) and the controls do not have endometrial cancer (OUTCOME = 0). The ID is the same for each case and her matched control. There are two risk factors of interest: gall-bladder disease and hypertension. Gall-bladder disease is indicated by the variable GALL: GALL = 1 means the person has gall-bladder disease, and GALL = 0 means she does not. Similarly HYPER = 1 means the person has hypertension, HYPER = 0 means she does not. The variables recorded for each person are: ID, OUTCOME, GALL, and HYPER. There are two observations per line of the data file: one for the case, and the other for the control. For example, the 8th line of the file is: 8 1 1 1 8 0 0 1 This means that in pair #8, the CASE has OUTCOME = 1, GALL = 1, and HYPER = 1. The CONTROL has OUTCOME = 0, GALL = 0, and HYPER = 1. Here is how you can analyze this using PROC LOGISTIC. First, for each risk factor of interest, compute the difference between the case's value for that factor and the control's value. For example, compute GALLDIFF = GALLCASE - GALLCONT ; HYPEDIFF = HYPECASE - HYPECONT ; A strange feature of this kind of PROC LOGISTIC analysis is that the outcome variable is set equal to a *constant*. Another feature is that the NOINT option is used on the MODEL statement. Here is the program: ================================================================================== options linesize = 80 ; footnote "~john-c/5421/gallhype &sysdate &systime" ; data gallhype ; input idcase outcome1 gallcase hypecase idcont outcome2 gallcont hypecont ; galldiff = gallcase - gallcont; hypediff = hypecase - hypecont; cards; 1 1 0 0 1 0 0 0 2 1 0 0 2 0 0 0 3 1 0 1 3 0 0 1 4 1 0 0 4 0 1 0 5 1 1 0 5 0 0 1 6 1 0 1 6 0 0 0 7 1 1 0 7 0 0 0 8 1 1 1 8 0 0 1 9 1 0 0 9 0 0 0 10 1 0 0 10 0 0 0 11 1 1 0 11 0 0 0 12 1 0 0 12 0 0 1 13 1 1 0 13 0 0 1 14 1 1 0 14 0 1 0 15 1 1 0 15 0 0 1 16 1 0 1 16 0 0 0 17 1 0 0 17 0 1 1 18 1 0 0 18 0 1 1 19 1 0 0 19 0 0 1 20 1 0 1 20 0 0 0 21 1 0 0 21 0 1 1 22 1 0 1 22 0 0 1 23 1 0 1 23 0 0 0 24 1 0 0 24 0 0 0 25 1 0 0 25 0 0 0 26 1 0 0 26 0 0 1 27 1 1 0 27 0 0 1 28 1 0 0 28 0 0 1 29 1 1 0 29 0 0 0 30 1 0 1 30 0 0 0 31 1 0 1 31 0 0 0 32 1 0 1 32 0 0 0 33 1 0 1 33 0 0 0 34 1 0 0 34 0 0 0 35 1 1 1 35 0 1 1 36 1 0 0 36 0 0 1 37 1 0 1 37 0 0 0 38 1 0 1 38 0 0 1 39 1 0 1 39 0 0 1 40 1 0 1 40 0 0 0 41 1 0 0 41 0 0 0 42 1 0 1 42 0 1 0 43 1 0 0 43 0 0 1 44 1 0 0 44 0 0 0 45 1 1 0 45 0 0 0 46 1 0 0 46 0 0 0 47 1 1 1 47 0 0 0 48 1 0 1 48 0 0 0 49 1 0 0 49 0 0 0 50 1 0 1 50 0 0 1 51 1 0 0 51 0 0 0 52 1 0 1 52 0 0 1 53 1 0 1 53 0 0 0 54 1 0 1 54 0 0 0 55 1 1 0 55 0 0 0 56 1 0 0 56 0 0 0 57 1 1 1 57 0 1 0 58 1 0 0 58 0 0 0 59 1 0 0 59 0 0 0 60 1 1 1 60 0 0 0 61 1 1 0 61 0 1 0 62 1 0 1 62 0 0 0 63 1 1 0 63 0 0 0 ; run; proc logistic data = gallhype ; model outcome1 = galldiff / noint ; title1 'Endometrial cancer case-control study: effect of gall bladder dis.' ; run ; proc logistic data = gallhype ; model outcome1 = galldiff hypediff / noint ; title1 'Endometrial cancer case-control study:' ; title2 'Effect of gall bladder disease and hypertension.' ; run ; ================================================================================= Endometrial cancer case-control study: effect of gall bladder dis. 1 19:34 Saturday, March 27, 2004 The LOGISTIC Procedure Data Set: WORK.GALLHYPE Response Variable: OUTCOME1 Response Levels: 1 Number of Observations: 63 Link Function: Logit Response Profile Ordered Value OUTCOME1 Count 1 1 63 Model Fitting Information and Testing Global Null Hypothesis BETA=0 Without With Criterion Covariates Covariates Chi-Square for Covariates AIC 87.337 85.654 . SC 87.337 87.797 . -2 LOG L 87.337 83.654 3.683 with 1 DF (p=0.0550) Score . . 3.556 with 1 DF (p=0.0593) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio GALLDIFF 1 0.9555 0.5262 3.2970 0.0694 0.275723 2.600 NOTE: Since there is only one response level, measures of association between the observed and predicted values were not calculated. ~john-c/5421/gallhype.sas 27MAR04 19:34 ================================================================================= Endometrial cancer case-control study: 2 Effect of gall bladder disease and hypertension. 19:34 Saturday, March 27, 2004 The LOGISTIC Procedure Data Set: WORK.GALLHYPE Response Variable: OUTCOME1 Response Levels: 1 Number of Observations: 63 Link Function: Logit Response Profile Ordered Value OUTCOME1 Count 1 1 63 Model Fitting Information and Testing Global Null Hypothesis BETA=0 Without With Criterion Covariates Covariates Chi-Square for Covariates AIC 87.337 86.788 . SC 87.337 91.074 . -2 LOG L 87.337 82.788 4.549 with 2 DF (p=0.1029) Score . . 4.362 with 2 DF (p=0.1129) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio GALLDIFF 1 0.9704 0.5307 3.3432 0.0675 0.280021 2.639 HYPEDIFF 1 0.3481 0.3770 0.8526 0.3558 0.134822 1.416 NOTE: Since there is only one response level, measures of association between the observed and predicted values were not calculated. ~john-c/5421/gallhype.sas 27MAR04 19:34 ================================================================================= Note the MODEL statement in the first PROC LOGISTIC analysis: model outcome1 = galldiff / noint ; The outcome variable is set equal to 'outcome1'. But 'outcome1' is the indicator variable for cases, and it is always equal to 1. That is, it is a *constant*. As it turns out, *any* constant can be used in this analysis and the result will be the same. Note that in the printout for this analysis, the coefficient of GALLDIFF is 0.9555. This can be interpreted as an odds ratio: specifically, exp(.9555) = 2.60 is the same as the estimated odds ratio obtained from discordant pairs in a McNemar analysis. It is the odds ratio for having gall bladder disease for a person who has endometrial cancer versus another person who does not. The second PROC LOGISTIC analysis here has the same structure, but it includes both GALLDIFF and HYPEDIFF as covariates. Again exp(coefficient) can be interpreted as a McNemar odds ratio. ================================================================================= PROBLEM 1: Carry out a PROC FREQ matched-pairs analysis to see if you get the same odds ratio from that as was obtained from the first PROC LOGISTIC analysis given above. PROBLEM 2: Carry out an analysis on this same data set that evaluates whether there is significant interaction between gall bladder disease and hypertension as risk factors for endometrial cancer. ================================================================================= n54703.012 Last update: March 30, 2005.