PROC LOGISTIC, II: More complicated tables. n54703.010 The previous notes dealt primarily with the use of PROC LOGISTIC to analyze one 2 x 2 table, and showed that much of what PROC LOGISTIC does in that case can be done in PROC FREQ. Here we will examine what happens when PROC LOGISTIC is applied to 2 x M tables, and to data in the form of multiple 2 x 2 tables. 1. 2 X M CONTINGENCY TABLE: Consider the following 2 x 3 table: X = 1 X = 2 X = 3 ------------------------------- | | | | Y = 0 | 10 | 20 | 30 | 60 | | | | ------------------------------- | | | | Y = 1 | 30 | 20 | 10 | 60 | | | | ------------------------------- 40 40 40 120 Here Y is the outcome variable, and X is a predictor or covariate. The question is whether there is statistical evidence for a relationship between X and Y. The null hypothesis is that there is not, i.e., that for each of the three columns, the true proportion for which Y = 1 is the same. The covariate X here is intended as a categorical variable. This means that the actual values taken on by X are not important, and even that their order is not important. If this were an analysis of variance, X would be a *factor*; it would be entered as a CLASS variable, and the different columns would be represented by indicator (or dummy) variables. PROC LOGISTIC in SAS version 8 has a lot in common with PROC GLM. It provides for the use of CLASS variables, but the coding of them is somewhat different from that for PROC GLM, as will be explained below. Here is a program which analyzes the table above, using both PROC FREQ and PROC LOGISTIC: ================================================================================== options linesize = 80 ; footnote "~john-c/5421/n54703.010.sas &sysdate &systime" ; data x23 x23xpand ; input x y count ; do i = 1 to count ; output x23xpand ; end ; output x23 ; cards ; 1 0 10 2 0 20 3 0 30 1 1 30 2 1 20 3 1 10 ; run ; proc freq data = x23 ; weight count ; tables y * x / chisq ; title1 'PROC FREQ analysis of a 2 x 3 contingency table' ; run ; proc logistic descending data = x23xpand ; class x ; model y = x / clodds = pl ; title1 'PROC LOGISTIC analysis of a 2 x 3 contingency table' ; title2 'Using covariate x as a CLASS variable ...' ; run ; ================================================================================ PROC FREQ analysis of a 2 x 3 contingency table 1 19:18 Tuesday, March 9, 2004 The FREQ Procedure Table of y by x y x Frequency| Percent | Row Pct | Col Pct | 1| 2| 3| Total ---------+--------+--------+--------+ 0 | 10 | 20 | 30 | 60 | 8.33 | 16.67 | 25.00 | 50.00 | 16.67 | 33.33 | 50.00 | | 25.00 | 50.00 | 75.00 | ---------+--------+--------+--------+ 1 | 30 | 20 | 10 | 60 | 25.00 | 16.67 | 8.33 | 50.00 | 50.00 | 33.33 | 16.67 | | 75.00 | 50.00 | 25.00 | ---------+--------+--------+--------+ Total 40 40 40 120 33.33 33.33 33.33 100.00 Statistics for Table of y by x Statistic DF Value Prob ------------------------------------------------------ Chi-Square 2 20.0000 <.0001 Likelihood Ratio Chi-Square 2 20.9299 <.0001 Mantel-Haenszel Chi-Square 1 19.8333 <.0001 Phi Coefficient 0.4082 Contingency Coefficient 0.3780 Cramer's V 0.4082 Sample Size = 120 ~john-c/5421/n54703.010.sas 09MAR04 19:18 ================================================================================ PROC LOGISTIC analysis of a 2 x 3 contingency table 2 Using covariate x as a CLASS variable ... 19:18 Tuesday, March 9, 2004 The LOGISTIC Procedure Model Information Data Set WORK.X23XPAND Response Variable y Number of Response Levels 2 Number of Observations 120 Link Function Logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value y Frequency 1 1 60 2 0 60 Class Level Information Design Variables Class Value 1 2 x 1 1 0 2 0 1 3 -1 -1 Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 168.355 151.425 SC 171.143 159.788 -2 Log L 166.355 145.425 ~john-c/5421/n54703.010.sas 09MAR04 19:18 ================================================================================ PROC LOGISTIC analysis of a 2 x 3 contingency table 3 Using covariate x as a CLASS variable ... 19:18 Tuesday, March 9, 2004 The LOGISTIC Procedure Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 20.9299 2 <.0001 Score 20.0000 2 <.0001 Wald 18.1042 2 0.0001 Type III Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq x 2 18.1042 0.0001 Analysis of Maximum Likelihood Estimates Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -93E-18 0.2018 0.0000 1.0000 x 1 1 1.0986 0.2919 14.1685 0.0002 x 2 1 -821E-19 0.2722 0.0000 1.0000 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits x 1 vs 3 9.000 3.271 24.763 x 2 vs 3 3.000 1.164 7.732 Association of Predicted Probabilities and Observed Responses Percent Concordant 58.3 Somers' D 0.444 Percent Discordant 13.9 Gamma 0.615 Percent Tied 27.8 Tau-a 0.224 Pairs 3600 c 0.722 Profile Likelihood Confidence Interval for Adjusted Odds Ratios Effect Unit Estimate 95% Confidence Limits x 1 vs 3 1.0000 9.000 3.395 26.001 x 2 vs 3 1.0000 3.000 1.186 7.973 ~john-c/5421/n54703.010.sas 09MAR04 19:18 ================================================================================ The PROC FREQ analysis is straightforward, and indicates that there is statistically significant relationship between X and Y. Note that the likelihood ratio chi-square equals 20.9299. This is compared to a chi-square distribution with 2 degrees of freedom. [Why 2?] The associated p-value is < .0001. The PROC LOGISTIC analysis yields essentially the same result. This can be seen from the following table in the printout. Note that the change in -2 Log L from the Intercept Only model to the Intercept and Covariates model is 166.355 - 145.425 = 20.9299. This should be compared to a chi-square statistic with 2 degrees of freedom (because SAS enters 2 indicator variables into the model), and the associated p-value is < 0.0001, just as with PROC FREQ. SAS goes on to compute two odds ratios: one for X = 1 versus X = 3, and the other for X = 2 versus X = 3. This corresponds exactly to computing odds ratios for the following two tables: X = 3 X = 1 X = 3 X = 2 --------------------- --------------------- | | | | | | Y = 0 | 30 | 10 | | 30 | 20 | | | | | | | --------------------- --------------------- | | | | | | Y = 1 | 10 | 30 | | 10 | 20 | | | | | | | --------------------- --------------------- OR = 30*30/(10*10) = 9 OR = 30*20/(20*10) = 3 Here I have put the X = 3 column on the left because SAS treats it as the 'default' category, i.e., the one to which the other two are to be compared. SAS represents the categories in a somewhat unexpected way. SAS makes use of two 'indicator' variables, X1 and X2, which are defined as follows: If X = 1, then X1 = 1 and X2 = 0. If X = 2, then X1 = 0 and X2 = 1. If X = 3, then X1 = -1 and X2 = -1. The model that SAS uses here is the following: Prob(Y = 1 | X1 and X2) = 1 / (1 + exp(-b0 - b1*X1 - b2*X2)). The printout gives the coefficient estimates for b0, b1, and b2: ---------------------------------------------------------------------------------- Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -93E-18 0.2018 0.0000 1.0000 x 1 1 1.0986 0.2919 14.1685 0.0002 x 2 1 -821E-19 0.2722 0.0000 1.0000 ---------------------------------------------------------------------------------- What this says essentially is: b0 = 0, b1 = 1.0986, and b2 = 0. To compute the odds ratio for X = 1 versus X = 3, you need to compute two odds: Odds(Y = 1 | X = 1) and Odds(Y = 1 | X = 3). Recall that Odds equals: prob / (1 - prob). Note that Prob(Y = 1 | X = 1) = 1 / (1 + exp(-0 - 1.0986*1 - 0)) = .75. Therefore Odds(Y = 1 | X = 1) = .75 / .25 = 3. Now the more difficult part: Note that Prob(Y = 1 | X = 3) = 1 / (1 + exp(-0 -1.0986*(-1) - 0*(-1)) = 1/(1 + exp(+1.0986)) = 1/4. Therefore Odds(Y = 1 | X = 3) = (1/4)/(3/4) = 1/3. Finally, therefore, the *odds ratio* for X = 1 versus X = 3 is: OR = 3/(1/3) = 9. This is given in the PROC LOGISTIC printout. Note that it agrees with the value given above based on consideration of the comparison of the X = 1 column with the X = 3 column. To be sure you understand this, you should go through the same process to compute the odds ratio for X = 2 versus X = 3, using the PROC LOGISTIC coefficients. PROC LOGISTIC also provides confidence intervals for both of these odds ratio estimates. PROC FREQ does display either the odds ratios or their confidence limits for 2 x M tables when M > 2. You may not like the way SAS codes the indicator variables (I don't!). In this case and many others, you can easily write your own in the data step preceding the PROC LOGISTIC. Below is an example of how this works: ================================================================================== options linesize = 80 ; footnote "~john-c/5421/n54703.010.sas &sysdate &systime" ; data x23 x23xpand ; input x y count ; x1 = 0 ; x2 = 0 ; x3 = 0 ; if x = 1 then x1 = 1 ; if x = 2 then x2 = 1 ; if x = 3 then x3 = 1 ; do i = 1 to count ; output x23xpand ; end ; output x23 ; cards ; 1 0 10 2 0 20 3 0 30 1 1 30 2 1 20 3 1 10 ; run ; proc logistic descending data = x23xpand ; model y = x1 x2 / clodds = pl ; title1 'PROC LOGISTIC analysis of a 2 x 3 contingency table' ; title2 'Using indicator variables ...' ; run ; endsas ; --------------------------------------------------------------------------------- PROC LOGISTIC analysis of a 2 x 3 contingency table 1 Using indicator variables ... 18:12 Wednesday, March 10, 2004 The LOGISTIC Procedure Data Set: WORK.X23XPAND Response Variable: Y Response Levels: 2 Number of Observations: 120 Link Function: Logit Response Profile Ordered Value Y Count 1 1 60 2 0 60 Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 168.355 151.425 . SC 171.143 159.788 . -2 LOG L 166.355 145.425 20.930 with 2 DF (p=0.0001) Score . . 20.000 with 2 DF (p=0.0001) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT 1 -1.0986 0.3651 9.0521 0.0026 . . X1 1 2.1972 0.5164 18.1042 0.0001 0.573451 9.000 X2 1 1.0986 0.4830 5.1726 0.0229 0.286725 3.000 Association of Predicted Probabilities and Observed Responses Concordant = 58.3% Somers' D = 0.444 Discordant = 13.9% Gamma = 0.615 Tied = 27.8% Tau-a = 0.224 (3600 pairs) c = 0.722 ~john-c/5421/n54703.010.sas 10MAR04 18:12 ---------------------------------------------------------------------------------- PROC LOGISTIC analysis of a 2 x 3 contingency table 2 Using indicator variables ... 18:12 Wednesday, March 10, 2004 The LOGISTIC Procedure Conditional Odds Ratios and 95% Confidence Intervals Profile Likelihood Confidence Limits Odds Variable Unit Ratio Lower Upper X1 1.0000 9.000 3.395 26.001 X2 1.0000 3.000 1.186 7.973 ~john-c/5421/n54703.010.sas 10MAR04 18:12 ================================================================================== Note that indicator variables x1, x2, and x3 are defined in the data step: x1 = 1 if x = 1, 0 otherwise; x2 = 1 if x = 2, 0 otherwise; x3 = 1 if x = 3, 0 otherwise. These appear in the MODEL statement in PROC LOGISTIC as follows: model y = x1 x2 / clodds = pl ; Note that there is no CLASS statement. Note that indicator variable x3 is omitted from the model: this corresponds to the fact that the third column is the reference category. Note that the odds ratios corresponding to x1 and x2 are computed as exp(x1 coeff) = exp(2.1972) = 9; 95% CI, (3.395, 26.001) exp(x2 coeff) = exp(1.0986) = 3; 95% CI, (1,186, 7.983) The interpretation of the odds ratio is the same as before: the odds that Y = 1 for column 1 versus column 3 is exp(x1 coeff) = 9, etc. This method of coding variables for PROC LOGISTIC seems a little easier to use and interpret than the CLASS variable version. 2. MULTIPLE 2 X 2 TABLES: We return to an example that was used in notes n54703.003: Men Women --------------------- --------------------- Smoke No Smoke Smoke No Smoke --------------------- --------------------- | | | | | | Heart Dis + | 24 | 18 | | 15 | 10 | | | | | | | --------------------- --------------------- | | | | | | Heart Dis - | 76 | 82 | | 85 | 90 | | | | | | | --------------------- --------------------- 100 100 100 100 OR = 1.439 OR = 1.588 We will denote the outcome variable, Heart Disease, by Y, with Heart Dis + : Y = 1 Heart Dis - : Y = 0. We will represent smoking status by A: Smoke : A = 1 No Smoke: A = 0. Finally we will represent Gender by B: Men : B = 0 Women : B = 1. We will also need an interaction term, AB, defined simply as AB = A*B. Note that AB = 0 if A = 0 or B = 0, and AB = 1 *only when* both A and B are 1. What part of the 2 x 2 tables is represented by AB = 1 ? Several models are possible. We will consider five here: MODEL 0: Intercept only. MODEL A: Variable 'A' the only covariate: Prob(Y = 1 | A) = 1 / (1 + exp(-a0 - a1*A)). MODEL B: Variable 'B' the only covariate: Prob(Y = 1 | B) = 1 / (1 + exp(-b0 - b1*B)). MODEL 1: No interaction: Prob(Y = 1 | A and B) = 1 / (1 + exp(-c0 - c1*A - c2*B)). MODEL 2: Interaction: Prob(Y = 1 | A and B) = 1 / (1 + exp(-d0 - d1*A - d2*B - d3*AB)). Below is the corresponding SAS analysis. The results of the PROC FREQ analysis are identical to those shown in notes n54703.003, and are excised from the printout: ================================================================================== options linesize = 80 ; footnote "~john-c/5421/n54703.010.2.sas &sysdate &systime" ; data heart ; input y a b count ; ab = a * b ; do i = 1 to count ; output ; end ; cards ; 1 1 0 24 1 0 0 18 0 1 0 76 0 0 0 82 1 1 1 15 1 0 1 10 0 1 1 85 0 0 1 90 ; run ; proc freq data = heart ; tables b * y * a / chisq cmh measures ; title1 'PROC FREQ analysis of two 2 x 2 tables' ; run ; proc logistic descending data = heart ; model y = a / clodds = pl ; title1 'PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis' ; title2 'Covariate A only: Smoking.' ; title3 'Model Y = 1 / (1 + exp(-a0 - a1*A)), no interaction.' ; run ; proc logistic descending data = heart ; model y = b / clodds = pl ; title1 'PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis' ; title2 'Covariate B only: Gender.' ; title3 'Model Y = 1 / (1 + exp(-b0 - b1*A)), no interaction.' ; run ; proc logistic descending data = heart ; model y = a b / clodds = pl ; title1 'PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis' ; title2 'Covariate A = smoking, Covariate B = gender' ; title3 'Model Y = 1 / (1 + exp(-c0 - c1*A - c2*B)), no interaction.' ; run ; proc logistic descending data = heart ; model y = a b ab / clodds = pl ; title1 'PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis' ; title2 'Covariate A = smoking, Covariate B = gender, AB = intxn' ; title3 'Model Y = 1 / (1 + exp(-d0 - d1*A - d2*B - d3*AB)), interaction.' ; run ; ================================================================================= MODEL A: PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis 1 Covariate A only: Smoking. Model Y = 1 / (1 + exp(-a0 - a1*A)), no interaction. 18:37 Wednesday, March 10, 2004 The LOGISTIC Procedure Data Set: WORK.HEART Response Variable: Y Response Levels: 2 Number of Observations: 400 Link Function: Logit Response Profile Ordered Value Y Count 1 1 67 2 0 333 Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 363.520 363.342 . SC 367.511 371.325 . -2 LOG L 361.520 359.342 2.178 with 1 DF (p=0.1400) Score . . 2.169 with 1 DF (p=0.1408) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT 1 -1.8153 0.2038 79.3547 0.0001 . . A 1 0.3974 0.2709 2.1528 0.1423 0.109699 1.488 Association of Predicted Probabilities and Observed Responses Concordant = 30.1% Somers' D = 0.099 Discordant = 20.2% Gamma = 0.196 Tied = 49.7% Tau-a = 0.028 (22311 pairs) c = 0.549 ~john-c/5421/n54703.010.2.sas 10MAR04 18:37 --------------------------------------------------------------------------------- PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis 2 Covariate A only: Smoking. Model Y = 1 / (1 + exp(-a0 - a1*A)), no interaction. 18:37 Wednesday, March 10, 2004 The LOGISTIC Procedure Conditional Odds Ratios and 95% Confidence Intervals Profile Likelihood Confidence Limits Odds Variable Unit Ratio Lower Upper A 1.0000 1.488 0.878 2.549 ~john-c/5421/n54703.010.2.sas 10MAR04 18:37 --------------------------------------------------------------------------------- MODEL B: PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis 3 Covariate B only: Gender. Model Y = 1 / (1 + exp(-a0 - a1*B)), no interaction. 18:37 Wednesday, March 10, 2004 The LOGISTIC Procedure Data Set: WORK.HEART Response Variable: Y Response Levels: 2 Number of Observations: 400 Link Function: Logit Response Profile Ordered Value Y Count 1 1 67 2 0 333 Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 363.520 360.291 . SC 367.511 368.274 . -2 LOG L 361.520 356.291 5.229 with 1 DF (p=0.0222) Score . . 5.181 with 1 DF (p=0.0228) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT 1 -1.3249 0.1736 58.2451 0.0001 . . B 1 -0.6210 0.2754 5.0838 0.0242 -0.171398 0.537 Association of Predicted Probabilities and Observed Responses Concordant = 32.9% Somers' D = 0.152 Discordant = 17.7% Gamma = 0.301 Tied = 49.4% Tau-a = 0.043 (22311 pairs) c = 0.576 ~john-c/5421/n54703.010.2.sas 10MAR04 18:37 --------------------------------------------------------------------------------- PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis 4 Covariate B only: Gender. Model Y = 1 / (1 + exp(-a0 - a1*B)), no interaction. 18:37 Wednesday, March 10, 2004 The LOGISTIC Procedure Conditional Odds Ratios and 95% Confidence Intervals Profile Likelihood Confidence Limits Odds Variable Unit Ratio Lower Upper B 1.0000 0.537 0.310 0.916 PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis 6 Covariate A = smoking, Covariate B = gender Model Y = 1 / (1 + exp(-c0 - c1*A - c2*B)), no interaction. 21:05 Tuesday, March 9, 2004 The LOGISTIC Procedure Model Information Data Set WORK.HEART Response Variable y Number of Response Levels 2 Number of Observations 400 Link Function Logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value y Frequency 1 1 67 2 0 333 Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 363.520 360.085 SC 367.511 372.059 -2 Log L 361.520 354.085 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 7.4354 2 0.0243 Score 7.3506 2 0.0253 Wald 7.1836 2 0.0275 ~john-c/5421/n54703.010.2.sas 09MAR04 21:05 ================================================================================= MODEL 1: Y = A B: No interaction. PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis 7 Covariate A = smoking, Covariate B = gender Model Y = 1 / (1 + exp(-c0 - c1*A - c2*B)), no interaction. 21:05 Tuesday, March 9, 2004 The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -1.5380 0.2313 44.1968 <.0001 a 1 0.4027 0.2727 2.1808 0.1397 b 1 -0.6244 0.2762 5.1114 0.0238 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits a 1.496 0.877 2.553 b 0.536 0.312 0.920 Association of Predicted Probabilities and Observed Responses Percent Concordant 47.8 Somers' D 0.202 Percent Discordant 27.6 Gamma 0.267 Percent Tied 24.5 Tau-a 0.056 Pairs 22311 c 0.601 Profile Likelihood Confidence Interval for Adjusted Odds Ratios Effect Unit Estimate 95% Confidence Limits a 1.0000 1.496 0.880 2.571 b 1.0000 0.536 0.308 0.914 ~john-c/5421/n54703.010.2.sas 09MAR04 21:05 ================================================================================= MODEL 2: A B A*B: Interaction. PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis 8 Covariate A = smoking, Covariate B = gender, AB = intxn Model Y = 1 / (1 + exp(-d0 - d1*A - d2*B - d3*AB)), interaction. 21:05 Tuesday, March 9, 2004 The LOGISTIC Procedure Model Information Data Set WORK.HEART Response Variable y Number of Response Levels 2 Number of Observations 400 Link Function Logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value y Frequency 1 1 67 2 0 333 Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 363.520 362.053 SC 367.511 378.019 -2 Log L 361.520 354.053 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 7.4668 3 0.0584 Score 7.3686 3 0.0610 Wald 7.0972 3 0.0689 ~john-c/5421/n54703.010.2.sas 09MAR04 21:05 ================================================================================= PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis 9 Covariate A = smoking, Covariate B = gender, AB = intxn Model Y = 1 / (1 + exp(-d0 - d1*A - d2*B - d3*AB)), interaction. 21:05 Tuesday, March 9, 2004 The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -1.5163 0.2603 33.9378 <.0001 a 1 0.3637 0.3501 1.0790 0.2989 b 1 -0.6809 0.4229 2.5919 0.1074 ab 1 0.0989 0.5587 0.0314 0.8594 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits a 1.439 0.724 2.857 b 0.506 0.221 1.160 ab 1.104 0.369 3.300 Association of Predicted Probabilities and Observed Responses Percent Concordant 47.8 Somers' D 0.202 Percent Discordant 27.6 Gamma 0.267 Percent Tied 24.5 Tau-a 0.056 Pairs 22311 c 0.601 Profile Likelihood Confidence Interval for Adjusted Odds Ratios Effect Unit Estimate 95% Confidence Limits a 1.0000 1.439 0.727 2.888 b 1.0000 0.506 0.214 1.140 ab 1.0000 1.104 0.371 3.348 ~john-c/5421/n54703.010.2.sas 09MAR04 21:05 ================================================================================= The main variable of interest here is smoking (A). Gender is essentially a confounder, that is, it is another variable which also affects the risk of heart disease. As in the PROC FREQ analysis, one wants to know whether there is an interaction of smoking and gender. If so, the right model to report is Model 2. If not, one should report the results of Model 1. The Model A analysis (A is the only covariate) indicates an odds ratio for the effect of A of exp(.3974) = 1.488. The Model B analysis (B is the only covariate) indicates an odds ratio for the effect of B of exp(-.621) = 0.537. An objective of this analysis is to evaluate the effect of factor A (smoking) versus non-smoking. In doing this, one would want to control for a possible confounder, covariate B (gender). The proper way to test for the effect of A is to look at Diff(-2 Log L) between model B and model A B. This yields: Diff(-2 Log L) = 356.291 - 354.085 = 2.206. This should be compared to a chi-square distribution with 1 degree of freedom: p = .1347. The Model 1 analysis yields a coefficient for A, the smoking variable, of 0.4027, and the corresponding odds ratio is 1.496. The confidence interval is (.877, 2.553), so the evidence that smoking is a risk factor in this model is not terribly strong. The Model 2 analysis yields the following coefficient estimates: Intercept : -1.516 A (smoking) : 0.364 B (gender) : -0.681 AB (interxn) : 0.099 Note that adding the interaction variable AB 'weakened' the effect of smoking. The real question is, what is the effect of the interaction term itself? The soundest way to evaluate the interaction effect statistically is to examine the difference in -2 Log L between Model 1 and Model 2: Model 1 -2 Log L: 354.085 Model 2 -2 Log L: 354.053 -------------------------------- Diff(-2 Log L) : 0.032. This should be compared to a chi-square distribution with 1 degree of freedom. The result is far from significant: p = 0.858. Therefore one would not reject the null hypothesis that there is no interaction. One would report the results of Model 1. Note that this agrees very closely with the results of the PROC FREQ analysis: the Breslow-Day test for homogeneity of the odds ratio between the two tables had a chi-square value of 0.031 with a p-value of 0.859. A key fact to note here is the following: saying there is no interaction is basically the same thing as saying the odds ratios in the two separate tables are indistinguishable. To put it another way, a test for interaction is equivalent to a test for homogeneity of the odds ratios. ================================================================================= Problem 1. Use PROC LOGISTIC to analyze the data from notes n54703.003: Men Women --------------------- --------------------- Smoke No Smoke Smoke No Smoke --------------------- --------------------- | | | | | | Heart Dis + | 24 | 18 | | 15 | 10 | | | | | | | --------------------- --------------------- | | | | | | Heart Dis - | 76 | 82 | | 85 | 90 | | | | | | | --------------------- --------------------- 100 100 100 100 OR = 1.439 OR = 1.588 Specifically, 1) Use PROC LOGISTIC to analyze the two strata separately, including estimates and 95% confidence intervals for the odds ratios, and tests of whether smoking status is related to outcome. Discuss how the results are related to PROC FREQ analyses. 2) Use PROC LOGISTIC for all the data stratified by gender. Find the estimated combined odds ratio and 95% confidence intervals. Perform a test of interaction of gender and smoking status. Again discuss how this analysis is related to a PROC FREQ analysis. ================================================================================= Problem 2. Use PROC LOGISTIC to analyze the following data: Minnesota Washington Alabama --------------------------------------------------------- | | | | D + | 1226 | 988 | 564 | | | | | --------------------------------------------------------- | | | | D - | 1358 | 1299 | 582 | | | | | --------------------------------------------------------- The question to be addressed here is, is there a difference between the three States in the proportion of people in the "D +" category? Compare your analysis and conclusions with a PROC FREQ analysis. ------------------------------------------------------------------------ PROBLEM 3. Use PROC LOGISTIC to analyze the relationship between the outcome variable pain, and covariates sex, age, and treatment. Treatment: P = placebo, A = drug A, B = drug B. Sex : F = female, M = male Age : years Pain : No and Yes This is a clinical trial for the treatment of chronic pain. The main questions of interest: 1. Is pain related to treatment? 2. Does treatment affect women differently from men? The dataset also includes 'duration', which is the time in months before the study began that the person first reported pain. Your analysis should control for age and duration, but focus on the two questions above. State your conclusions and explain them. The dataset is given below. Note that there are 3 cases on each line. ============================================================= data pain ; input Treatment $ Sex $ Age Duration Pain $ @@; datalines; P F 68 1 No B M 74 16 No P F 67 30 No P M 66 26 Yes B F 67 28 No B F 77 16 No A F 71 12 No B F 72 50 No B F 76 9 Yes A M 71 17 Yes A F 63 27 No A F 69 18 Yes B F 66 12 No A M 62 42 No P F 64 1 Yes A F 64 17 No P M 74 4 No A F 72 25 No P M 70 1 Yes B M 66 19 No B M 59 29 No A F 64 30 No A M 70 28 No A M 69 1 No B F 78 1 No P M 83 1 Yes B F 69 42 No B M 75 30 Yes P M 77 29 Yes P F 79 20 Yes A M 70 12 No A F 69 12 No B F 65 14 No B M 70 1 No B M 67 23 No A M 76 25 Yes P M 78 12 Yes B M 77 1 Yes B F 69 24 No P M 66 4 Yes P F 65 29 No P M 60 26 Yes A M 78 15 Yes B M 75 21 Yes A F 67 11 No P F 72 27 No P F 70 13 Yes A M 75 6 Yes B F 65 7 No P F 68 27 Yes P M 68 11 Yes P M 67 17 Yes B M 70 22 No A M 65 15 No P F 67 1 Yes A M 67 10 No P F 72 11 Yes A F 74 1 No B M 80 21 Yes A F 69 3 No ; ================================================================================= n54703.010 Last update: March 31, 2006.