PROC LOGISTIC, II: More complicated tables. n54703.010
The previous notes dealt primarily with the use of PROC LOGISTIC
to analyze one 2 x 2 table, and showed that much of what PROC LOGISTIC does in
that case can be done in PROC FREQ.
Here we will examine what happens when PROC LOGISTIC is applied to
2 x M tables, and to data in the form of multiple 2 x 2 tables.
1. 2 X M CONTINGENCY TABLE:
Consider the following 2 x 3 table:
X = 1 X = 2 X = 3
-------------------------------
| | | |
Y = 0 | 10 | 20 | 30 | 60
| | | |
-------------------------------
| | | |
Y = 1 | 30 | 20 | 10 | 60
| | | |
-------------------------------
40 40 40 120
Here Y is the outcome variable, and X is a predictor or covariate. The
question is whether there is statistical evidence for a relationship between
X and Y. The null hypothesis is that there is not, i.e., that for each of
the three columns, the true proportion for which Y = 1 is the same.
The covariate X here is intended as a categorical variable. This means
that the actual values taken on by X are not important, and even that their
order is not important. If this were an analysis of variance, X would be
a *factor*; it would be entered as a CLASS variable, and the different
columns would be represented by indicator (or dummy) variables.
PROC LOGISTIC in SAS version 8 has a lot in common with PROC GLM. It
provides for the use of CLASS variables, but the coding of them is somewhat
different from that for PROC GLM, as will be explained below. Here is a
program which analyzes the table above, using both PROC FREQ and PROC LOGISTIC:
==================================================================================
options linesize = 80 ;
footnote "~john-c/5421/n54703.010.sas &sysdate &systime" ;
data x23 x23xpand ;
input x y count ;
do i = 1 to count ;
output x23xpand ;
end ;
output x23 ;
cards ;
1 0 10
2 0 20
3 0 30
1 1 30
2 1 20
3 1 10
;
run ;
proc freq data = x23 ;
weight count ;
tables y * x / chisq ;
title1 'PROC FREQ analysis of a 2 x 3 contingency table' ;
run ;
proc logistic descending data = x23xpand ;
class x ;
model y = x / clodds = pl ;
title1 'PROC LOGISTIC analysis of a 2 x 3 contingency table' ;
title2 'Using covariate x as a CLASS variable ...' ;
run ;
================================================================================
PROC FREQ analysis of a 2 x 3 contingency table 1
19:18 Tuesday, March 9, 2004
The FREQ Procedure
Table of y by x
y x
Frequency|
Percent |
Row Pct |
Col Pct | 1| 2| 3| Total
---------+--------+--------+--------+
0 | 10 | 20 | 30 | 60
| 8.33 | 16.67 | 25.00 | 50.00
| 16.67 | 33.33 | 50.00 |
| 25.00 | 50.00 | 75.00 |
---------+--------+--------+--------+
1 | 30 | 20 | 10 | 60
| 25.00 | 16.67 | 8.33 | 50.00
| 50.00 | 33.33 | 16.67 |
| 75.00 | 50.00 | 25.00 |
---------+--------+--------+--------+
Total 40 40 40 120
33.33 33.33 33.33 100.00
Statistics for Table of y by x
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 2 20.0000 <.0001
Likelihood Ratio Chi-Square 2 20.9299 <.0001
Mantel-Haenszel Chi-Square 1 19.8333 <.0001
Phi Coefficient 0.4082
Contingency Coefficient 0.3780
Cramer's V 0.4082
Sample Size = 120
~john-c/5421/n54703.010.sas 09MAR04 19:18
================================================================================
PROC LOGISTIC analysis of a 2 x 3 contingency table 2
Using covariate x as a CLASS variable ...
19:18 Tuesday, March 9, 2004
The LOGISTIC Procedure
Model Information
Data Set WORK.X23XPAND
Response Variable y
Number of Response Levels 2
Number of Observations 120
Link Function Logit
Optimization Technique Fisher's scoring
Response Profile
Ordered Total
Value y Frequency
1 1 60
2 0 60
Class Level Information
Design
Variables
Class Value 1 2
x 1 1 0
2 0 1
3 -1 -1
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 168.355 151.425
SC 171.143 159.788
-2 Log L 166.355 145.425
~john-c/5421/n54703.010.sas 09MAR04 19:18
================================================================================
PROC LOGISTIC analysis of a 2 x 3 contingency table 3
Using covariate x as a CLASS variable ...
19:18 Tuesday, March 9, 2004
The LOGISTIC Procedure
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 20.9299 2 <.0001
Score 20.0000 2 <.0001
Wald 18.1042 2 0.0001
Type III Analysis of Effects
Wald
Effect DF Chi-Square Pr > ChiSq
x 2 18.1042 0.0001
Analysis of Maximum Likelihood Estimates
Standard
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -93E-18 0.2018 0.0000 1.0000
x 1 1 1.0986 0.2919 14.1685 0.0002
x 2 1 -821E-19 0.2722 0.0000 1.0000
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
x 1 vs 3 9.000 3.271 24.763
x 2 vs 3 3.000 1.164 7.732
Association of Predicted Probabilities and Observed Responses
Percent Concordant 58.3 Somers' D 0.444
Percent Discordant 13.9 Gamma 0.615
Percent Tied 27.8 Tau-a 0.224
Pairs 3600 c 0.722
Profile Likelihood Confidence Interval for Adjusted Odds Ratios
Effect Unit Estimate 95% Confidence Limits
x 1 vs 3 1.0000 9.000 3.395 26.001
x 2 vs 3 1.0000 3.000 1.186 7.973
~john-c/5421/n54703.010.sas 09MAR04 19:18
================================================================================
The PROC FREQ analysis is straightforward, and indicates that there is
statistically significant relationship between X and Y. Note that the
likelihood ratio chi-square equals 20.9299. This is compared to a chi-square
distribution with 2 degrees of freedom. [Why 2?] The associated p-value
is < .0001.
The PROC LOGISTIC analysis yields essentially the same result. This can
be seen from the following table in the printout. Note that the change
in -2 Log L from the Intercept Only model to the Intercept and Covariates
model is
166.355 - 145.425 = 20.9299.
This should be compared to a chi-square statistic with 2 degrees of
freedom (because SAS enters 2 indicator variables into the model), and the
associated p-value is < 0.0001, just as with PROC FREQ.
SAS goes on to compute two odds ratios: one for X = 1 versus X = 3,
and the other for X = 2 versus X = 3. This corresponds exactly to
computing odds ratios for the following two tables:
X = 3 X = 1 X = 3 X = 2
--------------------- ---------------------
| | | | | |
Y = 0 | 30 | 10 | | 30 | 20 |
| | | | | |
--------------------- ---------------------
| | | | | |
Y = 1 | 10 | 30 | | 10 | 20 |
| | | | | |
--------------------- ---------------------
OR = 30*30/(10*10) = 9 OR = 30*20/(20*10) = 3
Here I have put the X = 3 column on the left because SAS treats
it as the 'default' category, i.e., the one to which the other two are
to be compared.
SAS represents the categories in a somewhat unexpected way. SAS
makes use of two 'indicator' variables, X1 and X2, which are defined as follows:
If X = 1, then X1 = 1 and X2 = 0.
If X = 2, then X1 = 0 and X2 = 1.
If X = 3, then X1 = -1 and X2 = -1.
The model that SAS uses here is the following:
Prob(Y = 1 | X1 and X2) = 1 / (1 + exp(-b0 - b1*X1 - b2*X2)).
The printout gives the coefficient estimates for b0, b1, and b2:
----------------------------------------------------------------------------------
Standard
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -93E-18 0.2018 0.0000 1.0000
x 1 1 1.0986 0.2919 14.1685 0.0002
x 2 1 -821E-19 0.2722 0.0000 1.0000
----------------------------------------------------------------------------------
What this says essentially is: b0 = 0, b1 = 1.0986, and b2 = 0.
To compute the odds ratio for X = 1 versus X = 3, you need to compute
two odds:
Odds(Y = 1 | X = 1) and Odds(Y = 1 | X = 3).
Recall that Odds equals: prob / (1 - prob).
Note that Prob(Y = 1 | X = 1) = 1 / (1 + exp(-0 - 1.0986*1 - 0)) = .75.
Therefore Odds(Y = 1 | X = 1) = .75 / .25 = 3.
Now the more difficult part:
Note that Prob(Y = 1 | X = 3) = 1 / (1 + exp(-0 -1.0986*(-1) - 0*(-1))
= 1/(1 + exp(+1.0986)) = 1/4.
Therefore Odds(Y = 1 | X = 3) = (1/4)/(3/4) = 1/3.
Finally, therefore, the *odds ratio* for X = 1 versus X = 3 is:
OR = 3/(1/3) = 9.
This is given in the PROC LOGISTIC printout. Note that it agrees with
the value given above based on consideration of the comparison of the X = 1
column with the X = 3 column.
To be sure you understand this, you should go through the same process
to compute the odds ratio for X = 2 versus X = 3, using the PROC LOGISTIC
coefficients.
PROC LOGISTIC also provides confidence intervals for both of these
odds ratio estimates. PROC FREQ does display either the odds ratios or their
confidence limits for 2 x M tables when M > 2.
You may not like the way SAS codes the indicator variables (I don't!). In
this case and many others, you can easily write your own in the data step
preceding the PROC LOGISTIC. Below is an example of how this works:
==================================================================================
options linesize = 80 ;
footnote "~john-c/5421/n54703.010.sas &sysdate &systime" ;
data x23 x23xpand ;
input x y count ;
x1 = 0 ; x2 = 0 ; x3 = 0 ;
if x = 1 then x1 = 1 ;
if x = 2 then x2 = 1 ;
if x = 3 then x3 = 1 ;
do i = 1 to count ;
output x23xpand ;
end ;
output x23 ;
cards ;
1 0 10
2 0 20
3 0 30
1 1 30
2 1 20
3 1 10
;
run ;
proc logistic descending data = x23xpand ;
model y = x1 x2 / clodds = pl ;
title1 'PROC LOGISTIC analysis of a 2 x 3 contingency table' ;
title2 'Using indicator variables ...' ;
run ;
endsas ;
---------------------------------------------------------------------------------
PROC LOGISTIC analysis of a 2 x 3 contingency table 1
Using indicator variables ...
18:12 Wednesday, March 10, 2004
The LOGISTIC Procedure
Data Set: WORK.X23XPAND
Response Variable: Y
Response Levels: 2
Number of Observations: 120
Link Function: Logit
Response Profile
Ordered
Value Y Count
1 1 60
2 0 60
Model Fitting Information and Testing Global Null Hypothesis BETA=0
Intercept
Intercept and
Criterion Only Covariates Chi-Square for Covariates
AIC 168.355 151.425 .
SC 171.143 159.788 .
-2 LOG L 166.355 145.425 20.930 with 2 DF (p=0.0001)
Score . . 20.000 with 2 DF (p=0.0001)
Analysis of Maximum Likelihood Estimates
Parameter Standard Wald Pr > Standardized Odds
Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio
INTERCPT 1 -1.0986 0.3651 9.0521 0.0026 . .
X1 1 2.1972 0.5164 18.1042 0.0001 0.573451 9.000
X2 1 1.0986 0.4830 5.1726 0.0229 0.286725 3.000
Association of Predicted Probabilities and Observed Responses
Concordant = 58.3% Somers' D = 0.444
Discordant = 13.9% Gamma = 0.615
Tied = 27.8% Tau-a = 0.224
(3600 pairs) c = 0.722
~john-c/5421/n54703.010.sas 10MAR04 18:12
----------------------------------------------------------------------------------
PROC LOGISTIC analysis of a 2 x 3 contingency table 2
Using indicator variables ...
18:12 Wednesday, March 10, 2004
The LOGISTIC Procedure
Conditional Odds Ratios and 95% Confidence Intervals
Profile Likelihood
Confidence Limits
Odds
Variable Unit Ratio Lower Upper
X1 1.0000 9.000 3.395 26.001
X2 1.0000 3.000 1.186 7.973
~john-c/5421/n54703.010.sas 10MAR04 18:12
==================================================================================
Note that indicator variables x1, x2, and x3 are defined in the data step:
x1 = 1 if x = 1, 0 otherwise;
x2 = 1 if x = 2, 0 otherwise;
x3 = 1 if x = 3, 0 otherwise.
These appear in the MODEL statement in PROC LOGISTIC as follows:
model y = x1 x2 / clodds = pl ;
Note that there is no CLASS statement.
Note that indicator variable x3 is omitted from the model: this
corresponds to the fact that the third column is the reference category.
Note that the odds ratios corresponding to x1 and x2 are computed as
exp(x1 coeff) = exp(2.1972) = 9; 95% CI, (3.395, 26.001)
exp(x2 coeff) = exp(1.0986) = 3; 95% CI, (1,186, 7.983)
The interpretation of the odds ratio is the same as before: the odds
that Y = 1 for column 1 versus column 3 is exp(x1 coeff) = 9, etc. This
method of coding variables for PROC LOGISTIC seems a little easier to use
and interpret than the CLASS variable version.
2. MULTIPLE 2 X 2 TABLES:
We return to an example that was used in notes n54703.003:
Men Women
--------------------- ---------------------
Smoke No Smoke Smoke No Smoke
--------------------- ---------------------
| | | | | |
Heart Dis + | 24 | 18 | | 15 | 10 |
| | | | | |
--------------------- ---------------------
| | | | | |
Heart Dis - | 76 | 82 | | 85 | 90 |
| | | | | |
--------------------- ---------------------
100 100 100 100
OR = 1.439 OR = 1.588
We will denote the outcome variable, Heart Disease, by Y, with
Heart Dis + : Y = 1
Heart Dis - : Y = 0.
We will represent smoking status by A:
Smoke : A = 1
No Smoke: A = 0.
Finally we will represent Gender by B:
Men : B = 0
Women : B = 1.
We will also need an interaction term, AB, defined simply as AB = A*B.
Note that AB = 0 if A = 0 or B = 0, and AB = 1 *only when* both A and B are 1.
What part of the 2 x 2 tables is represented by AB = 1 ?
Several models are possible. We will consider five here:
MODEL 0: Intercept only.
MODEL A: Variable 'A' the only covariate: Prob(Y = 1 | A) = 1 / (1 + exp(-a0 - a1*A)).
MODEL B: Variable 'B' the only covariate: Prob(Y = 1 | B) = 1 / (1 + exp(-b0 - b1*B)).
MODEL 1: No interaction:
Prob(Y = 1 | A and B) = 1 / (1 + exp(-c0 - c1*A - c2*B)).
MODEL 2: Interaction:
Prob(Y = 1 | A and B) = 1 / (1 + exp(-d0 - d1*A - d2*B - d3*AB)).
Below is the corresponding SAS analysis. The results of the PROC FREQ
analysis are identical to those shown in notes n54703.003, and are excised
from the printout:
==================================================================================
options linesize = 80 ;
footnote "~john-c/5421/n54703.010.2.sas &sysdate &systime" ;
data heart ;
input y a b count ;
ab = a * b ;
do i = 1 to count ;
output ;
end ;
cards ;
1 1 0 24
1 0 0 18
0 1 0 76
0 0 0 82
1 1 1 15
1 0 1 10
0 1 1 85
0 0 1 90
;
run ;
proc freq data = heart ;
tables b * y * a / chisq cmh measures ;
title1 'PROC FREQ analysis of two 2 x 2 tables' ;
run ;
proc logistic descending data = heart ;
model y = a / clodds = pl ;
title1 'PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis' ;
title2 'Covariate A only: Smoking.' ;
title3 'Model Y = 1 / (1 + exp(-a0 - a1*A)), no interaction.' ;
run ;
proc logistic descending data = heart ;
model y = b / clodds = pl ;
title1 'PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis' ;
title2 'Covariate B only: Gender.' ;
title3 'Model Y = 1 / (1 + exp(-b0 - b1*A)), no interaction.' ;
run ;
proc logistic descending data = heart ;
model y = a b / clodds = pl ;
title1 'PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis' ;
title2 'Covariate A = smoking, Covariate B = gender' ;
title3 'Model Y = 1 / (1 + exp(-c0 - c1*A - c2*B)), no interaction.' ;
run ;
proc logistic descending data = heart ;
model y = a b ab / clodds = pl ;
title1 'PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis' ;
title2 'Covariate A = smoking, Covariate B = gender, AB = intxn' ;
title3 'Model Y = 1 / (1 + exp(-d0 - d1*A - d2*B - d3*AB)), interaction.' ;
run ;
=================================================================================
MODEL A:
PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis 1
Covariate A only: Smoking.
Model Y = 1 / (1 + exp(-a0 - a1*A)), no interaction.
18:37 Wednesday, March 10, 2004
The LOGISTIC Procedure
Data Set: WORK.HEART
Response Variable: Y
Response Levels: 2
Number of Observations: 400
Link Function: Logit
Response Profile
Ordered
Value Y Count
1 1 67
2 0 333
Model Fitting Information and Testing Global Null Hypothesis BETA=0
Intercept
Intercept and
Criterion Only Covariates Chi-Square for Covariates
AIC 363.520 363.342 .
SC 367.511 371.325 .
-2 LOG L 361.520 359.342 2.178 with 1 DF (p=0.1400)
Score . . 2.169 with 1 DF (p=0.1408)
Analysis of Maximum Likelihood Estimates
Parameter Standard Wald Pr > Standardized Odds
Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio
INTERCPT 1 -1.8153 0.2038 79.3547 0.0001 . .
A 1 0.3974 0.2709 2.1528 0.1423 0.109699 1.488
Association of Predicted Probabilities and Observed Responses
Concordant = 30.1% Somers' D = 0.099
Discordant = 20.2% Gamma = 0.196
Tied = 49.7% Tau-a = 0.028
(22311 pairs) c = 0.549
~john-c/5421/n54703.010.2.sas 10MAR04 18:37
---------------------------------------------------------------------------------
PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis 2
Covariate A only: Smoking.
Model Y = 1 / (1 + exp(-a0 - a1*A)), no interaction.
18:37 Wednesday, March 10, 2004
The LOGISTIC Procedure
Conditional Odds Ratios and 95% Confidence Intervals
Profile Likelihood
Confidence Limits
Odds
Variable Unit Ratio Lower Upper
A 1.0000 1.488 0.878 2.549
~john-c/5421/n54703.010.2.sas 10MAR04 18:37
---------------------------------------------------------------------------------
MODEL B:
PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis 3
Covariate B only: Gender.
Model Y = 1 / (1 + exp(-a0 - a1*B)), no interaction.
18:37 Wednesday, March 10, 2004
The LOGISTIC Procedure
Data Set: WORK.HEART
Response Variable: Y
Response Levels: 2
Number of Observations: 400
Link Function: Logit
Response Profile
Ordered
Value Y Count
1 1 67
2 0 333
Model Fitting Information and Testing Global Null Hypothesis BETA=0
Intercept
Intercept and
Criterion Only Covariates Chi-Square for Covariates
AIC 363.520 360.291 .
SC 367.511 368.274 .
-2 LOG L 361.520 356.291 5.229 with 1 DF (p=0.0222)
Score . . 5.181 with 1 DF (p=0.0228)
Analysis of Maximum Likelihood Estimates
Parameter Standard Wald Pr > Standardized Odds
Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio
INTERCPT 1 -1.3249 0.1736 58.2451 0.0001 . .
B 1 -0.6210 0.2754 5.0838 0.0242 -0.171398 0.537
Association of Predicted Probabilities and Observed Responses
Concordant = 32.9% Somers' D = 0.152
Discordant = 17.7% Gamma = 0.301
Tied = 49.4% Tau-a = 0.043
(22311 pairs) c = 0.576
~john-c/5421/n54703.010.2.sas 10MAR04 18:37
---------------------------------------------------------------------------------
PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis 4
Covariate B only: Gender.
Model Y = 1 / (1 + exp(-a0 - a1*B)), no interaction.
18:37 Wednesday, March 10, 2004
The LOGISTIC Procedure
Conditional Odds Ratios and 95% Confidence Intervals
Profile Likelihood
Confidence Limits
Odds
Variable Unit Ratio Lower Upper
B 1.0000 0.537 0.310 0.916
PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis 6
Covariate A = smoking, Covariate B = gender
Model Y = 1 / (1 + exp(-c0 - c1*A - c2*B)), no interaction.
21:05 Tuesday, March 9, 2004
The LOGISTIC Procedure
Model Information
Data Set WORK.HEART
Response Variable y
Number of Response Levels 2
Number of Observations 400
Link Function Logit
Optimization Technique Fisher's scoring
Response Profile
Ordered Total
Value y Frequency
1 1 67
2 0 333
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 363.520 360.085
SC 367.511 372.059
-2 Log L 361.520 354.085
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 7.4354 2 0.0243
Score 7.3506 2 0.0253
Wald 7.1836 2 0.0275
~john-c/5421/n54703.010.2.sas 09MAR04 21:05
=================================================================================
MODEL 1: Y = A B: No interaction.
PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis 7
Covariate A = smoking, Covariate B = gender
Model Y = 1 / (1 + exp(-c0 - c1*A - c2*B)), no interaction.
21:05 Tuesday, March 9, 2004
The LOGISTIC Procedure
Analysis of Maximum Likelihood Estimates
Standard
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -1.5380 0.2313 44.1968 <.0001
a 1 0.4027 0.2727 2.1808 0.1397
b 1 -0.6244 0.2762 5.1114 0.0238
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
a 1.496 0.877 2.553
b 0.536 0.312 0.920
Association of Predicted Probabilities and Observed Responses
Percent Concordant 47.8 Somers' D 0.202
Percent Discordant 27.6 Gamma 0.267
Percent Tied 24.5 Tau-a 0.056
Pairs 22311 c 0.601
Profile Likelihood Confidence Interval for Adjusted Odds Ratios
Effect Unit Estimate 95% Confidence Limits
a 1.0000 1.496 0.880 2.571
b 1.0000 0.536 0.308 0.914
~john-c/5421/n54703.010.2.sas 09MAR04 21:05
=================================================================================
MODEL 2: A B A*B: Interaction.
PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis 8
Covariate A = smoking, Covariate B = gender, AB = intxn
Model Y = 1 / (1 + exp(-d0 - d1*A - d2*B - d3*AB)), interaction.
21:05 Tuesday, March 9, 2004
The LOGISTIC Procedure
Model Information
Data Set WORK.HEART
Response Variable y
Number of Response Levels 2
Number of Observations 400
Link Function Logit
Optimization Technique Fisher's scoring
Response Profile
Ordered Total
Value y Frequency
1 1 67
2 0 333
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 363.520 362.053
SC 367.511 378.019
-2 Log L 361.520 354.053
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 7.4668 3 0.0584
Score 7.3686 3 0.0610
Wald 7.0972 3 0.0689
~john-c/5421/n54703.010.2.sas 09MAR04 21:05
=================================================================================
PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis 9
Covariate A = smoking, Covariate B = gender, AB = intxn
Model Y = 1 / (1 + exp(-d0 - d1*A - d2*B - d3*AB)), interaction.
21:05 Tuesday, March 9, 2004
The LOGISTIC Procedure
Analysis of Maximum Likelihood Estimates
Standard
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -1.5163 0.2603 33.9378 <.0001
a 1 0.3637 0.3501 1.0790 0.2989
b 1 -0.6809 0.4229 2.5919 0.1074
ab 1 0.0989 0.5587 0.0314 0.8594
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
a 1.439 0.724 2.857
b 0.506 0.221 1.160
ab 1.104 0.369 3.300
Association of Predicted Probabilities and Observed Responses
Percent Concordant 47.8 Somers' D 0.202
Percent Discordant 27.6 Gamma 0.267
Percent Tied 24.5 Tau-a 0.056
Pairs 22311 c 0.601
Profile Likelihood Confidence Interval for Adjusted Odds Ratios
Effect Unit Estimate 95% Confidence Limits
a 1.0000 1.439 0.727 2.888
b 1.0000 0.506 0.214 1.140
ab 1.0000 1.104 0.371 3.348
~john-c/5421/n54703.010.2.sas 09MAR04 21:05
=================================================================================
The main variable of interest here is smoking (A). Gender is essentially
a confounder, that is, it is another variable which also affects the risk
of heart disease. As in the PROC FREQ analysis, one wants to know whether
there is an interaction of smoking and gender. If so, the right model to
report is Model 2. If not, one should report the results of Model 1.
The Model A analysis (A is the only covariate) indicates an odds ratio for
the effect of A of exp(.3974) = 1.488.
The Model B analysis (B is the only covariate) indicates an odds ratio for
the effect of B of exp(-.621) = 0.537.
An objective of this analysis is to evaluate the effect of factor A
(smoking) versus non-smoking. In doing this, one would want to control for
a possible confounder, covariate B (gender). The proper way to test for
the effect of A is to look at Diff(-2 Log L) between model B and model A B.
This yields: Diff(-2 Log L) = 356.291 - 354.085 = 2.206. This should be
compared to a chi-square distribution with 1 degree of freedom: p = .1347.
The Model 1 analysis yields a coefficient for A, the smoking variable, of
0.4027, and the corresponding odds ratio is 1.496. The confidence interval
is (.877, 2.553), so the evidence that smoking is a risk factor in this
model is not terribly strong.
The Model 2 analysis yields the following coefficient estimates:
Intercept : -1.516
A (smoking) : 0.364
B (gender) : -0.681
AB (interxn) : 0.099
Note that adding the interaction variable AB 'weakened' the effect of smoking.
The real question is, what is the effect of the interaction term itself?
The soundest way to evaluate the interaction effect statistically is to
examine the difference in -2 Log L between Model 1 and Model 2:
Model 1 -2 Log L: 354.085
Model 2 -2 Log L: 354.053
--------------------------------
Diff(-2 Log L) : 0.032.
This should be compared to a chi-square distribution with 1 degree of
freedom. The result is far from significant: p = 0.858. Therefore one would
not reject the null hypothesis that there is no interaction. One would
report the results of Model 1.
Note that this agrees very closely with the results of the PROC FREQ
analysis: the Breslow-Day test for homogeneity of the odds ratio between
the two tables had a chi-square value of 0.031 with a p-value of 0.859.
A key fact to note here is the following: saying there is no interaction
is basically the same thing as saying the odds ratios in the two separate
tables are indistinguishable. To put it another way, a test for interaction
is equivalent to a test for homogeneity of the odds ratios.
=================================================================================
Problem 1.
Use PROC LOGISTIC to analyze the data from notes n54703.003:
Men Women
--------------------- ---------------------
Smoke No Smoke Smoke No Smoke
--------------------- ---------------------
| | | | | |
Heart Dis + | 24 | 18 | | 15 | 10 |
| | | | | |
--------------------- ---------------------
| | | | | |
Heart Dis - | 76 | 82 | | 85 | 90 |
| | | | | |
--------------------- ---------------------
100 100 100 100
OR = 1.439 OR = 1.588
Specifically,
1) Use PROC LOGISTIC to analyze the two strata separately,
including estimates and 95% confidence intervals for
the odds ratios, and tests of whether smoking status is
related to outcome. Discuss how the results are related to
PROC FREQ analyses.
2) Use PROC LOGISTIC for all the data stratified by gender.
Find the estimated combined odds ratio and 95% confidence
intervals. Perform a test of interaction of gender and
smoking status. Again discuss how this analysis is related
to a PROC FREQ analysis.
=================================================================================
Problem 2.
Use PROC LOGISTIC to analyze the following data:
Minnesota Washington Alabama
---------------------------------------------------------
| | | |
D + | 1226 | 988 | 564 |
| | | |
---------------------------------------------------------
| | | |
D - | 1358 | 1299 | 582 |
| | | |
---------------------------------------------------------
The question to be addressed here is, is there a difference between
the three States in the proportion of people in the "D +" category?
Compare your analysis and conclusions with a PROC FREQ analysis.
------------------------------------------------------------------------
PROBLEM 3.
Use PROC LOGISTIC to analyze the relationship between
the outcome variable pain, and covariates sex, age, and treatment.
Treatment: P = placebo, A = drug A, B = drug B.
Sex : F = female, M = male
Age : years
Pain : No and Yes
This is a clinical trial for the treatment of chronic pain.
The main questions of interest:
1. Is pain related to treatment?
2. Does treatment affect women differently from men?
The dataset also includes 'duration', which is the time in
months before the study began that the person first reported
pain. Your analysis should control for age and duration,
but focus on the two questions above. State your conclusions
and explain them.
The dataset is given below. Note that there are 3 cases
on each line.
=============================================================
data pain ;
input Treatment $ Sex $ Age Duration Pain $ @@;
datalines;
P F 68 1 No B M 74 16 No P F 67 30 No
P M 66 26 Yes B F 67 28 No B F 77 16 No
A F 71 12 No B F 72 50 No B F 76 9 Yes
A M 71 17 Yes A F 63 27 No A F 69 18 Yes
B F 66 12 No A M 62 42 No P F 64 1 Yes
A F 64 17 No P M 74 4 No A F 72 25 No
P M 70 1 Yes B M 66 19 No B M 59 29 No
A F 64 30 No A M 70 28 No A M 69 1 No
B F 78 1 No P M 83 1 Yes B F 69 42 No
B M 75 30 Yes P M 77 29 Yes P F 79 20 Yes
A M 70 12 No A F 69 12 No B F 65 14 No
B M 70 1 No B M 67 23 No A M 76 25 Yes
P M 78 12 Yes B M 77 1 Yes B F 69 24 No
P M 66 4 Yes P F 65 29 No P M 60 26 Yes
A M 78 15 Yes B M 75 21 Yes A F 67 11 No
P F 72 27 No P F 70 13 Yes A M 75 6 Yes
B F 65 7 No P F 68 27 Yes P M 68 11 Yes
P M 67 17 Yes B M 70 22 No A M 65 15 No
P F 67 1 Yes A M 67 10 No P F 72 11 Yes
A F 74 1 No B M 80 21 Yes A F 69 3 No
;
=================================================================================
n54703.010 Last update: March 31, 2006.