PROC LOGISTIC, II: More complicated tables.                       n54703.010

     The previous notes dealt primarily with the use of PROC LOGISTIC
to analyze one 2 x 2 table, and showed that much of what PROC LOGISTIC does in
that case can be done in PROC FREQ.

     Here we will examine what happens when PROC LOGISTIC is applied to
2 x M tables, and to data in the form of multiple 2 x 2 tables.


1.  2 X M CONTINGENCY TABLE:

     Consider the following 2 x 3 table:



                     X = 1     X = 2     X = 3
                  -------------------------------
                  |         |         |         |
        Y = 0     |    10   |    20   |    30   |  60
                  |         |         |         |
                  -------------------------------
                  |         |         |         |
        Y = 1     |    30   |    20   |    10   |  60
                  |         |         |         |
                  -------------------------------
                       40        40        40     120


     Here Y is the outcome variable, and X is a predictor or covariate.  The
question is whether there is statistical evidence for a relationship between
X and Y.  The null hypothesis is that there is not, i.e., that for each of
the three columns, the true proportion for which Y = 1 is the same.

     The covariate X here is intended as a categorical variable.  This means
that the actual values taken on by X are not important, and even that their
order is not important.  If this were an analysis of variance, X would be
a *factor*; it would be entered as a CLASS variable, and the different
columns would be represented by indicator (or dummy) variables.

     PROC LOGISTIC in SAS version 8 has a lot in common with PROC GLM.  It
provides for the use of CLASS variables, but the coding of them is somewhat
different from that for PROC GLM, as will be explained below.  Here is a
program which analyzes the table above, using both PROC FREQ and PROC LOGISTIC:
==================================================================================

options linesize = 80 ;
footnote "~john-c/5421/n54703.010.sas &sysdate &systime" ;

data x23 x23xpand ;

     input x y count ;

       do i = 1 to count ;
          output x23xpand ;
       end ;

      output x23 ;

cards ;
  1   0   10
  2   0   20
  3   0   30
  1   1   30
  2   1   20
  3   1   10
  ;

run ;

proc freq data = x23 ;
     weight count ;
     tables y * x / chisq ;
title1 'PROC FREQ analysis of a 2 x 3 contingency table' ;
run ;

proc logistic descending data = x23xpand ;
     class x ;
     model y = x / clodds = pl ;
title1 'PROC LOGISTIC analysis of a 2 x 3 contingency table' ;
title2 'Using covariate x as a CLASS variable ...' ;
run ;

================================================================================
                PROC FREQ analysis of a 2 x 3 contingency table                1
                                                    19:18 Tuesday, March 9, 2004

                               The FREQ Procedure

                                Table of y by x

                  y         x

                  Frequency|
                  Percent  |
                  Row Pct  |
                  Col Pct  |       1|       2|       3|  Total
                  ---------+--------+--------+--------+
                         0 |     10 |     20 |     30 |     60
                           |   8.33 |  16.67 |  25.00 |  50.00
                           |  16.67 |  33.33 |  50.00 |
                           |  25.00 |  50.00 |  75.00 |
                  ---------+--------+--------+--------+
                         1 |     30 |     20 |     10 |     60
                           |  25.00 |  16.67 |   8.33 |  50.00
                           |  50.00 |  33.33 |  16.67 |
                           |  75.00 |  50.00 |  25.00 |
                  ---------+--------+--------+--------+
                  Total          40       40       40      120
                              33.33    33.33    33.33   100.00


                         Statistics for Table of y by x

             Statistic                     DF       Value      Prob
             ------------------------------------------------------
             Chi-Square                     2     20.0000    <.0001
             Likelihood Ratio Chi-Square    2     20.9299    <.0001
             Mantel-Haenszel Chi-Square     1     19.8333    <.0001
             Phi Coefficient                       0.4082          
             Contingency Coefficient               0.3780          
             Cramer's V                            0.4082          

                               Sample Size = 120
 
 
                   ~john-c/5421/n54703.010.sas 09MAR04 19:18
================================================================================
               PROC LOGISTIC analysis of a 2 x 3 contingency table              2
                   Using covariate x as a CLASS variable ...
                                                    19:18 Tuesday, March 9, 2004

                             The LOGISTIC Procedure

                               Model Information

                 Data Set                      WORK.X23XPAND   
                 Response Variable             y               
                 Number of Response Levels     2               
                 Number of Observations        120             
                 Link Function                 Logit           
                 Optimization Technique        Fisher's scoring


                                Response Profile
 
                       Ordered                      Total
                         Value            y     Frequency

                             1            1            60
                             2            0            60


                            Class Level Information
 
                                               Design
                                             Variables
 
                         Class     Value      1      2

                         x         1          1      0
                                   2          0      1
                                   3         -1     -1


                            Model Convergence Status

                 Convergence criterion (GCONV=1E-8) satisfied.          


                             Model Fit Statistics
 
                                                  Intercept
                                   Intercept         and   
                    Criterion        Only        Covariates

                    AIC              168.355        151.425
                    SC               171.143        159.788
                    -2 Log L         166.355        145.425

 
                   ~john-c/5421/n54703.010.sas 09MAR04 19:18
================================================================================
               PROC LOGISTIC analysis of a 2 x 3 contingency table              3
                   Using covariate x as a CLASS variable ...
                                                    19:18 Tuesday, March 9, 2004

                             The LOGISTIC Procedure

                    Testing Global Null Hypothesis: BETA=0
 
            Test                 Chi-Square       DF     Pr > ChiSq

            Likelihood Ratio        20.9299        2         <.0001
            Score                   20.0000        2         <.0001
            Wald                    18.1042        2         0.0001


                          Type III Analysis of Effects
 
                                           Wald
                   Effect      DF    Chi-Square    Pr > ChiSq

                   x            2       18.1042        0.0001


                   Analysis of Maximum Likelihood Estimates
 
                                      Standard
     Parameter      DF    Estimate       Error    Chi-Square    Pr > ChiSq

     Intercept       1     -93E-18      0.2018        0.0000        1.0000
     x         1     1      1.0986      0.2919       14.1685        0.0002
     x         2     1    -821E-19      0.2722        0.0000        1.0000


                              Odds Ratio Estimates
                                        
                                 Point          95% Wald
                  Effect      Estimate      Confidence Limits

                  x 1 vs 3       9.000       3.271      24.763
                  x 2 vs 3       3.000       1.164       7.732


         Association of Predicted Probabilities and Observed Responses

               Percent Concordant     58.3    Somers' D    0.444
               Percent Discordant     13.9    Gamma        0.615
               Percent Tied           27.8    Tau-a        0.224
               Pairs                  3600    c            0.722


         Profile Likelihood Confidence Interval for Adjusted Odds Ratios
 
          Effect           Unit     Estimate     95% Confidence Limits

          x 1 vs 3       1.0000        9.000        3.395       26.001
          x 2 vs 3       1.0000        3.000        1.186        7.973
 
 
 
                   ~john-c/5421/n54703.010.sas 09MAR04 19:18

================================================================================

     The PROC FREQ analysis is straightforward, and indicates that there is
statistically significant relationship between X and Y.  Note that the
likelihood ratio chi-square equals  20.9299.  This is compared to a chi-square
distribution with 2 degrees of freedom.  [Why 2?]  The associated p-value
is < .0001.

     The PROC LOGISTIC analysis yields essentially the same result.  This can
be seen from the following table in the printout.  Note that the change
in -2 Log L from the Intercept Only model to the Intercept and Covariates
model is

                 166.355 - 145.425 = 20.9299.

     This should be compared to a chi-square statistic with 2 degrees of
freedom (because SAS enters 2 indicator variables into the model), and the
associated p-value is < 0.0001, just as with PROC FREQ.

     SAS goes on to compute two odds ratios: one for X = 1 versus X = 3,
and the other for X = 2 versus X = 3.  This corresponds exactly to
computing odds ratios for the following two tables:


                     X = 3     X = 1                X = 3    X = 2
                  ---------------------         ---------------------
                  |         |         |         |         |         |
        Y = 0     |    30   |    10   |         |    30   |    20   |
                  |         |         |         |         |         |
                  ---------------------         ---------------------
                  |         |         |         |         |         |
        Y = 1     |    10   |    30   |         |    10   |    20   |
                  |         |         |         |         |         |
                  ---------------------         ---------------------

                 OR = 30*30/(10*10) = 9       OR = 30*20/(20*10) = 3

     Here I have put the X = 3 column on the left because SAS treats
it as the 'default' category, i.e., the one to which the other two are
to be compared.

     SAS represents the categories in a somewhat unexpected way.  SAS
makes use of two 'indicator' variables, X1 and X2, which are defined as follows:

     If X = 1,  then  X1 = 1  and X2 = 0.

     If X = 2,  then  X1 = 0  and X2 = 1.

     If X = 3,  then  X1 = -1 and X2 = -1.


     The model that SAS uses here is the following:

         Prob(Y = 1 | X1 and X2) = 1 / (1 + exp(-b0 - b1*X1 - b2*X2)).

     The printout gives the coefficient estimates for b0, b1, and b2:

----------------------------------------------------------------------------------
                                      Standard
     Parameter      DF    Estimate       Error    Chi-Square    Pr > ChiSq

     Intercept       1     -93E-18      0.2018        0.0000        1.0000
     x         1     1      1.0986      0.2919       14.1685        0.0002
     x         2     1    -821E-19      0.2722        0.0000        1.0000
----------------------------------------------------------------------------------

     What this says essentially is: b0 = 0, b1 = 1.0986, and b2 = 0.

     To compute the odds ratio for X = 1 versus X = 3, you need to compute
two odds:

            Odds(Y = 1 | X = 1)  and Odds(Y = 1 | X = 3).

     Recall that Odds equals:   prob / (1 - prob).

     Note that Prob(Y = 1 | X = 1) = 1 / (1 + exp(-0 - 1.0986*1 - 0)) = .75.

     Therefore Odds(Y = 1 | X = 1) = .75 / .25 = 3.

     Now the more difficult part:

     Note that Prob(Y = 1 | X = 3) = 1 / (1 + exp(-0 -1.0986*(-1) - 0*(-1))

                                   = 1/(1 + exp(+1.0986)) = 1/4.

     Therefore Odds(Y = 1 | X = 3) = (1/4)/(3/4) = 1/3.

     Finally, therefore, the *odds ratio* for X = 1 versus X = 3 is:

              OR = 3/(1/3) = 9.

     This is given in the PROC LOGISTIC printout.  Note that it agrees with
the value given above based on consideration of the comparison of the X = 1
column with the X = 3 column.

     To be sure you understand this, you should go through the same process
to compute the odds ratio for X = 2 versus X = 3, using the PROC LOGISTIC
coefficients.

     PROC LOGISTIC also provides confidence intervals for both of these
odds ratio estimates.  PROC FREQ does display either the odds ratios or their
confidence limits for 2 x M tables when M > 2.

     You may not like the way SAS codes the indicator variables (I don't!).  In
this case and many others, you can easily write your own in the data step
preceding the PROC LOGISTIC.  Below is an example of how this works:

==================================================================================

options linesize = 80 ;
footnote "~john-c/5421/n54703.010.sas &sysdate &systime" ;

data x23 x23xpand ;

     input x y count ;

       x1 = 0 ; x2 = 0 ; x3 = 0 ;
       if x = 1 then x1 = 1 ;
       if x = 2 then x2 = 1 ;
       if x = 3 then x3 = 1 ;

       do i = 1 to count ;
          output x23xpand ;
       end ;

      output x23 ;

cards ;
  1   0   10
  2   0   20
  3   0   30
  1   1   30
  2   1   20
  3   1   10
  ;

run ;

proc logistic descending data = x23xpand ;
     model y = x1 x2 / clodds = pl ;
title1 'PROC LOGISTIC analysis of a 2 x 3 contingency table' ;
title2 'Using indicator variables ...' ;
run ;

endsas ;
---------------------------------------------------------------------------------

              PROC LOGISTIC analysis of a 2 x 3 contingency table              1
                         Using indicator variables ...
                                                 18:12 Wednesday, March 10, 2004

                             The LOGISTIC Procedure

     Data Set: WORK.X23XPAND
     Response Variable: Y         
     Response Levels: 2
     Number of Observations: 120
     Link Function: Logit


                                Response Profile
 
                           Ordered
                             Value       Y     Count

                                 1       1        60
                                 2       0        60

      Model Fitting Information and Testing Global Null Hypothesis BETA=0
 
                               Intercept
                 Intercept        and   
   Criterion       Only       Covariates    Chi-Square for Covariates

   AIC             168.355       151.425         .                          
   SC              171.143       159.788         .                          
   -2 LOG L        166.355       145.425       20.930 with 2 DF (p=0.0001)  
   Score              .             .          20.000 with 2 DF (p=0.0001)  


                    Analysis of Maximum Likelihood Estimates
 
               Parameter Standard    Wald       Pr >    Standardized     Odds
   Variable DF  Estimate   Error  Chi-Square Chi-Square   Estimate      Ratio

   INTERCPT 1    -1.0986   0.3651     9.0521     0.0026            .     .   
   X1       1     2.1972   0.5164    18.1042     0.0001     0.573451    9.000
   X2       1     1.0986   0.4830     5.1726     0.0229     0.286725    3.000


         Association of Predicted Probabilities and Observed Responses

                   Concordant = 58.3%          Somers' D = 0.444
                   Discordant = 13.9%          Gamma     = 0.615
                   Tied       = 27.8%          Tau-a     = 0.224
                   (3600 pairs)                c         = 0.722

 
                   ~john-c/5421/n54703.010.sas 10MAR04 18:12
----------------------------------------------------------------------------------
               PROC LOGISTIC analysis of a 2 x 3 contingency table              2
                         Using indicator variables ...
                                                 18:12 Wednesday, March 10, 2004

                             The LOGISTIC Procedure

              Conditional Odds Ratios and 95% Confidence Intervals
 
                                                 Profile Likelihood
                                                  Confidence Limits
                                        Odds
            Variable        Unit       Ratio       Lower       Upper

            X1            1.0000       9.000       3.395      26.001
            X2            1.0000       3.000       1.186       7.973

 
                   ~john-c/5421/n54703.010.sas 10MAR04 18:12


==================================================================================

     Note that indicator variables x1, x2, and x3 are defined in the data step:

     x1 = 1 if x = 1, 0 otherwise;

     x2 = 1 if x = 2, 0 otherwise;

     x3 = 1 if x = 3, 0 otherwise.

     These appear in the MODEL statement in PROC LOGISTIC as follows:

     model y = x1 x2 / clodds = pl ;

     Note that there is no CLASS statement.

     Note that indicator variable x3 is omitted from the model: this
corresponds to the fact that the third column is the reference category.

     Note that the odds ratios corresponding to x1 and x2 are computed as

        exp(x1 coeff) = exp(2.1972) = 9;    95% CI, (3.395, 26.001)

        exp(x2 coeff) = exp(1.0986) = 3;    95% CI, (1,186, 7.983)

     The interpretation of the odds ratio is the same as before: the odds
that Y = 1 for column 1 versus column 3 is exp(x1 coeff) = 9, etc.  This
method of coding variables for PROC LOGISTIC seems a little easier to use
and interpret than the CLASS variable version.



2.  MULTIPLE 2 X 2 TABLES:

     We return to an example that was used in notes n54703.003:


                          Men                      Women
                 ---------------------     ---------------------
                    Smoke    No Smoke         Smoke   No Smoke
                 ---------------------     ---------------------
                 |         |         |     |         |         |
    Heart Dis +  |    24   |    18   |     |    15   |    10   |
                 |         |         |     |         |         |
                 ---------------------     ---------------------
                 |         |         |     |         |         |
    Heart Dis -  |    76   |    82   |     |    85   |    90   |
                 |         |         |     |         |         |
                 ---------------------     ---------------------
                     100       100             100       100

                     OR = 1.439                OR = 1.588


     We will denote the outcome variable, Heart Disease, by Y, with

              Heart Dis + :   Y = 1

              Heart Dis - :   Y = 0.

     We will represent smoking status by A:

              Smoke   :  A = 1

              No Smoke:  A = 0.

     Finally we will represent Gender by B:

              Men     :  B = 0

              Women   :  B = 1.

     We will also need an interaction term, AB, defined simply as AB = A*B.
Note that AB = 0 if A = 0 or B = 0, and AB = 1 *only when* both A and B are 1.
What part of the 2 x 2 tables is represented by AB = 1 ?

     Several models are possible.  We will consider five here:

MODEL 0:  Intercept only.

MODEL A:  Variable 'A' the only covariate: Prob(Y = 1 | A) = 1 / (1 + exp(-a0 - a1*A)).

MODEL B:  Variable 'B' the only covariate: Prob(Y = 1 | B) = 1 / (1 + exp(-b0 - b1*B)).

MODEL 1:  No interaction:

      Prob(Y = 1 | A and B) = 1 / (1 + exp(-c0 - c1*A - c2*B)).

MODEL 2:  Interaction:

      Prob(Y = 1 | A and B) = 1 / (1 + exp(-d0 - d1*A - d2*B - d3*AB)).

     Below is the corresponding SAS analysis.  The results of the PROC FREQ
analysis are identical to those shown in notes n54703.003, and are excised
from the printout:

==================================================================================

options linesize = 80 ;
footnote "~john-c/5421/n54703.010.2.sas &sysdate &systime" ;

data heart ;

     input y a b count ;

     ab = a * b ;

       do i = 1 to count ;
          output ;
       end ;

 cards ;
      1   1   0   24
      1   0   0   18
      0   1   0   76
      0   0   0   82
      1   1   1   15
      1   0   1   10
      0   1   1   85
      0   0   1   90
;
run ;

 proc freq data = heart ;
      tables b * y * a / chisq cmh measures ;
 title1 'PROC FREQ analysis of two 2 x 2 tables' ;
 run ;

 proc logistic descending data = heart ;
      model y = a / clodds = pl ;
 title1 'PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis' ;
 title2 'Covariate A only: Smoking.' ;
 title3 'Model Y = 1 / (1 + exp(-a0 - a1*A)), no interaction.' ;
 run ;

 proc logistic descending data = heart ;
      model y = b / clodds = pl ;
 title1 'PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis' ;
 title2 'Covariate B only: Gender.' ;
 title3 'Model Y = 1 / (1 + exp(-b0 - b1*A)), no interaction.' ;
 run ;

 proc logistic descending data = heart ;
      model y = a b / clodds = pl ;
 title1 'PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis' ;
 title2 'Covariate A = smoking, Covariate B = gender' ;
 title3 'Model Y = 1 / (1 + exp(-c0 - c1*A - c2*B)), no interaction.' ;
 run ;

 proc logistic descending data = heart ;
      model y = a b ab / clodds = pl ;
 title1 'PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis' ;
 title2 'Covariate A = smoking, Covariate B = gender,  AB = intxn' ;
 title3 'Model Y = 1 / (1 + exp(-d0 - d1*A - d2*B - d3*AB)), interaction.' ;
 run ;

=================================================================================

MODEL A:

             PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis            1
                           Covariate A only: Smoking.
              Model Y = 1 / (1 + exp(-a0 - a1*A)), no interaction.
                                                 18:37 Wednesday, March 10, 2004

                             The LOGISTIC Procedure

     Data Set: WORK.HEART   
     Response Variable: Y         
     Response Levels: 2
     Number of Observations: 400
     Link Function: Logit


                                Response Profile
 
                           Ordered
                             Value       Y     Count

                                 1       1        67
                                 2       0       333



      Model Fitting Information and Testing Global Null Hypothesis BETA=0
 
                               Intercept
                 Intercept        and   
   Criterion       Only       Covariates    Chi-Square for Covariates

   AIC             363.520       363.342         .                          
   SC              367.511       371.325         .                          
   -2 LOG L        361.520       359.342        2.178 with 1 DF (p=0.1400)  
   Score              .             .           2.169 with 1 DF (p=0.1408)  


                    Analysis of Maximum Likelihood Estimates
 
               Parameter Standard    Wald       Pr >    Standardized     Odds
   Variable DF  Estimate   Error  Chi-Square Chi-Square   Estimate      Ratio

   INTERCPT 1    -1.8153   0.2038    79.3547     0.0001            .     .   
   A        1     0.3974   0.2709     2.1528     0.1423     0.109699    1.488


         Association of Predicted Probabilities and Observed Responses

                   Concordant = 30.1%          Somers' D = 0.099
                   Discordant = 20.2%          Gamma     = 0.196
                   Tied       = 49.7%          Tau-a     = 0.028
                   (22311 pairs)               c         = 0.549

 
 
 
 
 
 
 
                  ~john-c/5421/n54703.010.2.sas 10MAR04 18:37
---------------------------------------------------------------------------------
              PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis            2
                           Covariate A only: Smoking.
              Model Y = 1 / (1 + exp(-a0 - a1*A)), no interaction.
                                                 18:37 Wednesday, March 10, 2004

                             The LOGISTIC Procedure

              Conditional Odds Ratios and 95% Confidence Intervals
 
                                                 Profile Likelihood
                                                  Confidence Limits
                                        Odds
            Variable        Unit       Ratio       Lower       Upper

            A             1.0000       1.488       0.878       2.549

 
 
                  ~john-c/5421/n54703.010.2.sas 10MAR04 18:37
---------------------------------------------------------------------------------

MODEL B:

             PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis            3
                           Covariate B only: Gender.
              Model Y = 1 / (1 + exp(-a0 - a1*B)), no interaction.
                                                 18:37 Wednesday, March 10, 2004

                             The LOGISTIC Procedure

     Data Set: WORK.HEART   
     Response Variable: Y         
     Response Levels: 2
     Number of Observations: 400
     Link Function: Logit


                                Response Profile
 
                           Ordered
                             Value       Y     Count

                                 1       1        67
                                 2       0       333



      Model Fitting Information and Testing Global Null Hypothesis BETA=0
 
                               Intercept
                 Intercept        and   
   Criterion       Only       Covariates    Chi-Square for Covariates

   AIC             363.520       360.291         .                          
   SC              367.511       368.274         .                          
   -2 LOG L        361.520       356.291        5.229 with 1 DF (p=0.0222)  
   Score              .             .           5.181 with 1 DF (p=0.0228)  


                    Analysis of Maximum Likelihood Estimates
 
               Parameter Standard    Wald       Pr >    Standardized     Odds
   Variable DF  Estimate   Error  Chi-Square Chi-Square   Estimate      Ratio

   INTERCPT 1    -1.3249   0.1736    58.2451     0.0001            .     .   
   B        1    -0.6210   0.2754     5.0838     0.0242    -0.171398    0.537


         Association of Predicted Probabilities and Observed Responses

                   Concordant = 32.9%          Somers' D = 0.152
                   Discordant = 17.7%          Gamma     = 0.301
                   Tied       = 49.4%          Tau-a     = 0.043
                   (22311 pairs)               c         = 0.576

 
 
 
 
 
 
 
                  ~john-c/5421/n54703.010.2.sas 10MAR04 18:37
---------------------------------------------------------------------------------
              PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis            4
                           Covariate B only: Gender.
              Model Y = 1 / (1 + exp(-a0 - a1*B)), no interaction.
                                                 18:37 Wednesday, March 10, 2004

                             The LOGISTIC Procedure

              Conditional Odds Ratios and 95% Confidence Intervals
 
                                                 Profile Likelihood
                                                  Confidence Limits
                                        Odds
            Variable        Unit       Ratio       Lower       Upper

            B             1.0000       0.537       0.310       0.916

 
              PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis            6
                  Covariate A = smoking, Covariate B = gender
          Model Y = 1 / (1 + exp(-c0 - c1*A - c2*B)), no interaction.
                                                    21:05 Tuesday, March 9, 2004

                             The LOGISTIC Procedure

                               Model Information

                 Data Set                      WORK.HEART      
                 Response Variable             y               
                 Number of Response Levels     2               
                 Number of Observations        400             
                 Link Function                 Logit           
                 Optimization Technique        Fisher's scoring


                                Response Profile
 
                       Ordered                      Total
                         Value            y     Frequency

                             1            1            67
                             2            0           333


                            Model Convergence Status

                 Convergence criterion (GCONV=1E-8) satisfied.          


                             Model Fit Statistics
 
                                                  Intercept
                                   Intercept         and   
                    Criterion        Only        Covariates

                    AIC              363.520        360.085
                    SC               367.511        372.059
                    -2 Log L         361.520        354.085


                    Testing Global Null Hypothesis: BETA=0
 
            Test                 Chi-Square       DF     Pr > ChiSq

            Likelihood Ratio         7.4354        2         0.0243
            Score                    7.3506        2         0.0253
            Wald                     7.1836        2         0.0275


 
                  ~john-c/5421/n54703.010.2.sas 09MAR04 21:05
=================================================================================

MODEL 1: Y = A B: No interaction.

              PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis            7
                  Covariate A = smoking, Covariate B = gender
          Model Y = 1 / (1 + exp(-c0 - c1*A - c2*B)), no interaction.
                                                    21:05 Tuesday, March 9, 2004

                             The LOGISTIC Procedure

                   Analysis of Maximum Likelihood Estimates
 
                                     Standard
      Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

      Intercept     1     -1.5380      0.2313       44.1968        <.0001
      a             1      0.4027      0.2727        2.1808        0.1397
      b             1     -0.6244      0.2762        5.1114        0.0238


                              Odds Ratio Estimates
                                        
                                Point          95% Wald
                   Effect    Estimate      Confidence Limits

                   a            1.496       0.877       2.553
                   b            0.536       0.312       0.920


         Association of Predicted Probabilities and Observed Responses

               Percent Concordant     47.8    Somers' D    0.202
               Percent Discordant     27.6    Gamma        0.267
               Percent Tied           24.5    Tau-a        0.056
               Pairs                 22311    c            0.601


         Profile Likelihood Confidence Interval for Adjusted Odds Ratios
 
           Effect         Unit     Estimate     95% Confidence Limits

           a            1.0000        1.496        0.880        2.571
           b            1.0000        0.536        0.308        0.914
 
 
 
                  ~john-c/5421/n54703.010.2.sas 09MAR04 21:05
=================================================================================

MODEL 2: A B A*B: Interaction.


              PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis            8
            Covariate A = smoking, Covariate B = gender,  AB = intxn
        Model Y = 1 / (1 + exp(-d0 - d1*A - d2*B - d3*AB)), interaction.
                                                    21:05 Tuesday, March 9, 2004

                             The LOGISTIC Procedure

                               Model Information

                 Data Set                      WORK.HEART      
                 Response Variable             y               
                 Number of Response Levels     2               
                 Number of Observations        400             
                 Link Function                 Logit           
                 Optimization Technique        Fisher's scoring


                                Response Profile
 
                       Ordered                      Total
                         Value            y     Frequency

                             1            1            67
                             2            0           333


                            Model Convergence Status

                 Convergence criterion (GCONV=1E-8) satisfied.          


                             Model Fit Statistics
 
                                                  Intercept
                                   Intercept         and   
                    Criterion        Only        Covariates

                    AIC              363.520        362.053
                    SC               367.511        378.019
                    -2 Log L         361.520        354.053


                    Testing Global Null Hypothesis: BETA=0
 
            Test                 Chi-Square       DF     Pr > ChiSq

            Likelihood Ratio         7.4668        3         0.0584
            Score                    7.3686        3         0.0610
            Wald                     7.0972        3         0.0689

 
                  ~john-c/5421/n54703.010.2.sas 09MAR04 21:05
=================================================================================
              PROC LOGISTIC: two 2 x 2 tables: Outcome Y = heart dis            9
            Covariate A = smoking, Covariate B = gender,  AB = intxn
        Model Y = 1 / (1 + exp(-d0 - d1*A - d2*B - d3*AB)), interaction.
                                                    21:05 Tuesday, March 9, 2004

                             The LOGISTIC Procedure

                   Analysis of Maximum Likelihood Estimates
 
                                     Standard
      Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

      Intercept     1     -1.5163      0.2603       33.9378        <.0001
      a             1      0.3637      0.3501        1.0790        0.2989
      b             1     -0.6809      0.4229        2.5919        0.1074
      ab            1      0.0989      0.5587        0.0314        0.8594


                              Odds Ratio Estimates
                                        
                                Point          95% Wald
                   Effect    Estimate      Confidence Limits

                   a            1.439       0.724       2.857
                   b            0.506       0.221       1.160
                   ab           1.104       0.369       3.300


         Association of Predicted Probabilities and Observed Responses

               Percent Concordant     47.8    Somers' D    0.202
               Percent Discordant     27.6    Gamma        0.267
               Percent Tied           24.5    Tau-a        0.056
               Pairs                 22311    c            0.601


         Profile Likelihood Confidence Interval for Adjusted Odds Ratios
 
           Effect         Unit     Estimate     95% Confidence Limits

           a            1.0000        1.439        0.727        2.888
           b            1.0000        0.506        0.214        1.140
           ab           1.0000        1.104        0.371        3.348
 
                  ~john-c/5421/n54703.010.2.sas 09MAR04 21:05
=================================================================================

     The main variable of interest here is smoking (A).  Gender is essentially
a confounder, that is, it is another variable which also affects the risk
of heart disease.  As in the PROC FREQ analysis, one wants to know whether
there is an interaction of smoking and gender.  If so, the right model to
report is Model 2.  If not, one should report the results of Model 1.

     The Model A analysis (A is the only covariate) indicates an odds ratio for
the effect of A of exp(.3974) = 1.488.

     The Model B analysis (B is the only covariate) indicates an odds ratio for
the effect of B of exp(-.621) = 0.537.

     An objective of this analysis is to evaluate the effect of factor A
(smoking) versus non-smoking.  In doing this, one would want to control for
a possible confounder, covariate B (gender).  The proper way to test for
the effect of A is to look at  Diff(-2 Log L) between model B and model A B.
This yields:  Diff(-2 Log L) =  356.291 - 354.085 = 2.206.  This should be
compared to a chi-square distribution with 1 degree of freedom:  p = .1347.

     The Model 1 analysis yields a coefficient for A, the smoking variable, of
0.4027, and the corresponding odds ratio is 1.496.  The confidence interval
is (.877, 2.553), so the evidence that smoking is a risk factor in this
model is not terribly strong.

     The Model 2 analysis yields the following coefficient estimates:

               Intercept     :  -1.516
               A (smoking)   :   0.364
               B (gender)    :  -0.681
               AB (interxn)  :   0.099

     Note that adding the interaction variable AB 'weakened' the effect of smoking.
The real question is, what is the effect of the interaction term itself?

     The soundest way to evaluate the interaction effect statistically is to
examine the difference in -2 Log L between Model 1 and Model 2:

         Model 1   -2 Log L:    354.085

         Model 2   -2 Log L:    354.053
        --------------------------------
         Diff(-2 Log L)    :      0.032.

     This should be compared to a chi-square distribution with 1 degree of
freedom.  The result is far from significant: p = 0.858.  Therefore one would
not reject the null hypothesis that there is no interaction.  One would
report the results of Model 1.

     Note that this agrees very closely with the results of the PROC FREQ
analysis: the Breslow-Day test for homogeneity of the odds ratio between
the two tables had a chi-square value of 0.031 with a p-value of 0.859.

     A key fact to note here is the following: saying there is no interaction
is basically the same thing as saying the odds ratios in the two separate
tables are indistinguishable.  To put it another way, a test for interaction
is equivalent to a test for homogeneity of the odds ratios.

=================================================================================

Problem 1.

   Use PROC LOGISTIC to analyze the data from notes n54703.003:


                          Men                      Women
                 ---------------------     ---------------------
                    Smoke    No Smoke         Smoke   No Smoke
                 ---------------------     ---------------------
                 |         |         |     |         |         |
    Heart Dis +  |    24   |    18   |     |    15   |    10   |
                 |         |         |     |         |         |
                 ---------------------     ---------------------
                 |         |         |     |         |         |
    Heart Dis -  |    76   |    82   |     |    85   |    90   |
                 |         |         |     |         |         |
                 ---------------------     ---------------------
                     100       100             100       100

                     OR = 1.439                OR = 1.588



  Specifically, 

       1)  Use PROC LOGISTIC to analyze the two strata separately,
           including estimates and 95% confidence intervals for 
           the odds ratios, and tests of whether smoking status is
           related to outcome.  Discuss how the results are related to 
           PROC FREQ analyses.
 
       2)  Use PROC LOGISTIC for all the data stratified by gender.  
           Find the estimated combined odds ratio and 95% confidence
           intervals.  Perform a test of interaction of gender and 
           smoking status.  Again discuss how this analysis is related
           to a PROC FREQ analysis.

=================================================================================

Problem 2.

     Use PROC LOGISTIC to analyze the following data:


                   Minnesota          Washington          Alabama
              ---------------------------------------------------------
              |                  |                  |                 |
       D +    |       1226       |       988        |       564       |
              |                  |                  |                 |
              ---------------------------------------------------------
              |                  |                  |                 |
       D -    |       1358       |      1299        |       582       |
              |                  |                  |                 |
              ---------------------------------------------------------

     The question to be addressed here is, is there a difference between
the three States in the proportion of people in the "D +" category?

     Compare your analysis and conclusions with a PROC FREQ analysis.

------------------------------------------------------------------------

PROBLEM 3.

Use PROC LOGISTIC to analyze the relationship between
the outcome variable pain, and covariates sex, age, and treatment.  

Treatment:  P = placebo, A = drug A, B = drug B.

Sex      :  F = female, M = male

Age      :  years

Pain     :  No and Yes

This is a clinical trial for the treatment of chronic pain.

The main questions of interest:

    1.  Is pain related to treatment?

    2.  Does treatment affect women differently from men?

The dataset also includes 'duration', which is the time in
months before the study began that the person first reported
pain.  Your analysis should control for age and duration, 
but focus on the two questions above.  State your conclusions
and explain them.

The dataset is given below.  Note that there are 3 cases 
on each line.

=============================================================

data pain ;
      input Treatment $ Sex $ Age Duration Pain $ @@;
      datalines;
   P  F  68   1  No   B  M  74  16  No  P  F  67  30  No
   P  M  66  26  Yes  B  F  67  28  No  B  F  77  16  No
   A  F  71  12  No   B  F  72  50  No  B  F  76   9  Yes
   A  M  71  17  Yes  A  F  63  27  No  A  F  69  18  Yes
   B  F  66  12  No   A  M  62  42  No  P  F  64   1  Yes
   A  F  64  17  No   P  M  74   4  No  A  F  72  25  No
   P  M  70   1  Yes  B  M  66  19  No  B  M  59  29  No
   A  F  64  30  No   A  M  70  28  No  A  M  69   1  No
   B  F  78   1  No   P  M  83   1  Yes B  F  69  42  No
   B  M  75  30  Yes  P  M  77  29  Yes P  F  79  20  Yes
   A  M  70  12  No   A  F  69  12  No  B  F  65  14  No
   B  M  70   1  No   B  M  67  23  No  A  M  76  25  Yes
   P  M  78  12  Yes  B  M  77   1  Yes B  F  69  24  No
   P  M  66   4  Yes  P  F  65  29  No  P  M  60  26  Yes
   A  M  78  15  Yes  B  M  75  21  Yes A  F  67  11  No
   P  F  72  27  No   P  F  70  13  Yes A  M  75   6  Yes
   B  F  65   7  No   P  F  68  27  Yes P  M  68  11  Yes
   P  M  67  17  Yes  B  M  70  22  No  A  M  65  15  No
   P  F  67   1  Yes  A  M  67  10  No  P  F  72  11  Yes
   A  F  74   1  No   B  M  80  21  Yes A  F  69   3  No
   ;


=================================================================================
n54703.010  Last update: March 31, 2006.