PROC GLM, II:  Two-way Analysis of Variance and Interaction         n54703.008

     One-way analysis of variance is a fairly straightforward generalization
of the t-test.

     Two-way analysis of variance adds several complications.  First, there
is the idea that experiments can be designed to estimate and test for the effects
of two or more factors simultaneously.  This is essentially R. A. Fisher's
invention, and it happened in part because he worked at Rothamsted Agricultural
Station.  The agricultural scientists with whom he worked wanted to study
factors which affect the yields of crops (wheat, etc.).  Because it takes
essentially a whole year to collect data on crop yields, it would be good to
be able to study several factors at the same time.  Fisher realized that this
could be done with some reasonable assumptions.

     But it gets complicated rather quickly.  The second complication is that
when you have two factors that can affect a quantitative outcome like crop
yield - say, one factor is fertilizer and the other is amount of irrigation -
one possibility is that when both are used, their effects simply add together.
Another possibility is that they are synergistic - that is, the effect of
fertilizer is considerably accelerated by the presence of irrigation.  Another
possibility is that they interact negatively (antagonistically) - that is,
fertilizer alone increases crop yields, and irrigation alone increases crop
yields, but the two together actually decrease crop yields for some reason -
perhaps by causing the plant to expend its energies into growing leaves and
stalks rather than in growing more and larger seeds.

     Thus in two-way analysis of variance, one wants to estimate the effects
of each factor and test for whether they are significant, but also test
for the presence of interactions (which be either synergistic or antagonistic)
and to estimate the magnitude of the interaction in some way.

     Here is an example from the Lung Health Study.  The outcome or dependent
variable is percent weight gain from baseline to Year 5.  The two factors of
interest are (1) smoking status at Year 5, and (2) gender.  Thus the questions
of interest are:

     1)  Is Year 5 smoking status related to percent weight gain ?

     2)  Does gender affect percent weight gain ?

     3)  Within smoking status categories, are their differences between the
         genders in weight gain?

     Year 5 smoking status is called 'IVPCSUS5' and it has three categories:

         IVPCSUS5 = 1:   sustained quitter

         IVPCSUS5 = 2:   intermittent quitter

         IVPCSUS5 = 3:   continuing smoker

     Gender, as before, has two categories:

         GENDER = 0:  male

         GENDER = 1:  female


* ==================================================================== ;        

proc means data = smoke ;
     class gender ivpcsus5 ;
     var   wgtchg05 pwtchg05 ;
title1 'PROC MEANS: Weight change, baseline to Year 5' ;
title2 'versus gender and Year 5 smoking status' ;
format gender gender. ivpcsus5 ivpcsus. ;
run ;

proc glm data = smoke ;
     class gender ivpcsus5 ;
     model pwtchg05 = gender ivpcsus5 / solution ;
title1 'Lung Health Study:  % weight change, Baseline to Year 5' ;
title2 'versus gender and Year 5 smoking status' ;
format gender gender. ivpcsus5 ivpcsus. ;
run ;

proc glm data = smoke ;
     class gender ivpcsus5 ;
     model pwtchg05 = gender | ivpcsus5 / solution ;
title1 'Lung Health Study:  % weight change, Baseline to Year 5' ;
title2 'versus gender and Year 5 smoking status, and interaction' ;
title3 'of gender and smoking status' ;
means  gender | ivpcsus5 / bon ;
format gender gender. ivpcsus5 ivpcsus. ;
run ;

endsas ;

========================================================================================================================
              PROC MEANS: Weight change, baseline to Year 5
               versus gender and Year 5 smoking status                                                                               1
                                                                                            19:21 Tuesday, March 2, 2004

    GENDER        IVPCSUS5  N Obs  Variable  Label                                    N          Mean       Std Dev
    ---------------------------------------------------------------------------------------------------------------
    MEN     SUSTAINED QUIT    616  WGTCHG05  Weight change, Year 5 - Baseline       616     7.5814935     6.9734536
                                   PWTCHG05  Pct weight change, Year 5 - Baseline   616     9.4168640     8.2872195

            INTERMITT QUIT    963  WGTCHG05  Weight change, Year 5 - Baseline       935     5.1493048     6.9321518
                                   PWTCHG05  Pct weight change, Year 5 - Baseline   935     6.3261682     8.3515810

            CONTIN SMOKER    2022  WGTCHG05  Weight change, Year 5 - Baseline      1870     1.2616043     5.3922831
                                   PWTCHG05  Pct weight change, Year 5 - Baseline  1870     1.5892326     6.5480143

    WOMEN   SUSTAINED QUIT    321  WGTCHG05  Weight change, Year 5 - Baseline       319     8.6595611     6.8005707
                                   PWTCHG05  Pct weight change, Year 5 - Baseline   319    13.4672099    10.2055760

            INTERMITT QUIT    622  WGTCHG05  Weight change, Year 5 - Baseline       612     5.7668301     6.8957798
                                   PWTCHG05  Pct weight change, Year 5 - Baseline   612     8.8775309    10.0542127

            CONTIN SMOKER    1194  WGTCHG05  Weight change, Year 5 - Baseline      1117     1.9316920     5.5939001
                                   PWTCHG05  Pct weight change, Year 5 - Baseline  1117     3.0335047     8.5554308
    ---------------------------------------------------------------------------------------------------------------

       GENDER        IVPCSUS5  N Obs  Variable  Label                                      Minimum       Maximum
       ---------------------------------------------------------------------------------------------------------
       MEN     SUSTAINED QUIT    616  WGTCHG05  Weight change, Year 5 - Baseline       -30.9000000    30.4000000
                                      PWTCHG05  Pct weight change, Year 5 - Baseline   -26.6609146    42.0168067

               INTERMITT QUIT    963  WGTCHG05  Weight change, Year 5 - Baseline       -19.8000000    51.3000000
                                      PWTCHG05  Pct weight change, Year 5 - Baseline   -23.1971154    59.1439689

               CONTIN SMOKER    2022  WGTCHG05  Weight change, Year 5 - Baseline       -42.7000000    32.4000000
                                      PWTCHG05  Pct weight change, Year 5 - Baseline   -38.5031560    62.7906977

       WOMEN   SUSTAINED QUIT    321  WGTCHG05  Weight change, Year 5 - Baseline       -12.2000000    39.5000000
                                      PWTCHG05  Pct weight change, Year 5 - Baseline   -16.4864865    57.1635311

               INTERMITT QUIT    622  WGTCHG05  Weight change, Year 5 - Baseline       -17.5000000    29.4000000
                                      PWTCHG05  Pct weight change, Year 5 - Baseline   -21.8750000    45.4400000

               CONTIN SMOKER    1194  WGTCHG05  Weight change, Year 5 - Baseline       -24.5000000    39.7000000
                                      PWTCHG05  Pct weight change, Year 5 - Baseline   -30.7692308    50.7024266
       ---------------------------------------------------------------------------------------------------------
 
 
                                  LUNG HEALTH STUDY :  WBJEC5.SAS (JEC) 02MAR04 19:21
--------------------------------------------------------------------------------------------------------------------------
                                 Lung Health Study:  % weight change, Baseline to Year 5                                2
                                        versus gender and Year 5 smoking status             19:21 Tuesday, March 2, 2004

                                            General Linear Models Procedure
                                                Class Level Information

                             Class    Levels    Values

                             GENDER        2    MEN WOMEN

                             IVPCSUS5      3    CONTIN SMOKER INTERMITT QUIT SUSTAINED QUIT


                                       Number of observations in data set = 5887


          NOTE: Due to missing values, only 5469 observations can be used in this analysis.

 
                                  LUNG HEALTH STUDY :  WBJEC5.SAS (JEC) 02MAR04 19:21
--------------------------------------------------------------------------------------------------------------------------
                                 Lung Health Study:  % weight change, Baseline to Year 5                                3
                                        versus gender and Year 5 smoking status             19:21 Tuesday, March 2, 2004

                                            General Linear Models Procedure

   Dependent Variable: PWTCHG05   Pct weight change, Year 5 - Baseline

   Source                  DF             Sum of Squares               Mean Square          F Value            Pr > F

   Model                    3             70604.24233908            23534.74744636           352.13            0.0001

   Error                 5465            365254.38689887               66.83520346

   Corrected Total       5468            435858.62923795

                     R-Square                       C.V.                  Root MSE                      PWTCHG05 Mean

                     0.161989                   160.7998                8.17528002                         5.08413493


   Source                  DF                  Type I SS               Mean Square          F Value            Pr > F

   GENDER                   1              5712.48856447             5712.48856447            85.47            0.0001
   IVPCSUS5                 2             64891.75377461            32445.87688730           485.46            0.0001

   Source                  DF                Type III SS               Mean Square          F Value            Pr > F

   GENDER                   1              6149.73514550             6149.73514550            92.01            0.0001
   IVPCSUS5                 2             64891.75377461            32445.87688730           485.46            0.0001


                                                             T for H0:             Pr > |T|            Std Error of
   Parameter                           Estimate             Parameter=0                                  Estimate

   INTERCEPT                        12.24319792 B                 39.90              0.0001              0.30684994
   GENDER    MEN                    -2.19247064 B                 -9.59              0.0001              0.22856421
             WOMEN                   0.00000000 B                   .                 .                   .        
   IVPCSUS5  CONTIN SMOKER          -8.74128640 B                -28.52              0.0001              0.30645226
             INTERMITT QUIT         -3.58257980 B                -10.57              0.0001              0.33887958
             SUSTAINED QUIT          0.00000000 B                   .                 .                   .        

NOTE: The X'X matrix has been found to be singular and a generalized inverse was used to solve the normal equations.   
      Estimates followed by the letter 'B' are biased, and are not unique estimators of the parameters.


 
                                  LUNG HEALTH STUDY :  WBJEC5.SAS (JEC) 02MAR04 19:21
--------------------------------------------------------------------------------------------------------------------------
                                 Lung Health Study:  % weight change, Baseline to Year 5                                4
                                versus gender and Year 5 smoking status, and interaction    19:21 Tuesday, March 2, 2004
                                              of gender and smoking status

                                            General Linear Models Procedure
                                                Class Level Information

                             Class    Levels    Values

                             GENDER        2    MEN WOMEN

                             IVPCSUS5      3    CONTIN SMOKER INTERMITT QUIT SUSTAINED QUIT


                                       Number of observations in data set = 5887


          NOTE: Due to missing values, only 5469 observations can be used in this analysis.

 
 
                                  LUNG HEALTH STUDY :  WBJEC5.SAS (JEC) 02MAR04 19:21
--------------------------------------------------------------------------------------------------------------------------
                                 Lung Health Study:  % weight change, Baseline to Year 5                                5
                                versus gender and Year 5 smoking status, and interaction    19:21 Tuesday, March 2, 2004
                                              of gender and smoking status

                                            General Linear Models Procedure

   Dependent Variable: PWTCHG05   Pct weight change, Year 5 - Baseline

   Source                  DF             Sum of Squares               Mean Square          F Value            Pr > F

   Model                    5             71768.77643291            14353.75528658           215.37            0.0001

   Error                 5463            364089.85280504               66.64650427

   Corrected Total       5468            435858.62923795

                     R-Square                       C.V.                  Root MSE                      PWTCHG05 Mean

                     0.164661                   160.5727                8.16373103                         5.08413493


   Source                  DF                  Type I SS               Mean Square          F Value            Pr > F

   GENDER                   1              5712.48856447             5712.48856447            85.71            0.0001
   IVPCSUS5                 2             64891.75377461            32445.87688730           486.84            0.0001
   GENDER*IVPCSUS5          2              1164.53409383              582.26704692             8.74            0.0002

   Source                  DF                Type III SS               Mean Square          F Value            Pr > F

   GENDER                   1              7280.70647178             7280.70647178           109.24            0.0001
   IVPCSUS5                 2             64528.56647825            32264.28323912           484.11            0.0001
   GENDER*IVPCSUS5          2              1164.53409383              582.26704692             8.74            0.0002


                                                             T for H0:             Pr > |T|            Std Error of
   Parameter                           Estimate             Parameter=0                                  Estimate

   INTERCEPT                        13.46720987 B                 29.46              0.0001              0.45708119
   GENDER          MEN              -4.05034591 B                 -7.19              0.0001              0.56313017
                   WOMEN             0.00000000 B                   .                 .                   .        
   IVPCSUS5        CONTIN SMOKER   -10.43370520 B                -20.13              0.0001              0.51825557
                   INTERMITT QUIT   -4.58967892 B                 -8.14              0.0001              0.56375769
                   SUSTAINED QUIT    0.00000000 B                   .                 .                   .        
   GENDER*IVPCSUS5 MEN CONTIN SMOKER
                                     2.60607382 B                  4.06              0.0001              0.64220017
                   MEN INTERMITT QUIT
                                     1.49898314 B                  2.13              0.0336              0.70519131
                   MEN SUSTAINED QUIT
                                     0.00000000 B                   .                 .                   .        
                   WOMEN CONTIN SMOKER
                                     0.00000000 B                   .                 .                   .        
                   WOMEN INTERMITT QUIT
                                     0.00000000 B                   .                 .                   .        
                   WOMEN SUSTAINED QUIT
                                     0.00000000 B                   .                 .                   .        

NOTE: The X'X matrix has been found to be singular and a generalized inverse was used to solve the normal equations.   
      Estimates followed by the letter 'B' are biased, and are not unique estimators of the parameters.
 
                                  LUNG HEALTH STUDY :  WBJEC5.SAS (JEC) 02MAR04 19:21
--------------------------------------------------------------------------------------------------------------------------
                                 Lung Health Study:  % weight change, Baseline to Year 5                                6
                                versus gender and Year 5 smoking status, and interaction    19:21 Tuesday, March 2, 2004
                                              of gender and smoking status

                                            General Linear Models Procedure

                                    Bonferroni (Dunn) T tests for variable: PWTCHG05

                              NOTE: This test controls the type I experimentwise error rate, but generally has a higher 
                                    type II error rate than REGWQ.

                                          Alpha= 0.05  df= 5463  MSE= 66.6465
                                               Critical Value of T= 1.96
                                         Minimum Significant Difference= 0.4471
                                           WARNING: Cell sizes are not equal.
                                         Harmonic Mean of cell sizes= 2562.153

                              Means with the same letter are not significantly different.

                                     Bon Grouping              Mean      N  GENDER
                                                        
                                                A            6.4050   2048  WOMEN
                                                        
                                                B            4.2934   3421  MEN

                                  LUNG HEALTH STUDY :  WBJEC5.SAS (JEC) 02MAR04 19:21
                                 Lung Health Study:  % weight change, Baseline to Year 5                                7
                                versus gender and Year 5 smoking status, and interaction    19:21 Tuesday, March 2, 2004
                                              of gender and smoking status

                                            General Linear Models Procedure

                                    Bonferroni (Dunn) T tests for variable: PWTCHG05

                              NOTE: This test controls the type I experimentwise error rate but generally has a higher 
                                    type II error rate than Tukey's for all pairwise comparisons.

                                 Alpha= 0.05  Confidence= 0.95  df= 5463  MSE= 66.6465
                                              Critical Value of T= 2.39472

                           Comparisons significant at the 0.05 level are indicated by '***'.

                                                          Simultaneous            Simultaneous
                                                              Lower    Difference     Upper
                                   IVPCSUS5                Confidence    Between   Confidence
                                  Comparison                  Limit       Means       Limit

                       SUSTAINED QUIT - INTERMITT QUIT       2.6534      3.4632      4.2731   ***
                       SUSTAINED QUIT - CONTIN SMOKER        7.9368      8.6694      9.4020   ***

                       INTERMITT QUIT - SUSTAINED QUIT      -4.2731     -3.4632     -2.6534   ***
                       INTERMITT QUIT - CONTIN SMOKER        4.5938      5.2062      5.8186   ***

                       CONTIN SMOKER  - SUSTAINED QUIT      -9.4020     -8.6694     -7.9368   ***
                       CONTIN SMOKER  - INTERMITT QUIT      -5.8186     -5.2062     -4.5938   ***



                            Level of   Level of                -----------PWTCHG05----------
                            GENDER     IVPCSUS5            N       Mean              SD

                            MEN        CONTIN SMOKER    1870      1.5892326        6.5480143
                            MEN        INTERMITT QUIT    935      6.3261682        8.3515810
                            MEN        SUSTAINED QUIT    616      9.4168640        8.2872195
                            WOMEN      CONTIN SMOKER    1117      3.0335047        8.5554308
                            WOMEN      INTERMITT QUIT    612      8.8775309       10.0542127
                            WOMEN      SUSTAINED QUIT    319     13.4672099       10.2055760
 
 
                                  LUNG HEALTH STUDY :  WBJEC5.SAS (JEC) 02MAR04 19:21


========================================================================================================================

     There are two analyses shown here.  The first is a two-way analysis of
variance with smoking status and gender as factors, and no interaction.  The
second is a two-way analysis of variance with smoking status, gender and their
interaction included.

     To really understand what is going on here, you need to know what the
underlying model actually is.  This is not always made clear in textbooks dealing
with the analysis of variance.  For Model I (no interaction) the model is:


      PWTCHG05 = a0 + a1*Male + a2*Cont + a3*Intermitt + error,

where Male is an indicator variable (or dummy variable) representing male
gender: that is, Male = 1 represents male gender, Male = 0 represents female.
Also, Cont is an indicator variable representing Continuing Smokers:
Cont = 1 means that the person is a Continuing Smoker, while Cont = 0 means
the person is NOT a continuing smoker.  Finally, Intermitt is an indicator
variable for Intermittent Quitters.

      Note that female (Women) is not represented explicitly in the model.

      Note also that Sustained Quitters are not represented explicitly.

      Both of these categories are, however, represented: if Male = 0, then
the person must be female.  If Cont = 0 and Intermitt = 0, the person must
be a sustained quitter.  What this means is that the value of the coefficient
a0 represents the mean percent weight change for *female sustained quitters*.

      If you want to find the predicted percent weight change for *male intermittent
quitters*, it would be  a0 + a1 + a3:

      a0           = 12.24

      a1           = -2.19

      a3           = -3.58
    -----------------------
      a0 + a1 + a3 =  6.47

     Note that the *observed* mean % weight gain in male intermittent quitters is 6.33 (from
the proc means printout).

     Also note that the 'Model' sum of squares for this model is  3.  This
corresponds to the three non-intercept coefficients a1, a2, and a3.  The
degrees of freedom of the corrected total sum of squares is 5469 - 1,
and the degrees of freedom for the error sum of squares is 5469 - 1 - 3 = 5465.

     There is no test for interaction with this model.  There are however
tests for both of the 'main effects': gender and smoking status.  These are
shown in the printout in the 'Type I SS' and 'Type III SS' sections.  For
a no-interaction model like this one, the Type I SS and the Type III SS agree.
Both indicate highly significant effects of both factors on percent weight
change.

     It may be useful to put the observed and predicted percent weight gains in a
2 x 3 table, as follows:

                                Year 5 Smoking Status
                   ----------------------------------------------
                       Sust Quit    Intermitt Q     Contin Smk
                   ----------------------------------------------
                   |              |              |              |
                   | obs    9.42  | obs    6.33  | obs    1.59  |
        Men        |              |              |              |
                   | pred  10.04  | pred   6.46  | pred   1.31  |
                   |              |              |              |
                   ----------------------------------------------
                   |              |              |              |
                   | obs   13.47  | obs    8.88  | obs    3.03  |
        Women      |              |              |              |
                   | pred  12.23  | pred   8.65  | pred   3.50  |
                   |              |              |              |
                   ----------------------------------------------


     Notice that in this table, there is not perfect agreement between
the observed and predicted (from the Model) mean percent weight changes.
The predicteds tend to be higher than the observeds for men, but lower
than the observeds for women.  This is an indication that there is room
for improvement with this model; in fact it suggests that there is an
interaction of gender with smoking status.

     This is the motivation for the second model, which does include
interaction terms:

--------------------------------------------------------------------------------------------------------------------------
proc glm data = smoke ;
     class gender ivpcsus5 ;
     model pwtchg05 = gender | ivpcsus5 / solution ;
title1 'Lung Health Study:  % weight change, Baseline to Year 5' ;
title2 'versus gender and Year 5 smoking status, and interaction' ;
title3 'of gender and smoking status' ;
means  gender | ivpcsus5 / bon ;
format gender gender. ivpcsus5 ivpcsus. ;
run ;
--------------------------------------------------------------------------------------------------------------------------

     Note that the 'model' statement represents the model in the form:
gender | ivpcsus5.  This is SAS shorthand for saying:

         gender ivpcsus5   gender*ivpcsus5,

where gender and ivpcsus5 represent *main effects* and gender*ivpcsus5
represents the interaction terms.

     Again it is a good idea to know what the correct, full expression for this
model actually is.  It is:

     PWTCHG05 = a0 + a1*Male + a2*Cont + a3*Intermitt
                   + a4*Male*Cont + a5*Male*Intermitt + error.

     Notice that there are 6 coefficients in this model: the intercept term
plus 5 others.  Not coincidentally, this equals the number of cells in the
table above.

     We can compute the predicted means from the coefficients for this
model also.  For example, for Male Intermittent Quitters, the predicted
mean would be

       a0 + a1 + a3 + a5 = 13.47 - 4.05 - 4.58 + 1.50 = 6.34.

which is, also not coincidentally, very close (to within roundoff) of the observed
mean value for this category, 6.33.

     Computing the observed and predicted values for the other cells in the
table, we have:


                                Year 5 Smoking Status
                   ----------------------------------------------
                       Sust Quit    Intermitt Q     Contin Smk
                   ----------------------------------------------
                   |              |              |              |
                   | obs    9.42  | obs    6.33  | obs    1.59  |
        Men        |              |              |              |
                   | pred   9.42  | pred   6.34  | pred   1.59  |
                   |              |              |              |
                   ----------------------------------------------
                   |              |              |              |
                   | obs   13.47  | obs    8.88  | obs    3.03  |
        Women      |              |              |              |
                   | pred  13.47  | pred   8.88  | pred   3.04  |
                   |              |              |              |
                   ----------------------------------------------

     This shows that there is a substantial improvement in the agreement
of observed and predicted when the interaction terms is included.

     This is sometimes described as a *saturated* model: the predicted
means are equal to the observed means.  Given only these two categorical
predictors, GENDER and IVPCSUS5, there are no more terms to be added to
the model.

     Note that in the ANOVA table for this model, the degrees of freedom
for Model SS is 5: i.e., the number of coefficients in the model minus 1.

     Note also that for this model, the Type I SS and the Type III SS do not agree.
Note that sum of all the Type I squares equals the overall regression sum of
squares, but the same is not true for the Type III SS.

     So what are these Type I and Type II Sums of Squares ?

     The Type I sum of squares is essentially hierarchical.  The first Type I
sum of squares is for Gender.  That means what the effect of Gender is if Gender
is the only term in the model.  The second Type I SS is for IVPCSUS5.  That
represents the *additional* effect that IVPCSUS5 has *after* accounting for the
effect of gender.  Finally, the third Type I SS is GENDER*IVPCSUS5.  This
represents the additional effect of the interaction term AFTER both of the main effects,
GENDER and IVPCSUS5, are in the model.  In the example given above, this sum of
squares equals 1164.53.

     The Type III sum of squares is NOT hierarchical.  Each of the Type III
sum of squares represents the effect of that term after *all* of the other
terms, whether they be main effects or interactions, are in the model.  Note
that the Type III sums of squares for GENDER and IVPCSUS5 are not the same
as the corresponding Type I SS.  However, as may be expected, the Type III
interaction SS   *is*   the same as the Type I interaction SS.

     Some statisticians like Type III sums of squares.  Others prefer
Type I sums of squares.

     Those who do NOT like Type III sums of squares say: it makes no sense
to estimate a main effect (like gender, e.g.) after you have already entered
the interaction term in the model.

     Those who do NOT like Type I sums of squares say: since they are
hierarchical, it matters what order the terms are entered into the regression.
That is correct.  If you reverse GENDER and IVPCSUS4, you will get different
Type I sums of squares for each, although the overall sum stays the same.

     The Der-Everitt textbook expresses a clear, strong preference for Type I
sum of squares.  I am in agreement.  However, if you want to rely on Type I
sums of squares, it is a good idea to try both orders for entering the variables.
This means that it is not so easy to describe the results.  You have to report
and describe one or the other.  If there is a 'natural' order to these
variables, you should use that.  For example, suppose you are mainly interested
in the effect of smoking status and you regard gender as a nuisance variable
for which you want to 'adjust'.  In that case it would make sense to determine
the effect of smoking status AFTER you see the effect of gender.  Thus the
model above,

           model pwtchg05 = gender ivpcsus5 gender*ivpcsus5

makes sense, because the Type I SS will list the effect of GENDER first,
and then will show the effect of IVPCSUS5 after gender is in the model.

     Finally, how do you report results, when you observe, as in this
analysis, both significant main effects and a significant interaction?

     Most texts in this situation recommend saying that there is an
interaction effect, but not claiming that there are main effects.  This
is somewhat unsatisfactory.  The only thing that tells the whole story,
in a sense, are the *observed means*.

     When there is interaction, what can you report, without making some
statistical purist angry?

     Say the two factors are A and B, and A has two levels: A = 1 and A = 2, and B has
two levels also, B = 1 and B = 2.  Then you can consider whether, when you
restrict to people assigned to B1, whether there is a difference between
A1 and A2.  You could do this as follows:

     proc glm data = dataset ;
          where B = 1 ;
          class A ;
          model Y = A / solution ;
     title 'Effect of factor A on outcome Y when B = 1.' ;
     run ;

     You would similarly consider A = 1 versus A = 2 when B = 2.

     And similarly, B = 1 versus B = 2 when A = 1.  And finally
B = 1 versus B = 2 when A = 2.

     You would also report the interaction from the analysis

     proc glm data = dataset ;
          class A  B ;
          model Y = A B A*B / solution ;
     title 'Effects of A, B, and their interaction, A*B.' ;
     run ;

     If the interaction is not significant, many investigators report
only the results of the no-interaction analysis:

     proc glm data = dataset ;
          class A  B ;
          model Y = A B / solution ;
     title 'Effects of A and B - no interaction assumed.' ;
     run ;



     Interactions in general are often difficult to interpret.  The situation
can be considerably more complex than what is seen here with smoking status
and gender.  One of the two factors has only two levels.  In general, each
can have several levels.  There can be more than two factors.  This raises the
possibility of multiple two-way interactions and three-way or higher interactions.
The number of coefficients for such models can be very large, and the patterns
of interactions very difficult to understand.

* ==================================================================== ;        
n54703.008  Last update: March 6, 2004.