PubH 5470-3  Statistical Analysis Using SAS Procedures                         page 1 of 6

Final Exam - May 15, 2004                              Name: _____________________________
==========================================================================================

1.  Data on individuals enrolled in the Happy Health HMO are recorded in two
    files:


    File 1: Demographic Data              File 2: Drug Prescription Data
    -----------------------------------   -------------------------------------------
                                               Date of Last
    ID   Gender   Age    Weight   Race    ID   Prescription   Drug Code       Age
    ---- ------   ---    ------   ----    ---- ------------   ----------  -----------
    0611    M      18      129     W      0345  2004-02-29        124          41
    0345    M      40      158     A      0111  2003-12-16        288          52
    0260    F      79      107     B      0064       .             .           63

            [more observations]                 [more observations]

    Your task is to merge the two files so that both the demographic data and
the drug prescription data for a given person are part of one observation for
that person.  The two files both include 'ID', which is a unique identifier
for the person.  However, File 1 is sorted in ascending order by age, while
File 2 is sorted in reverse order by Date of Last Prescription.

     Write a SAS program which merges the two files.

--------------------------------------------------------------------------------------

       data file1 ;
            length id $4  gender $1  race $1 ;

            input ID gender age weight race ;

       run ;

[13]   data file2 ;
            length id $4 ;
            input  id  @7 presdate yymmdd10. drugcode ;
       run ;

       proc sort data = file1 ;  by id ;
       proc sort data = file2 ;  by id ;

       data twofiles ;
            merge file1 file2 ; by id ;
       run ;

--------------------------------------------------------------------------------------




     Note that one person, ID #0345, has a different Age on File 2 than on
File 1.  What will happen to this variable when the files are merged?



--------------------------------------------------------------------------------------

     The value from the second file in the merge statement will be used.

--------------------------------------------------------------------------------------


[7]


PubH 5470-3  Statistical Analysis Using SAS Procedures                         page 2 of 6

Final Exam - May 15, 2004                              Name: _____________________________
==========================================================================================

2.  The relationship of height to weight was studied in men and women in
    the Lung Health Study.  On the following pages are 4 PROC GLM analyses,
    and the corresponding printouts.

    a) Use the results of Model 1 and Model 2 to evaluate whether the
       relationship between height and weight is the same for men as it
       is for women.  Describe your reasoning in detail.


--------------------------------------------------------------------------------------
       Consider confidence intervals for intercepts and slopes,
       based on param estimates +/- 2*std errs.

[6]    Men:   Intercept = -81.43 +/- 2*5.06 = (-91.55, -71.31)
              Slope     =  92.75 +/- 2*2.86 = ( 87.03,  98.43)


       Women: Intercept = -54.62 +/- 2*8.62 = (-71.65, -37.38)
              Slope     =  73.14 +/- 2*3.88 = ( 65.38,  80.90)


       Note that the confidence intervals essentially do not even overlap.
--------------------------------------------------------------------------------------


    b) What do you conclude by comparing Model 3 and Model 4 ?  Justify your
       answer in detail.


--------------------------------------------------------------------------------------
       Conclude that because the 'gender' term is highly significant, then in
       a two-parallel-lines model, the intercepts are not the same.
[6]
       Also very likely: the slopes are not the same [note Model 3 slope is
       very different from Model 4 slope].
--------------------------------------------------------------------------------------





    c) Write another 'proc glm' ('Model 5' )which models weight as a function of
       height with separate intercepts and separate slopes for women and
       men in one procedure.

--------------------------------------------------------------------------------------
       proc glm ;
            class gender ;
[6]         model height = weight gender weight * gender ;
--------------------------------------------------------------------------------------





PubH 5470-3  Statistical Analysis Using SAS Procedures                         page 3 of 6

Final Exam - May 15, 2004                              Name: _____________________________
==========================================================================================
2., contin.


    d) Describe how you would use the printout from Model 5 and Model 3 to
       formally test whether Model 5 is better than Model 3.

--------------------------------------------------------------------------------------
       Compute the F-statistic:


[6]                    (ErrSS3 - ErrSS5) / 2
                 F =  -----------------------
                        ErrSS5 / (n - 4)


        Compare to F-distribution with (2, n - 4) degrees of freedom.

--------------------------------------------------------------------------------------






========================================================================
Program and Printout for Problem 2

 proc glm data = smoke ;
      where gender eq 0 ;
      model weight = height ;
 title1 'Model 1: gender = 0 [Men] only: regress weight on height' ;

 proc glm data = smoke ;
      where gender eq 1 ;
      model weight = height ;
 title1 'Model 2: gender = 1 [Women] only: regress weight on height' ;

 proc glm data = smoke ;
      model weight = height ;
 title1 'Model 3: both genders combined: regress weight on height' ;

 proc glm data = smoke ;
      model weight = gender height ;
 title1 'Model 4: both genders, same slope, different intercepts' ;

------------------------------------------------------------------------
             Model 1: gender = 0 [Men] only: regress weight on height           2

                                                       16:24 Sunday, May 9, 2004

                        General Linear Models Procedure

Dependent Variable: WEIGHT   
                                     Sum of            Mean
Source                  DF          Squares          Square   F Value     Pr > F

Model                    1     138223.66876    138223.66876   1051.04     0.0001

Error                 3699     486458.48624       131.51081

Corrected Total       3700     624682.15500

                  R-Square             C.V.        Root MSE          WEIGHT Mean

                  0.221270         13.89803       11.467816            82.513996


Source                  DF        Type I SS     Mean Square   F Value     Pr > F

HEIGHT                   1     138223.66876    138223.66876   1051.04     0.0001

Source                  DF      Type III SS     Mean Square   F Value     Pr > F

HEIGHT                   1     138223.66876    138223.66876   1051.04     0.0001
                                                                                 john-c

                                        T for H0:    Pr > |T|   Std Error of
Parameter                  Estimate    Parameter=0                Estimate

INTERCEPT              -81.43835688         -16.09     0.0001     5.06067757
HEIGHT                  92.75304821          32.42     0.0001     2.86099902

 
------------------------------------------------------------------------

            Model 2: gender = 1 [Women] only: regress weight on height          4
                                                       16:24 Sunday, May 9, 2004

                        General Linear Models Procedure

Dependent Variable: WEIGHT   
                                     Sum of            Mean
Source                  DF          Squares          Square   F Value     Pr > F

Model                    1     40585.548007    40585.548007    355.19     0.0001

Error                 2183    249437.325009      114.263548

Corrected Total       2184    290022.873016

                  R-Square             C.V.        Root MSE          WEIGHT Mean

                  0.139939         16.44993       10.689413            64.981510


Source                  DF        Type I SS     Mean Square   F Value     Pr > F

HEIGHT                   1     40585.548007    40585.548007    355.19     0.0001

Source                  DF      Type III SS     Mean Square   F Value     Pr > F

HEIGHT                   1     40585.548007    40585.548007    355.19     0.0001


                                        T for H0:    Pr > |T|   Std Error of
Parameter                  Estimate    Parameter=0                Estimate

INTERCEPT              -54.83183244          -8.62     0.0001     6.36142075
HEIGHT                  73.14141859          18.85     0.0001     3.88089171
------------------------------------------------------------------------
             Model 3: both genders combined: regress weight on height           6
                                                       16:24 Sunday, May 9, 2004

                        General Linear Models Procedure

Dependent Variable: WEIGHT   
                                     Sum of            Mean
Source                  DF          Squares          Square   F Value     Pr > F

Model                    1     571224.28119    571224.28119   4389.00     0.0001

Error                 5884     765796.48491       130.14896

Corrected Total       5885    1337020.76610

                  R-Square             C.V.        Root MSE          WEIGHT Mean

                  0.427237         15.00980       11.408285            76.005590


Source                  DF        Type I SS     Mean Square   F Value     Pr > F

HEIGHT                   1     571224.28119    571224.28119   4389.00     0.0001

Source                  DF      Type III SS     Mean Square   F Value     Pr > F

HEIGHT                   1     571224.28119    571224.28119   4389.00     0.0001


                                        T for H0:    Pr > |T|   Std Error of
Parameter                  Estimate    Parameter=0                Estimate

INTERCEPT              -114.1720521         -39.72     0.0001     2.87447389
HEIGHT                  110.5977830          66.25     0.0001     1.66941166
------------------------------------------------------------------------

             Model 4: both genders, same slope, different intercepts            8
                                                       16:24 Sunday, May 9, 2004

                        General Linear Models Procedure

Dependent Variable: WEIGHT   
                                     Sum of            Mean
Source                  DF          Squares          Square   F Value     Pr > F

Model                    2     599142.93697    299571.46849   2388.44     0.0001

Error                 5883     737877.82913       125.42543

Corrected Total       5885    1337020.76610

                  R-Square             C.V.        Root MSE          WEIGHT Mean

                  0.448118         14.73490       11.199350            76.005590


Source                  DF        Type I SS     Mean Square   F Value     Pr > F

GENDER                   1     422315.73809    422315.73809   3367.07     0.0001
HEIGHT                   1     176827.19888    176827.19888   1409.82     0.0001

Source                  DF      Type III SS     Mean Square   F Value     Pr > F

GENDER                   1      27918.65578     27918.65578    222.59     0.0001
HEIGHT                   1     176827.19888    176827.19888   1409.82     0.0001


                                        T for H0:    Pr > |T|   Std Error of
Parameter                  Estimate    Parameter=0                Estimate

INTERCEPT              -70.31957232         -17.26     0.0001     4.07456022
GENDER                  -6.33408349         -14.92     0.0001     0.42455048
HEIGHT                  86.46279900          37.55     0.0001     2.30275410

 
              LUNG HEALTH STUDY :  WBJEC5.SAS (JEC) 09MAY04 16:24


PubH 5470-3  Statistical Analysis Using SAS Procedures                         page 4 of 6

Final Exam - May 15, 2004                              Name: _____________________________
==========================================================================================

3.  A cohort of people aged 80-100 was followed for one year.  A datafile was
    constructed with the following information:

    Person's ID
    Age in years
    Death (0 = no, 1 = yes)

    a)  Describe the model in which probability of death is represented
        as a function of age.

--------------------------------------------------------------------------------------
 [3]    Model:  Prob(Death | age) = 1 / (1 + exp(-b0 -b1*age)).


--------------------------------------------------------------------------------------

    b)  Write a proc logistic procedure to analyze the data.


--------------------------------------------------------------------------------------
 [3]    proc logistic data = agedeath ;
             model death = age / clodds = pl ;
        run ;
--------------------------------------------------------------------------------------

    c)  Suppose the coefficient estimates from proc logistic are as
        follows:

        Analysis of Maximum Likelihood Estimates
 
                     Parameter Standard
         Variable DF  Estimate   Error

         INTERCPT 1    -1.000    0.3000
         AGE      1     -.010    0.0020


        Compute the probability of dying within one year for a person
        who is 90 years old.



--------------------------------------------------------------------------------------
                prob = 1 / (1 + exp(1 + .01*90)) = 1 / (1 + exp(1 + .9)) = 1 / (1 + exp(1.9))

 [4]                 = 1 / (1 + 6.686) = 1 / 7.686 = .13

--------------------------------------------------------------------------------------


    d)  Using the coefficient estimates in b) [misprint: should be c)], find the age for which
        a third of the people of that age would be expected to die
        within a year.

--------------------------------------------------------------------------------------
 [6]    Let 1/3 = 1 / (1 + exp(1 + .01*age)), or

              3 = 1 + exp(1 + .01*age)

              2 = exp(1 + .01*age)

              .693 = 1 + .01*age

             Therefore age = -30.7.
--------------------------------------------------------------------------------------
         Note: although the calculations above are technically correct, the answer
              -30.7 years does not really make sense.  The coefficient b1 of age
                was erroneously printed as -.010 when it should have been +.010.



PubPH5470-3  Statistical Analysis Using SAS Procedures                         page 5 of 6

Final Exam - May 15, 2004                              Name: _____________________________
==========================================================================================

4.  An investigator wants to conduct a clinical trial.  A total of 200 people
    will be randomized: 100 to drug X and 100 to drug Y (placebo).  The outcome
    is resolution of depression.  The investigator thinks that the probability
    of resolution of depression in each of the two groups will be the following:

    Drug X    :  .75
    Drug Y    :  .60

    a)  Write a SAS program which will simulate this clinical trial.  The
        output from the program should show the number of events in each
        of the two groups.

--------------------------------------------------------------------------------------
        data trial ;

             n = 100 ;
             px = .75 ;
             py = .60 ;
             seed = 20040515 ;
[6]
             do i = 1 to n ;

                group = 'X' ;
                rx = ranuni(seed) ;
                x = 0 ;
                if rx < px then x = 1 ;
                output ;

             end ;

             do i = 1 to n ;

                group = 'Y' ;
                ry = ranuni(seed) ;
                x = 0 ;
                if ry < py then x = 1 ;
                output ;

             end ;

        run ;
--------------------------------------------------------------------------------------

    b)  How would you test whether the results of your simulated clinical
        trial indicated a significant difference between the two groups?

--------------------------------------------------------------------------------------
        Use PROC FREQ as follows:

        proc freq data = trial ;
[6]          tables x * group / chisq ;
        run ;
--------------------------------------------------------------------------------------

    c)  Suppose you carried out 1000 separate simulations of the clinical
        trial with the probabilities given as above.  How could you use the
        results to estimate the power of the clinical trial (i.e., the
        probability that the results would be significant)?

--------------------------------------------------------------------------------------
        Generate 1000 datasets as in b) above and test each using proc freq.
        Let M = the number of times in 1000 that the p-value is < .05.
[8]
        Compute M / 1000.   This is an estimate of power.
--------------------------------------------------------------------------------------

PubH 5470-3  Statistical Analysis Using SAS Procedures                         page 6 of 6

Final Exam - May 15, 2004                              Name: _____________________________
==========================================================================================

5.  The following dataset summarizes data from a followup study of heart attack,
    where the risk factor of interest was aspirin use.  Systolic blood pressure,
    age, and gender are also risk factors for heart attack.

    Participant                  Systolic    Aspirin   Followup     Heart
    Sequence No.   Gender  Age     B. P.       Use     Time(mos.)   Attack?
    ------------   ------  ---   --------  ----------  ----------  --------
          1           2     67      145         0          28          0
          2           2     48      132         1          36          0
          3           1     72      168         0          48          1

                              [more observations]

    a)  Write a SAS program to read the datafile and analyze this data using 
        PROC LIFETEST.

--------------------------------------------------------------------------------------
        data heart ;
             infile 'heart.data' ;
             input seqno gender age sbp aspirin folltime heartatt ;
             aspgend = aspirin * gender ;
        run ;

[6]     proc lifetest data = heart ;
             folltime * heartatt(0) ;
        strata aspirin ;
        run ;
--------------------------------------------------------------------------------------

    b)  How can you tell, from PROC LIFETEST output, whether the people who
        use aspirin are at higher or lower risk than the people who do not?
        What test for the difference between aspirin users and non-users would
        you use ?

--------------------------------------------------------------------------------------
        Examine the survival curves.  The group with the higher survival rate is the
        group that has lower risk.

[6]     Test: PROC LIFETEST produces tests based on -2*Log(Likelihood), Wilcoxon statistic,
              and Logrank statistic.  Any of these can be used.  Conservative approach would
              be to take the largest p-value produced by these three tests.
--------------------------------------------------------------------------------------

    c)  Write a PROC PHREG procedure to analyze the effects of aspirin use,
        controlling for age, gender, and SBP.  How would you test for an
        interaction between aspirin use and gender ?

--------------------------------------------------------------------------------------
        1.  proc phreg data = heart ;
                 model folltime * heartatt(0) = age gender sbp ;
            run ;

        2.  proc phreg data = heart ;
                 model folltime * heartatt(0) = aspirin age gender sbp ;
            run ;

            Look at difference in (-2 Log L) with these two models, compare to
            chi-square distrib with 1 d.f.

        3.  proc phreg data = heart ;
                 model folltime * heartatt(0) = aspirin age gender sbp aspgend ;
            run ;

            Look at difference in (-2 Log L) between 3. and 2., compare to chi-square
            distrib. with 1 d.f..

            Note: you need to create a variable called aspgend in the dataset ...
[8]         see above.
--------------------------------------------------------------------------------------