December 22, 2004                                                     page 1 of 8

 SPH 5421 Final Examination         Name: ________________________________________
 =================================================================================

 1.  A datafile includes data on a random sample of 250 people, and  three 
     variables are recorded for each person:

     Variable 1 is ID, an identifying number
     Variable 2 is gender:   Gender = 'M' or 'F' (male and female)
     Variable 3 is serum cholesterol

     There are 100 males and 150 females in the sample represented on the file.

     Let S be the set of all possible pairs of males and females; that is,

         S = { (IDi, IDj) }, where IDi is the ID for a male and
                             IDj is the ID for a female.

    (a)  How many possible pairs are there in the set S ?


         100 x 150 = 15000
[2]



     Let M1 = the number of pairs in S for which the male person in the pair has 
     a lower serum cholesterol than the female person.  Let M2 = the number of 
     pairs in S for which the female person has a lower serum cholesterol than 
     the male person.


    (b)  If there is no difference between males and females in the distribution 
         of serum cholesterol in the population from which the sample is drawn, 
         what is the expected value  of M1 / (M1 + M2) ?



[2]        1/2.





    (c)  Write a program using proc iml that will compute M1 and M2
         for a given datafile, and will compute the fraction M1 / (M1 + M2).

[16]  See next page

 December 22, 2004                                                     page 2 of 8

 SPH 5421 Final Examination         Name: ________________________________________
 =================================================================================

 1. part (c) continued

         data cholm cholf ;
             infile 'cholmf.data' ; 
             input gender chol ;

              if gender eq 'M' then output cholm ;
              if gender eq 'F' then output cholf ;

         run ;

         proc iml ;
              use cholm ;
              read all var {chol} into males ;

              use cholf ;
              read all var {chol} into females ;

              rm = row(males) ;
              rf = row(females) ;

              m1 = 0 ; m2 = 0 ;
              do i = 1 to rm ;
              do j = 1 to rf ;

                 if males(i) < females(j) then m1 = m1 + 1 ;
                 if males(i) > females(j) then m2 = m2 + 1 ;

              end ;
              end ;

              estprob = m1 / (m1 + m2) ;

              file 'estprob.out' ;

              put ' m1 = ' m1 '  m2 = ' m2  ' estprob = ' estprob ;

         quit ;



 December 22, 2004                                                     page 3 of 8

 SPH 5421 Final Examination         Name: ________________________________________
 =================================================================================

 2.  Check digits are often used for numerical IDs.  Assume that the main part 
     of the ID is a 4-digit number, like N = 7629.  To compute the check digit:

     Multiply the rightmost digit by:         2,
     Multiply the next digit to the left by:  1,
     Multiply the next digit to the left by:  2,
     Multiply the next digit to the left by:  1, etc.

     For 7629, the process is:

     Digits of N:                 7        6        2        9

     Multipliers:               7 x 1    6 x 2    2 x 1    9 x 2

     Products:                    7       12        2       18

     Add the resulting digits:    7  +  1 + 2  +  2  +  1 + 8 = 21
                       ------
     Subtract this from the next largest multiple of 10:  30 - 21 = 9

     The check digit is 9.

    (a)  Compute the check digit for  N = 8536.


     Digits of N:                 8        5        3        6

                                8 x 1    5 x 2    3 x 1    6 x 2

[2]                               8 + 1 + 0 + 3 + 1 + 2 = 15

     Therefore check digit is:  5.


    (b)  Write a macro which computes the check digit for any 3-digit number, N.
         The call to the macro should look like:

           %check(n, checkdig) ;

         where  n  is the input and  checkdig  is the output.


[continue           SEE NEXT PAGE
 on next page
 if needed]

 December 22, 2004                                                     page 4 of 8

 SPH 5421 Final Examination         Name: ________________________________________
 =================================================================================

 2. part (b) continued

         %macro check(n, checkdig) ;

                d3 = int(&n / 100) ;                      * 100's digit

[18]            n2 = &n - 100 * d3 ;

                d2 = int(n2 / 10) ;                       *  10's digit

                d1 = n2 - 10 * d2 ;                       *   1's digit

                d12 = d1 * 2 ;                            *   1's digit x 2

                d122 = int(d12 / 10) ;                    *   10's digit of d12

                d121 = d12 - 10 * d122 ;                  *    1's digit of d12

                d21 = d2 ;                                *   10's digit x 1

                d211 = d21 ;                              *   same

                d32 = d3 * 2 ;                            *  100's digit x 2

                d322 = int(d32 / 10) ;                    *   10's digit of d32

                d321 = d32 - 10 * d322 ;                  *    1's digit of d32

                sum = d121 + d122 + d211 + d321 + d322 ;  * sum of digits

                summod10 = sum - 10 * int(sum/10) ;       * sum of digits mod 10.

                &checkdig = 10 - summod10 ;               * check digit ...

             &mend ;




 December 22, 2004                                                     page 5 of 8

 SPH 5421 Final Examination         Name: ________________________________________
 =================================================================================

 3.  A linear transformation  T  from  R^2 to R^2  has eigenvalues a1 = 3 and

                                                         | 1 |           | 1 |
     a2 = 2, and the corresponding eigenvectors are X1 = |   |  and X2 = |   | .
                                                         | 1 |           |-1 |


     (a)  Find the matrix of the linear transformation T.


           | A    B |   | 1 |   | 3 |       | A    B |   | 1 |   | 2 |
           |        | * |   | = |   |       |        | * |   | = |   |
           | C    D |   | 1 |   | 3 |       | C    D |   |-1 |   |-2 |

             A + B = 3                    A - B = 2

             C + D = 3                    C - D = -2 .

[12]                                                              | 2.5   0.5 |
             Thus A = 2.5, B = .5, C = .5, D = 2.5.  Matrix is:   |           |
                                                                  | 0.5   2.5 |.

     (b)  Let S be the unit square - that is, S has vertices (0, 0), (1, 0), 
          (1, 1), and (0, 1).  Draw a picture of T(S), specifying  all of its 
          vertices.  What is the area of T(S) ?

                                                           ---(3,3)
          (0, 0) ---> (0, 0)                           ----     /
                                            (.5,2.5)---        /
          (1, 0) ---> (2.5, 0.5)                /             /
                                               /             /
          (1, 1) ---> (3.0, 3.0)              /             /
                                             /             /
[5]       (0, 1) ---> (0.5, 2.5)            /       ----(2.5,.5)
                                           /     ----
                                         (0,0)---

          Area: 2.5 x 2.5 - .5 x .5 = 6.25 - .25 = 6.



     (c)  What are the eigenvalues of the inverse of T ?


          1/3 and 1/2.


[3]








 December 22, 2004                                                     page 6 of 8

 SPH 5421 Final Examination         Name: ________________________________________
 =================================================================================

 4.  An experiment is conducted in which water is allowed to evaporate from a 
     number of 1-cup containers where the containers are exposed to different 
     temperatures.  Each cup contains 8 ounces of water at the beginning.  After 
     being exposed to a temperature  Ti  for  Mi  minutes, the amount of water
     Wi  remaining in the cup is weighed.

     Assume that the amount of water that has evaporated is the following 
     function of temperature  Ti  and minutes exposed Mi:

         Ei = a * Mi * (Ti - b) + e,

     where e is a normally distributed error term, e ~ N(0, v), and where v is 
     an unknown constant variance.  The constants a and b are also unknown and 
     must be estimated from the data.

     Given a data file which includes Ti, Mi, and Wi, write a PROC NLIN program 
     which will estimate  a, and b.  Explain  how this program will also 
     estimate  v.


     data water ;

          input ti mi wi ;
                ei = 8 - wi ;
     run ;

     proc nlin method = marquardt data = water ;
          par a  1
              b  1 ;
[20]

          obsd = ei ;
          expd = a * Mi * (Ti - b) + e,

          der.a = Mi * (ti - b) ;
          der.b = -a * Mi ;

          model obsd = expd ;

     run ;


     The program prints the sum of squares.  You can estimate v as a
     mean square, which is the sum of squares divided by n - p.  The
     printout includes the mean square also.






 December 22, 2004                                                     page 7 of 8

 SPH 5421 Final Examination         Name: ________________________________________
 =================================================================================

 5.  Assume X and Y are dichotomous random variables, both of which  take on only
     values of 0 or 1, and that each has a Bernoulli distribution,

            X ~ Ber(.5)

            Y ~ Ber(.8),

     and X and Y are correlated:  corr(X, Y) = .3.

     {Recall that corr(X, Y) = cov(X, Y) / [sqrt(var(X)*var(Y))].}

     (a) What are the expected cell counts and margins in the in the following 
         table, where 100 observations (X, Y) are made ?

                   X = 0   X = 1
                 -----------------
                 |       |       |
          Y = 0  | E(a)  | E(b)  |  E(n1)
                 |       |       |
                 -----------------
                 |       |       |
          Y = 1  | E(c)  | E(d)  |  E(n2)
                 |       |       |
                 -----------------------
                   E(m1)   E(m2) |  100
                                 |


          Since corr(X, Y) = cov(X, Y)/(sqrt(Vx)*sqrt(Vy)),

[8]       .3 = cov(X, Y) / (.5 * .4), or cov(X, Y) = .3 * .2 = .06.


          Noting that cov(X, Y) = E(XY) - E(X)E(Y), we have

                 .06 = E(XY) - .5 * .8 = E(XY) - .4, or

                 E(XY) = .46.

           Hence probabilities:

                   X = 0   X = 1
                 -----------------
                 |       |       |
          Y = 0  |  .16  |  .04  | .20
                 |       |       |
                 -----------------
                 |       |       |
          Y = 1  |  .34  |  .46  | .80
                 |       |       |
                 -----------------------
                    .50     .50  | 1.00
                                 |

          So the cell expectations are 16, 4, 34, and 46.


     (b)  Write a program which simulates 100 observations from (X, Y).


          Next page ...

[12]


 December 22, 2004                                                     page 8 of 8

 SPH 5421 Final Examination         Name: ________________________________________
 =================================================================================

 5., part (b) continued.

 [12]  data xysim ;

            n = 100 ;

            p00 = .16 ;
            p01 = .04 ;
            p10 = .34 ;
            p11 = .46 ;

            psum00 = p00 ;
            psum01 = p00 + p01 ;
            psum10 = p00 + p01 + p10 ;

            cell00 = 0 ;
            cell01 = 0 ;
            cell10 = 0 ;
            cell11 = 0 ;

            do i = 1 to n ;

               r = ranuni(-1) ;

               if r < psum00                then do; X = 0; Y = 0 ; end ;
               if r > psum00 and r < psum01 then do; X = 1; Y = 0 ; end ;
               if r > psum01 and r < psum10 then do; X = 0; Y = 1 ; end ;
               if r > psum10                then do; X = 1; Y = 1 ; end ;

               output ;

            end ;

     run ;

     proc corr ;
          var x y ;
     title1 'Correlation of simulated X and Y ...' ;
     run ;