October 27, 2005                                                      Page 1 of 4

PubH 7460 -  Fall 2005 - Exam 1          Name:___________________________________
=================================================================================

1.  X and Y are independent random variables, each with a uniform distribution on
    the interval [0, 1].  Let Z = max(X, Y).

    a)  What is the expectation of Z ?

        First, the density of Z is f(z) = 2*z, 0 < z < 1.

[8]     The expectation is therefore the integral of 2*z*z from
        0 to 1.  This is 2/3.


    b)  What is the median of Z ?

        This is a little harder.  You need to find "m" such that the
        integral from 0 to m of f(z) is 1/2.  The integral of
        2*z from 0 to m is m^2.  Therefore m = 1/sqrt(2).
[10]


    c)  Write a program (SAS, Splus, or R) which will simulate the distribution 
        of Z.  The program should include (1) a way to estimate the expectation 
        of Z, (2) a way to estimate the variance of Z, and (3) a way to estimate 
        the median of Z.


        data max ;

[9]          n = 1000 ;
             sum = 0 ;

             do i = 1 to n ;

               x = ranuni(-1) ;
               y = ranuni(-1) ;
                z = max(x, y) ;
                sum = sum + z ;
                output ;

             end ;

             meanz = sum / n ;

        run ;

        proc univariate data = max ;
             var z ;

        [Note: proc univariate computes the mean as the 50th percentile,
         i.e., the median.]

October 27, 2005                                                      Page 2 of 4

PubH 7460 -  Fall 2005 - Exam 1          Name:___________________________________
=================================================================================

2.  Assume that X and Y are two Bernoulli random variables, X --> Ber(r) and
    Y --> Ber(s), where r and s are numbers between 0 and 1, and corr(X, Y) = rho.
    That is, if rho is not zero, then X and Y are not independent.

    a)  Write a program which simulates the bivariate distribution of X and Y.

        What this means is that the program should produce estimates of the
        cell-probabilities u, v, w and x for the following 2 x 2 table:


[15]              X = 0      X = 1
               ---------------------
              |          |          |
        Y = 0 |     u    |     v    |
              |          |          |
               ---------------------
              |          |          |
        Y = 1 |     w    |     x    |
              |          |          |
               ---------------------


     This part of the problem is harder than the second part,
and essentially you have to solve the second part in order to
do the first part.

     Note that rho = cov(X, Y) = cov(X,Y)/sqrt(var(X) * var(Y)).

     Note further that var(X) = r*(1 - r), var(Y) = s*(1 - s).

     Note that cov(X, Y) = E(X*Y) - E(X)*E(Y).

     E(X) = r and E(Y) = s.  Therefore


     rho*sqrt(r*(1-r)*s(1-s)) + r*s = E(X*Y).

     Finally, X*Y is also a Bernoulli random variable, and its
     expectation is the value in the lower right-hand cell in
     the diagram above, i.e., "x".  Therefore

         x = rho*sqrt(r*(1-r)*s*(1-s)) + r*s.

     From this it is easy to solve for u, v, and w in the
     diagram above.

     To simulate counts in these four cells:

        data cell4 ;

             x = rho*sqrt(r*(1-r)*s*(1-s)) + r*s.
             w = s - x ;
             v = r - x ;
             u = (1-r) - v ;

             cell1 = 0 ; cell2 = 0 ; cell3 = 0 ; cell4 = 0 ;


             n = 1000 ;

             do i = 1 to n ;

                p = ranuni(-1) ;
                if p < u                       then cell1 = cell1 + 1 ;
                if p > u and p < u + v         then cell2 = cell2 + 1;
                if p > u + v and p < u + v + w then cell3 = cell3 + 1 ;
                if p > u + v + 1               then cell4 = cell4 + 1 ;

             end ;

            output ;


    b)  Suppose r = .3, s = .6, and rho = .5.  What is the true value of x ?


             x = rho*sqrt(r*(1-r)*s*(1-s)) + r*s.

[10]           = .5*sqrt(.3*.7*.6*.4) + .18 = .5*sqrt(.0504) + .19 = .2922


October 27, 2005                                                      Page 3 of 4

PubH 7460 -  Fall 2005 - Exam 1          Name:___________________________________
=================================================================================

3.  This problem is concerned with estimating regression to the mean.  Assume you
    measure the blood pressure of a large group of N people on one occasion.  The 
    i-th person has a 'true' blood pressure BPi.  The population mean of all the 
    BPi is 80.  The standard deviation of the BPi (that is, the between-person
    standard deviation) is 10.  If you did repeated measurements of just one person's 
    blood pressure, the standard deviation of the measurement (i.e., the within-
    person standard deviation) is 6.

    After measuring the blood pressure of N people, you select those people whose
    measurement was bigger than or equal to 90.

    Later, you measure the BP of the people in this subset.

    a)  What can you say about the expectation of this second blood pressure measure-
        ment relative to the first one?  Why?

[10]


        Smaller.


    b)  Write a program (SAS, Splus, or R) which will simulate the difference between
        the first BP measurement and the second one in the subset described above.


        data regtomean ;

[15]         mu = 80 ;
             sigmabetween = 10 ;
             sigmawithin  = 6  ;

            n = 1000 ;

             do i = 1 to n ;

                truebp = mu + sigmabetween * rannor(-1) ;
                measbp1 = truebp + sigmawithin * rannor(-1) ;

                if measbp1 gt 90 then do ;

                   measbp2 = truebp + sigmawithin * rannor(-1) ;
                   diff12 = measbp1 - measbp2 ;

                   output ;

                end ;

             end ;

        run ;

        proc means data = regtomean ;
             var   measbp1 measbp2 diff12 ;
        title1 'Illustration of regression to the mean ...' ;
        run ;

October 27, 2005                                                      Page 4 of 4

PubH 7460 -  Fall 2005 - Exam 1          Name:___________________________________
=================================================================================

4.  Consider the equation

             x^3 = 4 * exp(-x) + 3

    a)  How do you know this equation has a solution for some positive real number x ?

        Let G(x) = x^3 - 4*exp(-x) - 3.

[10]    Note G(0) =  -7 < 0.

        Note G(2) = 8 - 4*exp(-2) - 3 > 8 - 4 - 3 = 1 > 0.

        Since G is a continuous function, there must be a value "a"
        such that G(a) = 0 [intermediate value theorem].


    b)  Write a program which uses Newton's method to find an approximate
        solution.


        data solution ;

             eps = 1e-8 ;
             x0 = 1.5 ;

             do i = 1 to 50 (while diff gt eps) ;

                Gx = x**3 - 4*exp(-x) - 3.
                Gpx = 3*x**2 + 4*exp(-1) ;

                x1 = x0 - Gx/Gpx ;

[13]            output ;

                x0 = x1 ;

             end ;

             output ;

        run ;

        proc print data = solution ;
        title "Results of iterations to a solution of Gx = x**3 - 4*exp(-x) - 3 = 0" ;
        run ;