Grades:   100  96  85  83  70  65  63  58


October 28, 2008                                                      Page 1 of 5

PubH 7460 -  Fall 2008 - Exam 1          Name:___________________________________
=================================================================================

1.  A person throws darts at a map.  The map includes only the States of Missouri,
    Iowa and Minnesota.  The probability that the dart lands in State X
    is proportional to the area of State X.  Dart-throws where the dart misses
    the map entirely are not counted.  The areas of the states are:

        Missouri :  69,709 square miles
        Iowa     :  56,276 square miles
        Minnesota:  86,943 square miles

    a) Write a program which simulates 1,000 dart-throws.  The program should produce
    the simulated count of darts that land within each of the three States.


      data darts ;

           amissouri  = 69709 ;
           aiowa      = 56276 ;
           aminnesota = 86943 ;
           totalarea  = amissouri + aiowa + aminnesota ;

           pmissouri  = amissouri / totalarea ;
           piowa      = aiowa / totalarea ;
           pminnesota = aminnesota / totalarea ;
[13]
           nmissouri = 0 ; niowa = 0 ; nminnesota = 0 ;

           do i = 1 to 1000 ;

              r = ranuni(-1) ;
              if r < pmissouri then nmissouri = nmissouri + 1 ;
              else if r < pmissouri + piowa then niowa = niowa + 1 ;
              else if r ge pmissouri + piowa then nminnesota = nminnesota + 1 ;

           end ;
           output ;

      run ;

      proc print data = darts ;

October 28, 2008                                                      Page 2 of 5

PubH 7460 -  Fall 2008 - Exam 1          Name:___________________________________
=================================================================================
1., Continued

    b) Let M = the number of darts that land in Missouri, and I = number of darts
    that land in Iowa, where again you assume that 1,000 darts are thrown.   How would 
    you use the simulated data from your program to estimate the covariance of M and I ?

        Easiest answer: Let p1 = prob(dart lands in Missouri),

[12]                        p2 = prob(dart lands in Iowa).


        Let Xi = 1 if the i-th dart does not land in Missouri, 0 otherwise.

        Let Yi = 1 if the i-th dart does not land in Iowa, 0 otherwise.

        Note that M = X1 + X2 + ... + X1000 and I = Y1 + Y2 + ... + Y1000.

        We assume that X1, X2, ..., X1000 are independent and Y1, Y2, ..., Y1000
        are independent.

        COV(M, I) = E((X1 + X2 + ... + X1000)*(Y1 + Y2 + ... + Y1000) - E(M) * E(I)

        Note that Xi * Yi = 0.

        Therefore E((X1 + X2 + ... X1000)*(Y1 + Y2 + ... + Y1000))

           = (Sum over i <> j of) E(Xi * Yj) = (Sum over i <> j of) E(Xi)E(Yj)

           = 1000 * 999 * p1 * p2.

        But M/1000 is an estimate of p1 and I/1000 is an estimate of p2.

        Therefore

        COV(M, I) = (1000 * 999) * p1 * p2 - 1000 * p1 * 1000 * p2

                  = - 1000 * p1 * p2 = - 1000 * (M/1000)* (I/1000) = - M * I / 1000.


October 28, 2008                                                      Page 3 of 5

PubH 7460 -  Fall 2008 - Exam 1          Name:___________________________________
=================================================================================

2.  Assume X has a standard normal distribution (mean 0, variance 1).  Assume Y is
    the absolute value of X.

    a)  What is the CDF for Y [may be stated in terms of the CDF for X].

        F_Y(y) = prob(Y < y) = prob(abs(X) < y) = 1 - 2 * prob(X < -y)
[5]
               = 1 - 2 * F_X(-y).


    b)  What is the pdf for Y ?

        f_Y(y) = 2 * f_X(-y),


        where f_X(x) is the pdf of the standard normal,

        f_X(x) = (1/sqrt(2*pi)) exp(-x^2/2).
[5]


    c)  What is the expectation of Y?

        integral from 0 to infinity of [2 * (1/sqrt(2*pi)) * y * exp(-y^2/2) dy]

        = 2 / sqrt(2*pi) = .798 approx.

[7]


    d)  How might you use distribution functions in SAS or R to find 
        median(Y) ?


        The median of Y is the value of y such that F_Y(y) = 1/2.

        From part a),  let 1/2 = 1 - 2 * F_X(-y), so
[8]
        F_X(-y) = 1/4.  Therefore -y = (inverse of F_X)(1/4).

        In SAS, the inverse of the normal CDF is PROBIT.

        Therefore y = -PROBIT(1/4) = .6744 approx.


October 28, 2008                                                      Page 4 of 5

PubH 7460 -  Fall 2008 - Exam 1          Name:___________________________________
=================================================================================

3.  Given a set of 100 numbers  x_1, x_2, x_3, ..., x_100, the 5% Winsorized mean
    is defined by replacing the lowest 5 numbers by the 6th lowest number, and the
    highest 5 numbers by the 6th highest number, and then computing the mean of this
    modified set of numbers.

    a)  Why might someone compute a Winsorized mean instead of the usual mean?
        What is a disadvantage of the Winsorized mean?

        Advantage: less influenced by outliers, and in general will give a
        more robust estimate of the true mean (if the assumed model is
        incorrect).
[10]

        Disadvantage: The standard error will be underestimated.  Plus, for
        skewed distributions, the Winsorized mean may be biased.


    b)  Write a program which will estimate the variance of the 5% Winsorized mean of
        a set of 100 numbers having an exponential distribution with hazard 0.5.

        The following is a clever way to do this problem and is due to one of the
        students in the class:

        data xdata ;
             infile 'x100.file' end = endmark ;
             retain n 0 sumx 0 sumxx 0 ;
[15]         input x ;

             n = n + 1 ;
             if n = 6 then do ; sumx = sumx + 6*x ; sumxx = sumxx + 6*x*x ; end ;
             if n > 6 and n < 95 then do ; sumx = sumx + x ; sumxx = sumxx + x*x ; end ;
             if n = 95 then do ; sumx = sumx + 6*x ; sumxx = sumxx + 6*x*x ; end ;

             if endmark = 1 then winsorvariance = (sumxx - sumx * sumx / 100)/ 99 ;

         run ;

         proc print data = xdata ;
         run ;
         endsas ;


October 28, 2008                                                      Page 5 of 5

PubH 7460 -  Fall 2008 - Exam 1          Name:___________________________________
=================================================================================

4.  Assume X and Y are independent random variables, both having standard normal
    distributions (that is, mean 0 and variance 1).  Let T be the linear transformation
    defined by:

              | X |     | U |     | 2*X - 5*Y |
            T |   |  =  |   |  =  |           |
              | Y |     | V |     |  X + 3*Y  |

    a)  Sketch the image:  T(unit square).  What is the area
        of the resulting figure?

           Y |                             V |
             |                               |
             |                   T           |  Big Parallogram      Area = 11.
[7]          |                  --->         |
             |                               |
             |                               |
            -|--------------                -|---------------
           0 |              X              0 |               U

    b)  Find Var(U)  and  Cov(U, V).

        Var(U) = Var(2*X - 5*Y) = 2*2 + 5*5 = 29.

        Cov(U, V) = Cov(2*X - 5*Y, X + 3*Y)

                  = 2 - 15 = -13.
[8]


    b)  Write a simulation program in SAS which produces an estimate of
        Corr(U, V), based on a sample of size 10000.  Do not use a SAS procedure.

        data simuv ;

           n = 10000 ;

           sumu = 0 ; sumuu = 0 sumv = 0 ; sumvv = 0 ; sumuv = 0 ;

           do i = 1 to 1000 ;

              x = rannor(-1) ; y = rannor(-1) ;

              u = 2 * x - 5 * y ;
              v = x + 3 * y ;

[10]          sumu = sumu + u ; sumv = sumv + v ;
              sumuu = sumuu + u*u ; sumvv = sumvv + v*v ;
              sumuv = sumuv + u * v ;

           end ;

           covuv = (sumuv - sumu * sumv / 1000 ) / 998 ;

           varu = (sumuu - sumu * sumu / 1000) / 999 ;

           varv = (sumvv - sumv * sumv / 1000) / 999 ;

           corruv = covuv / sqrt(varu * varv) ;

           output ;

      run ;

      proc print data = simuv ;
           var n varu varv covuv corruv ;
      run ;