SPH 7460 Final Exam  December 22, 2007                                    page 1 of 6

Five Problems - 6 pages               Name: ______ANSWER KEY_________________________
=====================================================================================

1.  SAS PROC IML provides the function INV(A) to compute the inverse of a nonsingular n x n
    matrix A.

    SAS also provides the call-function

        call EIGEN(E, P, A)

    which computes the eigenvalues of A.  If A is n x n and symmetric, then the array E
    is an n x 1 array which contains the eigenvalues of A, and P is an n x n matrix whose
    columns are the eigenvectors of A.  By a theorem proved in homework,

        inverse(P) = transpose(P); that is,  P * P` = P` * P = I.


    a)  Suppose D is a 3 x 3 diagonal matrix, e.g.,

               | d1   0   0 |
               |            |
           D = | 0   d2   0 |
               |            |
               | 0    0  d3 |,

        where d1, d2, and d3 are nonzero.

        What is the determinant of D ?


[5]         det(D) = d1 * d2 * d3


     b) What is the inverse of D ?


               | 1/d1   0     0  |
               |                 |
      inv(D) = |  0    1/d2   0  |
               |                 |
               |  0     0   1/d3 |,
[5]


SPH 7460 Final Exam  December 22, 2007                                    page 2 of 6

                                      Name: _________________________________________
=====================================================================================
1., Continued.

     c) Explain using matrix equations why knowing E and P, and using the result in b)
        above, you can compute the inverse of an 3 x 3 nonsingular symmetric matrix A.
        Then write a PROC IML program to do this (without using the INV function).

                                     | e1    0    0 |
                                     |              |
        A * P = P * D,     where D = |  0   e2    0 |
                                     |              |
[20]                                 |  0    0   e3 |


        Therefore A = P * D * P`.

        Therefore inv(A) = inv(P * D * P`) = inv(P')*inv(D)*inv(P) = P * inv(D) * P`.

--------------------------------------------------------------------------------------

proc iml ;

     a = {1  2  4, 2  5  6, 4  6  3} ;

     call eigen(e, p, a) ;

     d = j(3, 3, 0) ; dinv = d ;

     d[1, 1] = e[1] ; dinv[1, 1] = 1/e[1] ;
     d[2, 2] = e[2] ; dinv[2, 2] = 1/e[2] ;
     d[3, 3] = e[3] ; dinv[3, 3] = 1/e[3] ;

     ainv = inv(a) ;
     pdinvpt = p * dinv * p` ;

     print "a, e, d, p, ainv, pdinvpt = " a  e d  p  ainv  pdinvpt ;

  quit ;

endsas ;

---------------------------------------------------------------------------------------


SPH 7460 Final Exam  December 22, 2007                                    page 3 of 6

                                      Name: _________________________________________
=====================================================================================

2.  Write a program which simulates a bivariate distribution for (X, Y) such that

    (1) X and Y have the following Bernoulli distributions:  X ~ Ber(.3), Y ~ Ber(.5).

    (2) Corr(X, Y) = 0.2

     
     First, note that the following probabilities apply in the 2 x 2 table:

                     X
                  0      1
              -------------------
              |        |        |
           0  |    a   | .5 - a |  .5
              |        |        |
       Y      --------------------
              |        |        |
           1  | .7 - a | a - .2 |  .5
              |        |        |
              -------------------
                  .7      .3     1.00

     Since CORR(X, Y) = COV(X, Y)/Sqrt(Var(X)*Var(Y)), we have

           CORR(X, Y) = (E(XY) - E(X)*E(Y))/sqrt(.3 * .7 * .5 * 5.), or
[30]
           CORR(X, Y) = ((a - .2) - .3 * .5)/sqrt(.0525) = (a - .35)/.229.

     Therefore      a - .35 = CORR(X, Y) * .229 = .2 * .229 = .0458,
     which implies  a = .3958.

 ---------------------------------------------------------------------------------

 data bernoulli;

      n = 1000 ; seed = 20071215 ;

      a = .3958 ; b = .5 - a ; c = .7 - a ; d = a - .2 ;

      apb = a + b ;
      apbpc = a + b + c ;

      do i = 1 to n ;

         x = 0 ; y = 0 ;

         r = ranuni(seed) ;

         if r > a     then do ; x = 1 ; y = 0 ; end ;
         if r > apb   then do ; x = 0 ; y = 1 ; end ;
         if r > apbpc then do ; x = 1 ; y = 1 ; end ;

         output ;

      end ;
 run ;

 proc freq data = bernoulli ;
      tables x * y ;
 title1 'Prob 2 Final Exam PubH 7460 Fall 2007: frequencies' ;
 run ;

 proc corr data = bernoulli pearson ;
      var x y ;
 title1 'Prob 2 Final Exam PubH 7460 Fall 2007: Correlations' ;
 run ;

 endsas ;

-------------------------------------------------------------------------------------

SPH 7460 Final Exam   December 22, 2007                                   page 4 of 6

                                      Name: _________________________________________
=====================================================================================

3.  Random variable X has a normal distribution with mean m and variance 1:

      X ~ N(m, 1).

    A laboratory instrument for measuring X can only measure values which are less
    than some constant c.  The other values of X are discarded.  Let Y be the
    values which are observed.

    The cumulative distribution (cdf) of Y, G(y), is defined by

        G(y) =  1 if y > c,  and

        G(y) =  F(y)/(F(c)  if y <= c,

    where F is the cdf of X.

    a)  What is the pdf of X ?  What is the pdf of Y (in terms of the pdf 
        and cdf of X)?

            f(x) = (1/sqrt(2*pi))*exp(-.5*(x - m)**2)
[10]
            g(y) = F'(y)/F(c) = f(y)/F(c)


    b)  What is the loglikelihood of a single observation of Y ?

            loglike = log(f(y)) - log(F(c)).

[10]        The tricky part of this is, F(c) is also a function of m.  That is,
            after a change of variable,

            F(c) = (1/sqrt(2*pi))*Integral[-infinity, c - m] exp(-.5 * x^2) dx, or

            F(c) = PHI(c - m), where PHI is the CDF for the standard normal distr.


    c)  Suppose Y1, Y2, ..., Y100 are 100 random observations of Y.  Write a
        program to compute the maximum likelihood estimate of m.

[20]


This was a difficult problem.  The following program, based on explicit
computation of 2nd derivatives, converges very quickly and is not very
sensitive to the starting value.

-----------------------------------------------------------------------------------
options linesize = 80 ;

 data ydata ;
      infile 'prob3.yvalues' ;
      input i y ;

 run ;

 proc iml ;

   use ydata ;

   read all var {y} into y ;
   n = nrow(y) ;

   beta = 2.81 ;
   c = 4 ;

*-------------------------------------------------------------------------- ;

   start loglike(n, beta, c, y, l, dl, d2l) ;

       l = 0 ;
      dl = 0 ;
     d2l = 0 ;

     pi = 4 * atan(1) ;

     do i = 1 to n ;

        fc = (1/sqrt(2*pi)) * exp(-.5*(c - beta)**2) ;
       fyi = (1/sqrt(2*pi)) * exp(-.5*(y[i] - beta)**2) ;
       Fcm = probnorm(c - beta) ;

         l  =   l - .5*(y[i] - beta)**2 - log(Fcm) - log(sqrt(2*pi)) ;
        dl  =  dl + (y[i] - beta) + fc / Fcm ;
       d2li = -1 + fc * ((c - beta)) / Fcm + fc * fc / (Fcm * Fcm) ;
       d2l  = d2l + d2li ;

     end ;

     finish loglike ;
*---------------------------------------------------------------------------------;
  eps = 1e-7 ;
  diff = 1 ;

  do iter = 1 to 300 while (diff > eps) ;

     run loglike(n, beta, c, y, l, dl, d2l) ;

     invd2l = 1/d2l ;
     beta = beta - invd2l * dl ;

     diff = abs(d2l * dl) ;

     print iter l dl d2l diff beta ;

  end ;

  mlestm = beta ;

  print 'Prob 3 Final PubH 7460 Fall 2007' ;
  print 'ML Estimate of m: ' mlestm ;

  quit ;


-------------------------------------------------------------------------------------

SPH 7460 Final Exam  December 22, 2007                                    page 5 of 6

                                      Name: _________________________________________
=====================================================================================

4.  Assume X and Y both have the uniform distribution on the interval [0, 1], and
    are independent.

    Write a program which simulates the probability that .20 < X*Y < .25.


      data region ;

           n = 1000 ;
[20]       count = 0 ;

           do i = 1 to n ;

              x = ranuni(-1) ;
              y = ranuni(-1) ;

              if x*y > .2 and x*y < .25 then count = count + 1 ;

           end ;

           estprob = count / n ;

           output ;
 run ;

 proc print data = region ;
 title1 'Prob 4 Final Exam PubH 7460 Fall 2007' ;
 title2 'The variable estprob is the estimated probability' ;
 run ;


SPH 7460 Final Exam  December 22, 2007                                    page 6 of 6

                                      Name: _________________________________________
=====================================================================================

5.  To say that "X has a lognormal distribution" means that log(X) has a normal
    distribution.  More completely, you would say that "X has a lognormal distribution
    with parameters mu and sigma^2" if

         log(X) ~ N(mu, sigma^2).

    Write a macro in SAS which produces a graph of the CDF and the PDF for X,
    A call to the macro should look like:

          %graphlogn(mu, s2) ;

[30]

    The key thing to note with this problem is that

        prob(X < x) = prob(log(X) < log(x)) = prob(Y < log(x)),
        where Y has distribution N(mu, s2).

        Therefore the cdf of x is  probnorm((log(x) - mu)/s).

        You then find the pdf by estimating a derivative of the cdf.

*---------------------------------------------------------------------------------;

options linesize = 80 ;
footnote "~john-c/5421/prob5.f2007.sas &sysdate &systime" ;

FILENAME GRAPH 'gsas.grf' ;
LIBNAME  loc '' ;

OPTIONS  LINESIZE = 80 MPRINT ;

GOPTIONS
         RESET = GLOBAL
         ROTATE = PORTRAIT
         FTEXT = SWISSB
         DEVICE = PSCOLOR
         GACCESS = SASGASTD
         GSFNAME = GRAPH
         GSFMODE = REPLACE
         GUNIT = PCT BORDER
         CBACK = WHITE
         HTITLE = 2 HTEXT = 1 ;

*===================================================================== ;        

    %macro graphlogn(mu, s2) ;

       data graph ;

         n = 500 ;

         h = 1e-6 ;
         s = sqrt(&s2) ;
         lowery = &mu - 2 * s ;
         uppery = &mu + 2 * s ;

         do i = 1  to n ;

            y = lowery + (uppery - lowery) * i / n ;
            x = exp(y) ;
            yh = log(x + h) ;
            cdfx = probnorm((y - &mu)/s) ;
            cdfxh = probnorm((yh - &mu)/s) ;
            pdfx = (cdfxh - cdfx)/h ;

            output ;

         end ;

   run ;

   symbol1 i = j v = none l = 1 ;
   symbol2 i = j v = none l = 3 ;

   proc gplot data = graph ;
        plot cdfx * x  pdfx * x / overlay ;
   title1 h = 2 'Prob 5  Final Exam PubH 7460 Fall 2007' ;
   title2 h = 2 "CDF and PDF of lognormal ...mu = &mu s2 = &s2" ;
   run ;

 %mend ;

 %graphlogn(0, 1) ;

 endsas ;