SPH 7460 Final Exam December 22, 2007 page 1 of 6 Five Problems - 6 pages Name: ______ANSWER KEY_________________________ ===================================================================================== 1. SAS PROC IML provides the function INV(A) to compute the inverse of a nonsingular n x n matrix A. SAS also provides the call-function call EIGEN(E, P, A) which computes the eigenvalues of A. If A is n x n and symmetric, then the array E is an n x 1 array which contains the eigenvalues of A, and P is an n x n matrix whose columns are the eigenvectors of A. By a theorem proved in homework, inverse(P) = transpose(P); that is, P * P` = P` * P = I. a) Suppose D is a 3 x 3 diagonal matrix, e.g., | d1 0 0 | | | D = | 0 d2 0 | | | | 0 0 d3 |, where d1, d2, and d3 are nonzero. What is the determinant of D ? [5] det(D) = d1 * d2 * d3 b) What is the inverse of D ? | 1/d1 0 0 | | | inv(D) = | 0 1/d2 0 | | | | 0 0 1/d3 |, [5] SPH 7460 Final Exam December 22, 2007 page 2 of 6 Name: _________________________________________ ===================================================================================== 1., Continued. c) Explain using matrix equations why knowing E and P, and using the result in b) above, you can compute the inverse of an 3 x 3 nonsingular symmetric matrix A. Then write a PROC IML program to do this (without using the INV function). | e1 0 0 | | | A * P = P * D, where D = | 0 e2 0 | | | [20] | 0 0 e3 | Therefore A = P * D * P`. Therefore inv(A) = inv(P * D * P`) = inv(P')*inv(D)*inv(P) = P * inv(D) * P`. -------------------------------------------------------------------------------------- proc iml ; a = {1 2 4, 2 5 6, 4 6 3} ; call eigen(e, p, a) ; d = j(3, 3, 0) ; dinv = d ; d[1, 1] = e[1] ; dinv[1, 1] = 1/e[1] ; d[2, 2] = e[2] ; dinv[2, 2] = 1/e[2] ; d[3, 3] = e[3] ; dinv[3, 3] = 1/e[3] ; ainv = inv(a) ; pdinvpt = p * dinv * p` ; print "a, e, d, p, ainv, pdinvpt = " a e d p ainv pdinvpt ; quit ; endsas ; --------------------------------------------------------------------------------------- SPH 7460 Final Exam December 22, 2007 page 3 of 6 Name: _________________________________________ ===================================================================================== 2. Write a program which simulates a bivariate distribution for (X, Y) such that (1) X and Y have the following Bernoulli distributions: X ~ Ber(.3), Y ~ Ber(.5). (2) Corr(X, Y) = 0.2 First, note that the following probabilities apply in the 2 x 2 table: X 0 1 ------------------- | | | 0 | a | .5 - a | .5 | | | Y -------------------- | | | 1 | .7 - a | a - .2 | .5 | | | ------------------- .7 .3 1.00 Since CORR(X, Y) = COV(X, Y)/Sqrt(Var(X)*Var(Y)), we have CORR(X, Y) = (E(XY) - E(X)*E(Y))/sqrt(.3 * .7 * .5 * 5.), or [30] CORR(X, Y) = ((a - .2) - .3 * .5)/sqrt(.0525) = (a - .35)/.229. Therefore a - .35 = CORR(X, Y) * .229 = .2 * .229 = .0458, which implies a = .3958. --------------------------------------------------------------------------------- data bernoulli; n = 1000 ; seed = 20071215 ; a = .3958 ; b = .5 - a ; c = .7 - a ; d = a - .2 ; apb = a + b ; apbpc = a + b + c ; do i = 1 to n ; x = 0 ; y = 0 ; r = ranuni(seed) ; if r > a then do ; x = 1 ; y = 0 ; end ; if r > apb then do ; x = 0 ; y = 1 ; end ; if r > apbpc then do ; x = 1 ; y = 1 ; end ; output ; end ; run ; proc freq data = bernoulli ; tables x * y ; title1 'Prob 2 Final Exam PubH 7460 Fall 2007: frequencies' ; run ; proc corr data = bernoulli pearson ; var x y ; title1 'Prob 2 Final Exam PubH 7460 Fall 2007: Correlations' ; run ; endsas ; ------------------------------------------------------------------------------------- SPH 7460 Final Exam December 22, 2007 page 4 of 6 Name: _________________________________________ ===================================================================================== 3. Random variable X has a normal distribution with mean m and variance 1: X ~ N(m, 1). A laboratory instrument for measuring X can only measure values which are less than some constant c. The other values of X are discarded. Let Y be the values which are observed. The cumulative distribution (cdf) of Y, G(y), is defined by G(y) = 1 if y > c, and G(y) = F(y)/(F(c) if y <= c, where F is the cdf of X. a) What is the pdf of X ? What is the pdf of Y (in terms of the pdf and cdf of X)? f(x) = (1/sqrt(2*pi))*exp(-.5*(x - m)**2) [10] g(y) = F'(y)/F(c) = f(y)/F(c) b) What is the loglikelihood of a single observation of Y ? loglike = log(f(y)) - log(F(c)). [10] The tricky part of this is, F(c) is also a function of m. That is, after a change of variable, F(c) = (1/sqrt(2*pi))*Integral[-infinity, c - m] exp(-.5 * x^2) dx, or F(c) = PHI(c - m), where PHI is the CDF for the standard normal distr. c) Suppose Y1, Y2, ..., Y100 are 100 random observations of Y. Write a program to compute the maximum likelihood estimate of m. [20] This was a difficult problem. The following program, based on explicit computation of 2nd derivatives, converges very quickly and is not very sensitive to the starting value. ----------------------------------------------------------------------------------- options linesize = 80 ; data ydata ; infile 'prob3.yvalues' ; input i y ; run ; proc iml ; use ydata ; read all var {y} into y ; n = nrow(y) ; beta = 2.81 ; c = 4 ; *-------------------------------------------------------------------------- ; start loglike(n, beta, c, y, l, dl, d2l) ; l = 0 ; dl = 0 ; d2l = 0 ; pi = 4 * atan(1) ; do i = 1 to n ; fc = (1/sqrt(2*pi)) * exp(-.5*(c - beta)**2) ; fyi = (1/sqrt(2*pi)) * exp(-.5*(y[i] - beta)**2) ; Fcm = probnorm(c - beta) ; l = l - .5*(y[i] - beta)**2 - log(Fcm) - log(sqrt(2*pi)) ; dl = dl + (y[i] - beta) + fc / Fcm ; d2li = -1 + fc * ((c - beta)) / Fcm + fc * fc / (Fcm * Fcm) ; d2l = d2l + d2li ; end ; finish loglike ; *---------------------------------------------------------------------------------; eps = 1e-7 ; diff = 1 ; do iter = 1 to 300 while (diff > eps) ; run loglike(n, beta, c, y, l, dl, d2l) ; invd2l = 1/d2l ; beta = beta - invd2l * dl ; diff = abs(d2l * dl) ; print iter l dl d2l diff beta ; end ; mlestm = beta ; print 'Prob 3 Final PubH 7460 Fall 2007' ; print 'ML Estimate of m: ' mlestm ; quit ; ------------------------------------------------------------------------------------- SPH 7460 Final Exam December 22, 2007 page 5 of 6 Name: _________________________________________ ===================================================================================== 4. Assume X and Y both have the uniform distribution on the interval [0, 1], and are independent. Write a program which simulates the probability that .20 < X*Y < .25. data region ; n = 1000 ; [20] count = 0 ; do i = 1 to n ; x = ranuni(-1) ; y = ranuni(-1) ; if x*y > .2 and x*y < .25 then count = count + 1 ; end ; estprob = count / n ; output ; run ; proc print data = region ; title1 'Prob 4 Final Exam PubH 7460 Fall 2007' ; title2 'The variable estprob is the estimated probability' ; run ; SPH 7460 Final Exam December 22, 2007 page 6 of 6 Name: _________________________________________ ===================================================================================== 5. To say that "X has a lognormal distribution" means that log(X) has a normal distribution. More completely, you would say that "X has a lognormal distribution with parameters mu and sigma^2" if log(X) ~ N(mu, sigma^2). Write a macro in SAS which produces a graph of the CDF and the PDF for X, A call to the macro should look like: %graphlogn(mu, s2) ; [30] The key thing to note with this problem is that prob(X < x) = prob(log(X) < log(x)) = prob(Y < log(x)), where Y has distribution N(mu, s2). Therefore the cdf of x is probnorm((log(x) - mu)/s). You then find the pdf by estimating a derivative of the cdf. *---------------------------------------------------------------------------------; options linesize = 80 ; footnote "~john-c/5421/prob5.f2007.sas &sysdate &systime" ; FILENAME GRAPH 'gsas.grf' ; LIBNAME loc '' ; OPTIONS LINESIZE = 80 MPRINT ; GOPTIONS RESET = GLOBAL ROTATE = PORTRAIT FTEXT = SWISSB DEVICE = PSCOLOR GACCESS = SASGASTD GSFNAME = GRAPH GSFMODE = REPLACE GUNIT = PCT BORDER CBACK = WHITE HTITLE = 2 HTEXT = 1 ; *===================================================================== ; %macro graphlogn(mu, s2) ; data graph ; n = 500 ; h = 1e-6 ; s = sqrt(&s2) ; lowery = &mu - 2 * s ; uppery = &mu + 2 * s ; do i = 1 to n ; y = lowery + (uppery - lowery) * i / n ; x = exp(y) ; yh = log(x + h) ; cdfx = probnorm((y - &mu)/s) ; cdfxh = probnorm((yh - &mu)/s) ; pdfx = (cdfxh - cdfx)/h ; output ; end ; run ; symbol1 i = j v = none l = 1 ; symbol2 i = j v = none l = 3 ; proc gplot data = graph ; plot cdfx * x pdfx * x / overlay ; title1 h = 2 'Prob 5 Final Exam PubH 7460 Fall 2007' ; title2 h = 2 "CDF and PDF of lognormal ...mu = &mu s2 = &s2" ; run ; %mend ; %graphlogn(0, 1) ; endsas ;