October 27, 2005 Page 1 of 4 PubH 7460 - Fall 2005 - Exam 1 Name:___________________________________ ================================================================================= 1. X and Y are independent random variables, each with a uniform distribution on the interval [0, 1]. Let Z = max(X, Y). a) What is the expectation of Z ? First, the density of Z is f(z) = 2*z, 0 < z < 1. [8] The expectation is therefore the integral of 2*z*z from 0 to 1. This is 2/3. b) What is the median of Z ? This is a little harder. You need to find "m" such that the integral from 0 to m of f(z) is 1/2. The integral of 2*z from 0 to m is m^2. Therefore m = 1/sqrt(2). [10] c) Write a program (SAS, Splus, or R) which will simulate the distribution of Z. The program should include (1) a way to estimate the expectation of Z, (2) a way to estimate the variance of Z, and (3) a way to estimate the median of Z. data max ; [9] n = 1000 ; sum = 0 ; do i = 1 to n ; x = ranuni(-1) ; y = ranuni(-1) ; z = max(x, y) ; sum = sum + z ; output ; end ; meanz = sum / n ; run ; proc univariate data = max ; var z ; [Note: proc univariate computes the mean as the 50th percentile, i.e., the median.] October 27, 2005 Page 2 of 4 PubH 7460 - Fall 2005 - Exam 1 Name:___________________________________ ================================================================================= 2. Assume that X and Y are two Bernoulli random variables, X --> Ber(r) and Y --> Ber(s), where r and s are numbers between 0 and 1, and corr(X, Y) = rho. That is, if rho is not zero, then X and Y are not independent. a) Write a program which simulates the bivariate distribution of X and Y. What this means is that the program should produce estimates of the cell-probabilities u, v, w and x for the following 2 x 2 table: [15] X = 0 X = 1 --------------------- | | | Y = 0 | u | v | | | | --------------------- | | | Y = 1 | w | x | | | | --------------------- This part of the problem is harder than the second part, and essentially you have to solve the second part in order to do the first part. Note that rho = cov(X, Y) = cov(X,Y)/sqrt(var(X) * var(Y)). Note further that var(X) = r*(1 - r), var(Y) = s*(1 - s). Note that cov(X, Y) = E(X*Y) - E(X)*E(Y). E(X) = r and E(Y) = s. Therefore rho*sqrt(r*(1-r)*s(1-s)) + r*s = E(X*Y). Finally, X*Y is also a Bernoulli random variable, and its expectation is the value in the lower right-hand cell in the diagram above, i.e., "x". Therefore x = rho*sqrt(r*(1-r)*s*(1-s)) + r*s. From this it is easy to solve for u, v, and w in the diagram above. To simulate counts in these four cells: data cell4 ; x = rho*sqrt(r*(1-r)*s*(1-s)) + r*s. w = s - x ; v = r - x ; u = (1-r) - v ; cell1 = 0 ; cell2 = 0 ; cell3 = 0 ; cell4 = 0 ; n = 1000 ; do i = 1 to n ; p = ranuni(-1) ; if p < u then cell1 = cell1 + 1 ; if p > u and p < u + v then cell2 = cell2 + 1; if p > u + v and p < u + v + w then cell3 = cell3 + 1 ; if p > u + v + 1 then cell4 = cell4 + 1 ; end ; output ; b) Suppose r = .3, s = .6, and rho = .5. What is the true value of x ? x = rho*sqrt(r*(1-r)*s*(1-s)) + r*s. [10] = .5*sqrt(.3*.7*.6*.4) + .18 = .5*sqrt(.0504) + .19 = .2922 October 27, 2005 Page 3 of 4 PubH 7460 - Fall 2005 - Exam 1 Name:___________________________________ ================================================================================= 3. This problem is concerned with estimating regression to the mean. Assume you measure the blood pressure of a large group of N people on one occasion. The i-th person has a 'true' blood pressure BPi. The population mean of all the BPi is 80. The standard deviation of the BPi (that is, the between-person standard deviation) is 10. If you did repeated measurements of just one person's blood pressure, the standard deviation of the measurement (i.e., the within- person standard deviation) is 6. After measuring the blood pressure of N people, you select those people whose measurement was bigger than or equal to 90. Later, you measure the BP of the people in this subset. a) What can you say about the expectation of this second blood pressure measure- ment relative to the first one? Why? [10] Smaller. b) Write a program (SAS, Splus, or R) which will simulate the difference between the first BP measurement and the second one in the subset described above. data regtomean ; [15] mu = 80 ; sigmabetween = 10 ; sigmawithin = 6 ; n = 1000 ; do i = 1 to n ; truebp = mu + sigmabetween * rannor(-1) ; measbp1 = truebp + sigmawithin * rannor(-1) ; if measbp1 gt 90 then do ; measbp2 = truebp + sigmawithin * rannor(-1) ; diff12 = measbp1 - measbp2 ; output ; end ; end ; run ; proc means data = regtomean ; var measbp1 measbp2 diff12 ; title1 'Illustration of regression to the mean ...' ; run ; October 27, 2005 Page 4 of 4 PubH 7460 - Fall 2005 - Exam 1 Name:___________________________________ ================================================================================= 4. Consider the equation x^3 = 4 * exp(-x) + 3 a) How do you know this equation has a solution for some positive real number x ? Let G(x) = x^3 - 4*exp(-x) - 3. [10] Note G(0) = -7 < 0. Note G(2) = 8 - 4*exp(-2) - 3 > 8 - 4 - 3 = 1 > 0. Since G is a continuous function, there must be a value "a" such that G(a) = 0 [intermediate value theorem]. b) Write a program which uses Newton's method to find an approximate solution. data solution ; eps = 1e-8 ; x0 = 1.5 ; do i = 1 to 50 (while diff gt eps) ; Gx = x**3 - 4*exp(-x) - 3. Gpx = 3*x**2 + 4*exp(-1) ; x1 = x0 - Gx/Gpx ; [13] output ; x0 = x1 ; end ; output ; run ; proc print data = solution ; title "Results of iterations to a solution of Gx = x**3 - 4*exp(-x) - 3 = 0" ; run ;