Grades on this exam: 70 73 76 83 89 89 93 93 94 94 96 96 97 99 November 6, 2007 Page 1 of 6 PubH 7460 - Fall 2007 - Exam 1 Name:___________________________________ ================================================================================= 1. Assume you want to create a randomization schedule for a clinical trial which will have 50 patients assigned to group R and 50 assigned to group B. One way to create such a schedule would be to put 50 Red marbles and 50 Blue marbles in a jar, and then when you find a patient who is eligible for the trial, you randomly take a marble from the jar. If it is Red, assign the patient to group R, and if it is Blue, assign the patient to group B. a) What are advantages and disadvantages of this kind of randomization schedule? Advantages: 1. Achieves overall balance if all the randomization assignments are used. 2. Is usually relatively unpredictable until you get near the end. Disadvantages 1. Can have long runs of the same treatment assignment 2. The last assignment is 100% predictable. If there is [12] a large imbalance near the end, the last several assign- ments are predictable. 3. If it is actually implemented with a jar of marbles, it would be possible to cheat by looking at the marbles or by putting a drawn marble back in and trying again if you don't like the treatment assignment. b) Write a program which would simulate creating a randomization schedule based on this idea. data marbles ; seed = 20071106 ; n = 50 ; length assign $1 ; do i = 1 to n ; [13] assign = 'R' ; rand = ranuni(seed) ; output ; assign = 'B' ; rand = ranuni(seed) ; output ; end ; run ; proc sort data = marbles ; by rand ; run ; proc print data = marbles ; var rand assign ; title1 'Treatment Assignments: 50 to group R, 50 to group B.' ; run ; November 6, 2007 Page 2 of 6 PubH 7460 - Fall 2007 - Exam 1 Name:___________________________________ ================================================================================= 2. Let X have a Bernoulli distribution with parameter 1/3, and let Y have a Bernoulli distribution with parameter 2/3. a) Describe a sampling experiment and a statistical procedure to test whether X and Y are independent (this is not a computing question). Sample pairs of X's and Y's and collect the data in a 2 x 2 table as follows: X -------------------- 0 1 -------------------- | | | 0 | a | b | | | | Y -------------------- | | | 1 | c | d | | | | ----------------------- N Then compute the p-value from the Fisher Exact Test or the corrected [9] chi-square test. b) Assume X and Y as described above are independent. Let Z = X + Y. Write a program which simulates the distribution of Z. data sums ; n = 1000 ; seed = today() ; do i = 1 to n ; rx = ranuni(seed) ; ry = ranuni(seed) ; x = 0 ; y = 0 ; [8] if rx < 1/3 then x = 1 ; if ry < 2/3 then y = 1 ; output ; end ; run ; proc freq data = sums ; tables z ; title1 'Distribution of the Sum of Two Bernoulli Random Variables' ; run ; November 6, 2007 Page 3 of 6 PubH 7460 - Fall 2007 - Exam 1 Name:___________________________________ ================================================================================= 2. contin. c) Assume X is Bernoulli(.5) and Y is Bernoulli(.5), and X and Y have a correlation of .3. Write a program to simulate the bivariate distribution of X and Y. Consider the following 2 x 2 table of probabilities: X -------------------- 0 1 -------------------- | | | 0 | p | .5 - p | .5 | | | Y -------------------- | | | 1 | .5 - p | p | .5 | | | ----------------------- .5 .5 1.0 What you need to know here is the value of p. You are given that [8] corr(X, Y) = .3. But corr(X, Y) = cov(X, Y)/sqrt(var(X)*var(Y)). var(X) = var(Y) = .5*.5 = .25. cov(X, Y) = E(X*Y) - E(X) * E(Y) = p - .5*.5 = p - .25. Therefore (p - .25) / .25 = .3. Solve this for p: p = .3 * .25 + .25 = .325. data bivar ; p = .325 ; p00 = .325 ; p01 = .175 ; p10 = .175 ; p11 = .325 ; n = 1000 ; seed = 987654321 ; do i = 1 to n ; x = 0 ; y = 0 ; if p00 < r < p00 + p01 then do ; x = 0 ; y = 1 ; end ; if p00 + p01 < r < p00 + p01 + p10 then do ; x = 1 ; y = 0 ; end ; if p00 + p01 + p10 < r then do ; x = 1 ; y = 1 ; output ; end ; run ; proc freq data = bivar ; tables x * y ; title1 'Bivariate distribution of 1000 observations of correlated Bernoulli' ; title2 'variables X and Y, with corr(X, Y) = .3, prob(X = 1) = prob(Y = 1) = .5' ; November 6, 2007 Page 4 of 6 PubH 7460 - Fall 2007 - Exam 1 Name:___________________________________ ================================================================================= 3. You have 14 coins in your pocket: 3 pennies, 2 nickels, 4 dimes, and 5 quarters. You want to buy a bar of soap in the drugstore. The cashier says, "That will be 69 cents." You reach into your pocket and take out 4 coins at random. a) What is the probability that the four coins will total at least enough to buy the bar of soap? The only ways to get 4 coins that total 69 cents or more are: Ways to get Value Number of Ways ----------- ----- ----------------------------------- 2Q + 2D 70 C(5, 2) * C(4, 2) = 10 * 6 = 60 [12] 3Q + 1P 76 C(5, 3) * C(3, 1) = 10 * 3 = 30 3Q + 1N 80 C(5, 3) * C(2, 1) = 10 * 2 = 20 3Q + 1D 85 C(5, 3) * C(4, 1) = 10 * 4 = 40 4Q 100 C(5, 4) = 5 * 1 = 5 ------------------------------------------------------------ Total 155 The total number of ways to take out 4 coins: C(14, 4) = 1001. The probability of getting 4 coins that total 69 cents or more: 155/1001 = .1548 b) Write a program which simulates the distribution of the total of four coins taken out of your pocket at random (without replacement). data coinsum ; seed = -1 ; file 'coinsum.out' ; n = 100000 ; do j = 1 to n ; sum = 0 ; n1 = 3; n5 = 2 ; n10 = 4 ; n25 = 5 ; ncoins = 14 ; p1 = n1/ncoins ; p5 = n5/ncoins ; p10 = n10/ncoins ; p25 = n25/ncoins ; do i = 1 to 4 ; r = ranuni(seed) ; if r < p1 then do ; sum = sum + 1 ; ncoins = ncoins - 1 ; n1 = n1 - 1 ; end ; if p1 <= r < p1 + p5 then do ; sum = sum + 5 ; ncoins = ncoins - 1 ; n5 = n5 - 1 ; end ; if p1 + p5 <= r < p1 + p5 + p10 then do ; sum = sum + 10 ; ncoins = ncoins - 1 ; n10 = n10 - 1 ; end ; if p1 + p5 + p10 <= r then do ; sum = sum + 25 ; ncoins = ncoins - 1 ; n25 = n25 - 1 ; end ; p1 = n1 / ncoins ; p5 = n5 / ncoins ; p10 = n10 / ncoins ; p25 = n25 / ncoins ; end ; output ; end ; run ; proc freq data = coinsum ; tables sum ; run ; [13] November 6, 2007 Page 5 of 6 PubH 7460 - Fall 2007 - Exam 1 Name:___________________________________ ================================================================================= 4. Assume the following dataset: OBS U Y --- --- --- 1 3 10 2 2 8 3 5 15 4 7 4 5 1 0 6 2 12 You want to estimate the parameters for the least-squares regression of of Y as a function of U. That is, the model is Y = b0 + b1 * U + e, where e is random error and has expectation 0 and variance V. a) What is the design matrix X ? | 1 3 | | 1 2 | | 1 5 | X = | 1 7 | [5] | 1 1 | | 1 2 | b) What is X` * X, where X` denotes the transpose of X ? Compute this explicitly for the data above. | 6 20 | X` * X = | | | 10 92 | [5] Note det(X) = 552 - 400 = 152. November 6, 2007 Page 6 of 6 PubH 7460 - Fall 2007 - Exam 1 Name:___________________________________ ================================================================================= 4., continued c) What is the inverse of X` * X ? | 92/152 -20/152 | inv(X` * X) = | | | -20/152 6/152 | [5] d) Suppose the model is Y = b0 + b1*U + b2*U^2 + e. Write a PROC IML program which reads in the data given above and computes least-squares estimates of b0, b1, and b2, and provides an estimate s2 of the error variance V. data UY ; [10] input U Y ; one = 1 ; U2 = U * U ; run ; proc iml ; use UY ; read all var {one U U2} into X ; read all var {Y} into Y ; n = nrow(X) ; p = 3 ; bhat = inv(X` * X) * X` * Y ; SSRES = Y` * Y - bhat` * X` * Y ; s2 = SSRES/(n - p - 1) ; print, bhat s2 ; quit ;