Grades on this exam: 100 99 91 77 73 65 October 15, 2013 Page 1 of 7 PubH 7460 - Fall 2013 - Exam 1 Name:___________________________________ ================================================================================= 1. Measurement of systolic blood pressure has two sources of variability: the person's "true" blood pressure, which varies from one person to the next, and pure measurement error, which differs from day to day within person. That is, SBP = T + E, where T is the person's true blood pressure and E is the measurement error. Assume T has a normal distribution with mean 128 and a variance of 10. Assume E has a normal distribution with mean 0 and variance 5. Assume T and E are independent. You are conducting a screening exercise to find people who may qualify for a clinical trial. If the first screen measurement is over 136, the person is asked to come back for a second screen measurement on a different day. The measurement error on the second day will be independent of the measurement error on the first day. However, that person's true blood pressure will be the same on both days. You expect some regression toward the mean to occur, that is, the measurement on the second day is likely to be smaller than the first measurement. If the second measurement is less than 132, the person is considered not eligible for the trial. Write a simulation program which estimates the expected value of the difference between the two measurements. ALSO the program should estimate what fraction of people (of all those who come to the first screening visit) end up being eligible for the trial. [25 points] options linesize = 150 ; footnote "~john-c/5421/regress.to.mean.sas &sysdate &systime" ; data screens ; mean = 128 ; n = 1000 ; var1 = 10 ; sd1 = sqrt(var1) ; var2 = 5 ; sd2 = sqrt(var2) ; nelig = 0 ; do i = 1 to n ; truebp = mean + sd1 * rannor(-1) ; bp1 = truebp + sd2 * rannor(-1) ; if bp1 ge 136 then do ; bp2 = truebp + sd2 * rannor(-1) ; diff = bp1 - bp2 ; if bp2 ge 132 then do ; nelig = nelig + 1 ; end ; output ; end ; end ; run ; proc means data = screens n mean stddev stderr clm min max ; var bp1 bp2 diff nelig ; run ; * Note that max(nelig) is the number of eligible. ; The SAS System 16:37 Saturday, October 19, 2013 1 The MEANS Procedure Lower 95% Upper 95% Variable N Mean Std Dev Std Error CL for Mean CL for Mean Minimum Maximum ------------------------------------------------------------------------------------------------------------------------------ bp1 18 137.5496965 0.8683842 0.2046801 137.1178592 137.9815338 136.2042208 138.9453566 bp2 18 135.2439803 2.9347348 0.6917236 133.7845710 136.7033895 129.1760206 139.6680830 diff 18 2.3057162 2.8379993 0.6689229 0.8944124 3.7170201 -1.3984453 9.3190451 nelig 18 8.6111111 4.5650706 1.0759975 6.3409549 10.8812673 1.0000000 16.0000000 ------------------------------------------------------------------------------------------------------------------------------ October 15, 2013 Page 2 of 7 PubH 7460 - Fall 2013 - Exam 1 Name:___________________________________ ================================================================================= 1. Contination page for program for Question 1. October 15, 2013 Page 3 of 7 PubH 7460 - Fall 2013 - Exam 1 Name:___________________________________ ================================================================================= 2. You need to write a randomization schedule for a clinical trial with two groups A and B. Your program is determined by the following constraints: If at any point in the schedule, equal numbers of patients have been assigned to each group, then the probability of assignment to either group is 0.5. If at any point in the schedule, the two groups are out of balance by m , then the probability of assignment to the smaller group is 1 - (1/2^m). a) Suppose at the 50th point in the schedule, there have been 27 assignments to group A and 23 to group B. What is the probability that the next assignment is to group A ? [3 points] prob(assign to A | NA = 27, NB = 23) = 2**(-4) = 1/16 = 0.0625 b) Describe two advantages of this randomization scheme and two disadvantages. Advantage 1): Not completely predictable at any point 2): If out of balance, tends to move toward balance, with large out-of-balance implying a higher probability of moving toward balance. [10 points] Disadvantage 1): Long runs are possible 2): Large imbalances are possible c) Write a program which produces a randomization schedule subject to the two constraints. [12 points] options linesize = 80 ; footnote "~john-c/5421/randsched2m.sas &sysdate &systime" ; data sched ; m = 0 ; na = 0 ; nb = 0 ; n = 100 ; seed = 20131015 ; do i = 1 to n ; r = ranuni(seed) ; mbefore = m ; if m = 0 then do ; p = 1 - (1 / 2**m) ; if r ge .5 then do; na = na + 1 ; assign = 'A' ; end ; if r lt .5 then do; nb = nb + 1 ; assign = 'B' ; end ; end ; if m gt 0 then do ; p = 1 - (1 / 2**m) ; if r lt p and na > nb then do ; nb = nb + 1 ; assign = 'B' ; end ; if r lt p and na < nb then do ; na = na + 1 ; assign = 'A' ; end ; if r ge p and na > nb then do ; na = na + 1 ; assign = 'A' ; end ; if r ge p and na < nb then do ; nb = nb + 1 ; assign = 'B' ; end ; end ; m = abs(na - nb) ; output ; end ; run ; proc print data = sched ; var i na nb mbefore m p r assign ; run ; endsas ; The SAS System 1 16:52 Saturday, October 19, 2013 Obs i na nb mbefore m p r assign 1 1 0 1 0 1 0.000 0.26753 B 2 2 1 1 1 0 0.500 0.48610 A 3 3 1 2 0 1 0.000 0.10325 B 4 4 2 2 1 0 0.500 0.02105 A 5 5 2 3 0 1 0.000 0.31906 B 6 6 2 4 1 2 0.500 0.86702 B 7 7 2 5 2 3 0.750 0.87936 B 8 8 3 5 3 2 0.875 0.04745 A 9 9 4 5 2 1 0.750 0.28850 A 10 10 5 5 1 0 0.500 0.24877 A October 15, 2013 Page 4 of 7 PubH 7460 - Fall 2013 - Exam 1 Name:___________________________________ ================================================================================= 2. Continuation page for Question 2. October 15, 2013 Page 5 of 7 PubH 7460 - Fall 2013 - Exam 1 Name:___________________________________ ================================================================================= 3. The equation 9*u^2 + 4*v^2 = 25 describes an ellipse in the plane R^2. This ellipse is the image of a circle under a certain linear transformation T. The equation of the circle is x^2 + y^2 = 1. a) What is the linear transformation? First divide both sides of the equation by 25: (9/25)*u^2 + (4/25)*v^2 = 1 [15 points] Then it is clear that you can write (3/5)*u = x and (2/5)*v = y, or u = (5/3)*x and v = (5/2)*y. | 5/3 0 | Therefore the matrix of the transformation is : | | | 0 5/2| b) What is the area of the ellipse [hint: what is the determinant of the matrix corresponding to the linear transformation?] The determinant of the matrix is 25/6. The area of the circle is pi. Therefore the area of the ellipse is 25*pi/6. [10 points] October 15, 2013 Page 6 of 7 PubH 7460 - Fall 2013 - Exam 1 Name:___________________________________ ================================================================================= 4. Assume the no-intercept model Y = b1 * X + e, where e has a distribution with mean 0 and variance sigma^2. Given the dataset: Obs X Y --- --- --- 1 X1 Y1 2 X2 Y2 etc etc etc n Xn Yn the sufficient statistic for the least-squares estimate of b1 is: sum(Xi * Yi) / sum(Xi * Xi). Write a program (not using a SAS or R procedure) which computes the least-squares estimate of b1 AND which also computes an estimate for sigma^2. Your program should take into account the fact that there may be missing values. [25 points] options linesize = 80 ; footnote "~john-c/5421/noint.regress.sas &sysdate &systime" ; data noint ; infile 'xy' end = endmark ; retain n 0 ; retain sumxy 0 ; retain sumx2 0 ; input x y ; if x ne . and y ne . then do ; n = n + 1 ; sumxy = sumxy + x * y ; sumx2 = sumx2 + x * x ; end ; if endmark = 1 then do ; b1 = sumxy / sumx2 ; put "b1 = " b1 ; end ; * [separate program, after you have the estimate of b1] ; data noint ; retain ssres 0 ; set noint end = endmark ; b1 = 5.172815534 ; predy = b1 * x ; ssres = ssres + (y - predy)**2 ; if endmark eq 1 then do ; n = 9 ; s2 = ssres / (n - 1) ; end ; run ; proc print data = noint ; var n x y predy ssres s2 ; run ; * Note that only the last line of the printout contains the * desired estimate of s^2 ; proc reg data = noint ; model y = x / noint ; run ; endsas ; The SAS System 1 13:07 Sunday, October 20, 2013 Obs n x y predy ssres s2 1 1 3.0 17 15.5184 2.195 . 2 2 2.5 16 12.9320 11.607 . 3 3 4.0 25 20.6913 30.173 . 4 4 5.0 33 25.8641 81.094 . 5 5 6.0 32 31.0369 82.022 . 6 6 7.0 35 36.2097 83.485 . 7 7 8.0 41 41.3825 83.631 . 8 8 9.0 43 46.5553 96.272 . 9 9 10.0 49 51.7282 103.715 12.9643 The SAS System 2 13:07 Sunday, October 20, 2013 The REG Procedure Model: MODEL1 Dependent Variable: y Number of Observations Read 9 Number of Observations Used 9 NOTE: No intercept in model. R-Square is redefined. Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 10335 10335 797.21 <.0001 Error 8 103.71456 12.96432 Uncorrected Total 9 10439 Root MSE 3.60060 R-Square 0.9901 Dependent Mean 32.33333 Adj R-Sq 0.9888 Coeff Var 11.13588 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| x 1 5.17282 0.18321 28.23 <.0001 October 15, 2013 Page 7 of 7 PubH 7460 - Fall 2013 - Exam 1 Name:___________________________________ ================================================================================= 4. Continuation page for Question 4.