October 31, 2006 Page 1 of 4 PubH 7460 - Fall 2006 - Exam 1 Name:___________________________________ ================================================================================= 1. Systolic blood pressure has an approximately normal distribution in human populations. In African-American men, the mean is 85 and the standard deviation is 10. In non-African-American men, the mean is 80 and the standard deviation is 8. African-Americans comprise about 15% of the U. S. population. Write a program which will simulate the distribution of blood pressures for a sample of size 1000 of men chosen at random from the U.S. population. Your program should estimate the mean, median, and standard deviation of systolic blood pressure. [25] ------------------------------------------------------------------------ %let meanafrican = 85 ; %let meanwhite = 80 ; %let sdafrican = 10 ; %let sdwhite = 8 ; %let probafrican = .15 ; %let n = 1000 ; data bpsimulate ; bpsum = 0 ; bpsumsq = 0 ; do i = 1 to &n ; race = ranuni(-1) ; if race lt &probafrican then do ; bp = &meanafrican + &sdafrican * rannor(-1) ; end ; if race ge &probafrican then do ; bp = &meanwhite + &sdwhite * rannor(-1) ; end ; output ; end ; run ; proc univariate plot normal ; var bp ; title1 'Descriptive stats, including, mean, sdev, median, for BP:' ; title2 'Simulated data ... ' ; ------------------------------------------------------------------------ October 31, 2006 Page 2 of 4 PubH 7460 - Fall 2006 - Exam 1 Name:___________________________________ ================================================================================= 2. The 90% TRIMMED MEAN of a set of observations of the random variable X is computed by discarding the upper and lower 5% of the observed X values and then calculating the mean of the remainder. (a) What advantage might there be to computing a trimmed mean rather than the ordinary mean? Would the trimmed mean have the same expectation as the ordinary mean? Would it have the same standard deviation? The trimmed mean is less affected by outliers and values that may be incorrect. The trimmed mean in general will not have the same expectation as the ordinary mean. [10] The standard deviation of the trimmed mean will be smaller than the standard deviation of the ordinary mean. (b) Write a program which computes the 90% trimmed mean for a set of 1000 observed values of X (assumed to be on an external data file). [15] ------------------------------------------------------------------------ data trim ; retain count 0 ; infile 'x.data' end = endmark ; input x ; if x ne . then do ; count = count + 1 ; output ; end ; if endmark eq 1 then do ; call symput('nonmisscount', count) ; end ; run ; proc sort data = trim ; by x ; data trim ; retain newcount 0 ; set trim ; newcount = newcount + 1 ; if newcount le .05 * &nonmisscount then delete ; if newcount ge .95 * &nonmisscount then delete ; run ; proc univariate ; title1 '90% trimmed mean problem ... ' ; title2 'Simulated data ... some missing.' ; var x ; ------------------------------------------------------------------------ October 31, 2006 Page 3 of 4 PubH 7460 - Fall 2006 - Exam 1 Name:___________________________________ ================================================================================= 3. Write a program to show you would use PROC IML to solve the following set of simultaneous linear equations: x + 2y + 3z = 1 x - 3y + 5z = 11 2x - 6y + z = 4 [25] ------------------------------------------------------------------------ proc iml ; A = {1 2 3, 1 3 5, 2 -6 1 } ; u = {1, 11, 4} ; ainv = inv(a) ; y = ainv * u ; Ay = A * y ; ustar = A * y ; print A u y Ay ; end ; finish ; ------------------------------------------------------------------------ October 31, 2006 Page 4 of 4 PubH 7460 - Fall 2006 - Exam 1 Name:___________________________________ ================================================================================= 4. A statistician wants to write a program which will produce treatment assignments for a two-group clinical trial. It is desirable that assignments to the two groups do not get too far out of balance. If at a certain point, group A has t more patients than group B, then the probability that the next assignment is to group B is prob = .5 + .5 * t / (abs(t) + 1). However, if t is greater than or equal to 5, then next assignment must be to group B regardless of the value of prob. For example, if there are 10 patients in group A and 7 in group B, then the probability that the next treatment assignment is to group B should be .5 + .5 * 3/4 = .875. Similar rules apply if there are t more patients in group B than in group A. Every time a new patient is eligible for the trial, the program is run to produce the next treatment assignment. Of course it needs to take the previous treatment assignments into account. Write a program to implement this randomization scheme. [25] ------------------------------------------------------------------------ data randomize ; length group $1 ; t = 0 ; na = 0 ; nb = 0 ; do i = 1 to 1000 ; probb = .5 + .5 * t / (abs(t) + 1); r = ranuni(-1) ; if r lt probb and abs(t) lt 5 then do ; nb = nb + 1 ; group = 'B' ; end ; if r ge probb and abs(t) lt 5 then do ; na = na + 1 ; group = 'A' ; end ; if t ge 5 then do ; nb = nb + 1 ; group = 'B' ; end ; if t le -5 then do ; na = na + 1 ; group = 'A' ; end ; t = na - nb ; output ; end ; run ; proc print data = randomize ; title1 'Randomization schedule which guarantees approx balance ' ; var i group na nb t probb ; run ; ------------------------------------------------------------------------