Scores on this exam: 100 97 75 55 November 11, 2010 Page 1 of 5 PubH 7460 - Fall 2010 - Exam 1 Name:___________________________________ ================================================================================= 1. (a) Write a simulation program which can estimate the area under the curve f(x) = x^(.5) * (1 - x)^(2.4), between x = 0 and x = 1. Note that the function f(x) is never less than zero and never bigger than 1 in the interval from x = 0 to x = 1. Therefore the graph of f(x) lies above the x-axis and below the line y = 1. That is, it lies inside the unit square, which has area 1. data area ; numberinside = 0 ; do i = 1 to 1000 ; x = ranuni(-1) ; [15] y = ranuni(-1) ; fx = x**.5 * (1 - x)**2.4 ; if y < fx then numberinside = numberinside + 1 ; end ; estimated_area = numberinside / 1000 ; run ; proc print data = area ; where i = 1000 ; var numberinside estimated_area ; run ; November 11, 2010 Page 2 of 5 PubH 7460 - Fall 2010 - Exam 1 Name:___________________________________ ================================================================================= 1. (b) continued. Note that the function in part (a) is closely related to the beta distribution; in fact, if K is the area under the curve in part (a), then (1/K) * f(x) is the beta distribution with parameters alpha = 1.5, beta = 3.4. Write another simulation program which estimates the mean of this beta distribution. (You can use the estimate of K obtained in part (a),) This can be done as in part (a), but it can also be done using Riemann sums as in calculus: ---------------------------------------------------------------------------------- data mean ; [10] n = 1000 ; K = .314159 ; sum = 0 ; do i = 1 to n ; dx = i / n ; fx = x**.5 * (1 - x)**2.4 ; xfx = x * fx ; sum = sum + xfx ; end ; mean = sum / K ; run ; proc print data = mean; var K sum mean ; run ; ---------------------------------------------------------------------------------- November 11, 2010 Page 3 of 5 PubH 7460 - Fall 2010 - Exam 1 Name:___________________________________ ================================================================================= 2. a) A random survey is taken of n people, asking each one how many have had chicken pox. A total of m people say yes. What is the estimated variance of the sample proportion of people who say yes? phat * (1 - phat)/ n = (m/n) * (1 - m/n) / n = m * (n - m) / n What are 95% confidence limits for the true proportion of people who will say yes? phat +/- 1.96 * sqrt(var) = m/n +/- 1.96 * sqrt(m * (n - m) / n) [10] b) Write a SAS or R program which computes bootstrap estimates of the 95% confidence for the true proportion. %let n = 500 ; %let m = 243 ; data sample ; do i = 1 to &m ; y = 1 ; output ; end ; [15] do i = &m + 1 to &n ; y = 0 ; output ; end ; run ; %macro boots(&n, dataset, nrep) ; %do i = 1 %to &nrep ; proc iml ; y = j{&&n, 1, 0} ; yboot= j{&&n, 1, 0} ; use sample ; read all var into y ; do j = 1 to &&n ; rindex = 1 + int(&&n * ranuni(-1)) ; yboot[j] = y[rindex] ; end ; varnames = "y" ; create yboots from yboot [colname = varnames] ; append from yboot ; quit ; proc means data = yboots mean ; var y ; output out = ymean mean = bootmean ; run ; data outfile ; set outfile ymean ; run ; %end ; %mend ; %boots(&n, sample, 1000) ; proc sort data = outfile ; by bootmean ; run ; data outfile ; retain obsnum 0 low95cl up95cl 0 ; set outfile ; obsnum = obsnum + 1 ; if obsnum = int(.025 * &n) then low95cl = bootmean; if obsnum = int(.975 * &n) then up95cl = bootmean; run ; proc print data = outfile ; where obsnum eq &n ; var low95cl up95cl ; run ; November 11, 2010 Page 4 of 5 PubH 7460 - Fall 2010 - Exam 1 Name:___________________________________ ================================================================================= 3. Write a program which produces a randomization schedule for a two-group clinical trial with randomly selected blocks of sizes 2 and 4. data sched ; array block2(2) ; array block4(4) ; array block(4) ; array b12(2) 1 2 ; array b22(2) 2 1 ; array b14(4) 1 1 2 2 ; array b24(4) 1 2 1 2 ; array b34(4) 1 2 2 1 ; array b44(4) 2 1 1 2 ; array b54(4) 2 1 2 1 ; array b64(4) 2 2 1 1 ; count = 400 ; do i = 1 to count while count le 400 ; twoorfour = 1 + int(2 * ranuni(-1)) ; [25] if twoorfour eq 1 then do ; blocksize = 2 ; perm = 1 + int(2 * ranuni(-1)) ; if perm eq 1 then do ; do j = 1 to blocksize ; block(j) = b12(j) ; end ; if perm eq 2 then do ; do j = 1 to blocksize ; block(j) = b22(j) ; end ; if twofour eq 2 then do ; blocksize = 4 ; perm = 1 + int(6 * ranuni(-1)) ; if perm eq 1 then do ; do j = 1 to blocksize ; block(j) = b14(j) ; end ; if perm eq 2 then do ; do j = 1 to blocksize ; block(j) = b24(j) ; end ; if perm eq 3 then do ; do j = 1 to blocksize ; block(j) = b34(j) ; end ; if perm eq 4 then do ; do j = 1 to blocksize ; block(j) = b44(j) ; end ; if perm eq 5 then do ; do j = 1 to blocksize ; block(j) = b54(j) ; end ; if perm eq 6 then do ; do j = 1 to blocksize ; block(j) = b64(j) ; end ; end ; do k = 1 to blocksize ; assign = block(k) ; output ; end ; count = count + blocksize ; end ; run ; proc print data = sched ; var i assign ; run ; November 11, 2010 Page 5 of 5 PubH 7460 - Fall 2010 - Exam 1 Name:___________________________________ ================================================================================= 4. Assume you are presented with a dataset which has N observations of two variables, X and Y. Write a program in PROC IML which produces least-squares estimates of the coefficients b0, b1, and b2, for the model: Y = b0 + b1*X + b2*X^2 + e, where e ~ N(0, sigma^2). Your program should give parameter estimates based only on the observations where neither X nor Y is missing. Your program should also estimate the residual sum of squares and the R-square. data xydata ; set xydata ; if x eq . or y eq . then delete ; x2 = x * x ; one = 1 ; run ; [25] proc iml ; use xydata ; read all var{one x x2} into x ; read all var{y} into y ; n = length(y) ; betahat = inv(x` * x) * x` * y ; ssreg = betahat` * x` * y - (1/n) * y` * y ; ssres = y` * y - betaha` * x` * y ; sstot = ssreg + ssres ; rsquare = ssreg / sstot ; put betahat ssres rsquare ; quit ;