Grades on this exam (100 points possible) : 80 77 75 73 64 61 SPH 7460 Final Exam December 13, 2013 page 1 of 5 Name: Answer Key______________________________ ===================================================================================== 1. Given a data file in the following format, Obs X Y --- --- --- 1 x1 y1 2 x2 y2 3 x3 y3 etc write a SAS program which computes bootstrap estimates of the 95% confidence limits for the correlation of X and Y. data pairs ; infile 'xy.data' ; input x y ; if x eq . or y eq . then delete ; run ; proc iml ; use pairs ; read all var {x} into x ; read all var {y} into y ; nboots = 1000 ; [20] n = nrow(x) ; corrxy = j(nboots, 1) ; do j = 1 to nboots ; sumx = 0 ; sumxx = 0 ; sumy = 0 ; sumyy = 0 ; sumxy = 0 ; do i = 1 to n ; randindex = 1 + int(n * ranuni(-1)) ; xrep = x(randindex) ; yrep = y(randindex) ; sumx = sumx + xrep ; sumxx = sumxx + xrep * xrep ; sumy = sumy + yrep ; sumyy = sumyy + yrep * yrep ; sumxy = sumxy + xrep * yrep ; end ; varx = (sumxx - sumx * sumx / n) / (n - 1) ; vary = (sumyy - sumy * sumy / n) / (n - 1) ; covxy = (sumxy - sumx * sumy / n) / n ; corrxy(j) = covxy / sqrt(varx * vary) ; end ; varname = 'corr' ; create correlations from corrxy ; append from corrxy ; quit ; proc sort data = correlations ; by corr ; run ; data correlations ; retain obsnum 0 cl025 cl975 ; set correlations ; obsnum = obsnum + 1 ; if obsnum = 25 then cl025 = corr ; if obsnum = 975 then cl975 = corr ; run ; proc print data = correlations; where obsnum = 1000 ; var cl025 cl975 ; run ; SPH 7460 Final Exam December 18, 2012 page 2 of 5 Name: Answer Key_______________________________ ===================================================================================== 2. A random variable X has the following distribution: 1. X = 0 with probability p 2. Conditional on X being > 0, the distribution is uniform on the interval [0, a]. You are given a file which has a sample of 100 values of X. a) Write the expression for the likelihood of a single observation, X_i. ix = 0 ; if x eq 0 then ix = 1 ; [12] lxi = p * ix + (1 - p) * (1 - ix) / a ; b) Write a PROC NLP procedure which will provide maximum likelihood estimates of p and a. The following program is an acceptable answer, but doesn't work: [13] proc nlp data = xvalues cov = 1 ; parms p = 0.5, a = 2 ; maxloglike ; if x = 0 then loglike = log(p) ; if x > 0 then loglike = log(1 - p) - log(a) ; run ; The reason it doesn't work is that the likelihood is flat for x > 0 and x <= a, and the likelihood is zero for x > a. The following variant of the program does work: proc nlp data = xvalues cov = 1 tech = nmsimp ; parms p = 0.5, a = 2 ; maxloglike ; if x = 0 then loglike = log(p) ; if x > 0 then loglike = log(1 - p) - log(a) ; if x > a then loglike = -99999 ; run ; The "tech = nmsimp" option causes the computations to be done using the 'Nelder-Mead Simplex" algorithm, which does not depend on derivatives. SPH 7460 Final Exam December 18, 2012 page 3 of 5 Name: Answer Key_______________________________ ===================================================================================== 3. Assume random variable Y has a Poisson distribution with unknown Poisson parameter c. Given a datafile with 100 samples of Y, a) Derive the maximum likelihood estimate of c. prob(Yi = yi) = c^yi * exp(-c) / yi!. [5] The loglikelihood for a single observation is yi * log(c) - c. The loglikelihood for n observations is Sum (yi * log(c) - c) = log(c)*Sum(yi) - n*c. The derivative of this expression is Sum(yi)/c - n. The MLE for c is therefore Sum(yi) / n = ybar. b) What is the corresponding Fisher information? Fisher information is approximated as the negative of the second derivative of the log likelihood at the MLE for the parameter. In this case, the second derivative is [10] -Sum*(yi)/c^2 which, evaluated at c = ybar, is -n* ybar / (ybar * ybar) = - n / ybar. c) How do you use Fisher information to compute the standard deviation of of the MLE of c ? var(MLE(c)) = -1/FisherInfo = ybar / n. sdev(MLE(c)) = sqrt(ybar / n). [10] SPH 7460 Final Exam December 18, 2012 page 4 of 5 Name: Answer Key_______________________________ ===================================================================================== 4. You go to the bus stop. The probability that a bus will arrive in the next hour is 50%. The time for a bus to arrive follows an exponential distribution, i.e., the hazard function for the bus arrival time is constant. a) What is that constant? .5 = 1 - exp(-h * 1) ---> h = -log(.5) = log(2). [12] b) Write a simulation program which estimates the expected value of how long you have to wait for a bus after you come to the bus stop. The program should also estimate the standard deviation of the arrival times. data wait ; h = log(2) ; [13] n = 1000 ; sumt = 0 ; sumtt = 0 ; do i = 1 to n ; u = ranuni(-1) ; time = -log(1 - u) / h ; sumt = sumt + time ; sumtt = sumtt + time * time ; end ; expecttime = sumt / n; vartime = (sumtt - sumt * sumt / n) / (n - 1) ; sdevtime = sqrt(vartime) ; output ; end ; run ; proc print data = wait ; var n expecttime vartime sdevtime ; title 'Expected wait time, variance and standard deviation' ; run ; SPH 7460 Final Exam December 18, 2012 page 5 of 5 Name: Answer Key_______________________________ ===================================================================================== 5. Why is SYMPUT useful in SAS ? SYMPUT makes is possible to carry data-dependent values (or statistics) from one data step into a later data step or procedure, as macro variables. Because SAS data steps are independent of each other, other ways of doing this would be quite cumbersome (although in general it can be done by other methods). [5]