December 17, 2003 page 1 of 6 SPH 5421 Final Examination Name: ________________________________________ ================================================================================= 1. The random variable X has the distribution specified by: prob(X = n) = 1 / 2^n, where n = 1, 2, 3, .... Write a SAS program (not using any SAS procedures) which (1) Generates 100 independent observations from this distribution. (2) Computes the mean of the observations (3) Computes the variance of the observations. [25] options linesize = 80 ; data geom ; n = 100 ; sum = 0 ; sumsq = 0 ; do i = 1 to n ; r = ranuni(-1) ; sumprob = 0 ; m = 1 ; do j = 1 to 100 ; sumprob = sumprob + 1 / 2**m ; if r < sumprob then goto jump1 ; m = m + 1 ; end ; jump1: sum = sum + m ; sumsq = sumsq + m*m ; output ; end ; mean = sum / n ; var = (sumsq - n*mean*mean)/(n - 1) ; output ; run ; proc print ; December 17, 2003 page 2 of 6 SPH 5421 Final Examination Name: ________________________________________ ================================================================================= 2. Let X1, X2, ..., Xn be a sample of observations of the random variable X. Define the LOWEST TERTILE to be the [n/3] smallest values in the sample, where [n/3] is the largest integer less than or equal to n/3. Define the HIGHEST TERTILE to be the [n/3] largest values. Define the "1/3 trimmed mean" to be the mean of the sample after the lowest tertile and highest tertile are thrown out. Write a SAS macro to compute the 1/3 trimmed mean of a sample. The call to the macro should look like the following: %trim3 (dataset, n, x, tmean), where dataset = a data set that includes the values for x n = number of observations in the dataset (you can assume none are missing) x = the variable of interest tmean = output trimmed mean. [25] %macro trim3 (dataset, n, x, tmean) ; proc sort data = &dataset ; by &x ; data xsort ; retain xobs 0 ; set &dataset ; xobs = xobs + 1 ; run ; proc means data = xsort n mean std ; where xobs gt int(%n / 3) and xobs le int(2 * &n / 3) ; var &x ; output out = xmean &tmean = mean ; run ; %mend ; December 17, 2003 page 3 of 6 SPH 5421 Final Examination Name: ________________________________________ ================================================================================= 3. A datafile has the following structure: OBS ID X ----- ---- --- 1 1 16 2 1 15 3 1 18 4 2 4 5 2 7 6 2 2 7 3 X7 8 3 X8 9 3 X9 10 4 X10 11 4 X11 12 4 X12 13 5 X13 14 5 X14 14 5 X15 etc. That is, there are 3 consecutive observations for each ID. Write a SAS program which reads in this datafile and writes out another datafile which has the following structure: OBS ID R S T ----- ---- --- --- --- 1 1 X1 X2 X3 2 2 X4 X5 X6 3 3 X7 X8 X9 etc. [25] data xobs ; retain casecount 0 x1 x2 ; input ID x ; casecount = casecount + 1 ; if casecount = 1 then x1 = x ; if casecount = 2 then x2 = x ; if casecount = 3 then do ; x3 = x ; output ; casecount = 0 ; end ; run ; proc print ; endsas ; December 17, 2003 page 4 of 6 SPH 5421 Final Examination Name: ________________________________________ ================================================================================= 4. A program produces maximum likelihood estimates s and t of two parameters S and T. It also produces a covariance matrix A for s and t: | .02 -.01 | A = | | | -.01 .03 |. Find an estimated standard error of r = s^2 + 3 * s * t. [25] var(r) = (approx) (dr/ds)^2 * var(s) + 2*(dr/ds) * (dt/ds) * cov(s, t) + (dr/dt)^2 * var(t) = (2*s + 3*t)^2*(.02) + 2*(2*s + 3*t)*(3*s)*(-.01) + (3*s)^2 * (.03) December 17, 2003 page 5 of 6 SPH 5421 Final Examination Name: ________________________________________ ================================================================================= 5. Levels of cortisol in a person's blood tend to vary according to the time of day that the blood is drawn. Here is a graph of cortisol levels for individuals, plotted against time of day on a 24-hour clock: | .20 | xxx xxx | xxxxx x xxx | x xx x .15 | x x xx | x xxx x |x xxxxx xx xxxx .10 | x x xx | x x | xx xxx .05 |_______xx___________________________________________ 0 4 8 12 16 20 24 time t A reasonable model for the expected cortisol level might be: E(C(t)) = a + b * cos(c + d*pi*t), where time t is in hours. a) Describe what the parameters are in terms of the graph. b) Specify further assumptions which are needed to justify using a least-squares procedures to obtain estimates of the para- meters a, b, c, and d. c) What would good initial guesses be for parameters a, b, c, and d ? d) Write a PROC NLIN program which produces least-squares estimates of the parameters. [25] a): a = overall mean b = amplitude c = phase offset d = frequency b): error ~ N(0, sig^2). c): a = .13, b = .08, c = 3.8, d = 3.8 = 4*pi/3 ; d): proc nlin method = marquardt ; pars a = .13 b = .08 c = 3.8 d = .08 ; der.a = 1 ; der.b = cos(c + d * pi * t) ; der.c = -b * sin(c + d * pi * t) ; der.d = - b*pi*t*sin(c + d * pi * t) ; f = a + b * cos(c + d * pi * t) ; model y = f ; run ; endsas ; December 17, 2003 page 6 of 6 SPH 5421 Final Examination Name: ________________________________________ ================================================================================= 6. Short answers: 1) What is an eigenvector? Given an n x n matrix A, an eigenvector v is an n x 1 column vector such that A * v = a * v for some nonzero constant a. 2) What is an advantage of the simplex method of computing a minimum of a function ? Usually converges, and does not need expressions for derivatives. 3) What is a disadvantage of the simplex method? Slow to converge, does not automatically give an estimate of variance. 4) Suppose f(x) = 5*x - exp(x). You can find a solution to f(x) = 0 by the use of Newton's method. The key equation is x(n + 1) = x(n) - ??? / ???. x(n + 1) = x(n) - f(x) / f'(x) = x(n) - (t*x - exp(x)) / (5 - exp(x)) ; 5) If X has a Poisson distribution with parameter h = 1, give the probabilities that: X = 0 : h^0 * exp(-h) / 0! = 1/e X = 1 : h^1 * exp(-h) / 1! = 1/e X = 2 : h^2 * exp(-h)/2! = 1 / (2 * e) ; [25]