SPH 7460 Final Exam December 20, 2006 page 1 of 6 Five Problems - 6 pages Name: _________________________________________ ===================================================================================== 1. A man has 14 coins in his pocket. The numbers of coins by their values are: Value of Number of Coin Coins --------- -------- 1 cent 4 5 cents 2 10 cents 3 25 cents 5 He is going to remove 3 coins at random from his pocket. He wants to know the probability that the total value of these 3 coins is at least 30 cents. Write a program which will estimate this probability by means of a simulation. [30] ---------------------------------------------------------------------------------- The natural interpretation here is that sampling is being done without replacement. The natural approach is to list all 14 coins and choose 3 of them at random without replacement. The following is one way to do this: data coins14 ; array coins(14) ; array coincopy(14) ; coins(1) = 1 ; coins(2) = 1 ; coins(3) = 1; coins(4) = 1; coins(5) = 5 ; coins(6) = 5 ; coins(7) = 10 ; coins(8) = 10; coins(9) = 10; coins(10)=25; coins(11)=25; coins(12)=25; coins(13)=25; coins(14)=25 ; do i = 1 to 14; coincopy(i) = coins(i); next i ; countge30 = 0 ; do j = 1 to 1000 ; coinsum = 0 ; m = 14 ; do k = 1 to 3 ; r = 1 + int(m * ranuni(-1)) ; coinsum = coinsum + coincopy(r) ; if r < m then do ; coincopy(r) = coincopy(m); end ; m = m - 1 ; end ; if coinsum ge 30 then countge30 = countge30 + 1 ; do i = 1 to 14; coincopy(i) = coins(i); next i ; end ; probge30 = countge30 / nsim ; output ; run ; proc print data = coins14; var nsim countge30 probge30 ; run ; ---------------------------------------------------------------------------------- SPH 7460 Final Exam December 20, 2006 page 2 of 6 Name: _________________________________________ ===================================================================================== 2. A researcher wants to find the equation of a curve that goes through two given points in the x-y plane. The two points are (x1, y1) and (x2, y2). The curve is restricted to be of the form: y = A * exp(-A*x - B*x^2). Write a program in PROC IML or R or S+ which will compute estimates of the unknown coefficients A and B, given the values of x1, y1, x2 and y2. ----------------------------------------------------------------------------- This is not a maximum likelihood problem nor even a minimization problem. It is just a solve-nonlinear-equations problem. The approach is Newton's method. [30] proc iml ; *------------------------------------------------------------------------------; start pointf(x1, y1, x2, y2, beta, fAB, dfAB) ; A = beta[1] ; B = beta[2] ; E1 = exp(-A*x1 - B*x1*x1) ; E2 = exp(-A*x2 - B*x2*x2) ; d1 = y1 - A * E1 ; d2 = y2 - A * E2 ; fAB = j(2, 1, 0) ; fAB[1] = d1 ; fAB[2] = d2 ; dd1A = E1 - A * x1 * E1 ; dd1B = -A * x1 * x1 * E1 ; dd2A = E2 - A * x2 * E2 ; dd2B = -A * x2 * x2 * E2 ; dfAB = j(2, 2, 1) ; dfAB[1, 1] = dd1A ; dfAB[1, 2] = dd1B ; dfAB[2, 1] = dd2A ; dfAB[2, 2] = dd2B ; finish pointf ; *------------------------------------------------------------------------------; eps = 1e-8 ; x1 = 1 ; y1 = 2 ; x2 = 3; y2 = .5 ; * Given values ... ; beta0 = j(2, 1, 0) ; beta0[1] = 1 ; beta0[2] = 1; * Initial values ... ; call pointf(x1, y1, x2, y2, beta0, fAB, dfAB) ; i = 0 ; print i fAB dfAB beta0 beta1 diff dist ; dist = 10 ; do i = 1 to 100 while (dist > eps) ; call pointf(x1, y1, x2, y2, beta0, fAB, dfAB) ; diff = inv(dfAB) * fAB ; beta1 = beta0 - diff ; dist = max(abs(diff)) ; print i fAB dfAB beta0 beta1 diff dist ; beta0 = beta1 ; end ; quit ; ----------------------------------------------------------------------------- SPH 7460 Final Exam December 20, 2006 page 3 of 6 Name: _________________________________________ ===================================================================================== 3. "weights.data" is a datafile with the following structure: ID date1 weight1 date2 weight2 date3 weight3 date4 weight4 ---- ------- ------- ------- ------- ------- ------- ------- ------- 001 010305 120 032805 124 021305 119 060905 121 002 011506 149 120405 150 111705 143 021506 143 etc. That is, for each ID there are 4 dates and 4 weights. Some of these may be missing. Note that some of the dates, which are in MMDDYY format, are not in chronological order. a) Write a SAS macro which sorts the dates and weights in order by the date. -------------------------------------------------------------------------------- *-------------------------------------------------------------------------------; %macro sortcw(d, w, n) ; do i = 2 to n ; * This is essentially a ; do j = 1 to i - 1 ; * sort-and-carry routine ; * using a bubble sort. ; if d(i) < d(j) then do ; tempd = d(i) ; tempw = w(i) ; d(i) = d(j) ; w(i) = w(j) ; d(j) = tempd ; w(j) = tempw ; end ; end ; end ; %mend ; *-------------------------------------------------------------------------------; See next page for how this is used ... [18] SPH 7460 Final Exam December 20, 2006 page 4 of 6 Name: _________________________________________ ===================================================================================== 3. continued b) Include this macro in a program which will read the file above and produce a report that looks like the following: ------------------------------- ID = 001: Date Weight ------ ------- 010305 120 021305 119 032805 124 060905 121 ------------------------------- ID = 002: Date Weight ------ ------ 111705 143 120405 150 011506 149 021506 143 ------------------------------- etc. ---------------------------------------------------------------------------------- data weights ; infile 'weights.data' ; array date(4) date1-date4 ; array weight(4) weight1-weight4 ; length id $4 ; input id date1 MMDDYY6. weight1 date2 MMDDYY6. weight2 date3 MMDDYY6. weight3 data4 MMDDYY6. weight4 ; %sortcw(date, weight, 4) ; [12] put "--------------------------------" ; put "ID = " ID " Date Weight" ; put " ------ ------" ; put " " date1 " " weight1 ; put " " date2 " " weight2 ; put " " date3 " " weight3 ; put " " date4 " " weight4 ; run ; SPH 7460 Final Exam December 20, 2006 page 5 of 6 Name: _________________________________________ ===================================================================================== 4. The function f(A, B) is defined as f(A, B) = A * log(B). Assume that estimates of Ae and Be of A and B have been provided by a maximum likelihood program, and that the corresponding Fisher information matrix is: | 2.1 -1.2 | F = | | | -1.2 0.8 |. Write a program using PROC IML which will provide an estimated standard error for f(Ae, Be). ---------------------------------------------------------------------------------- This problem (the easiest on the exam) assumes you know how Fisher information is related to the covariance estimate of parameter estimates: namely, inverse(Fisher information) = est covar of parameters. Then you apply the delta method: proc iml ; fisher = j(2, 2, 0) ; fisher[1, 1] = 2.1 ; fisher[1, 2] = -1.2 ; fisher[2, 1] = -1.2 ; fisher[2, 2] = 0.8 ; AE = 1 ; BE = 1.7 ; * Assume values known ... ; fAB = AE * log(BE) ; dfA = log(BE); dfB = AE/BE ; df = j(2, 1, 0) ; df[1] = dfA ; df[2] = dfB ; covarfAB = df` * inv(fisher) * df ; print AE BE fAB df fisher covarfAB ; quit ; [30] SPH 7460 Final Exam December 20, 2006 page 6 of 6 Name: _________________________________________ ===================================================================================== 5. An observation X in a dataset is called an OUTLIER if X > Q3 + 1.5 * (Q3 - Q1) or X < Q1 - 1.5 * (Q3 - Q1), where Q3 is the third quartile of dataset (i.e., 3/4 of the observations are less than Q3) and Q1 is the first quartile. Write a macro which lists all the outliers for a given variable in a dataset. The call to the macro should be of the form %outliers (indata, xvar, outdata), where indata is the input dataset, xvar is the variable of interest, and outdata is the output dataset of outlying values. ------------------------------------------------------------------------------ This problem is a little trickier than it looks, because you have to remember that you are looking for the quartiles of xvar, not of the observation numbers ... *------------------------------------------------------------------------------- ; %macro outliers(indata, xvar, outdata) ; retain count 0 ; data &indata ; set &indata end = eof ; if xvar ne . then do ; count = count + 1 ; output ; end ; if endmark = 1 then do ; call symput(fcount, count) ; run ; proc sort data = &indata ; by &xvar ; run ; data outliers ; retain quart1 int(&fcount/4) ; retain quart3 int(&fcount/4) ; retain count 0 ; retain xquart1 xquart3 ; set &indata end = endmark ; if count eq quart1 then xquart1 = &xvar ; * These are approximate; if count eq quart3 then xquart3 = &xvar ; if endmark = 1 then do ; upper = xquart3 + 1.5*(xquart3 - xquart1) ; lower = xquart1 - 1.5*(xquart3 - xquart1) ; call symput(outupper, upper) ; call symput(outlower, lower) ; end ; run ; file "outlier.list" ; data outdata ; length outliermessage $20 ; retain count 0 ; set &indata ; if &xvar le &outlower then do ; [30] outliermessage = "Below lower bound: " ; put outliermessage &outlower ": " &xvar ; end ; if &xvar le &outupper then do ; outliermessage = "Above upper bound: " ; put outliermessage &outlupper ": " &xvar ; end ; run ; %mend ; *--------------------------------------------------------------------------------;