December 22, 2004 page 1 of 8 SPH 5421 Final Examination Name: ________________________________________ ================================================================================= 1. A datafile includes data on a random sample of 250 people, and three variables are recorded for each person: Variable 1 is ID, an identifying number Variable 2 is gender: Gender = 'M' or 'F' (male and female) Variable 3 is serum cholesterol There are 100 males and 150 females in the sample represented on the file. Let S be the set of all possible pairs of males and females; that is, S = { (IDi, IDj) }, where IDi is the ID for a male and IDj is the ID for a female. (a) How many possible pairs are there in the set S ? 100 x 150 = 15000 [2] Let M1 = the number of pairs in S for which the male person in the pair has a lower serum cholesterol than the female person. Let M2 = the number of pairs in S for which the female person has a lower serum cholesterol than the male person. (b) If there is no difference between males and females in the distribution of serum cholesterol in the population from which the sample is drawn, what is the expected value of M1 / (M1 + M2) ? [2] 1/2. (c) Write a program using proc iml that will compute M1 and M2 for a given datafile, and will compute the fraction M1 / (M1 + M2). [16] See next page December 22, 2004 page 2 of 8 SPH 5421 Final Examination Name: ________________________________________ ================================================================================= 1. part (c) continued data cholm cholf ; infile 'cholmf.data' ; input gender chol ; if gender eq 'M' then output cholm ; if gender eq 'F' then output cholf ; run ; proc iml ; use cholm ; read all var {chol} into males ; use cholf ; read all var {chol} into females ; rm = row(males) ; rf = row(females) ; m1 = 0 ; m2 = 0 ; do i = 1 to rm ; do j = 1 to rf ; if males(i) < females(j) then m1 = m1 + 1 ; if males(i) > females(j) then m2 = m2 + 1 ; end ; end ; estprob = m1 / (m1 + m2) ; file 'estprob.out' ; put ' m1 = ' m1 ' m2 = ' m2 ' estprob = ' estprob ; quit ; December 22, 2004 page 3 of 8 SPH 5421 Final Examination Name: ________________________________________ ================================================================================= 2. Check digits are often used for numerical IDs. Assume that the main part of the ID is a 4-digit number, like N = 7629. To compute the check digit: Multiply the rightmost digit by: 2, Multiply the next digit to the left by: 1, Multiply the next digit to the left by: 2, Multiply the next digit to the left by: 1, etc. For 7629, the process is: Digits of N: 7 6 2 9 Multipliers: 7 x 1 6 x 2 2 x 1 9 x 2 Products: 7 12 2 18 Add the resulting digits: 7 + 1 + 2 + 2 + 1 + 8 = 21 ------ Subtract this from the next largest multiple of 10: 30 - 21 = 9 The check digit is 9. (a) Compute the check digit for N = 8536. Digits of N: 8 5 3 6 8 x 1 5 x 2 3 x 1 6 x 2 [2] 8 + 1 + 0 + 3 + 1 + 2 = 15 Therefore check digit is: 5. (b) Write a macro which computes the check digit for any 3-digit number, N. The call to the macro should look like: %check(n, checkdig) ; where n is the input and checkdig is the output. [continue SEE NEXT PAGE on next page if needed] December 22, 2004 page 4 of 8 SPH 5421 Final Examination Name: ________________________________________ ================================================================================= 2. part (b) continued %macro check(n, checkdig) ; d3 = int(&n / 100) ; * 100's digit [18] n2 = &n - 100 * d3 ; d2 = int(n2 / 10) ; * 10's digit d1 = n2 - 10 * d2 ; * 1's digit d12 = d1 * 2 ; * 1's digit x 2 d122 = int(d12 / 10) ; * 10's digit of d12 d121 = d12 - 10 * d122 ; * 1's digit of d12 d21 = d2 ; * 10's digit x 1 d211 = d21 ; * same d32 = d3 * 2 ; * 100's digit x 2 d322 = int(d32 / 10) ; * 10's digit of d32 d321 = d32 - 10 * d322 ; * 1's digit of d32 sum = d121 + d122 + d211 + d321 + d322 ; * sum of digits summod10 = sum - 10 * int(sum/10) ; * sum of digits mod 10. &checkdig = 10 - summod10 ; * check digit ... &mend ; December 22, 2004 page 5 of 8 SPH 5421 Final Examination Name: ________________________________________ ================================================================================= 3. A linear transformation T from R^2 to R^2 has eigenvalues a1 = 3 and | 1 | | 1 | a2 = 2, and the corresponding eigenvectors are X1 = | | and X2 = | | . | 1 | |-1 | (a) Find the matrix of the linear transformation T. | A B | | 1 | | 3 | | A B | | 1 | | 2 | | | * | | = | | | | * | | = | | | C D | | 1 | | 3 | | C D | |-1 | |-2 | A + B = 3 A - B = 2 C + D = 3 C - D = -2 . [12] | 2.5 0.5 | Thus A = 2.5, B = .5, C = .5, D = 2.5. Matrix is: | | | 0.5 2.5 |. (b) Let S be the unit square - that is, S has vertices (0, 0), (1, 0), (1, 1), and (0, 1). Draw a picture of T(S), specifying all of its vertices. What is the area of T(S) ? ---(3,3) (0, 0) ---> (0, 0) ---- / (.5,2.5)--- / (1, 0) ---> (2.5, 0.5) / / / / (1, 1) ---> (3.0, 3.0) / / / / [5] (0, 1) ---> (0.5, 2.5) / ----(2.5,.5) / ---- (0,0)--- Area: 2.5 x 2.5 - .5 x .5 = 6.25 - .25 = 6. (c) What are the eigenvalues of the inverse of T ? 1/3 and 1/2. [3] December 22, 2004 page 6 of 8 SPH 5421 Final Examination Name: ________________________________________ ================================================================================= 4. An experiment is conducted in which water is allowed to evaporate from a number of 1-cup containers where the containers are exposed to different temperatures. Each cup contains 8 ounces of water at the beginning. After being exposed to a temperature Ti for Mi minutes, the amount of water Wi remaining in the cup is weighed. Assume that the amount of water that has evaporated is the following function of temperature Ti and minutes exposed Mi: Ei = a * Mi * (Ti - b) + e, where e is a normally distributed error term, e ~ N(0, v), and where v is an unknown constant variance. The constants a and b are also unknown and must be estimated from the data. Given a data file which includes Ti, Mi, and Wi, write a PROC NLIN program which will estimate a, and b. Explain how this program will also estimate v. data water ; input ti mi wi ; ei = 8 - wi ; run ; proc nlin method = marquardt data = water ; par a 1 b 1 ; [20] obsd = ei ; expd = a * Mi * (Ti - b) + e, der.a = Mi * (ti - b) ; der.b = -a * Mi ; model obsd = expd ; run ; The program prints the sum of squares. You can estimate v as a mean square, which is the sum of squares divided by n - p. The printout includes the mean square also. December 22, 2004 page 7 of 8 SPH 5421 Final Examination Name: ________________________________________ ================================================================================= 5. Assume X and Y are dichotomous random variables, both of which take on only values of 0 or 1, and that each has a Bernoulli distribution, X ~ Ber(.5) Y ~ Ber(.8), and X and Y are correlated: corr(X, Y) = .3. {Recall that corr(X, Y) = cov(X, Y) / [sqrt(var(X)*var(Y))].} (a) What are the expected cell counts and margins in the in the following table, where 100 observations (X, Y) are made ? X = 0 X = 1 ----------------- | | | Y = 0 | E(a) | E(b) | E(n1) | | | ----------------- | | | Y = 1 | E(c) | E(d) | E(n2) | | | ----------------------- E(m1) E(m2) | 100 | Since corr(X, Y) = cov(X, Y)/(sqrt(Vx)*sqrt(Vy)), [8] .3 = cov(X, Y) / (.5 * .4), or cov(X, Y) = .3 * .2 = .06. Noting that cov(X, Y) = E(XY) - E(X)E(Y), we have .06 = E(XY) - .5 * .8 = E(XY) - .4, or E(XY) = .46. Hence probabilities: X = 0 X = 1 ----------------- | | | Y = 0 | .16 | .04 | .20 | | | ----------------- | | | Y = 1 | .34 | .46 | .80 | | | ----------------------- .50 .50 | 1.00 | So the cell expectations are 16, 4, 34, and 46. (b) Write a program which simulates 100 observations from (X, Y). Next page ... [12] December 22, 2004 page 8 of 8 SPH 5421 Final Examination Name: ________________________________________ ================================================================================= 5., part (b) continued. [12] data xysim ; n = 100 ; p00 = .16 ; p01 = .04 ; p10 = .34 ; p11 = .46 ; psum00 = p00 ; psum01 = p00 + p01 ; psum10 = p00 + p01 + p10 ; cell00 = 0 ; cell01 = 0 ; cell10 = 0 ; cell11 = 0 ; do i = 1 to n ; r = ranuni(-1) ; if r < psum00 then do; X = 0; Y = 0 ; end ; if r > psum00 and r < psum01 then do; X = 1; Y = 0 ; end ; if r > psum01 and r < psum10 then do; X = 0; Y = 1 ; end ; if r > psum10 then do; X = 1; Y = 1 ; end ; output ; end ; run ; proc corr ; var x y ; title1 'Correlation of simulated X and Y ...' ; run ;