PubH 6470 SAS Procedures and Data Analysis page 1 of 6 Exam 1 - March 9, 2006 Name: __Answer Key_________________ ========================================================================================== 1. Given the following program: ----------------------------------- data xyz ; input x y z ; datalines ; 5 8 13 4 0 0 9 19 12 [more lines] 12 -5 2 ; run ; ----------------------------------- Write another datastep which includes new variables a b c where a = smallest of x, y, and z b = middle value of x, y, and z and c = largest of x, y, and z. That is, the new data set should look like this: x y z a b c --- --- --- --- --- --- 5 13 8 5 8 13 4 1 2 1 2 4 9 19 12 9 12 19 [more lines] 12 -5 2 -5 2 12 -------------------------------------------------------------------------- [15] -------------------------------------------------------------------------- data xyzabc ; set xyz ; a = x ; if a ge y then a = y ; if a ge z then a = z ; c = y ; if c le x then c = x ; if c le z then c = z ; * [Comment: the preceding lines set a = minimum(x, y, z) and c = maximum(x, y, z).] b = z ; if a le y and y le c then b = y ; if a le x and x le c then b = x ; if a le z and z le c then b = z ; output ; run ; proc print data = xyzabc ; var x y z a b c ; -------------------------------------------------------------------------- Pubh 5470-3 Statistical Analysis Using SAS PROCEDURES page 2 of 6 Exam 1 - March 9, 2006 4 Name: _____________________________ ========================================================================================== 2. Heights are measured on three groups of 5-year-old children with 15 children in each group. The first group was raised in Beijing. The second group was raised in Houston. The third group was raised in Mombasa. The objective is to see whether the distribution of heights in the three groups is the same. Here is the program: -------------------------------------------------------------------------- data heights ; length city $7 ; input city h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 h12 h13 h14 h15 ; datalines ; Beijing 40 43 29 32 32 41 34 29 50 42 41 39 30 28 33 Houston 51 38 28 33 50 43 44 37 50 44 51 34 44 38 40 Mombasa 28 31 39 48 29 40 31 33 29 30 45 31 35 46 38 ; -------------------------------------------------------------------------- Write a program to restructure the data set as necessary and use PROC NPAR1WAY to test whether the distribution of heights in the three cities is the same. [20] --------------------------------------------------------------------------- options linesize = 80 ; footnote "~john-c/5421/exam1prob2.sas &sysdate &systime" ; data heights ; length city $7 ; input city @@ ; do i = 1 to 15 ; input height @@ ; output ; end ; datalines ; Beijing 40 43 29 32 32 41 34 29 50 42 41 39 30 28 33 Houston 51 38 28 33 50 43 44 37 50 44 51 34 44 38 40 Mombasa 28 31 39 48 29 40 31 33 29 30 45 31 35 46 38 ; run ; proc print data = heights ; var city height ; run ; proc npar1way data = heights ; class city ; var height ; run ; --------------------------------------------------------------------------- PubH 5470-3 Statistical Analysis Using SAS PROCEDURES page 3 of 6 Exam 1 - March 9, 2006 Name: _____________________________ ========================================================================================== 3. A case-control study was performed in which the cases were people over 60 who had liver cancer, and the controls were people over 60 who did not have liver cancer. There were twice as many controls as cases. The risk factor of interest was exposure to abdominal CT scans. The data were as follows: Case Control ------------------------- Abdom | | | CT Yes | 104 | 150 | n1 = 254 | | | | exp=84.67 | exp=169.33| ------------------------- Abdom | | | CT No | 496 | 1050 | n2 = 1546 | | | | exp=515.33|exp=1030.67| ------------------------- m1 = 600 m2 = 1200 N = 1800 a) What null hypothesis might you want to test? H0: prob(Abdom CT = Yes | case) = prob(Abdom CT = Yes | control) [5] Or: H0: Odds ratio = 1. b) Compute an appropriate odds ratio estimate and explain what it means. If your odds ratio estimate is bigger than 1 or less than 1, tell what that means in terms of the relationship between exposure to CT scans and liver cancer. OR estimate = 104*1050/(150*496) = 1.468. [10] This means that it appears to be more likely that a person with liver cancer will have had exposure to abdominal CT scans. c) Compute expected counts for each cell in the table, assuming the null hypothesis. See above: (row margin) X (column margin) / Total [5] PubH 5470-3 Statistical Analysis Using SAS PROCEDURES page 4 of 6 Exam 1 - March 9, 2006 Name: _____________________________ ========================================================================================== 3. Continued d) Write a SAS program (PROC FREQ) to analyze this data, including computation of the odds ratio estimate and a 95% confidence interval for the true odds ratio. Tell how you might explain to a nonstatistician what it means if the 95% confidence interval does not include the number 1.0. [15] --------------------------------------------------------------------------- data livercan ; input case abdomct count ; datalines ; 1 1 104 0 1 150 1 0 496 0 0 1050 ; run ; proc freq data = livercan ; weight count ; tables abdomct * case / chisq measures ; run ; --------------------------------------------------------------------------- If the null hypothesis is true, you would expect that in repetitions of this study, 95% of the confidence intervals for the OR would contain the value "1". So this is interpreted as evidence that the null hypothesis is not true. --------------------------------------------------------------------------- e) If the chi-square test in PROC FREQ is highly significant (p < .0001), does that mean that abdominal CT scans cause liver cancer? Why or why not? Not necessarily. It could be that a history of liver disease causes CT scans. Or it could be that liver cancer is correlated with some other variable that results in more CT scans. In general case-control studies do not yield causative conclusions, only association. [5] PubH 5470-3 Statistical Analysis Using SAS PROCEDURES page 5 of 6 Exam 1 - March 9, 2006 Name: _____________________________ ========================================================================================== 4. The graph below shows mean cholesterol levels for 4 groups of people: ---------------------------------------------------------------------- 250 + A | B | A A B | B E | A D C C | B D A Cholesterol | F F B | A C D E 200 + B C D B | F C A B | F A | B A | C A | B | B | 150 + ---+-----------------+-----------------+-----------------+-- Men Women Men Women No bacon No bacon Bacon Bacon ---------------------------------------------------------------------- The vertical axis is cholesterol level. The four groups are defined by gender and whether or not the person habitually eats bacon. There are 25 people in each group. Assume you have the data on a file with the following structure: Gender Bacon (0 = male (1 = yes, Cholesterol 1 = female) 0 = no) (mg/deciliter) ---------- ------- ------------- 1 0 231 0 0 177 0 1 250 etc. Write a SAS program to read in the data. Write an appropriate analysis routines to see what the effects of gender and bacon are on cholesterol level. There should be two analyses: one for a model that includes only main effects, and one for a model that includes main effects and an interaction term. What does it mean if the interaction of gender and bacon is positive and significant? [Continue on next page if necessary] [25] --------------------------------------------------------------------------- data bacon ; infile 'diet.bacon' ; input gender bacon chol ; run ; proc glm data = bacon ; class gender bacon ; model chol = gender bacon / solution ss2 ; title1 'Model 1: cholesterol = b0 + b1*gender + b2*bacon + error' ; run ; proc glm data = bacon ; class gender bacon ; model chol = gender bacon gender * bacon / solution ss2 ; title1 'Model 2: cholesterol = b0 + b1*gender + b2*bacon + b3*gender*bacon + error' ; run ; --------------------------------------------------------------------------- If the coefficient of the interaction term is positive, it means that bacon raises cholesterol more in women than it does in men. PubH 5470-3 Statistical Analysis Using SAS PROCEDURES page 6 of 6 Exam 1 - March 9, 2006 Name: _____________________________ ========================================================================================== 4. Continued ...