August 19, 2011 Some Sample Questions for Stat Computing II (SPH 5421) 1. Given a dataset of observations of the form (Xi, Yi), what computations would you do to decide between a linear model with an intercept and a linear model with no intercept? 2. How would you generate pseudorandom observations from a lognormal distribution? 3. You can estimate the number pi by (1) randomly generating points within the square which has vertices at (-1, -1), (-1, 1), (1, 1), and (1, -1), and then (2) computing what proportion of these points lie within a circle of radius 1 centered at the origin. Write SAS or SPlus code which accomplishes this and produces an estimate of pi. Execute your program with 10000 simulated points and print the answer. Can you think of a way to assign a standard error to the estimate of pi that you obtain by this method (assuming you didn't already know the true value) ? 4. Given a list of 120 people, find an efficient way to randomly assign them to three groups so that there are exactly 40 in each group. 5. An investigator gives drug A to 100 people and drug B to another 100 people for four weeks. The drug is supposed to lower blood pressure. Thus the outcome of the study is the change in blood pressure from baseline to the last day they are on drug. He finds that the mean blood pressure change in the A group is -12.4, and in the B group it is -15.8. The difference between groups A and B is -12.4 - (-15.8) = 3.4. He wants to know how likely it is that this might have occurred by chance. He decides to test the null hypothesis of no difference between the drugs by means of a permutation test. To do this he takes the list of 200 people and their changes in blood pressure and randomly assigns them to a pseudo-A group and a pseudo-B group. He then computes the difference in mean change in blood pressure between these pseudo-groups. He repeats this process 10,000 times. a) What do you expect the mean difference over the 10,000 pseudo- trials to be? b) How can he estimate the probability that he would obtain by chance a difference greater than that which he observed? c) Does this seem to be a valid way to carry out a statistical test? c) Write a SAS or Splus program which carries out his plan. 6. Assume a data file with the following structure: Obs X Y --- --- --- 1 12 3 2 16 2 . . . . . . . . . N 29 7 Write an IML program which computes regression coefficients for the model E(Y) = b0 + b1 * X + b2 * X^2. Include a computation of R-square. 7. White blood cell counts may be related to age. Suppose N people are sampled with ages A1, A2, ..., An and weight W1, W2, ..., Wn, and white cell counts Y1, Y2, ..., Yn. The white cell count Yi for a given person can be assumed to have a Poisson distribution with expectation equal to E(Yi) = exp(a*Wi + b*Ai), where Ai is the person's age and Wi is the person's weight. Write a proc iml procedure which will compute maximum likelihood estimates of a and b, and computes the approximate covariance matrix of these estimates. 8. Given a 2 x 2 table as follows, Disease ---------------- 0 1 ---------------- | | | 0 | a | b | | | | Exposure ---------------- | | | 1 | c | d | | | | ---------------- Define the estimated ODDS RATIO of disease for exposed versus nonexposed people as OR = (d / c) / (b / a). Write a macro which will compute and print the estimated odds ratio and a 95% confidence interval for the true odds ratio. The call to the macro should look like: %oddsmac (a, b, c, d, title); 9. Write a program to generate random samples from the negative binomial distribution; from the geometric distribution; from the hypergeometric distribution. 10. Write a macro which shows the image of the unit circle (x^2 + y^2 = 1) under a linear transformation A: R^2 --> R^2, specified by the matrix | a b | A = | | | c d | The call to the macro should look like %circimg (a, b, c, d, title) ; 11. Describe why SYMPUT is a useful command. 12. List 6 reasons why macros are useful. 13. List some possible disadvantages of macros. 14. What is a good reason to avoid using the 'goto' statement in SAS (or any other language)? What should you do instead ? 15. Why is it a good idea to include 'mprint' on an options card? 16. Which is better: SAS or Splus? 17. What things can go wrong in programs which are designed to minimize functions iteratively? 18. Write a program that (1) reads a dataset A which includes diastolic blood pressure (DBP) as a variable, and (2) reads a second dataset B which also includes DBP as a variable, and computes the percentages of people in dataset B that are above and below the median DBP of dataset A, and tests for whether these percentages are significantly different from 50%. 19. Write a macro for doing sample size calculations for a two-group clinical trial with a dichotomous endpoint. The call to the macro should look like %sampsize(p1, p2, alpha, beta, s) ; where p1 = the event probability in group 1, p2 = the event probability in group 2, alpha = significance level (two-sided), beta = probability of Type II error (= 1 - power), s = the desired total sample size. 20. An observation X in a dataset is called an OUTLIER if X > Q3 + 1.5 * (Q3 - Q1) or X < Q1 - 1.5 * (Q3 - Q1), where Q3 is the third quartile of dataset (i.e., 3/4 of the observations are less than Q3) and Q1 is the first quartile. Write a macro which lists all the outliers for a given variable in a dataset. The call to the macro should be of the form %outliers (indata, xvar, outdata), where indata is the input dataset, xvar is the variable of interest, and outdata is the output dataset of outlying values.Image: St. John's Wort Flower
Web address of this page: http://www.biostat.umn.edu/~john-c/review.s2003.html
Most recent update: August 19, 2011.