1. (a) You have a stick which is 1 unit long. You make two marks on the stick at points X and Y, where both X and Y have a uniform[0, 1] distribution (the left end of the stick is at 0). This divides the stick into three pieces of lengths a, b, and c. Use a pseudo-random number generator to estimate the probability that a, b, and c can form the sides of a triangle. (b) You have a stick which is 1 unit long. You make two marks on the stick at points X and Y, where X has a uniform[0, 1] distribution and Y has a uniform distribution on the interval [x, 1]. This divides the stick into three pieces of lengths a, b, and c. Again use a pseudo-random number generator to estimate the probability that a, b, and c can form the sides of a triangle. (c) You have a stick which is 1 unit long. You make two marks on the stick at points X and Y, where X has a uniform[0, 1] distribution Y has a uniform distribution on either [0, x] or on [x, 1], depending on which of these two subintervals is longer. This divides the stick into three pieces of lengths a, b, and c. Again use a pseudo-random number generator to estimate the probability that a, b, and c can form the sides of a triangle. What do you think the true answer is for part (a)? Can you prove it? 2. The random number generator rexpon in SAS generates observations having an exponential distribution with expectation 1.0. Use rexpon to generate observations with an exponential distributions which have expectation 0.5 and 2.0 respectively. 3. Use the random number generator rannor in SAS to generate two random variables X and Y such that (1) X and Y both have normal distributions, and (2) Corr(X, Y) = .5. Generate 1000 X Y pairs and find the estimated correlation of the two variables.
1. A clinical trial is planned of two blood pressure drugs, A and B. Each person will be randomly assigned to take one of these two for 30 days. At the end, the systolic blood pressure (SBP) will be be measured. Here are some relevant assumptions: 1) The standard deviation of SBP is 10 mmHg. 2) The significance level is alpha = 0.05, two-sided. 3) Null hypothesis: mean(A) = mean(B). 4) Alternative hypothesis: mean(B) - mean(A) = 2 mmHg. 5) N = 60 people in each group. 6) The test statistic will be an unpaired t-test. Estimate the power of this clinical trial. Do this in three ways: a. By the use of a simulation study with at least 10,000 simulated trials. b. By the use of a formula for power (cite your source). c. By the use of PROC POWER in SAS or an equivalent routine in R. 2. You need to write a program for a randomization schedule for a clinical trial with 3 groups, A, B, and C. Assume you will use a randomized block design with block sizes 3 and 6. Assume there are two clinics and that your program allows for at least 120 treatment assignments for each clinic.
1. (a) Write a program to compute sample size for a clinical trial with two groups, where the endpoint is time-to-event (i.e., survival). The sample size computation should be based on the the description in Biostatistical Methods, by John Lachin, pages 409-412. A copy is included in the class website, right after notes.004.1. The test statistic is the logrank test. Constant exponential hazards are assumed. You can assume that the sample sizes in the two groups will be equal. Input parameters should include the following: ============================================================================== * alpha = two-sided signif level * power = 1 - beta * * f = Maximal follow-up time * a = Accrual time (assuming uniform accrual) * * r1 = proportion having event in group 1 at time = 1 * r2 = proportion having event in group 2 at time = 1 * ============================================================================== Output from the program should look like the following: ============================================================================== Logrank sample size program: {program name} 27AUG07 17:26 Computation based on Biostatistical Methods, John Lachin (2000) Two groups with exponential hazard in each group Two-sided alpha = 0.05 Power = 0.85 Maximal follow-up time f = 2.5 Accrual time = 1.5 (uniform accrual assumed) Expected proportion of events in Group 1 in time = 1 : 0.55 Expected proportion of events in Group 2 in time = 1 : 0.44 Expected number of events in Group 1 : 189 Expected number of events in Group 2 : 161 Proportion of patients in Group 1: 0.5 Proportion of patients in Group 2: 0.5 Hazard in Group 1: 0.799 Hazard in Group 2: 0.580 Average hazard : 0.689 Relative hazard (Group 2 relative to Group1) : 0.726 Required total sample size : 513 =============================================================================== 1(b). Check that your program is giving approximately the right values by comparing the results to those you can obtain from PROC POWER in SAS version 9. 2. Use Newton's Method to solve the following equation for x: u = CDF(x), where u = .1, .3, .5, .7, .9 and CDF(x) is the cumulative distribution function for the lognormal distribution function with theta = 0 and lambda = 1. Note that the CDF for the lognormal distribution is available in SAS 9.3: Look in the SAS 9.3 documentation under "SAS(R) 9.3 Functions and CALL Routines: Reference"
1. A permutation test problem: the datafile asthma.out is on the course website. It includes three variables: group (= 1 or 2), number of events, and follow-up time. There are two ways to summarize the data: the average rate of events per follow-up time, where you compute the fraction rate = (number of events / follow-up time) for each person. You can then compare the two groups using either a t-test or a nonparametric test such as the Wilcoxon. The other way of summarizing the data is to compute rateg = (total events ) / (total follow-up time) where this is done separately for each group. You can then use a permutation test to compare the groups. Do both of these analyses and compare the results. For the permutation test, base your results on at least 1000 permutations of the group assignments.
For this problem use the same dataset as that given for Homework #4. 1. Compute the bootstrap estimate of the variance of the median of the rate of events, where a separate rate is computed for each person. Do this for each group separately. 2. Let the number of bootstrap samples be 1000. Graph a histogram of the bootstrap medians. 3. Use the estimate of variance in part 1. to estimate 95% confidence intervals for the medians of each group and for the difference of the medians. 4. Use the bootstrap to compute a direct estimate of the 95% confidence intervals for the medians of each group and for the difference of the medians. 5. Also compute 95% confidence limits for the medians of each group using the method described in: https://epilab.ich.ucl.ac.uk/coursematerial/statistics/non_parametric/ confidence_interval.html
1. The function f(x) is defined as f(x) = log(x) + x. It is defined only for x > 0. Use Newton's method to find an approximate solution to the equation f(x) = 0. The solution should be correct to within 10^(-6). 2. The function G(x) is defined as G(x, y) = (xy - 1)^2 + cos(x - y) + sin(x^2 + y^2). Use Newton's method to find a vector (x, y) such that dG(x, y)/dx = 0 and dG(x, y)/dy = 0. Draw a contour plot of the function to see if your answer looks reasonable. 3. The random variable X has the following distribution: Pr(X = n) = (1 - r) * r^n, where n is a nonnegative integer, that is, n = 0, 1, 2, 3, ... If y is not a nonnegative integer, Pr(X = y) = 0. You don't know what r is, but it must be between 0 and 1. Assume you have the following observations for X: 1, 2, 0, 3, 5, 2, 3, 8. a) Find the maximum likelihood estimate of r. b) Verify that the solution provides a maximum of the (log)likelihood, not a minimum. c) How would you generate random observations from this distribution? d) Generate 100 pseudo-random observations from this distribution for r = 0.2 and find the mean, variance, and standard deviation of these values.
1. Problem 18A, notes.019: Do this using proc nlp and a similar routine in R. 2. Problem 19.1, notes.020: Do this using both proc iml, and proc nlp. 3. Problem 19.2, notes.020: Do this also using both proc iml and proc nlp.
Problem 20, notes.021. Use SAS PROC IML. Problem 21, notes.021 Problem 21a, notes.021 See notes.021a. Assume the model S2FEVPOS = b0 + b1*age + b2*bodymass + error, E(error) = 0, Var(error) = w^2. Define two other variables U and V, linear combinations of age and bodymass, such that: (1) U and V are uncorrelated (2) The regression of S2FEVPOS on U and V has the same sum of squared residuals as the regression of S2FEVPOS on age and bodymass (3) The regression sum of squares for the model in part (2) is the sum of the regression sum of squares for the regression of S2FEVPOS on U and the regression sum of squares for the regression of S2FEVPOS on V. This problem should be done using both R and SAS PROC IML.
1. Let A be the matrix {5 2, 2 8}. Let V be the vector {1, 0}. Perform the following steps: 1. V = (1/w) * V where w = sqrt(V`*V). 2. V = A*V. Repeat these steps 50 times. What is the result? 2. You have two datasets, one of which has the heights and weights of some men, and the other of which has the heights and weights of some women. Write a macro which does the following: 1. Computes the body mass index, BMI, for each person in the two datasets. BMI = weight / (height * height), where weight is in kilograms and height is in meters. (If the body mass index is already nonblank on the file, the above computation is skipped.) 2. For each man in dataset 1, computes the number of women in dataset 2 who have a smaller BMI than that man. The percent of women who have a smaller BMI than that man is also computed. 3. Compute the average of the percents for the men in 2. 4. Print the results in a table. Apply this macro to the Lung Health Study dataset, lhs.data. Note that the variable 'gender' on that file is coded as 0 = men, 1 = women. Also note that BMI is already computed there, with the label 'bodymass'.
Web address: http://www.biostat.umn.edu/~john-c/assign7460.f2013.html Most recent update: November 29, 2013.