Most recent update: November 19, 2007.
1. Write a macro to sort an array in SAS. Show how it works by sorting the following array elements:
18 -12 . 41 2 2 2 95 -95 . . -14 21
2. Suppose X and Y are two independent random variables each having the same distribution. Let Z = max(X, Y). Perform simulations of size N = 1000 to describe (using PROC UNIVARIATE) the distribution of Z, if:
(a) X and Y are both uniform on [0, 1]
(b) X and Y are both N(0, 1)
In case (b), how can you test whether Z has a normal distribution?
3. Project Assignment 3 in notes.001.
1. Assume the following 2 x 2 table:
A B
-----------------
| | |
1 | a | b | 30
| | |
-----------------
| | |
2 | c | d | 20
| | |
-----------------
20 30 50
The margins are fixed as shown. The counts in the cells are
variable.
Let 'a' denote the count of observations in the upper left cell
(the [1, A] cell). Assume 'a' has a hypergeometric distribution,
as described in class.
a) Display the true distribution of 'a' as a histogram.
b) Simulate 1000 observations of the variable 'a', assuming
as above that 'a' has the hypergeometric distribution.
Display the results again as a histogram.
c) Compare the two histograms.
2. Assume you randomize 200 people, 100 in each to drug A and
drug B. The outcome is classified as either Success or
Failure. Assume that under the alternative hypothesis, the
success rate with drug A is 70%, while the success rate with
drug B is 55%. Assume you are going to carry out a statistical
test at the end of the study with a significance level of 0.05.
Carry out a simulation study to estimate the statistical power
for three different tests for a 2 x 2 table: the chi-square
test, the continuity adjusted chi-square test, and Fisher's
exact test. Include a scatterplot of the p-values of the
chi-square test versus Fisher's exact test. The simulation
study should be based on at least 500 simulated clinical trials.
3. Problem 7 parts 1. and 2., notes.005.
Problem 10, Parts 1 & 2, notes.008 Problem 11, Parts 1, 2, 3, notes.010. Write a program in SAS or R to perform simple linear regression, without using procedures. The program should compute least-squares estimates of beta0 and beta1. It should compute the model, error, and corrected total sums of squares, the F-statistic and corresponding p-value, the estimate of s^2, R-square, and the standard errors of the estimates of beta0 and beta1. You should generate a sample data set of 100 observations to illustrate how the program works. You should check that your program gives the same answers for all these that PROC REG or the corresponding R routine gives.
Problem 12.A, notes.011 Problem 13, notes.012 Write a program in SAS to simulate observations from the geometric distribution. A geometric random variable is the number of independent Bernoulli trials required before a success occurs, where the probability of success on any given trial is 'p'. Your program should have 'p' as a variable parameter. Letting p = .03, generate 10000 simulated observations from the geometric distribution, and use the results to estimate the mean and standard deviation.
Problem 15, part 2, notes.017 Problem 16, part 1, notes.017 Problem 17, notes.018
Write a program to compute sample size for a clinical trial
with two groups, where the endpoint is time-to-event (i.e.,
survival). The sample size computation should be based on the
the description in Biostatistical Methods, by John Lachin,
pages 409-412 [See class handout]. The test statistic is
the logrank test. Constant exponential hazards are assumed.
You can assume that the sample sizes in the two groups will be
equal. Input parameters should include the following:
==============================================================================
* alpha = two-sided signif level
* power = power = 1 - beta
*
* f = Maximal follow-up time
* a = Accrual time (assuming uniform accrual)
*
* r1 = proportion having event in group 1 at time = 1
* r2 = proportion having event in group 2 at time = 1
*
==============================================================================
Output from the program should look like the following:
==============================================================================
Logrank sample size program: {program name} 27AUG07 17:26
Computation based on Biostatistical Methods, John Lachin (2000)
Two groups with exponential hazard in each group
Two-sided alpha = 0.05
Power = 0.85
Maximal follow-up time f = 2.5
Accrual time = 1.5 (uniform accrual assumed)
Expected proportion of events in Group 1 in time = 1 : 0.55
Expected proportion of events in Group 2 in time = 1 : 0.44
Expected number of events in Group 1 : 189
Expected number of events in Group 2 : 161
Proportion of patients in Group 1: 0.5
Proportion of patients in Group 2: 0.5
Hazard in Group 1: 0.799
Hazard in Group 2: 0.580
Average hazard : 0.689
Relative hazard (Group 2 relative to Group1) : 0.726
Required total sample size : 513
===============================================================================
You can check that your program is giving approximately the
right values by comparing the results to those you can obtain
from PROC POWER in SAS version 9.
Problem 18 part 2., notes.019 Problem 19, notes.020 Problem 20, notes.021 Problem 21a, notes.021
.
.
.
Web address of this page: http://www.biostat.umn.edu/~john-c/assign7460.f2007.html