1. Write a macro to sort an array in SAS. Show how it works by sorting the following array elements:
18 -12 . 41 2 2 2 95 -95 . . -14 21
2. Suppose X and Y are two independent random variables each having the same distribution. Let Z = max(X, Y). Perform simulations of size N = 1000 to describe (using PROC UNIVARIATE) the distribution of Z, if:
(a) X and Y are both uniform on [0, 1]
(b) X and Y are both N(0, 1)
In case (b), how can you test whether Z has a normal distribution?
3. Project 3 - Problem 3, notes.001.
1. Assume the following 2 x 2 table:
A B
-----------------
| | |
1 | a | b | 11
| | |
-----------------
| | |
2 | c | d | 9
| | |
-----------------
8 12 20
The margins are fixed as shown. The counts in the cells are
variable.
Let 'a' denote the count of observations in the upper left cell
(the [1, A] cell). Assume 'a' has a hypergeometric distribution.
a) Display the true distribution of 'a' as a histogram.
b) Simulate 1000 observations of the variable 'a', assuming
as above that 'a' has the hypergeometric distribution.
Display the results again as a histogram.
c) Compare the two histograms.
2. Assume you randomize 200 people, 100 in each to drug A and
drug B. The outcome is classified as either Success or
Failure. Assume that under the alternative hypothesis, the
success rate with drug A is 70%, while the success rate with
drug B is 55%. Assume you are going to carry out a statistical
test at the end of the study with a significance level of 0.05.
Carry out a simulation study to estimate the statistical power
for three different tests for a 2 x 2 table: the chi-square
test, the continuity adjusted chi-square test, and Fisher's
exact test. Include a scatterplot of the p-values of the
chi-square test versus Fisher's exact test. The simulation
study should be based on at least 500 simulated clinical trials.
3. Problem 7 parts 1. and 2., notes.005.
4.1 Problem 10, Parts 1 & 2, notes.008. Note that the file 'lhs.listing' is on
the course website, right after notes.008.
4.2 Problem 11, Parts 1, 2, 3, notes.010. Note that the datafile 'lhs.data' is on
the course website, right after notes.010.
4.3 Write a program in SAS or R to perform simple linear regression,
without using procedures. The program should compute least-squares
estimates of beta0 and beta1. It should compute the model, error,
and corrected total sums of squares, the F-statistic and corresponding
p-value, the estimate of s^2, R-square, and the standard errors of
the estimates of beta0 and beta1. You should generate a sample data
set of 100 observations to illustrate how the program works. You
should check that your program gives the same answers for all these
that PROC REG or the corresponding R routine gives.
1. Problem 12.A, notes.011
2. Problem 13, notes.012
3. Find the matrix of the linear transformation T: R^2 ---> R^2
which is reflection through the line y = 2*x.
4. Find the matrix of the linear transformation S(T), where T
is the linear transformation in preceding problem 3 and S is
the linear transformation of counterclockwise rotation by
30 degrees. Is S(T) the same thing as T(S) ?
Problem 14, both parts, notes.016 Problem 15, both parts, notes.017 Problem 16, part 1, notes.017
Problem 18 part 2, notes.019 Problem 19, notes.020
Write a program to compute sample size for a clinical trial
with two groups, where the endpoint is time-to-event (i.e.,
survival). The sample size computation should be based on the
the description in Biostatistical Methods, by John Lachin,
pages 409-412 [See class handout]. The test statistic is
the logrank test. Constant exponential hazards are assumed.
You can assume that the sample sizes in the two groups will be
equal. Input parameters should include the following:
==============================================================================
* alpha = two-sided signif level
* power = 1 - beta
*
* f = Maximal follow-up time
* a = Accrual time (assuming uniform accrual)
*
* r1 = proportion having event in group 1 at time = 1
* r2 = proportion having event in group 2 at time = 1
*
==============================================================================
Output from the program should look like the following:
==============================================================================
Logrank sample size program: {program name} 27AUG07 17:26
Computation based on Biostatistical Methods, John Lachin (2000)
Two groups with exponential hazard in each group
Two-sided alpha = 0.05
Power = 0.85
Maximal follow-up time f = 2.5
Accrual time = 1.5 (uniform accrual assumed)
Expected proportion of events in Group 1 in time = 1 : 0.55
Expected proportion of events in Group 2 in time = 1 : 0.44
Expected number of events in Group 1 : 189
Expected number of events in Group 2 : 161
Proportion of patients in Group 1: 0.5
Proportion of patients in Group 2: 0.5
Hazard in Group 1: 0.799
Hazard in Group 2: 0.580
Average hazard : 0.689
Relative hazard (Group 2 relative to Group1) : 0.726
Required total sample size : 513
===============================================================================
You can check that your program is giving approximately the
right values by comparing the results to those you can obtain
from PROC POWER in SAS version 9.
Web address: http://www.biostat.umn.edu/~john-c/assign7460.f2009.html
Most recent update: November 20, 2009.