SAMPLE SIZE, POWER, AND SIMULATION STUDIES SPH 7460 notes.004
Drugs A and B are both supposed to be effective in lowering serum
cholesterol. You want to do an experiment to see if one drug is better
than the other at reducing serum cholesterol.
You decide to carry out a clinical trial. You identify 200 people
with high serum cholesterol (> 240 mg/dl). You randomize 100 of them
to drug A and 100 to drug B. You measure their baseline levels of
serum cholesterol and ask them to take their assigned drugs for 6 weeks.
At the end of that time, you measure their serum cholesterol again. The
outcome variable Y for this experiment (clinical trial) is the change
in serum cholesterol from baseline to 6 weeks.
From prior information you have an estimate of the standard
deviation of Y. It is sigma = 20 mg/dl.
You want to know the statistical power of your study to detect
meaningful differences between the two drugs in reducing serum
cholesterol. For example, if the true mean change associated
with drug A is -12 mg/dl, and that for B is -15 mg/dl, what is the
probability that you will see a significant difference in mean changes
between drugs A and B at the end of the study?
To answer this, you first need to define 'significant'. Here we
mean statistically significant, not necessarily clinically significant.
A significance level alpha is chosen. You also have to specify whether
you are doing a two-sided test or a one-sided test.
Let's assume you are doing a two-sided test with significance
level alpha = .02.
Then the POWER of the test for drug A effect: muA = -12 mg/dl versus
drug B effect: muB = -15 mg/dl is
[1] prob(abs(W) > 2.33 | muA - muB = +3), where
W = (YAbar - YBbar)/(stderr(YAbar - YBbar).
Note that stderr(YAbar - YBbar) = sqrt(var(YAbar - YBbar)).
Further, you can probably assume that
var(YAbar - YBbar) = var(YAbar) + var(YBbar) [why?]
Note that var(YAbar) = var(YBbar) = sigma^2 / N, where
N is the sample size for each group. Here N = 100 and sigma = 20,
so we get
var(YAbar) = var(YBbar) = 400 /100 = 4,
and therefore
stderr(YAbar - YBbar) = sqrt(4 + 4) = 2.828.
How nice that the arithmetic is reasonably simple.
Now go back to equation [1]. It says
prob(abs((YAbar - YBbar)/2.828) > 2.33 | muA - muB = +3),
which simplifies to
[2] prob(Z > 2.33 - 3/2.828) + prob(Z < -2.33 - 3/2.828)
where Z is N(0, 1).
These two probabilities add up to: .1022 + .0003 = .1025
and that is the power of the study.
Implicit in all this is a formula for power for a two-group
clinical trial with a quantitative outcome. The formula has the
following ingredients:
1. The sample sizes in each of two groups [note: NA and NB
in general may not be equal]
2. The standard deviation sigma of the outcome measure.
3. An expected or projected treatment effect difference, delta.
4. A significance level, alpha (plus specification of
whether it is a one or two-sided test).
The outcome of the formula is power of the study.
Note that evaluating the formula requires that you have a way of
evaluating probabilities associated with Z-statistics and vice versa.
You can do this with tables or you can do it with a stat package like
SAS.
PROBLEM 5
1. Write a SAS or Splus program to compute power for a two-group
clinical trial as described above.
2. Apply the formula to graph power curves. Assume the same sigma
as specified above for the serum cholesterol trial. Assume a
two-sided test. Assume the possible treatment effect differences
range from -15 mg/dl to +15 mg/dl. Assume three different signif-
icance levels are specified: alpha = .05, alpha = .02, and
alpha = .01. Also assume (1) NA = NB = 100, and (2) NA = NB = 80.
3. Turn the formula around. Assume you specify the power, and you
want the sample size in each of the two groups. Here, assume the
two groups are of equal size. In other words, write a program
which computes sample size for a two-group study with equal groups,
a specified treatment effect, a specified significance level,
a specified power, and a specified standard deviation of measure-
ment.
{In doing this part of the problem, note that back up in formula [2] above,
one of the two parts is essentially negligible. This will always
be the case for any realistic sample size calculation. This makes
it much easier to write sample size as a function of power.}
========================================================================
SIMULATING A CLINICAL TRIAL
How reliable is your power computation back in Problem 5?
Suppose you specify alpha, delta, sigma, and the sample sizes
(say NA = NB for simplicity). If your formula says it has, for
example, 80% power, is it correct? In other words, if you did
10,000 trials with the specified parameters, would about 8,000 of
them turn out to have a significant result?
How can you test the formula?
You can simulate a single trial as follows. Start with the 100
people on drug A. Simulate a treatment effect for each person. For drug
A, you need to sample from a normal distribution with mean -12 and
standard deviation sigma = 20. Compute the mean YAbar for all of group A,
and a standard error of the mean, SEA. Do the same for Drug B, except
assume there that the true mean is -15. Then compute the test statistic
W = (YAbar - YBbar) / stderr(YAbar - YBbar).
Note that W may or may not exceed the cutoff for significance.
Repeat this process M times. Let m be the number of times that the
test statistic W achieves significance. What is the estimated
power? What is the approximate standard error of this estimate of power?
PROBLEM 6
Carry out a simulation study to estimate the power of a clinical
trial with the following parameters:
1. significance level alpha = .05 (two-sided)
2. sample size in each group: 100
3. drug A mean effect, -12 mg/dl; drug B mean effect, -16 mg/dl.
4. standard deviation of the outcome measure: 20 mg/dl
Your program should compute the estimated power and 95% confidence
limits for the true power.
SOME QUESTIONS
Shouldn't you be using a t-test rather than a z-test for testing the
outcome of a clinical trial as described above?
Do you think your power formula is wrong if in fact you use a t-test?
Is the formula likely to overestimate power or underestimate it if a
t-test is used ?
What might you do to improve the formula?
========================================================================
SIMULATIONS FOR ESTIMATION OF LOGISTIC REGRESSION COEFFICIENTS
Assume that the relationship between a dichotomous outcome
Y and a quantitative predictor X is logistic: that is, Y
can take on only values 0 (no event) and 1 (event), and
prob(Y = 1 | X) = 1 / (1 + exp(-b0 -b1*X)).
For example: you are doing an experiment on toxicity of a
certain substance to mice. You assign a dose X, chosen at
random between 0 and 1, to each mouse. The outcome event is
death; that is, if the mouse dies, Y = 1, and if the mouse
lives, Y = 0.
You want to estimate the coefficient b1. The larger b1 is,
the more toxic the drug.
You want your estimate of b1 to have a certain level of
precision. That is, you would like to have a confidence
interval for b1 that is small. You might want the confidence
interval to have a radius (half-width) of, say, .5 units.
The width of the confidence interval is a function of the estimated
standard error of b1_hat (the estimate of b1). Specifically,
a 95% confidence interval has radius equal to:
1.96 * serr(b1_hat).
In ordinary linear regression, the standard error of b1_hat
can be computed rather easily if you know the distribution of
the predictor variable X. In this case X has a uniform [0, 1]
distribution. However, this is not linear regression. In
logistic regression, there is no simple formula for the
standard error. The standard error can be computed after the
data are collected - PROC LOGISTIC gives an estimated standard
error - but it cannot be easily estimated in advance of having
the data.
The standard error is partly a function of sample size.
Large sample size is associated with smaller standard error.
What you want for the problem at hand is some way to estimate
the standard error for a range of sample sizes.
This can be done as follows. Choose a sample size N. Simulate
N observations from your proposed study. Use PROC LOGISTIC to
get the estimated standard error.
Repeat this for different values of N. Start with a small N
and work up to large values until you get a standard error of the
desired size. You may want to repeat this several times at each
value of N that you choose.
This is what is done in the following program with 9 different
sample sizes. A part of the printout occurs after the program,
edited to show the standard errors.
Note that the standard errors get smaller as N increases. For
N = 100, the standard error of b1 was .8511. For N = 1600, the
standard error was .2310.
This is a rather crude and inefficiently written program. It can
be made more concise by the use of PROC IML and macros, as will be
explained later.
======================================================================
options linesize = 100 ;
footnote "prog = ~john-c/logrand.sas &sysdate &systime" ;
%let b0 = 1 ;
%let b1 = 1 ;
%let n1 = 100 ;
%let n2 = 150 ;
%let n3 = 200 ;
%let n4 = 300 ;
%let n5 = 400 ;
%let n6 = 600 ;
%let n7 = 800 ;
%let n8 = 1200;
%let n9 = 1600;
*======================================================================;
data dataset ;
n = &n1 ;
do i = 1 to n ;
x = ranuni(-1) ;
p = 1/(1 + exp(-&b0 -&b1*x)) ;
r = ranuni(-1) ;
y = 0 ;
if r < p then y = 1 ;
output ;
end ;
run ;
proc logistic data = dataset ;
model y = x ;
title1 "Proc logistic with b0 = &b0, b1 = &b1" ;
title2 "N = &n1" ;
run ;
*======================================================================;
data dataset ;
n = &n2 ;
do i = 1 to n ;
x = ranuni(-1) ;
p = 1/(1 + exp(-&b0 -&b1*x)) ;
r = ranuni(-1) ;
y = 0 ;
if r < p then y = 1 ;
output ;
end ;
run ;
proc logistic data = dataset ;
model y = x ;
title1 "Proc logistic with b0 = &b0, b1 = &b1" ;
title2 "N = &n2" ;
run ;
*======================================================================;
.... other similar datasteps and PROC LOGISTICS omitted ....
*======================================================================;
data dataset ;
n = &n9 ;
do i = 1 to n ;
x = ranuni(-1) ;
p = 1/(1 + exp(-&b0 -&b1*x)) ;
r = ranuni(-1) ;
y = 0 ;
if r < p then y = 1 ;
output ;
end ;
run ;
proc logistic data = dataset ;
model y = x ;
title1 "Proc logistic with b0 = &b0, b1 = &b1" ;
title2 "N = &n9
run ;
*======================================================================;
Proc logistic with b0 = 1, b1 = 1 1
N = 100 17:57 Monday, September 15, 2003
The LOGISTIC Procedure
Data Set: WORK.DATASET
Response Variable: Y
Response Levels: 2
Number of Observations: 100
Link Function: Logit
Response Profile
Ordered
Value Y Count
1 0 19
2 1 81
Model Fitting Information and Testing Global Null Hypothesis BETA=0
Intercept
Intercept and
Criterion Only Covariates Chi-Square for Covariates
AIC 99.245 100.566 .
SC 101.850 105.776 .
-2 LOG L 97.245 96.566 0.679 with 1 DF (p=0.4100)
Score . . 0.682 with 1 DF (p=0.4088)
Analysis of Maximum Likelihood Estimates
Parameter Standard Wald Pr > Standardized Odds
Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio
INTERCPT 1 -1.0915 0.4911 4.9385 0.0263 . .
X 1 -0.7003 0.8511 0.6771 0.4106 -0.116044 0.496
N = 150 17:57 Monday, September 15, 2003
INTERCPT 1 -0.7231 0.3813 3.5961 0.0579 . .
X 1 -1.6454 0.7218 5.1970 0.0226 -0.277841 0.193
N = 200 17:57 Monday, September 15, 2003
INTERCPT 1 -1.5484 0.3497 19.6008 0.0001 . .
X 1 0.1407 0.6356 0.0490 0.8248 0.022254 1.151
N = 300 17:57 Monday, September 15, 2003
INTERCPT 1 -1.0860 0.2821 14.8236 0.0001 . .
X 1 -0.9856 0.5468 3.2485 0.0715 -0.155407 0.373
N = 400 17:57 Monday, September 15, 2003
INTERCPT 1 -0.9346 0.2427 14.8309 0.0001 . .
X 1 -1.3676 0.4747 8.3019 0.0040 -0.219096 0.255
N = 600 17:57 Monday, September 15, 2003
INTERCPT 1 -1.1729 0.2059 32.4535 0.0001 . .
X 1 -0.5274 0.3615 2.1283 0.1446 -0.083545 0.590
N = 800 17:57 Monday, September 15, 2003
INTERCPT 1 -1.0774 0.1705 39.9108 0.0001 . .
X 1 -1.0150 0.3251 9.7455 0.0018 -0.163858 0.362
N = 1200 17:57 Monday, September 15, 2003
INTERCPT 1 -1.0068 0.1371 53.9625 0.0001 . .
X 1 -0.9458 0.2627 12.9601 0.0003 -0.149178 0.388
N = 1600 17:57 Monday, September 15, 2003
INTERCPT 1 -0.9706 0.1210 64.2948 0.0001 . .
X 1 -1.1857 0.2310 26.3500 0.0001 -0.190150 0.306
prog = ~john-c/logrand.sas 15SEP03 17:57
===============================================================================
~john-c/5421/notes.004 Last update: September 19, 2007.