SAMPLE SIZE, POWER, AND SIMULATION STUDIES        SPH 7460 notes.004

     Drugs A and B are both supposed to be effective in lowering serum
cholesterol.  You want to do an experiment to see if one drug is better
than the other at reducing serum cholesterol.

     You decide to carry out a clinical trial.  You identify 200 people
with high serum cholesterol (> 240 mg/dl).  You randomize 100 of them
to drug A and 100 to drug B.  You measure their baseline levels of
serum cholesterol and ask them to take their assigned drugs for 6 weeks.
At the end of that time, you measure their serum cholesterol again.  The
outcome variable Y for this experiment (clinical trial) is the change
in serum cholesterol from baseline to 6 weeks.

     From prior information you have an estimate of the standard
deviation of Y.  It is sigma = 20 mg/dl.

     You want to know the statistical power of your study to detect
meaningful differences between the two drugs in reducing serum
cholesterol.  For example, if the true mean change associated
with drug A is -12 mg/dl, and that for B is -15 mg/dl, what is the
probability that you will see a significant difference in mean changes
between drugs A and B at the end of the study?

     To answer this, you first need to define 'significant'.  Here we
mean statistically significant, not necessarily clinically significant.
A significance level alpha is chosen.  You also have to specify whether
you are doing a two-sided test or a one-sided test.

     Let's assume you are doing a two-sided test with significance
level alpha = .02.

     Then the POWER of the test for drug A effect: muA = -12 mg/dl versus
drug B effect: muB = -15 mg/dl is


[1]  prob(abs(W) > 2.33 | muA - muB = +3), where

     W = (YAbar - YBbar)/(stderr(YAbar - YBbar).

     Note that stderr(YAbar - YBbar) = sqrt(var(YAbar - YBbar)).

     Further, you can probably assume that

              var(YAbar - YBbar) = var(YAbar) + var(YBbar)  [why?]

     Note that var(YAbar) = var(YBbar) = sigma^2 / N, where
N is the sample size for each group.  Here N = 100 and sigma = 20,
so we get

              var(YAbar) = var(YBbar) = 400 /100 = 4,

     and therefore

              stderr(YAbar - YBbar) = sqrt(4 + 4) = 2.828.

     How nice that the arithmetic is reasonably simple.

     Now go back to equation [1].  It says

            prob(abs((YAbar - YBbar)/2.828) > 2.33 | muA - muB = +3),

            which simplifies to

[2]         prob(Z > 2.33 - 3/2.828) + prob(Z < -2.33 - 3/2.828)

     where Z is N(0, 1).

     These two probabilities add up to:  .1022 + .0003 = .1025
and that is the power of the study.

     Implicit in all this is a formula for power for a two-group
clinical trial with a quantitative outcome.  The formula has the
following ingredients:

     1.  The sample sizes in each of two groups [note: NA and NB
         in general may not be equal]

     2.  The standard deviation sigma of the outcome measure.

     3.  An expected or projected treatment effect difference, delta.

     4.  A significance level, alpha (plus specification of
         whether it is a one or two-sided test).

     The outcome of the formula is power of the study.

     Note that evaluating the formula requires that you have a way of
evaluating probabilities associated with Z-statistics and vice versa.
You can do this with tables or you can do it with a stat package like
SAS.

PROBLEM 5

 1.  Write a SAS or Splus program to compute power for a two-group
     clinical trial as described above.

 2.  Apply the formula to graph power curves.  Assume the same sigma
     as specified above for the serum cholesterol trial.  Assume a
     two-sided test.  Assume the possible treatment effect differences
     range from -15 mg/dl to +15 mg/dl.  Assume three different signif-
     icance levels are specified: alpha = .05, alpha = .02, and
     alpha = .01.  Also assume (1) NA = NB = 100, and (2) NA = NB = 80.

 3.  Turn the formula around.  Assume you specify the power, and you
     want the sample size in each of the two groups.  Here, assume the
     two groups are of equal size.  In other words, write a program
     which computes sample size for a two-group study with equal groups,
     a specified treatment effect, a specified significance level,
     a specified power, and a specified standard deviation of measure-
     ment.

     {In doing this part of the problem, note that back up in formula [2] above,
     one of the two parts is essentially negligible.  This will always
     be the case for any realistic sample size calculation.  This makes
     it much easier to write sample size as a function of power.}

========================================================================

SIMULATING A CLINICAL TRIAL

     How reliable is your power computation back in Problem 5?

     Suppose you specify alpha, delta, sigma, and the sample sizes
(say NA = NB for simplicity).  If your formula says it has, for
example, 80% power, is it correct?  In other words, if you did
10,000 trials with the specified parameters, would about 8,000 of
them turn out to have a significant result?

     How can you test the formula?

     You can simulate a single trial as follows.  Start with the 100
people on drug A.  Simulate a treatment effect for each person.  For drug
A, you need to sample from a normal distribution with mean -12 and
standard deviation sigma = 20.  Compute the mean YAbar for all of group A,
and a standard error of the mean, SEA.  Do the same for Drug B, except
assume there that the true mean is -15.  Then compute the test statistic 
W = (YAbar - YBbar) / stderr(YAbar - YBbar).

     Note that W may or may not exceed the cutoff for significance.

     Repeat this process M times.  Let m be the number of times that the 
test statistic W achieves significance.  What is the estimated
power?  What is the approximate standard error of this estimate of power?
 
PROBLEM 6

     Carry out a simulation study to estimate the power of a clinical
trial with the following parameters:

     1.  significance level alpha = .05 (two-sided)

     2.  sample size in each group: 100

     3.  drug A mean effect, -12 mg/dl; drug B mean effect, -16 mg/dl.

     4.  standard deviation of the outcome measure: 20 mg/dl

     Your program should compute the estimated power and 95% confidence 
limits for the true power.


SOME QUESTIONS

     Shouldn't you be using a t-test rather than a z-test for testing the
outcome of a clinical trial as described above?

     Do you think your power formula is wrong if in fact you use a t-test?
Is the formula likely to overestimate power or underestimate it if a
t-test is used ?  

     What might you do to improve the formula?


========================================================================

SIMULATIONS FOR ESTIMATION OF LOGISTIC REGRESSION COEFFICIENTS

  Assume that the relationship between a dichotomous outcome
Y and a quantitative predictor X is logistic: that is, Y
can take on only values 0 (no event) and 1 (event), and

    prob(Y = 1 | X) = 1 / (1 + exp(-b0 -b1*X)).

  For example: you are doing an experiment on toxicity of a
certain substance to mice.  You assign a dose X, chosen at
random between 0 and 1, to each mouse.  The outcome event is
death; that is, if the mouse dies, Y = 1, and if the mouse
lives, Y = 0.

  You want to estimate the coefficient b1.  The larger b1 is,
the more toxic the drug.

  You want your estimate of b1 to have a certain level of
precision.  That is, you would like to have a confidence
interval for b1 that is small.  You might want the confidence
interval to have a radius (half-width) of, say, .5 units.

  The width of the confidence interval is a function of the estimated
standard error of b1_hat (the estimate of b1).  Specifically,
a 95% confidence interval has radius equal to:

            1.96 * serr(b1_hat).

  In ordinary linear regression, the standard error of b1_hat
can be computed rather easily if you know the distribution of
the predictor variable X.  In this case X has a uniform [0, 1]
distribution.  However, this is not linear regression.  In
logistic regression, there is no simple formula for the
standard error.  The standard error can be computed after the
data are collected - PROC LOGISTIC gives an estimated standard
error - but it cannot be easily estimated in advance of having 
the data.

  The standard error is partly a function of sample size.
Large sample size is associated with smaller standard error.

  What you want for the problem at hand is some way to estimate
the standard error for a range of sample sizes.

  This can be done as follows.  Choose a sample size N.  Simulate
N observations from your proposed study.  Use PROC LOGISTIC to
get the estimated standard error.

  Repeat this for different values of N.  Start with a small N
and work up to large values until you get a standard error of the
desired size.  You may want to repeat this several times at each
value of N that you choose.

  This is what is done in the following program with 9 different
sample sizes.  A part of the printout occurs after the program,
edited to show the standard errors.

  Note that the standard errors get smaller as N increases.  For
N = 100, the standard error of b1 was .8511.  For N = 1600, the
standard error was .2310.

  This is a rather crude and inefficiently written program.  It can
be made more concise by the use of PROC IML and macros, as will be
explained later.

======================================================================
 options linesize = 100 ;
 footnote "prog = ~john-c/logrand.sas &sysdate &systime" ;

 %let b0 = 1 ;
 %let b1 = 1 ;

 %let n1 = 100 ;
 %let n2 = 150 ;
 %let n3 = 200 ;
 %let n4 = 300 ;
 %let n5 = 400 ;
 %let n6 = 600 ;
 %let n7 = 800 ;
 %let n8 = 1200;
 %let n9 = 1600;

*======================================================================;
 data dataset ;

      n = &n1 ;

      do i = 1 to n ;

         x = ranuni(-1) ;

         p = 1/(1 + exp(-&b0 -&b1*x)) ;
         r = ranuni(-1) ;
         y = 0 ;
         if r < p then y = 1 ;
         output ;

      end ;

run ;

proc logistic data = dataset ;
     model y = x ;
title1 "Proc logistic with b0 = &b0, b1 = &b1" ;
title2 "N = &n1" ;
run ;
*======================================================================;


 data dataset ;

      n = &n2 ;

      do i = 1 to n ;

         x = ranuni(-1) ;

         p = 1/(1 + exp(-&b0 -&b1*x)) ;
         r = ranuni(-1) ;
         y = 0 ;
         if r < p then y = 1 ;
         output ;

      end ;

run ;

proc logistic data = dataset ;
     model y = x ;
title1 "Proc logistic with b0 = &b0, b1 = &b1" ;
title2 "N = &n2" ;
run ;
*======================================================================;


 .... other similar datasteps and PROC LOGISTICS omitted ....


*======================================================================;

 data dataset ;

      n = &n9 ;

      do i = 1 to n ;

         x = ranuni(-1) ;

         p = 1/(1 + exp(-&b0 -&b1*x)) ;
         r = ranuni(-1) ;
         y = 0 ;
         if r < p then y = 1 ;
         output ;

      end ;

run ;

proc logistic data = dataset ;
     model y = x ;
title1 "Proc logistic with b0 = &b0, b1 = &b1" ;
title2 "N = &n9
run ;
*======================================================================;
                                 Proc logistic with b0 = 1, b1 = 1                                 1
                                              N = 100               17:57 Monday, September 15, 2003

                                       The LOGISTIC Procedure

               Data Set: WORK.DATASET 
               Response Variable: Y         
               Response Levels: 2
               Number of Observations: 100
               Link Function: Logit


                                          Response Profile
 
                                     Ordered
                                       Value       Y     Count

                                           1       0        19
                                           2       1        81



                Model Fitting Information and Testing Global Null Hypothesis BETA=0
 
                                         Intercept
                           Intercept        and   
             Criterion       Only       Covariates    Chi-Square for Covariates

             AIC              99.245       100.566         .                          
             SC              101.850       105.776         .                          
             -2 LOG L         97.245        96.566        0.679 with 1 DF (p=0.4100)  
             Score              .             .           0.682 with 1 DF (p=0.4088)  


                             Analysis of Maximum Likelihood Estimates
 
                    Parameter    Standard       Wald          Pr >       Standardized        Odds
  Variable    DF     Estimate      Error     Chi-Square    Chi-Square      Estimate         Ratio

  INTERCPT    1       -1.0915      0.4911        4.9385        0.0263               .        .   
  X           1       -0.7003      0.8511        0.6771        0.4106       -0.116044       0.496

                                              N = 150               17:57 Monday, September 15, 2003

  INTERCPT    1       -0.7231      0.3813        3.5961        0.0579               .        .   
  X           1       -1.6454      0.7218        5.1970        0.0226       -0.277841       0.193

                                              N = 200               17:57 Monday, September 15, 2003

  INTERCPT    1       -1.5484      0.3497       19.6008        0.0001               .        .   
  X           1        0.1407      0.6356        0.0490        0.8248        0.022254       1.151

                                              N = 300               17:57 Monday, September 15, 2003

  INTERCPT    1       -1.0860      0.2821       14.8236        0.0001               .        .   
  X           1       -0.9856      0.5468        3.2485        0.0715       -0.155407       0.373

                                              N = 400               17:57 Monday, September 15, 2003

  INTERCPT    1       -0.9346      0.2427       14.8309        0.0001               .        .   
  X           1       -1.3676      0.4747        8.3019        0.0040       -0.219096       0.255

                                              N = 600               17:57 Monday, September 15, 2003

  INTERCPT    1       -1.1729      0.2059       32.4535        0.0001               .        .   
  X           1       -0.5274      0.3615        2.1283        0.1446       -0.083545       0.590

                                              N = 800               17:57 Monday, September 15, 2003

  INTERCPT    1       -1.0774      0.1705       39.9108        0.0001               .        .   
  X           1       -1.0150      0.3251        9.7455        0.0018       -0.163858       0.362

                                              N = 1200              17:57 Monday, September 15, 2003

  INTERCPT    1       -1.0068      0.1371       53.9625        0.0001               .        .   
  X           1       -0.9458      0.2627       12.9601        0.0003       -0.149178       0.388

                                              N = 1600              17:57 Monday, September 15, 2003

  INTERCPT    1       -0.9706      0.1210       64.2948        0.0001               .        .   
  X           1       -1.1857      0.2310       26.3500        0.0001       -0.190150       0.306

 
                              prog = ~john-c/logrand.sas 15SEP03 17:57
===============================================================================

~john-c/5421/notes.004   Last update: September 19, 2007.