BINOMIAL OUTCOMES                                 SPH 7460 notes.005

     Many clinical trials are conducted in which the endpoint for each
participant is dichotomous (like: death, or kidney transplant failure).  Assume you assign N_A people to
drug A and N_B people to drug B.  You assume a probability P_A of death in
the A group, and P_B in the B group.

     At the end of the trial you count how many have died in each
group.  Let M_A and M_B be these counts.  These should be binomial random
variables,

        M_A ~ binom( N_A, P_A) and M_B ~ binom( N_B, P_B).

     To test the null hypothesis that P_A = P_B, you can carry out an
ordinary 2 x 2 chi-square test.

     The formulas for power and sample size for a study of this kind are a
bit more complicated than they are for a quantitative outcome trial.  The
reason is that the standard error of the proportion of failures in each
group is a function of the failure rate itself.


     The derivation of the power formula for a two-sided test of the
hypothesis that two proportions, P_A and P_B are equal, is as follows:
 
      power = prob(abs(W) > Z_a | altern hypoth H₁),

where Z_a is such that the probability that an N(0, 1) random variable is
less than Z_a is 1 - alpha/2 (for example, if alpha = .05, Z_a = 1.96),

      and W = (P_Ahat - P_Bhat) / serr_H0(P_Ahat - P_Bhat) 		

            = (P_Ahat - P_Bhat) / sqrt(Pbar * Qbar * (1/N_A + 1/N_B)).

      Thus prob(abs(W) > Z_a) = prob(W > Z_a | H₁)
                             + prob(W < -Z_a| H₁).

Consider the first of these two probabilities.  The inequality inside is
equivalent to

      [P_Ahat - P_Bhat - (P_A - P_B)] / serr_H1(P_Ahat - P_Bhat)
  
      >  [Z_a * sqrt(Pbar*Qbar(1/N_A + 1/N_B)) - (P_A - P_B)] /
         sqrt[P_A*Q_A / N_A + P_B*Q_B / N_B]
                        
      The left side of this inequality has an N(0, 1) distribution under
the alternative hypothesis.

      Similarly, the second inequality is equivalent to

      [P_Ahat - P_Bhat - (P_A - P_B)] / serr_H1(P_Ahat - P_Bhat)
  
      <  [-Z_a * sqrt(Pbar*Qbar(1/N_A + 1/N_B)) - (P_A - P_B)] /
         sqrt[P_A*Q_A / N_A + P_B*Q_B / N_B]

      where again the left side is N(0, 1).

      Note that Pbar = average of P_A and P_B, and Qbar = 1 - Pbar,
      and Q_A = 1 - P_A,  Q_B = 1 - P_B.
    

PROBLEM 7

1. Write a program for the power formula for a dichotomous-outcome
   clinical trial.  Assume a two-sided test will be used.  Input
   parameters for the program will include:

     1.  The desired significance level

     2.  P_A and P_B

     3.  N_A and N_B

2.  Use your program to compute power for the following configurations:

    N_A = N_B = 150
    P_A = .6, P_B = .5
    alpha = .01 and alpha = .05

3.  Graph the power curve assuming N_A = N_B = 150, alpha = .05,
    P_A = .5, and P_B ranges from .3 to .8.


SIMULATION STUDIES

     As with trials which have a quantitative outcome, you can do
simulations to check whether the power formulas are about right.  For this
purpose you can use the fact that SAS (or Splus) has a binomial random
number generator.  You don't need to simulate each participant
individually; you just use the binomial random number generator once for
each group in each simulated trial.

PROBLEM 8

   Carry out a simulation study of the power for a clinical trial with
   the parameters specified in Problem 7, part 2.  Compute the
   simulated power and a 95% confidence interval for the true power.

SOME MORE QUESTIONS

     Some statisticians think the Fisher exact test should be used
whenever possible in analyzing 2 x 2 tables, instead of the chi-square
statistic.  The Fisher exact test tends to be more conservative than the
(uncorrected) chi-square test; that is, it is less likely to give a
significant result.  

     What implications does that have for power estimates?

     [The Fisher exact test is based on the computation of the probability
of the observed data within a 2 x 2 table, given the observed margins (row
and column totals).  The computation involves the hypergeometric
distribution.  In practice, the row and column totals are usually not fixed and
the hypergeometric distribution does not reflect the distribution of the
cell counts.  The uncorrected chi-square statistic seems to have better
operating characteristics for this situation than does the Fisher exact
test.]


MORE DIFFICULT QUESTION

     Recall that back in notes.003, there was a formula for risk of
heart disease as a function of diastolic blood pressure.

     Suppose you wanted to carry out a clinical trial of a drug D
versus placebo.  You think the drug will lower blood pressure by
an average of 3 mm Hg.

     You have a population of 1000 men aged 40-59 who have blood pressures
between 80 and 110 mm Hg.  In fact, the distribution of blood pressures
among these men is approximately uniformly distributed between 80
and 110 mm Hg.

     You want to randomize half the men to drug D and half to placebo.
You study them for 6 years.  At the end of 6 years, you count how many
in each group have new heart disease.  You perform a chi-square test
to compare the groups.

     How can you compute the power for this study?

     Here are some of the complicating factors:

     1.  The men have differing levels of risk at baseline, depending
         on their blood pressure.  The number of men who have an
         event (heart disease) can be considered to be a binomial
         random variable in either group, but it is not necessarily
         very easy to compute what the binomial probability is.  You
         would have to compute the risk for each man using the logistic
         formula and then average.

     2.  The treatment effect of the drug is also not constant.  There
         are two problems.  The first is that the average 3 mm Hg effect
         is not the same for all the men in drug group D.  It will vary
         from man to man.  It may be reasonable to assume the drug
         effect has an approximately normal distribution with a mean of
         3 mm Hg and a standard deviation of 10 mm Hg.  The second
         is that the risk for a given man is a function of both his
         starting DBP and his DBP after he starts taking the drug, and
         they do not all have the same starting DBP.

         How might you use the available information to compute the expected
         event rate in the drug group?

     The bottom line here is, the computation of power for a
trial like this using the standard formulas is actually rather crude.
Formulas for expected event rates in the two groups may be extremely
complicated.

     You can apply the standard formulas, making some crude guesses
about treatment effects.  In fact that is what people usually do.
However, you may want to check them by carrying out simulation studies
which can take into account variability as described above and other
complicating factors.


~john-c/5421/notes.005   Last update: July 8, 2000.