SAMPLE SIZE, POWER, AND SIMULATION STUDIES SPH 7460 notes.004 Drugs A and B are both supposed to be effective in lowering serum cholesterol. You want to do an experiment to see if one drug is better than the other at reducing serum cholesterol. You decide to carry out a clinical trial. You identify 200 people with high serum cholesterol (> 240 mg/dl). You randomize 100 of them to drug A and 100 to drug B. You measure their baseline levels of serum cholesterol and ask them to take their assigned drugs for 6 weeks. At the end of that time, you measure their serum cholesterol again. The outcome variable Y for this experiment (clinical trial) is the change in serum cholesterol from baseline to 6 weeks. From prior information you have an estimate of the standard deviation of Y. It is sigma = 20 mg/dl. You want to know the statistical power of your study to detect meaningful differences between the two drugs in reducing serum cholesterol. For example, if the true mean change associated with drug A is -12 mg/dl, and that for B is -15 mg/dl, what is the probability that you will see a significant difference in mean changes between drugs A and B at the end of the study? To answer this, you first need to define 'significant'. Here we mean statistically significant, not necessarily clinically significant. A significance level alpha is chosen. You also have to specify whether you are doing a two-sided test or a one-sided test. Let's assume you are doing a two-sided test with significance level alpha = .02. Then the POWER of the test for drug A effect: muA = -12 mg/dl versus drug B effect: muB = -15 mg/dl is [1] prob(abs(W) > 2.33 | muA - muB = +3), where W = (YAbar - YBbar)/(stderr(YAbar - YBbar). Note that stderr(YAbar - YBbar) = sqrt(var(YAbar - YBbar)). Further, you can probably assume that var(YAbar - YBbar) = var(YAbar) + var(YBbar) [why?] Note that var(YAbar) = var(YBbar) = sigma^2 / N, where N is the sample size for each group. Here N = 100 and sigma = 20, so we get var(YAbar) = var(YBbar) = 400 /100 = 4, and therefore stderr(YAbar - YBbar) = sqrt(4 + 4) = 2.828. How nice that the arithmetic is reasonably simple. Now go back to equation [1]. It says prob(abs((YAbar - YBbar)/2.828) > 2.33 | muA - muB = +3), which simplifies to [2] prob(Z > 2.33 - 3/2.828) + prob(Z < -2.33 - 3/2.828) where Z is N(0, 1). These two probabilities add up to: .1022 + .0003 = .1025 and that is the power of the study. Implicit in all this is a formula for power for a two-group clinical trial with a quantitative outcome. The formula has the following ingredients: 1. The sample sizes in each of two groups [note: NA and NB in general may not be equal] 2. The standard deviation sigma of the outcome measure. 3. An expected or projected treatment effect difference, delta. 4. A significance level, alpha (plus specification of whether it is a one or two-sided test). The outcome of the formula is power of the study. Note that evaluating the formula requires that you have a way of evaluating probabilities associated with Z-statistics and vice versa. You can do this with tables or you can do it with a stat package like SAS. PROBLEM 5 1. Write a SAS or Splus program to compute power for a two-group clinical trial as described above. 2. Apply the formula to graph power curves. Assume the same sigma as specified above for the serum cholesterol trial. Assume a two-sided test. Assume the possible treatment effect differences range from -15 mg/dl to +15 mg/dl. Assume three different signif- icance levels are specified: alpha = .05, alpha = .02, and alpha = .01. Also assume (1) NA = NB = 100, and (2) NA = NB = 80. 3. Turn the formula around. Assume you specify the power, and you want the sample size in each of the two groups. Here, assume the two groups are of equal size. In other words, write a program which computes sample size for a two-group study with equal groups, a specified treatment effect, a specified significance level, a specified power, and a specified standard deviation of measure- ment. {In doing this part of the problem, note that back up in formula [2] above, one of the two parts is essentially negligible. This will always be the case for any realistic sample size calculation. This makes it much easier to write sample size as a function of power.} ======================================================================== SIMULATING A CLINICAL TRIAL How reliable is your power computation back in Problem 5? Suppose you specify alpha, delta, sigma, and the sample sizes (say NA = NB for simplicity). If your formula says it has, for example, 80% power, is it correct? In other words, if you did 10,000 trials with the specified parameters, would about 8,000 of them turn out to have a significant result? How can you test the formula? You can simulate a single trial as follows. Start with the 100 people on drug A. Simulate a treatment effect for each person. For drug A, you need to sample from a normal distribution with mean -12 and standard deviation sigma = 20. Compute the mean YAbar for all of group A, and a standard error of the mean, SEA. Do the same for Drug B, except assume there that the true mean is -15. Then compute the test statistic W = (YAbar - YBbar) / stderr(YAbar - YBbar). Note that W may or may not exceed the cutoff for significance. Repeat this process M times. Let m be the number of times that the test statistic W achieves significance. What is the estimated power? What is the approximate standard error of this estimate of power? PROBLEM 6 Carry out a simulation study to estimate the power of a clinical trial with the following parameters: 1. significance level alpha = .05 (two-sided) 2. sample size in each group: 100 3. drug A mean effect, -12 mg/dl; drug B mean effect, -16 mg/dl. 4. standard deviation of the outcome measure: 20 mg/dl Your program should compute the estimated power and 95% confidence limits for the true power. SOME QUESTIONS Shouldn't you be using a t-test rather than a z-test for testing the outcome of a clinical trial as described above? Do you think your power formula is wrong if in fact you use a t-test? Is the formula likely to overestimate power or underestimate it if a t-test is used ? What might you do to improve the formula? ======================================================================== SIMULATIONS FOR ESTIMATION OF LOGISTIC REGRESSION COEFFICIENTS Assume that the relationship between a dichotomous outcome Y and a quantitative predictor X is logistic: that is, Y can take on only values 0 (no event) and 1 (event), and prob(Y = 1 | X) = 1 / (1 + exp(-b0 -b1*X)). For example: you are doing an experiment on toxicity of a certain substance to mice. You assign a dose X, chosen at random between 0 and 1, to each mouse. The outcome event is death; that is, if the mouse dies, Y = 1, and if the mouse lives, Y = 0. You want to estimate the coefficient b1. The larger b1 is, the more toxic the drug. You want your estimate of b1 to have a certain level of precision. That is, you would like to have a confidence interval for b1 that is small. You might want the confidence interval to have a radius (half-width) of, say, .5 units. The width of the confidence interval is a function of the estimated standard error of b1_hat (the estimate of b1). Specifically, a 95% confidence interval has radius equal to: 1.96 * serr(b1_hat). In ordinary linear regression, the standard error of b1_hat can be computed rather easily if you know the distribution of the predictor variable X. In this case X has a uniform [0, 1] distribution. However, this is not linear regression. In logistic regression, there is no simple formula for the standard error. The standard error can be computed after the data are collected - PROC LOGISTIC gives an estimated standard error - but it cannot be easily estimated in advance of having the data. The standard error is partly a function of sample size. Large sample size is associated with smaller standard error. What you want for the problem at hand is some way to estimate the standard error for a range of sample sizes. This can be done as follows. Choose a sample size N. Simulate N observations from your proposed study. Use PROC LOGISTIC to get the estimated standard error. Repeat this for different values of N. Start with a small N and work up to large values until you get a standard error of the desired size. You may want to repeat this several times at each value of N that you choose. This is what is done in the following program with 9 different sample sizes. A part of the printout occurs after the program, edited to show the standard errors. Note that the standard errors get smaller as N increases. For N = 100, the standard error of b1 was .8511. For N = 1600, the standard error was .2310. This is a rather crude and inefficiently written program. It can be made more concise by the use of PROC IML and macros, as will be explained later. ====================================================================== options linesize = 100 ; footnote "prog = ~john-c/logrand.sas &sysdate &systime" ; %let b0 = 1 ; %let b1 = 1 ; %let n1 = 100 ; %let n2 = 150 ; %let n3 = 200 ; %let n4 = 300 ; %let n5 = 400 ; %let n6 = 600 ; %let n7 = 800 ; %let n8 = 1200; %let n9 = 1600; *======================================================================; data dataset ; n = &n1 ; do i = 1 to n ; x = ranuni(-1) ; p = 1/(1 + exp(-&b0 -&b1*x)) ; r = ranuni(-1) ; y = 0 ; if r < p then y = 1 ; output ; end ; run ; proc logistic data = dataset ; model y = x ; title1 "Proc logistic with b0 = &b0, b1 = &b1" ; title2 "N = &n1" ; run ; *======================================================================; data dataset ; n = &n2 ; do i = 1 to n ; x = ranuni(-1) ; p = 1/(1 + exp(-&b0 -&b1*x)) ; r = ranuni(-1) ; y = 0 ; if r < p then y = 1 ; output ; end ; run ; proc logistic data = dataset ; model y = x ; title1 "Proc logistic with b0 = &b0, b1 = &b1" ; title2 "N = &n2" ; run ; *======================================================================; .... other similar datasteps and PROC LOGISTICS omitted .... *======================================================================; data dataset ; n = &n9 ; do i = 1 to n ; x = ranuni(-1) ; p = 1/(1 + exp(-&b0 -&b1*x)) ; r = ranuni(-1) ; y = 0 ; if r < p then y = 1 ; output ; end ; run ; proc logistic data = dataset ; model y = x ; title1 "Proc logistic with b0 = &b0, b1 = &b1" ; title2 "N = &n9 run ; *======================================================================; Proc logistic with b0 = 1, b1 = 1 1 N = 100 17:57 Monday, September 15, 2003 The LOGISTIC Procedure Data Set: WORK.DATASET Response Variable: Y Response Levels: 2 Number of Observations: 100 Link Function: Logit Response Profile Ordered Value Y Count 1 0 19 2 1 81 Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 99.245 100.566 . SC 101.850 105.776 . -2 LOG L 97.245 96.566 0.679 with 1 DF (p=0.4100) Score . . 0.682 with 1 DF (p=0.4088) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT 1 -1.0915 0.4911 4.9385 0.0263 . . X 1 -0.7003 0.8511 0.6771 0.4106 -0.116044 0.496 N = 150 17:57 Monday, September 15, 2003 INTERCPT 1 -0.7231 0.3813 3.5961 0.0579 . . X 1 -1.6454 0.7218 5.1970 0.0226 -0.277841 0.193 N = 200 17:57 Monday, September 15, 2003 INTERCPT 1 -1.5484 0.3497 19.6008 0.0001 . . X 1 0.1407 0.6356 0.0490 0.8248 0.022254 1.151 N = 300 17:57 Monday, September 15, 2003 INTERCPT 1 -1.0860 0.2821 14.8236 0.0001 . . X 1 -0.9856 0.5468 3.2485 0.0715 -0.155407 0.373 N = 400 17:57 Monday, September 15, 2003 INTERCPT 1 -0.9346 0.2427 14.8309 0.0001 . . X 1 -1.3676 0.4747 8.3019 0.0040 -0.219096 0.255 N = 600 17:57 Monday, September 15, 2003 INTERCPT 1 -1.1729 0.2059 32.4535 0.0001 . . X 1 -0.5274 0.3615 2.1283 0.1446 -0.083545 0.590 N = 800 17:57 Monday, September 15, 2003 INTERCPT 1 -1.0774 0.1705 39.9108 0.0001 . . X 1 -1.0150 0.3251 9.7455 0.0018 -0.163858 0.362 N = 1200 17:57 Monday, September 15, 2003 INTERCPT 1 -1.0068 0.1371 53.9625 0.0001 . . X 1 -0.9458 0.2627 12.9601 0.0003 -0.149178 0.388 N = 1600 17:57 Monday, September 15, 2003 INTERCPT 1 -0.9706 0.1210 64.2948 0.0001 . . X 1 -1.1857 0.2310 26.3500 0.0001 -0.190150 0.306 prog = ~john-c/logrand.sas 15SEP03 17:57 =============================================================================== ~john-c/5421/notes.004 Last update: September 19, 2007.