Example: Simulating a Clinical Trial n54703.021 Suppose you are planning a clinical trial of a blood pressure medication. The outcome is survival over a 7-year period. You expect to randomize men in the trial to use either a blood-pressure lowering drug A, or a placebo, B. The risk factor of interest is systolic blood pressure. The men have varying levels of SBP, and their risk of dying depends on these levels. Some men are at higher risk than others. You can carry out a logistic regression to find how SBP and other risk factors (age, smoking habit, cholesterol level) affect the person's risk. The blood pressure medication A is expected to lower SBP. The expected amount of lowering is 15 points. However, there is considerable variability between individuals in their response to medication A. The mean response to medication A is -15 mmHg, but the standard deviation of that response is 8 mmHg. You need to know how large the sample size should be for a clinical trial of A versus B. For this, you need to know the expected proportion of deaths within 7 years in both the placebo group (B) and in the active drug group (A). To find these proportions, you plan to carry out a simulation study. You will randomly generate N men in group A and N men in group B, and then randomly simulate whether they remain alive or die. You will count up the number of deaths in each group, thereby arriving at the proportions you need to compute sample size. To carry out a realistic simulation, you first need to know something about the distributions of the risk factors. You can obtain this from a previous population study. You find means and standard deviations of the risk factors using PROC MEANS. You will use these to simulate men in your simulated clinical trial. You also need to have estimates of the effects of these risk factors on the probability of dying within 7 years. You can estimate these also from a previous study. You use these coefficients and the simulated risk factors to compute the probability that each simulated man will die within 7 years. Below is program-fragment, and a table of simulated baseline risk factors for men in the MRFIT study. This table is based only on the men in the MRFIT control group. ---------------------------------------------------------------------------------------------------- OPTIONS LINESIZE = 100 ; PROC MEANS DATA = SEL N MEAN STD MIN MAX ; WHERE STUDYGP EQ 1 ; VAR AGE BASECHOL F10CIGS F10SBP ; TITLE1 'Basic Baseline Stats from the MRFIT (Control Group Only)' ; TITLE2 'for a Simulated Trial of Blood Pressure Medication' ; RUN ; PROC LOGISTIC DATA = SEL DESCENDING ; WHERE STUDYGP EQ 1 ; MODEL DEATH7 = AGE BASECHOL F10CIGS F10SBP ; TITLE1 'PROC LOGISTIC: MRFIT data: Factors Predicting Death' ; TITLE2 'Within 7 Years of Randomization: Control Group Only' ; RUN : endsas ; ---------------------------------------------------------------------------------------------------- Basic Baseline Stats from the MRFIT (Control Group Only) 1 for a Simulated Trial of Blood Pressure Medication 17:20 Sunday, May 1, 2005 Variable Label N Mean Std Dev Minimum Maximum ---------------------------------------------------------------------------------------------------- AGE AGE IN YEARS: NOT TRUNCATED 6428 46.9402687 5.9355795 35.1642710 58.3832991 BASECHOL BASELINE CHOLESTEROL MG/DL 6428 253.7708463 36.3836324 120.0000000 393.0000000 F10CIGS BASELINE CIGS PER DAY 6428 21.5085563 20.2588989 0 99.0000000 F10SBP BASELINE SYSTOLIC B.P. 6412 147.6218029 15.3454267 104.0000000 232.0000000 ---------------------------------------------------------------------------------------------------- billings.sas (joanneb) 01MAY05 17:20 ---------------------------------------------------------------------------------------------------- PROC LOGISTIC: MRFIT data: Factors Predicting Death 2 Within 7 Years of Randomization: Control Group Only 17:20 Sunday, May 1, 2005 The LOGISTIC Procedure Data Set: WORK.SEL Response Variable: DEATH7 Response Levels: 2 Number of Observations: 6412 Link Function: Logit Response Profile Ordered Value DEATH7 Count 1 1 265 2 0 6147 WARNING: 16 observation(s) were deleted due to missing values for the response or explanatory variables. Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 2209.578 2148.179 . SC 2216.344 2182.008 . -2 LOG L 2207.578 2138.179 69.399 with 4 DF (p=0.0001) Score . . 70.006 with 4 DF (p=0.0001) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT 1 -9.2378 0.9589 92.8129 0.0001 . . AGE 1 0.0674 0.0116 33.6343 0.0001 0.220441 1.070 BASECHOL 1 0.00172 0.00178 0.9377 0.3329 0.034511 1.002 F10CIGS 1 0.0174 0.00322 29.2204 0.0001 0.194379 1.018 F10SBP 1 0.0135 0.00413 10.6962 0.0011 0.114353 1.014 Association of Predicted Probabilities and Observed Responses Concordant = 63.4% Somers' D = 0.294 Discordant = 34.0% Gamma = 0.302 Tied = 2.6% Tau-a = 0.023 (1628955 pairs) c = 0.647 billings.sas (joanneb) 01MAY05 17:20 ---------------------------------------------------------------------------------------------------- From the above, we obtain the following statistics that we will use in the simulation: Variable Mean Std Dev Logistic Coefficient ------------ ------ --------- ---------------------- Age 47 6 0.0674 Cholesterol 254 36 0.00172 Cigs/Day 22 20 0.0174 Systolic BP 148 15 0.0135 Intercept -- -- -9.2378 ---------------------------------------------------------------------------------------------------- options linesize = 100 ; footnote "~john-c/5421/simcity.sas &sysdate &systime" ; data simmen ; n = 100000 ; seed = 20050501 ; mage = 47 ; sdage = 6 ; mchol = 254 ; sdchol = 36 ; mcigs = 22 ; sdcigs = 20 ; msbp = 148 ; sdsbp = 15 ; meffect = -15 ; sdeffect = 8 ; intcpt = -9.2378 ; cage = .0674 ; cchol = .00172 ; ccigs = .0174 ; csbp = .0135 ; do i = 1 to n ; do j = 1 to 2 ; group = 'A' ; if j eq 2 then group = 'B' ; goback: age = mage + sdage * rannor(seed) ; if age lt 35 or age gt 60 then goto goback ; chol = mchol + sdchol * rannor(seed) ; cigs = mcigs + sdcigs * rannor(seed) ; if cigs lt 0 then cigs = 0 ; sbp = msbp + sdsbp * rannor(seed) ; if j eq 1 then sbp = sbp + meffect + sdeffect * rannor(-1) ; risk = 1 / (1 + exp(-intcpt - cage*age - cchol * chol - ccigs * cigs - csbp * sbp)) ; death = 0 ; r = ranuni(seed) ; if r < risk then death = 1 ; output ; end ; end ; run ; proc print data = simmen ; where i le 20 ; var i group age chol cigs sbp risk death ; title1 'Printout of first 30 simulated pairs.' ; format age chol cigs sbp 5.1 risk 8.5 death 2.0 ; run ; proc means data = simmen ; class group ; var age chol cigs sbp risk death ; title1 'Descriptive Stats for a Simulated Clinical Trial ' ; title2 'N = 100,000 Men in Each of Two Groups A and B' ; format age chol cigs sbp 5.1 risk 8.5 death 2.0 ; run ; proc freq data = simmen ; tables death * group / chisq ; title1 'Simulated Estimates Based on N = 100000 Men in Each of Two Groups' ; title2 'Of the Numbers of Deaths in a Simulated Clinical Trial' ; run ; --------------------------------------------------------------------------------------- Printout of first 30 simulated pairs. 1 09:19 Monday, May 2, 2005 OBS I GROUP AGE CHOL CIGS SBP RISK DEATH 1 1 A 47.9 264.0 0.0 112.7 0.01741 0 2 1 B 39.1 272.4 31.8 154.6 0.02948 0 3 2 A 44.8 234.8 23.0 127.7 0.02431 0 4 2 B 52.2 268.5 20.7 143.3 0.04902 0 5 3 A 43.7 238.0 0.0 104.5 0.01128 0 6 3 B 50.9 204.0 10.7 155.2 0.04016 0 7 4 A 35.5 297.5 7.0 127.2 0.01108 0 8 4 B 55.2 285.4 13.2 146.3 0.05621 0 9 5 A 41.0 240.0 0.0 144.8 0.01619 0 10 5 B 47.1 313.8 1.1 146.3 0.02847 0 11 6 A 45.2 261.7 54.1 111.6 0.03584 0 12 6 B 53.3 264.6 19.9 138.7 0.04879 0 13 7 A 48.6 298.1 2.8 124.8 0.02385 0 14 7 B 56.5 220.7 2.9 154.5 0.05133 0 15 8 A 48.7 256.3 0.0 112.3 0.01804 0 16 8 B 41.2 288.2 11.9 137.8 0.01991 0 17 9 A 42.6 251.7 44.6 124.4 0.02990 0 18 9 B 52.7 249.0 42.9 130.6 0.06031 0 19 10 A 51.4 259.7 38.3 132.7 0.05365 0 20 10 B 53.6 240.7 17.0 145.7 0.04985 0 21 11 A 46.6 220.8 13.5 158.6 0.03412 0 22 11 B 46.0 260.1 46.1 136.7 0.04553 0 23 12 A 57.5 282.9 7.0 152.0 0.06281 0 24 12 B 48.7 220.4 23.7 140.5 0.03677 0 25 13 A 39.5 210.3 20.2 117.5 0.01371 0 26 13 B 40.1 224.5 6.9 164.3 0.02164 0 27 14 A 47.5 197.1 36.4 149.9 0.04571 0 28 14 B 40.8 296.1 0.0 168.7 0.02410 0 29 15 A 49.2 336.0 4.2 155.1 0.04015 0 30 15 B 48.7 210.7 33.2 136.2 0.04007 0 31 16 A 40.1 287.0 47.3 122.8 0.02760 0 32 16 B 51.9 233.1 23.8 146.4 0.04980 0 33 17 A 44.5 215.8 37.1 117.4 0.02558 0 34 17 B 48.1 311.1 29.1 120.2 0.03445 0 35 18 A 49.3 283.5 18.6 172.1 0.05849 0 36 18 B 52.8 193.3 18.4 154.0 0.04989 0 37 19 A 42.5 262.7 0.0 133.6 0.01605 0 38 19 B 46.8 214.0 38.4 143.7 0.04286 0 39 20 A 56.3 261.5 0.0 154.8 0.05193 0 40 20 B 46.9 260.3 16.6 137.0 0.02969 0 ~john-c/5421/simcity.sas 02MAY05 09:19 --------------------------------------------------------------------------------------- Descriptive Stats for a Simulated Clinical Trial 2 N = 100,000 Men in Each of Two Groups A and B 09:19 Monday, May 2, 2005 GROUP N Obs Variable N Mean Std Dev Minimum Maximum --------------------------------------------------------------------------------------- A 100000 AGE 100000 47.0788745 5.3722424 35.0008355 59.9984669 CHOL 100000 253.9125808 35.9845865 96.8446007 401.8331571 CIGS 100000 23.3428431 17.7081055 0 109.5310886 SBP 100000 133.0145802 16.9874675 65.6720524 208.4419826 RISK 100000 0.0357545 0.0195739 0.0044108 0.2760082 DEATH 100000 0.0352200 0.1843364 0 1.0000000 B 100000 AGE 100000 47.0890849 5.3782966 35.0001232 59.9969396 CHOL 100000 253.9049176 36.0942547 82.3931852 410.5082411 CIGS 100000 23.3908179 17.7088696 0 102.6874489 SBP 100000 147.9565927 14.9534868 77.3451272 208.7581353 RISK 100000 0.0431605 0.0228288 0.0068171 0.2810298 DEATH 100000 0.0428500 0.2025198 0 1.0000000 --------------------------------------------------------------------------------------- ~john-c/5421/simcity.sas 02MAY05 09:19 --------------------------------------------------------------------------------------- Simulated Estimates Based on N = 100000 Men in Each of Two Groups 3 Of the Numbers of Deaths in a Simulated Clinical Trial 09:19 Monday, May 2, 2005 TABLE OF DEATH BY GROUP DEATH GROUP Frequency| Percent | Row Pct | Col Pct |A |B | Total ---------+--------+--------+ 0 | 96478 | 95715 | 192193 | 48.24 | 47.86 | 96.10 | 50.20 | 49.80 | | 96.48 | 95.71 | ---------+--------+--------+ 1 | 3522 | 4285 | 7807 | 1.76 | 2.14 | 3.90 | 45.11 | 54.89 | | 3.52 | 4.29 | ---------+--------+--------+ Total 100000 100000 200000 50.00 50.00 100.00 STATISTICS FOR TABLE OF DEATH BY GROUP Statistic DF Value Prob ------------------------------------------------------ Chi-Square 1 77.599 0.001 Likelihood Ratio Chi-Square 1 77.718 0.001 Continuity Adj. Chi-Square 1 77.396 0.001 Mantel-Haenszel Chi-Square 1 77.599 0.001 Fisher's Exact Test (Left) 1.000 (Right) 6.59E-19 (2-Tail) 1.32E-18 Phi Coefficient 0.020 Contingency Coefficient 0.020 Cramer's V 0.020 Sample Size = 200000 ~john-c/5421/simcity.sas 02MAY05 09:19 ===================================================================================================== Simulated SBP is generated in the following program steps: sbp = msbp + sdsbp * rannor(seed) ; if j eq 1 then sbp = sbp + meffect + sdeffect * rannor(-1) ; This means that we are assuming SBP has a normal distribution in the population, with a mean of msbp = 148 and a standard deviation of sdsbp = 15. However, if the person is in the active drug group (j = 1), then there is a random drug effect which has mean meffect = -15 and standard deviation sdeffect = 8. We should therefore expect that people in drug group 'A' (j = 1) have an SBP about 15 points lower than those in drug group 'B' (j = 2). This is confirmed by the 'PROC MEANS': the mean SBP in simulated group A is 132.97, while that in simulated group B is 148.02, a difference of 15.05. The crucial section in the program is where deaths are simulated: risk = 1 / (1 + exp(-intcpt - cage*age - cchol * chol - ccigs * cigs - csbp * sbp)) ; death = 0 ; r = ranuni(seed) ; if r < risk then death = 1 ; First, the person's risk is computed, based on the simulated risk factors. The logistic risk function is used. Then a random number r between 0 and 1 is generated, with a uniform distribution. If this random number is less than the value of 'risk', a simulated death occurs. Otherwise the simulated person does not die. The results are summarized as follows: Number of N Mean Age Mean Chol Mean Cigs Mean SBP Mean Risk Sim'd Deaths ------- ---------- --------- --------- -------- --------- ------------ Group A 100,000 47.1 253.9 23.3 133.0 0.0358 3522 Group B 100,000 47.1 253.9 23.4 148.0 0.0432 4285 The bottom line here is, the death rates in group A are about 3.6%, while those in group B are 4.3% (provided the drug works the way it is supposed to). The sample size in this simulation is large, which means that we should be confident of these results. You can now use a sample size program to estimate how large a clinical trial must be to prove that the drug is effective in lowering the rate of death (assuming it has the effect projected here). The answer is, you will need 13,900+ men in each group (85% power, 2-sided significance level of 0.05). It is worth noting that one aspect of the simulation study is not realistic. Note that in the MRFIT population, the mean number of cigarettes per day was about equal to 22, while in the simulated trial, it was about equal to 23. This is caused by the fact that the distribution of cigarettes per day in the MRFIT population was not normal. It was somewhat skewed because about 35% of the MRFIT men were nonsmokers. Thus the mean and standard deviation of cigarettes per day were not sufficient to characterize the population distribution. It would have been better to simulate 35% of the men as nonsmokers, and then use a normal distribution (with a mean of about 31) to simulate the rest. The age distribution in the MRFIT also was non-normal, but this had a smaller effect. =========================================================================================================== n54703.021 Last update: May 2, 2005.