Example: Simulating a Clinical Trial                               n54703.021


     Suppose you are planning a clinical trial of a blood pressure medication.
The outcome is survival over a 7-year period.  You expect to randomize men
in the trial to use either a blood-pressure lowering drug A, or a placebo, B.

     The risk factor of interest is systolic blood pressure.  The men have
varying levels of SBP, and their risk of dying depends on these levels.
Some men are at higher risk than others.  You can carry out a logistic
regression to find how SBP and other risk factors (age, smoking habit,
cholesterol level) affect the person's risk.

     The blood pressure medication A is expected to lower SBP.  The expected
amount of lowering is 15 points.  However, there is considerable variability
between individuals in their response to medication A.  The mean response
to medication A is -15 mmHg, but the standard deviation of that response is
8 mmHg.

     You need to know how large the sample size should be for a clinical trial
of A versus B.  For this, you need to know the expected proportion of deaths
within 7 years in both the placebo group (B) and in the active drug group (A).

     To find these proportions, you plan to carry out a simulation study.

     You will randomly generate N men in group A and N men in group B, and
then randomly simulate whether they remain alive or die.  You will count up
the number of deaths in each group, thereby arriving at the proportions you
need to compute sample size.

     To carry out a realistic simulation, you first need to know something
about the distributions of the risk factors.  You can obtain this from a
previous population study.  You find means and standard deviations of the
risk factors using PROC MEANS.  You will use these to simulate men in your 
simulated clinical trial.

     You also need to have estimates of the effects of these risk factors
on the probability of dying within 7 years.  You can estimate these also
from a previous study.  You use these coefficients and the simulated risk factors
to compute the probability that each simulated man will die within 7
years.

     Below is program-fragment, and a table of simulated baseline risk factors 
for men in the MRFIT study.  This table is based only on the men in the MRFIT 
control group.


----------------------------------------------------------------------------------------------------

OPTIONS LINESIZE = 100 ;

PROC MEANS DATA = SEL N MEAN STD MIN MAX ;
     WHERE STUDYGP EQ 1 ;
     VAR   AGE BASECHOL F10CIGS F10SBP ;
TITLE1 'Basic Baseline Stats from the MRFIT (Control Group Only)' ;
TITLE2 'for a Simulated Trial of Blood Pressure Medication' ;

RUN ;

PROC LOGISTIC DATA = SEL DESCENDING ;
     WHERE STUDYGP EQ 1 ;
     MODEL DEATH7 = AGE BASECHOL F10CIGS F10SBP ;
TITLE1 'PROC LOGISTIC: MRFIT data: Factors Predicting Death' ;
TITLE2 'Within 7 Years of Randomization: Control Group Only' ;
RUN :

endsas ;
----------------------------------------------------------------------------------------------------



                      Basic Baseline Stats from the MRFIT (Control Group Only)                     1
                         for a Simulated Trial of Blood Pressure Medication
                                                                           17:20 Sunday, May 1, 2005

Variable  Label                            N          Mean       Std Dev       Minimum       Maximum
----------------------------------------------------------------------------------------------------
AGE       AGE IN YEARS: NOT TRUNCATED   6428    46.9402687     5.9355795    35.1642710    58.3832991
BASECHOL  BASELINE CHOLESTEROL MG/DL    6428   253.7708463    36.3836324   120.0000000   393.0000000
F10CIGS   BASELINE CIGS PER DAY         6428    21.5085563    20.2588989             0    99.0000000
F10SBP    BASELINE SYSTOLIC B.P.        6412   147.6218029    15.3454267   104.0000000   232.0000000
----------------------------------------------------------------------------------------------------
 
 
 
                                billings.sas (joanneb) 01MAY05 17:20
----------------------------------------------------------------------------------------------------


                         PROC LOGISTIC: MRFIT data: Factors Predicting Death                        2
                        Within 7 Years of Randomization: Control Group Only
                                                                           17:20 Sunday, May 1, 2005

                                       The LOGISTIC Procedure

               Data Set: WORK.SEL     
               Response Variable: DEATH7    
               Response Levels: 2
               Number of Observations: 6412
               Link Function: Logit


                                          Response Profile
 
                                     Ordered
                                       Value  DEATH7     Count

                                           1       1       265
                                           2       0      6147

WARNING: 16 observation(s) were deleted due to missing values for the response or explanatory 
         variables.



                Model Fitting Information and Testing Global Null Hypothesis BETA=0
 
                                         Intercept
                           Intercept        and   
             Criterion       Only       Covariates    Chi-Square for Covariates

             AIC            2209.578      2148.179         .                          
             SC             2216.344      2182.008         .                          
             -2 LOG L       2207.578      2138.179       69.399 with 4 DF (p=0.0001)  
             Score              .             .          70.006 with 4 DF (p=0.0001)  


                             Analysis of Maximum Likelihood Estimates
 
                  Parameter    Standard       Wald          Pr >       Standardized        Odds
Variable    DF     Estimate      Error     Chi-Square    Chi-Square      Estimate         Ratio

INTERCPT    1       -9.2378      0.9589       92.8129        0.0001               .        .       
AGE         1        0.0674      0.0116       33.6343        0.0001        0.220441       1.070    
BASECHOL    1       0.00172     0.00178        0.9377        0.3329        0.034511       1.002    
F10CIGS     1        0.0174     0.00322       29.2204        0.0001        0.194379       1.018    
F10SBP      1        0.0135     0.00413       10.6962        0.0011        0.114353       1.014    


                   Association of Predicted Probabilities and Observed Responses

                             Concordant = 63.4%          Somers' D = 0.294
                             Discordant = 34.0%          Gamma     = 0.302
                             Tied       =  2.6%          Tau-a     = 0.023
                             (1628955 pairs)             c         = 0.647
 
 
 
                                billings.sas (joanneb) 01MAY05 17:20

----------------------------------------------------------------------------------------------------

     From the above, we obtain the following statistics that we will use in the simulation:


     Variable          Mean      Std Dev          Logistic Coefficient
     ------------     ------    ---------        ----------------------
     Age                47          6                   0.0674
     Cholesterol       254         36                   0.00172
     Cigs/Day           22         20                   0.0174
     Systolic BP       148         15                   0.0135

     Intercept          --         --                  -9.2378


----------------------------------------------------------------------------------------------------
options linesize = 100 ;
footnote "~john-c/5421/simcity.sas &sysdate &systime" ;

data simmen ;

     n = 100000 ;
     seed = 20050501 ;

     mage   =  47 ;     sdage =   6 ;
     mchol  = 254 ;     sdchol = 36 ;
     mcigs  =  22 ;     sdcigs = 20 ;
     msbp   = 148 ;     sdsbp  = 15 ;

     meffect  = -15 ;
     sdeffect =   8 ;

     intcpt = -9.2378 ;
     cage   = .0674 ;
     cchol  = .00172 ;
     ccigs  = .0174 ;
     csbp   = .0135 ;

     do i = 1 to n ;

        do j = 1 to 2 ;

           group = 'A' ;
           if j eq 2 then group = 'B' ;

goback:

           age  = mage + sdage * rannor(seed) ;
           if age lt 35 or age gt 60 then goto goback ;
           chol = mchol + sdchol * rannor(seed) ;
           cigs = mcigs + sdcigs * rannor(seed) ;
           if cigs lt 0 then cigs = 0 ;
           sbp = msbp + sdsbp * rannor(seed) ;

           if j eq 1 then sbp = sbp + meffect + sdeffect * rannor(-1) ;

           risk = 1 / (1 + exp(-intcpt - cage*age - cchol * chol - ccigs * cigs - csbp * sbp)) ;

           death = 0 ;
           r = ranuni(seed) ;
           if r < risk then death = 1 ;

           output ;

        end ;

     end ;

run ;

proc print data = simmen ;
     where i le 20 ;
     var   i group age chol cigs sbp risk death ;
title1 'Printout of first 30 simulated pairs.' ;
format age chol cigs sbp 5.1 risk 8.5 death 2.0 ;
run ;

proc means data = simmen ;
     class group ;
     var age chol cigs sbp risk death ;
title1 'Descriptive Stats for a Simulated Clinical Trial ' ;
title2 'N = 100,000 Men in Each of Two Groups A and B' ;
format age chol cigs sbp 5.1 risk 8.5 death 2.0 ;
run ;

proc freq data = simmen ;
     tables death * group / chisq ;
title1 'Simulated Estimates Based on N = 100000 Men in Each of Two Groups' ;
title2 'Of the Numbers of Deaths in a Simulated Clinical Trial' ;
run ;
---------------------------------------------------------------------------------------
                               Printout of first 30 simulated pairs.                               1
                                                                           09:19 Monday, May 2, 2005

              OBS     I    GROUP      AGE     CHOL     CIGS      SBP        RISK    DEATH

                1     1      A       47.9    264.0      0.0    112.7     0.01741      0  
                2     1      B       39.1    272.4     31.8    154.6     0.02948      0  
                3     2      A       44.8    234.8     23.0    127.7     0.02431      0  
                4     2      B       52.2    268.5     20.7    143.3     0.04902      0  
                5     3      A       43.7    238.0      0.0    104.5     0.01128      0  
                6     3      B       50.9    204.0     10.7    155.2     0.04016      0  
                7     4      A       35.5    297.5      7.0    127.2     0.01108      0  
                8     4      B       55.2    285.4     13.2    146.3     0.05621      0  
                9     5      A       41.0    240.0      0.0    144.8     0.01619      0  
               10     5      B       47.1    313.8      1.1    146.3     0.02847      0  
               11     6      A       45.2    261.7     54.1    111.6     0.03584      0  
               12     6      B       53.3    264.6     19.9    138.7     0.04879      0  
               13     7      A       48.6    298.1      2.8    124.8     0.02385      0  
               14     7      B       56.5    220.7      2.9    154.5     0.05133      0  
               15     8      A       48.7    256.3      0.0    112.3     0.01804      0  
               16     8      B       41.2    288.2     11.9    137.8     0.01991      0  
               17     9      A       42.6    251.7     44.6    124.4     0.02990      0  
               18     9      B       52.7    249.0     42.9    130.6     0.06031      0  
               19    10      A       51.4    259.7     38.3    132.7     0.05365      0  
               20    10      B       53.6    240.7     17.0    145.7     0.04985      0  
               21    11      A       46.6    220.8     13.5    158.6     0.03412      0  
               22    11      B       46.0    260.1     46.1    136.7     0.04553      0  
               23    12      A       57.5    282.9      7.0    152.0     0.06281      0  
               24    12      B       48.7    220.4     23.7    140.5     0.03677      0  
               25    13      A       39.5    210.3     20.2    117.5     0.01371      0  
               26    13      B       40.1    224.5      6.9    164.3     0.02164      0  
               27    14      A       47.5    197.1     36.4    149.9     0.04571      0  
               28    14      B       40.8    296.1      0.0    168.7     0.02410      0  
               29    15      A       49.2    336.0      4.2    155.1     0.04015      0  
               30    15      B       48.7    210.7     33.2    136.2     0.04007      0  
               31    16      A       40.1    287.0     47.3    122.8     0.02760      0  
               32    16      B       51.9    233.1     23.8    146.4     0.04980      0  
               33    17      A       44.5    215.8     37.1    117.4     0.02558      0  
               34    17      B       48.1    311.1     29.1    120.2     0.03445      0  
               35    18      A       49.3    283.5     18.6    172.1     0.05849      0  
               36    18      B       52.8    193.3     18.4    154.0     0.04989      0  
               37    19      A       42.5    262.7      0.0    133.6     0.01605      0  
               38    19      B       46.8    214.0     38.4    143.7     0.04286      0  
               39    20      A       56.3    261.5      0.0    154.8     0.05193      0  
               40    20      B       46.9    260.3     16.6    137.0     0.02969      0  
 
 
                               ~john-c/5421/simcity.sas 02MAY05 09:19
---------------------------------------------------------------------------------------
                          Descriptive Stats for a Simulated Clinical Trial                          2
                           N = 100,000 Men in Each of Two Groups A and B   09:19 Monday, May 2, 2005

      GROUP   N Obs  Variable       N          Mean       Std Dev       Minimum       Maximum
      ---------------------------------------------------------------------------------------
      A      100000  AGE       100000    47.0788745     5.3722424    35.0008355    59.9984669
                     CHOL      100000   253.9125808    35.9845865    96.8446007   401.8331571
                     CIGS      100000    23.3428431    17.7081055             0   109.5310886
                     SBP       100000   133.0145802    16.9874675    65.6720524   208.4419826
                     RISK      100000     0.0357545     0.0195739     0.0044108     0.2760082
                     DEATH     100000     0.0352200     0.1843364             0     1.0000000

      B      100000  AGE       100000    47.0890849     5.3782966    35.0001232    59.9969396
                     CHOL      100000   253.9049176    36.0942547    82.3931852   410.5082411
                     CIGS      100000    23.3908179    17.7088696             0   102.6874489
                     SBP       100000   147.9565927    14.9534868    77.3451272   208.7581353
                     RISK      100000     0.0431605     0.0228288     0.0068171     0.2810298
                     DEATH     100000     0.0428500     0.2025198             0     1.0000000
      ---------------------------------------------------------------------------------------
 
 
                               ~john-c/5421/simcity.sas 02MAY05 09:19
---------------------------------------------------------------------------------------
                  Simulated Estimates Based on N = 100000 Men in Each of Two Groups                 3
                       Of the Numbers of Deaths in a Simulated Clinical Trial
                                                                           09:19 Monday, May 2, 2005

                                      TABLE OF DEATH BY GROUP

                                DEATH     GROUP

                                Frequency|
                                Percent  |
                                Row Pct  |
                                Col Pct  |A       |B       |  Total
                                ---------+--------+--------+
                                       0 |  96478 |  95715 | 192193
                                         |  48.24 |  47.86 |  96.10
                                         |  50.20 |  49.80 |
                                         |  96.48 |  95.71 |
                                ---------+--------+--------+
                                       1 |   3522 |   4285 |   7807
                                         |   1.76 |   2.14 |   3.90
                                         |  45.11 |  54.89 |
                                         |   3.52 |   4.29 |
                                ---------+--------+--------+
                                Total      100000   100000   200000
                                            50.00    50.00   100.00


                               STATISTICS FOR TABLE OF DEATH BY GROUP

                       Statistic                     DF     Value        Prob
                       ------------------------------------------------------
                       Chi-Square                     1    77.599       0.001
                       Likelihood Ratio Chi-Square    1    77.718       0.001
                       Continuity Adj. Chi-Square     1    77.396       0.001
                       Mantel-Haenszel Chi-Square     1    77.599       0.001
                       Fisher's Exact Test (Left)                       1.000
                                           (Right)                   6.59E-19
                                           (2-Tail)                  1.32E-18
                       Phi Coefficient                      0.020            
                       Contingency Coefficient              0.020            
                       Cramer's V                           0.020            

                       Sample Size = 200000

 
                               ~john-c/5421/simcity.sas 02MAY05 09:19
=====================================================================================================

     Simulated SBP is generated in the following program steps:

           sbp = msbp + sdsbp * rannor(seed) ;
           if j eq 1 then sbp = sbp + meffect + sdeffect * rannor(-1) ;

     This means that we are assuming SBP has a normal distribution in the population,
with a mean of msbp = 148 and a standard deviation of sdsbp = 15.  However, if the
person is in the active drug group (j = 1), then there is a random drug effect which
has mean meffect = -15 and standard deviation sdeffect = 8.  We should therefore expect
that people in drug group 'A' (j = 1) have an SBP about 15 points lower than those in 
drug group 'B' (j = 2).

     This is confirmed by the 'PROC MEANS': the mean SBP in simulated group A is 132.97, 
while that in simulated group B is 148.02, a difference of 15.05.

     The crucial section in the program is where deaths are simulated:


       risk = 1 / (1 + exp(-intcpt - cage*age - cchol * chol - ccigs * cigs - csbp * sbp)) ;

       death = 0 ;
       r = ranuni(seed) ;
       if r < risk then death = 1 ;

     First, the person's risk is computed, based on the simulated risk factors.
The logistic risk function is used.  Then a random number r between 0 and 1 is 
generated, with a uniform distribution.  If this random number is less than the 
value of 'risk', a simulated death occurs.  Otherwise the simulated person does 
not die.


     The results are summarized as follows:
                                                                                      Number of
                 N       Mean Age    Mean Chol   Mean Cigs   Mean SBP    Mean Risk    Sim'd Deaths
              -------   ----------   ---------   ---------   --------    ---------    ------------
     Group A  100,000      47.1        253.9       23.3       133.0       0.0358          3522

     Group B  100,000      47.1        253.9       23.4       148.0       0.0432          4285


     The bottom line here is, the death rates in group A are about 3.6%, while those in
group B are 4.3% (provided the drug works the way it is supposed to).  The sample size
in this simulation is large, which means that we should be confident of these results.  You
can now use a sample size program to estimate how large a clinical trial must be to
prove that the drug is effective in lowering the rate of death (assuming it has the
effect projected here).  The answer is, you will need 13,900+ men in each group
(85% power, 2-sided significance level of 0.05).

     It is worth noting that one aspect of the simulation study is not realistic.
Note that in the MRFIT population, the mean number of cigarettes per day was about
equal to 22, while in the simulated trial, it was about equal to 23.  This is caused
by the fact that the distribution of cigarettes per day in the MRFIT population
was not normal.  It was somewhat skewed because about 35% of the MRFIT men were
nonsmokers.  Thus the mean and standard deviation of cigarettes per day were not
sufficient to characterize the population distribution.  It would have been better
to simulate 35% of the men as nonsmokers, and then use a normal distribution (with
a mean of about 31) to simulate the rest.  The age distribution in the MRFIT also
was non-normal, but this had a smaller effect.

===========================================================================================================

n54703.021  Last update: May 2, 2005.