SAS MACROS: Simulations Using SAS Procedures             SPH 5421 notes.023b

     The purpose of simulations frequently is to investigate properties of
a hypothetical distribution of data.  An example: you want to simulate a
clinical trial of a drug which would be given to people hospitalized for
heart disease.  You want to know whether the drug will decrease the average
length of stay in the hospital.  This is important because (1) staying in the
hospital is very expensive and uncomfortable for the patient, and (2) a shorter
length of hospital stay generally indicates that the person has recovered more
quickly from the original illness.  The difficulty is, the length of hospital
stay has an unknown distribution.  Certainly it is not normally distributed -
since hospital length of stay is usually recorded in days, it takes on only
nonnegative integer values.  But it does not have a Poisson distribution either.
You may have access to an empirical distribution from a previous study, but
you may not know of any explicit parametric distribution which fits the data.

     What you would most want to do in this case is simulate your clinical
trial, specifying the distribution of length-of-stay in the placebo group
as being identical to your empirical distribution, and specifying a certain
shift of this distribution in the active drug group.  Then simulate a two-
group clinical trial using these two distributions.  Compute a test statistic
to compare the two groups.  You may want to use a nonparametric test since
the two distributions do not fit any parametric form.  Therefore you might
use PROC NPAR1WAY to do the test, perhaps focussing on the value of the
Wilcoxon statistic (or its p-value) for comparison of two independent groups.
Ideally, you might simulate the trial 10,000 times under the alternative
hypothesis, and then examine the results to see how often the p-value is
below your desired critical level (say, 0.05).

     How do you do this?

     You could submit 10,000 copies of the simulation, each with a different
output file, and then combine all the results into one file at the end, and
then count how many times out of 10,000 the p-value was less than 0.05.  Then
divide this count by 10,000 to obtain an estimate of the power of the proposed
trial, given the alternative hypothesis.

     There are several things wrong with trying to do this.  It is excessively
cumbersome.  It is expensive in terms of time and space.  You have to somehow 
write 10,000 programs and somehow combine all the output.  The output from each 
run of PROC NPAR1WAY is several pages, and all you want from it are one or two 
numbers (e.g., Wilcoxon statistic and the p-value).  You have to extract those 
things and discard the rest, collect all the results, count the number of "significant" 
results, and divide by 10,000 to estimate the power.

   It should be possible to do the whole thing in one program.  In fact
this can be done using a macro.  The following outline shows the strategy:

      1.  Specify the input parameters to the macro.  This would include the
          arrays specifying the distribution of the outcome variable under
          both the placebo condition and the active-drug condition; the
          critical level for the Wilcoxon test; and the number N of simulated
          clinical trials, and the number 2*m of the size of each trial.

      2.  For each i, 1 <= i <= N, simulate a clinical trial with output equal
          to m simulated lengths-of-hospital-stay in each of the treatment
          groups.  Store the results in a temporary dataset D.

      3.  Apply PROC NPAR1WAY to dataset D.  Include computation of the Wilcoxon
          statistics.  Specify an Output Delivery System (ODS) dataset WOUT which
          will include the the Wilcoxon statistic and its p-value.

      4.  Access the dataset WOUT.  From it, extract the Wilcoxon statistic and
          its p-value.  Append these two values as an observation in another dataset.

      5.  Continue doing steps 2.-4. until you have done N simulations.

      6.  At the end, count how many of the simulations result in a Wilcoxon p-value
          less than (say) 0.05.  This is your estimate of power.


     There are key features of SAS macros which makes all this possible.  A macro can
include calls to SAS procedures.  However, a macro can also include a "do" loop.
This makes it possible to carry out any number of repeated calls to procedures
without writing out those calls explicitly.  This cannot be done in "ordinary"
SAS, and the "do" loop needs to have different syntax from that used in ordinary
SAS, as follows:

-----------------------------------------------------------------------------------

     %macro trialsim(nsim, ) ;

            %do jsim = 1 %to &nsim ;

                * data sets *

                * procedures *

                * output files *

                * accumulate output *

            %end ;

     %mend ;

-----------------------------------------------------------------------------------

     The following program illustrates the method with a simple example.  In this
case, m observations from two distributions are simulated.  The random variable X1
is distributed as the absolute value of an N(0, 1) random variable, and X2 is
distributed as the absolute value of an N(0, 2.25) random variable.  The objective is
to see what the power of the Wilcoxon test will be for comparisons of a sample of m
observations of X1 with a sample of m observations of X2.

-----------------------------------------------------------------------------------

options linesize = 80 ;
footnote "~john-c/5421/notes23b.sas &sysdate &systime" ;

%macro tsim (nsim, m, sig1, sig2) ;

  %do i = 1 %to &nsim ;

      data simtrial ;

        do j = 1 to &m ;

           group = 1 ;
           x = abs(&sig1 * rannor(-1)) ;
           output ;

           group = 2 ;
           x = abs(&sig2 * rannor(-1)) ;
           output ;

        end ;

run ;

ods select none ;                                  * Note: Printed output turned off here ... ;

        proc npar1way data = simtrial ;
             class group ;
             var x ;
             ods output WilcoxonTest = wilcox ;    * ODS dataset defined here ... ;
        run ;

ods select all;                                    * Note: Printing turned back on here ... ;

        proc print data = wilcox ;
        title1 '***** printout of datafile wilcox: *****' ;
        run ;

        data wilcox ;
             retain obsnum 0 ;
             set wilcox ;
             obsnum = obsnum + 1 ;

        data cumulate ;
             set cumulate wilcox ;
             if obsnum ne 6 then delete ;  * Note: Obs 6 has the desired test stats ;
        run;

    %end ;

%mend ;

data cumulate ;
     Label1 = '' ;

%tsim (5, 50, 1, 1.50) ;

proc print data = cumulate ;
title1 '>>>>> Printout of dataset cumulate ... <<<<<' ;
run ;

endsas ;

-----------------------------------------------------------------------------------

                    ***** printout of datafile wilcox: *****                   1
                                               17:13 Saturday, November 12, 2005

            Obs    Label1                  cValue1           nValue1

              1    Statistic               2111.0000     2111.000000
              2                                                    .
              3    Normal Approximation                            .
              4    Z                       -2.8506         -2.850595
              5    One-Sided Pr <  Z       0.0022           0.002182
              6    Two-Sided Pr > |Z|      0.0044           0.004364
              7                                                    .
              8    t Approximation                                 .
              9    One-Sided Pr <  Z       0.0027           0.002655
             10    Two-Sided Pr > |Z|      0.0053           0.005311
--------------------------------------------------------------------
            Obs    Label1                  cValue1           nValue1

              1    Statistic               2268.0000     2268.000000
              2                                                    .
              3    Normal Approximation                            .
              4    Z                       -1.7683         -1.768265
              5    One-Sided Pr <  Z       0.0385           0.038508
              6    Two-Sided Pr > |Z|      0.0770           0.077017
              7                                                    .
              8    t Approximation                                 .
              9    One-Sided Pr <  Z       0.0400           0.040049
             10    Two-Sided Pr > |Z|      0.0801           0.080097
--------------------------------------------------------------------
            Obs    Label1                  cValue1           nValue1

              1    Statistic               2360.0000     2360.000000
              2                                                    .
              3    Normal Approximation                            .
              4    Z                       -1.1340         -1.134033
              5    One-Sided Pr <  Z       0.1284           0.128390
              6    Two-Sided Pr > |Z|      0.2568           0.256781
              7                                                    .
              8    t Approximation                                 .
              9    One-Sided Pr <  Z       0.1298           0.129760
             10    Two-Sided Pr > |Z|      0.2595           0.259519
--------------------------------------------------------------------
            Obs    Label1                  cValue1           nValue1

              1    Statistic               2233.0000     2233.000000
              2                                                    .
              3    Normal Approximation                            .
              4    Z                       -2.0095         -2.009548
              5    One-Sided Pr <  Z       0.0222           0.022239
              6    Two-Sided Pr > |Z|      0.0445           0.044479
              7                                                    .
              8    t Approximation                                 .
              9    One-Sided Pr <  Z       0.0236           0.023600
             10    Two-Sided Pr > |Z|      0.0472           0.047201
--------------------------------------------------------------------
            Obs    Label1                  cValue1           nValue1

              1    Statistic               2416.0000     2416.000000
              2                                                    .
              3    Normal Approximation                            .
              4    Z                       -0.7480         -0.747979
              5    One-Sided Pr <  Z       0.2272           0.227236
              6    Two-Sided Pr > |Z|      0.4545           0.454473
              7                                                    .
              8    t Approximation                                 .
              9    One-Sided Pr <  Z       0.2281           0.228123
             10    Two-Sided Pr > |Z|      0.4562           0.456246
--------------------------------------------------------------------
                  >>>>> Printout of dataset cumulate ... <<<<<                 6
                                               17:13 Saturday, November 12, 2005

                                            c
               Obs    Label1    obsnum    Value1         nValue1

                1       T          6      0.0044        0.004364
                2       T          6      0.0770        0.077017
                3       T          6      0.2568        0.256781
                4       T          6      0.0445        0.044479
                5       T          6      0.4545        0.454473
 
 
 
                    ~john-c/5421/notes23b.sas 12NOV05 17:13
-----------------------------------------------------------------------------------

     Note here that the printout shows the values of variables in the ODS output dataset 
'wilcox'.  The variable of interest is 'nValue1', which is the 2-sided p-value
corresponding to the Wilcoxon Z-statistic.  The final printout shows the accumulated
values of nValue1.  There were only 5 simulations in this case, and in only 2 of the
5 cases, the p-value was less than 0.05.  Thus the estimated power would be 40%.  Of
course in a practical problem, one would want to do many more simulations to estimate
power.  For that purpose, you would probably sort the accumulated data set by the
value of nValue1 and count how many observations had nValue1 < 0.05.

--------------------------------------------------------------------------------------

Problem X:

Use a simulation study macro as described above to estimate
the power of a two-group clinical trial in which the endpoint is survival time.

The assumptions are the following:

    1.  Group 1: exponential survival with pdf:

              f1(t) = exp(-t).

    2.  Group 2: exponential survival with pdf:

              f2(t) = 1.4 * exp(-1.4 * t).

    3.  N = 100 people in each group.

    4.  No censoring.

    5.  alpha = 0.05

    6.  The test of significance will be the Wilcoxon test in PROC LIFETEST.

--------------------------------------------------------------------------------------

/home/walleye/john-c/5421/notes.023b   Last update: November 29, 2010