SAS MACROS: Simulations Using SAS Procedures SPH 5421 notes.023b The purpose of simulations frequently is to investigate properties of a hypothetical distribution of data. An example: you want to simulate a clinical trial of a drug which would be given to people hospitalized for heart disease. You want to know whether the drug will decrease the average length of stay in the hospital. This is important because (1) staying in the hospital is very expensive and uncomfortable for the patient, and (2) a shorter length of hospital stay generally indicates that the person has recovered more quickly from the original illness. The difficulty is, the length of hospital stay has an unknown distribution. Certainly it is not normally distributed - since hospital length of stay is usually recorded in days, it takes on only nonnegative integer values. But it does not have a Poisson distribution either. You may have access to an empirical distribution from a previous study, but you may not know of any explicit parametric distribution which fits the data. What you would most want to do in this case is simulate your clinical trial, specifying the distribution of length-of-stay in the placebo group as being identical to your empirical distribution, and specifying a certain shift of this distribution in the active drug group. Then simulate a two- group clinical trial using these two distributions. Compute a test statistic to compare the two groups. You may want to use a nonparametric test since the two distributions do not fit any parametric form. Therefore you might use PROC NPAR1WAY to do the test, perhaps focussing on the value of the Wilcoxon statistic (or its p-value) for comparison of two independent groups. Ideally, you might simulate the trial 10,000 times under the alternative hypothesis, and then examine the results to see how often the p-value is below your desired critical level (say, 0.05). How do you do this? You could submit 10,000 copies of the simulation, each with a different output file, and then combine all the results into one file at the end, and then count how many times out of 10,000 the p-value was less than 0.05. Then divide this count by 10,000 to obtain an estimate of the power of the proposed trial, given the alternative hypothesis. There are several things wrong with trying to do this. It is excessively cumbersome. It is expensive in terms of time and space. You have to somehow write 10,000 programs and somehow combine all the output. The output from each run of PROC NPAR1WAY is several pages, and all you want from it are one or two numbers (e.g., Wilcoxon statistic and the p-value). You have to extract those things and discard the rest, collect all the results, count the number of "significant" results, and divide by 10,000 to estimate the power. It should be possible to do the whole thing in one program. In fact this can be done using a macro. The following outline shows the strategy: 1. Specify the input parameters to the macro. This would include the arrays specifying the distribution of the outcome variable under both the placebo condition and the active-drug condition; the critical level for the Wilcoxon test; and the number N of simulated clinical trials, and the number 2*m of the size of each trial. 2. For each i, 1 <= i <= N, simulate a clinical trial with output equal to m simulated lengths-of-hospital-stay in each of the treatment groups. Store the results in a temporary dataset D. 3. Apply PROC NPAR1WAY to dataset D. Include computation of the Wilcoxon statistics. Specify an Output Delivery System (ODS) dataset WOUT which will include the the Wilcoxon statistic and its p-value. 4. Access the dataset WOUT. From it, extract the Wilcoxon statistic and its p-value. Append these two values as an observation in another dataset. 5. Continue doing steps 2.-4. until you have done N simulations. 6. At the end, count how many of the simulations result in a Wilcoxon p-value less than (say) 0.05. This is your estimate of power. There are key features of SAS macros which makes all this possible. A macro can include calls to SAS procedures. However, a macro can also include a "do" loop. This makes it possible to carry out any number of repeated calls to procedures without writing out those calls explicitly. This cannot be done in "ordinary" SAS, and the "do" loop needs to have different syntax from that used in ordinary SAS, as follows: ----------------------------------------------------------------------------------- %macro trialsim(nsim,) ; %do jsim = 1 %to &nsim ; * data sets * * procedures * * output files * * accumulate output * %end ; %mend ; ----------------------------------------------------------------------------------- The following program illustrates the method with a simple example. In this case, m observations from two distributions are simulated. The random variable X1 is distributed as the absolute value of an N(0, 1) random variable, and X2 is distributed as the absolute value of an N(0, 2.25) random variable. The objective is to see what the power of the Wilcoxon test will be for comparisons of a sample of m observations of X1 with a sample of m observations of X2. ----------------------------------------------------------------------------------- options linesize = 80 ; footnote "~john-c/5421/notes23b.sas &sysdate &systime" ; %macro tsim (nsim, m, sig1, sig2) ; %do i = 1 %to &nsim ; data simtrial ; do j = 1 to &m ; group = 1 ; x = abs(&sig1 * rannor(-1)) ; output ; group = 2 ; x = abs(&sig2 * rannor(-1)) ; output ; end ; run ; ods select none ; * Note: Printed output turned off here ... ; proc npar1way data = simtrial ; class group ; var x ; ods output WilcoxonTest = wilcox ; * ODS dataset defined here ... ; run ; ods select all; * Note: Printing turned back on here ... ; proc print data = wilcox ; title1 '***** printout of datafile wilcox: *****' ; run ; data wilcox ; retain obsnum 0 ; set wilcox ; obsnum = obsnum + 1 ; data cumulate ; set cumulate wilcox ; if obsnum ne 6 then delete ; * Note: Obs 6 has the desired test stats ; run; %end ; %mend ; data cumulate ; Label1 = '' ; %tsim (5, 50, 1, 1.50) ; proc print data = cumulate ; title1 '>>>>> Printout of dataset cumulate ... <<<<<' ; run ; endsas ; ----------------------------------------------------------------------------------- ***** printout of datafile wilcox: ***** 1 17:13 Saturday, November 12, 2005 Obs Label1 cValue1 nValue1 1 Statistic 2111.0000 2111.000000 2 . 3 Normal Approximation . 4 Z -2.8506 -2.850595 5 One-Sided Pr < Z 0.0022 0.002182 6 Two-Sided Pr > |Z| 0.0044 0.004364 7 . 8 t Approximation . 9 One-Sided Pr < Z 0.0027 0.002655 10 Two-Sided Pr > |Z| 0.0053 0.005311 -------------------------------------------------------------------- Obs Label1 cValue1 nValue1 1 Statistic 2268.0000 2268.000000 2 . 3 Normal Approximation . 4 Z -1.7683 -1.768265 5 One-Sided Pr < Z 0.0385 0.038508 6 Two-Sided Pr > |Z| 0.0770 0.077017 7 . 8 t Approximation . 9 One-Sided Pr < Z 0.0400 0.040049 10 Two-Sided Pr > |Z| 0.0801 0.080097 -------------------------------------------------------------------- Obs Label1 cValue1 nValue1 1 Statistic 2360.0000 2360.000000 2 . 3 Normal Approximation . 4 Z -1.1340 -1.134033 5 One-Sided Pr < Z 0.1284 0.128390 6 Two-Sided Pr > |Z| 0.2568 0.256781 7 . 8 t Approximation . 9 One-Sided Pr < Z 0.1298 0.129760 10 Two-Sided Pr > |Z| 0.2595 0.259519 -------------------------------------------------------------------- Obs Label1 cValue1 nValue1 1 Statistic 2233.0000 2233.000000 2 . 3 Normal Approximation . 4 Z -2.0095 -2.009548 5 One-Sided Pr < Z 0.0222 0.022239 6 Two-Sided Pr > |Z| 0.0445 0.044479 7 . 8 t Approximation . 9 One-Sided Pr < Z 0.0236 0.023600 10 Two-Sided Pr > |Z| 0.0472 0.047201 -------------------------------------------------------------------- Obs Label1 cValue1 nValue1 1 Statistic 2416.0000 2416.000000 2 . 3 Normal Approximation . 4 Z -0.7480 -0.747979 5 One-Sided Pr < Z 0.2272 0.227236 6 Two-Sided Pr > |Z| 0.4545 0.454473 7 . 8 t Approximation . 9 One-Sided Pr < Z 0.2281 0.228123 10 Two-Sided Pr > |Z| 0.4562 0.456246 -------------------------------------------------------------------- >>>>> Printout of dataset cumulate ... <<<<< 6 17:13 Saturday, November 12, 2005 c Obs Label1 obsnum Value1 nValue1 1 T 6 0.0044 0.004364 2 T 6 0.0770 0.077017 3 T 6 0.2568 0.256781 4 T 6 0.0445 0.044479 5 T 6 0.4545 0.454473 ~john-c/5421/notes23b.sas 12NOV05 17:13 ----------------------------------------------------------------------------------- Note here that the printout shows the values of variables in the ODS output dataset 'wilcox'. The variable of interest is 'nValue1', which is the 2-sided p-value corresponding to the Wilcoxon Z-statistic. The final printout shows the accumulated values of nValue1. There were only 5 simulations in this case, and in only 2 of the 5 cases, the p-value was less than 0.05. Thus the estimated power would be 40%. Of course in a practical problem, one would want to do many more simulations to estimate power. For that purpose, you would probably sort the accumulated data set by the value of nValue1 and count how many observations had nValue1 < 0.05. -------------------------------------------------------------------------------------- Problem X: Use a simulation study macro as described above to estimate the power of a two-group clinical trial in which the endpoint is survival time. The assumptions are the following: 1. Group 1: exponential survival with pdf: f1(t) = exp(-t). 2. Group 2: exponential survival with pdf: f2(t) = 1.4 * exp(-1.4 * t). 3. N = 100 people in each group. 4. No censoring. 5. alpha = 0.05 6. The test of significance will be the Wilcoxon test in PROC LIFETEST. -------------------------------------------------------------------------------------- /home/walleye/john-c/5421/notes.023b Last update: November 29, 2010