RANDOMIZATION SCHEDULES SPH 7460 notes.002 Randomized treatment assignments for clinical trials are usually generated using pseudo-random number generators in SAS, FORTRAN, or other languages. If a clinical trial has several different clinical centers, usually separate randomization schedules are generated for each center. These separate groups are referred to as 'strata'. Sometimes there are stratifying factors in addition to clinical center. Typically, investigators want to stratify on variables which are known to have a strong influence on outcome. In a clinical trial in which the outcome is heart attack, it is reasonable to stratify on gender or age group. The object of stratifying is to achieve approximate balance of the treatment groups within each stratum. Say, for example, you want a schedule of treatment assignments to drug D or placebo P for 100 people within a stratum. You want approximately equal numbers assigned to D and P. You don't want, for example, 70 people assigned to D and 30 people assigned to P. Such imbalances can affect the power of the study. If the imbalance occurs on a risk factor for the study's outcome, the results may appear to favor one drug over the other even though there is no real difference between the drugs. Randomization to two groups is basically just like flipping a coin. Unless you take some precautions, you might end up with a bad imbalance between the groups. This weakens the power of the study and makes it more likely that the groups are also imbalanced on other factors. You don't know in advance how many people will ultimately be entered into the trial from a given stratum. You would therefore like to write the randomization schedule so that (1) people have equal probability of being assigned to either group, and (2) approximate balance between the groups is guaranteed at any point in the schedule. How can you do this? The most common method is by creating a randomization schedule from a series of 'permuted blocks' of specified lengths. A permuted block of size N is a based on a random permutation of the N numbers 1, 2, 3, ..., N. Say for example N = 4. You start with the block [1 2 3 4]. You randomly permute it and get [2 4 3 1]. You assign people to drug D if the number is even and to drug P if it is odd. Thus the permuted block gives rise the following sequence of treatment assignments: [2 4 3 1] [D D P P] If you string together five randomly permuted blocks like this, you could get the following treatment schedule: [2 4 3 1][1 4 2 3][4 3 1 2][3 2 1 4][4 3 2 1] D D P P P D D P D P P D P D P D D P D P 1 2 1 0 -1 0 1 0 1 0 1 0 -1 0-1 0 1 0 1 0 <--- D - P imbalance Note that, at ANY POINT in the schedule, the imbalance between D and P is never greater than two. Also, you never have runs of the same treatment assignment of length greater than 4. Also notice that, with all the block sizes being 4, the imbalance between D and P is zero after every 4th randomization. This feature unfortunately makes it possible for patients or clinic coordinators to know the next treatment assignment in certain cases. For example, if you are at the 15th spot in the schedule, and the three previousw treatment assigments are D D P, then you know with absolute certainty that the next assignment is also P. To remedy this problem, randomization schedules are often composed as mixtures of two or more different sizes of permuted blocks. This makes it impossible to know exactly where block boundaries are. For example, in the MRFIT clinical trial, three block sizes were used: 2, 4, and 6. Whenever one block was completed, the size K of the next block was chosen at random, a random permutation of [1 2 ... K] was generated, and the next K treatment assignments were made. The MRFIT randomization schedule was stratified by clinical center. There were 22 clinical centers. There were two treatment groups. The MRFIT randomization schedule guaranteed that the treatments were never out of balance by more than __?__. It also guaranteed that there were no runs of treatment assignments greater than __?__. So the main question is: how do you use pseudo-random number generators to create random permutations of [1 2 ... N], for any given N? ARRAYS IN SAS ... An ARRAY is a vector or matrix. SAS permits vector arrays in data steps. This is not the same as the data arrays which occur in SAS PROC IML, which is basically a matrix language. Data arrays in an ordinary data step in SAS are included as part of the data in each observation in the data set. Consider the following example: ================================================================================== data fvctimes ; array fvc(6) fvc1-fvc6 ; input id age gender fvc1 fvc2 fvc3 fvc4 fvc5 fvc6 ; run ; proc print data = fvctimes ; var id age gender fvc1 fvc2 fvc3 fvc4 fvc5 fvc6 ; title "Example of the use of arrays ..." ; run ; ================================================================================== Here is what is being done in this program. There is an external file which has data including ID, age, gender, and 6 FVC (forced vital capacity) measurements at times 1, 2, 3, 4, 5, and 6. When you write array fvc(6) fvc1-fvc6 you are indicating that you can refer to the elements of the array in two different ways. In other words, fvc(1) is the same thing as fvc1 fvc(2) is the same thing as fvc2, etc. Here is one reason that it is handy to sometimes express several numbers as an array. Perhaps you want to sort those numbers in ascending order. The following program sorts fvc(1), fvc(2), fvc(3), ..., fvc(6) in ascending order: ================================================================================== do i = 2 to 6 ; do j = 1 to i - 1 ; if fvc(j) > fvc(i) then do ; temp = fvc(j) ; fvc(j) = fvc(i) ; fvc(i) = temp ; end ; end ; end ; ================================================================================== This little code-fragment is an example of a "bubble sort". It is a not-very-efficient way to sort things in ascending order. Note that what is happening inside the "if ... then" section is that if fvc(j) and fvc(i) are not in ascending order, then they are interchanged. You need that extra variable "temp" as a placeholder while you replace fvc(j) with fvc(i). There is a variant of the bubbles-sort algorithm which is valuable also. This is the 'sort-and-carry' bubblesort algorithm. What this does is sort one array while at the same time permuting another array in parallel with the sorting of the first array. The following is an example of how this works: -------------------------------------------------------------------- options linesize = 80 ; footnote "~john-c/5421/bubblesort.sas &sysdate &systime" ; * Variant of the bubblesort algorithm: Sort-and-Carry ; data sortanarray ; array fvc(6) fvc1-fvc6 ; array drugs(6) drugs1-drugs6 ; fvc(1) = 4.3 ; drugs(1) = 1 ; fvc(2) = 2.3 ; drugs(2) = 1 ; fvc(3) = 5.0 ; drugs(3) = 1 ; fvc(4) = 1.1 ; drugs(4) = 2 ; fvc(5) = 0.9 ; drugs(5) = 2 ; fvc(6) = 2.3 ; drugs(6) = 2 ; output ; run; proc print data = sortanarray ; var fvc1 fvc2 fvc3 fvc4 fvc5 fvc6 drugs1 drugs2 drugs3 drugs4 drugs5 drugs6 ; title "Print of the array BEFORE it is sorted:" ; run ; data sortanarray ; * Variant of the bubblesort algorithm: Sort-and-Carry ; set sortanarray ; array fvc(6) fvc1-fvc6 ; array drugs(6) drugs1-drugs6 ; do i = 2 to 6 ; do j = 1 to i - 1 ; if fvc(j) > fvc(i) then do ; ftemp = fvc(j) ; fvc(j) = fvc(i) ; fvc(i) = ftemp ; dtemp = drugs(j) ; drugs(j) = drugs(i) ; drugs(i) = dtemp ; end ; end ; end ; run ; proc print data = sortanarray ; var fvc1 fvc2 fvc3 fvc4 fvc5 fvc6 drugs1 drugs2 drugs3 drugs4 drugs5 drugs6 ; title "Print of the array AFTER it is sorted:" ; run ; --------------------------------------------------------------- Print of the arrays BEFORE sorting: 1 16:02 Tuesday, September 13, 2011 Obs fvc1 fvc2 fvc3 fvc4 fvc5 fvc6 drugs1 drugs2 drugs3 drugs4 drugs5 drugs6 1 4.3 2.3 5 1.1 0.9 2.3 1 1 1 2 2 2 Print of the arrays AFTER sorting: 2 16:02 Tuesday, September 13, 2011 Obs fvc1 fvc2 fvc3 fvc4 fvc5 fvc6 drugs1 drugs2 drugs3 drugs4 drugs5 drugs6 1 0.9 1.1 2.3 2.3 4.3 5 2 2 1 2 1 1 ~john-c/5421/bubblesort.sas 13SEP11 16:02 ======================================================================= PROJECT ASSIGNMENT 4 1. Find an efficient way to generate random permutations of [1 2 ... N]. Write SAS code which does this. 2. Write a complete SAS or SPLUS program to generate randomization schedules based on permuted blocks of varying sizes. The key parameters to the program are: 1) The number of treatments 2) The acceptable block sizes (and the number of such sizes) 3) The number of strata 4) The length of the schedule within each stratum (can be assumed to be the same for each stratum). You can assume that all the treatments will be assigned with equal probability. ~john-c/5421/notes.002 Revised Sept 14, 2011.