RANDOMIZATION SCHEDULES                           SPH 7460 notes.002

Randomized treatment assignments for clinical trials are usually
generated using pseudo-random number generators in SAS, FORTRAN,
or other languages.

If a clinical trial has several different clinical centers, usually
separate randomization schedules are generated for each center.
These separate groups are referred to as 'strata'.

Sometimes there are stratifying factors in addition to clinical
center.  Typically, investigators want to stratify on variables which are
known to have a strong influence on outcome.  In a clinical trial in which the
outcome is heart attack, it is reasonable to stratify on gender or age group.

The object of stratifying is to achieve approximate balance of the
treatment groups within each stratum.  Say, for example, you want
a schedule of treatment assignments to drug D or placebo P for
100 people within a stratum.  You want approximately equal numbers
assigned to D and P.  You don't want, for example, 70 people assigned
to D and 30 people assigned to P.  Such imbalances can affect the power of
the study.  If the imbalance occurs on a risk factor for the study's outcome,
the results may appear to favor one drug over the other even though there is
no real difference between the drugs.

Randomization to two groups is basically just like flipping a coin.
Unless you take some precautions, you might end up with a bad
imbalance between the groups.  This weakens the power of the study
and makes it more likely that the groups are also imbalanced on other
factors.

You don't know in advance how many people will ultimately be entered
into the trial from a given stratum.  You would therefore like to
write the randomization schedule so that (1) people have equal
probability of being assigned to either group, and (2) approximate
balance between the groups is guaranteed at any point in the
schedule.

How can you do this?

The most common method is by creating a randomization schedule from
a series of 'permuted blocks' of specified lengths.

A permuted block of size N is a based on a random permutation of the
N numbers 1, 2, 3, ..., N.

Say for example N = 4.  You start with the block [1 2 3 4].  You
randomly permute it and get [2 4 3 1].  You assign people to drug D
if the number is even and to drug P if it is odd.  Thus the permuted
block gives rise the following sequence of treatment assignments:

                      [2 4 3 1]

                      [D D P P]

If you string together five randomly permuted blocks like this, you
could get the following treatment schedule:

    [2 4 3 1][1 4 2 3][4 3 1 2][3 2 1 4][4 3 2 1]

     D D P P  P D D P  D P P D  P D P D  D P D P

     1 2 1 0 -1 0 1 0  1 0 1 0 -1 0-1 0  1 0 1 0  <--- D - P imbalance

Note that, at ANY POINT in the schedule, the imbalance between D and P
is never greater than two.  Also, you never have runs of the same
treatment assignment of length greater than 4.

Also notice that, with all the block sizes being 4, the imbalance
between D and P is zero after every 4th randomization.  This feature
unfortunately makes it possible for patients or clinic coordinators
to know the next treatment assignment in certain cases.  For example,
if you are at the 15th spot in the schedule, and the three previousw
treatment assigments are  D D P,  then you know with absolute
certainty that the next assignment is also P.

To remedy this problem, randomization schedules are often composed as
mixtures of two or more different sizes of permuted blocks.  This
makes it impossible to know exactly where block boundaries are.  For
example, in the MRFIT clinical trial, three block sizes were used:
2, 4, and 6.  Whenever one block was completed, the size K of the next
block was chosen at random, a random permutation of [1 2 ... K] was
generated, and the next K treatment assignments were made.

The MRFIT randomization schedule was stratified by clinical center.
There were 22 clinical centers.  There were two treatment groups.

The MRFIT randomization schedule guaranteed that the treatments were
never out of balance by more than __?__.  It also guaranteed that there
were no runs of treatment assignments greater than __?__.

So the main question is: how do you use pseudo-random number
generators to create random permutations of [1 2 ... N], for any
given N?

ARRAYS IN SAS ...

An ARRAY is a vector or matrix.

SAS permits vector arrays in data steps.  This is not the same
as the data arrays which occur in SAS PROC IML, which is basically
a matrix language.

Data arrays in an ordinary data step in SAS are included as part of
the data in each observation in the data set.  Consider the following
example:

==================================================================================

data fvctimes ;

     array fvc(6) fvc1-fvc6 ;
     input id age gender fvc1 fvc2 fvc3 fvc4 fvc5 fvc6 ;

run ;

proc print data = fvctimes ;
     var   id age gender fvc1 fvc2 fvc3 fvc4 fvc5 fvc6 ;
title "Example of the use of arrays ..." ;
run ;

==================================================================================

Here is what is being done in this program.  There is an external file
which has data including ID, age, gender, and 6 FVC (forced vital capacity)
measurements at times 1, 2, 3, 4, 5, and 6.

When you write

     array fvc(6) fvc1-fvc6

you are indicating that you can refer to the elements of the array
in two different ways.  In other words,

   fvc(1) is the same thing as fvc1
   fvc(2) is the same thing as fvc2,   etc.

Here is one reason that it is handy to sometimes express several numbers
as an array.  Perhaps you want to sort those numbers in ascending order.
The following program sorts fvc(1), fvc(2), fvc(3), ..., fvc(6) in ascending
order:

==================================================================================

   do i = 2 to 6 ;

      do j = 1 to i - 1 ;

         if fvc(j) > fvc(i) then do ;

            temp = fvc(j) ;
            fvc(j) = fvc(i) ;
            fvc(i) = temp ;

         end ;

      end ;

    end ;

==================================================================================

This little code-fragment is an example of a "bubble sort".  It is a
not-very-efficient way to sort things in ascending order.  Note that what is
happening inside the "if ... then" section is that if fvc(j) and fvc(i)
are not in ascending order, then they are interchanged.  You need that extra
variable "temp" as a placeholder while you replace fvc(j) with fvc(i).

There is a variant of the bubbles-sort algorithm which is valuable
also.  This is the 'sort-and-carry' bubblesort algorithm.  What this
does is sort one array while at the same time permuting another
array in parallel with the sorting of the first array.  The following
is an example of how this works:

--------------------------------------------------------------------

options linesize = 80 ;
footnote "~john-c/5421/bubblesort.sas &sysdate &systime" ;

* Variant of the bubblesort algorithm: Sort-and-Carry ;

data sortanarray ;

     array fvc(6) fvc1-fvc6 ;
     array drugs(6) drugs1-drugs6 ;

     fvc(1) = 4.3 ;  drugs(1) = 1 ;
     fvc(2) = 2.3 ;  drugs(2) = 1 ;
     fvc(3) = 5.0 ;  drugs(3) = 1 ;
     fvc(4) = 1.1 ;  drugs(4) = 2 ;
     fvc(5) = 0.9 ;  drugs(5) = 2 ;
     fvc(6) = 2.3 ;  drugs(6) = 2 ;

output ;

run;

proc print data = sortanarray ;
     var fvc1 fvc2 fvc3 fvc4 fvc5 fvc6
         drugs1 drugs2 drugs3 drugs4 drugs5 drugs6 ;
title "Print of the array BEFORE it is sorted:" ;
run ;

data sortanarray ;

* Variant of the bubblesort algorithm: Sort-and-Carry ;

     set sortanarray ;
     array fvc(6) fvc1-fvc6 ;
     array drugs(6) drugs1-drugs6 ;
     do i = 2 to 6 ;
     do j = 1 to i - 1 ;

     if fvc(j) > fvc(i) then do ;

        ftemp = fvc(j) ;
        fvc(j) = fvc(i) ;
        fvc(i) = ftemp ;
        dtemp = drugs(j) ;
        drugs(j) = drugs(i) ;
        drugs(i) = dtemp ;

     end ;

     end ;

     end ;

run ;

proc print data = sortanarray ;
     var fvc1 fvc2 fvc3 fvc4 fvc5 fvc6
         drugs1 drugs2 drugs3 drugs4 drugs5 drugs6 ;
title "Print of the array AFTER it is sorted:" ;
run ;

---------------------------------------------------------------

                    Print of the arrays BEFORE sorting:                    1
                                               16:02 Tuesday, September 13, 2011

  Obs fvc1 fvc2 fvc3 fvc4 fvc5 fvc6 drugs1 drugs2 drugs3 drugs4 drugs5 drugs6

   1   4.3  2.3   5   1.1  0.9  2.3    1      1      1      2      2      2  
 


                     Print of the arrays AFTER sorting:                    2
                                               16:02 Tuesday, September 13, 2011

  Obs fvc1 fvc2 fvc3 fvc4 fvc5 fvc6 drugs1 drugs2 drugs3 drugs4 drugs5 drugs6

   1   0.9  1.1  2.3  2.3  4.3   5     2      2      1      2      1      1  
 
 
                   ~john-c/5421/bubblesort.sas 13SEP11 16:02

=======================================================================

PROJECT ASSIGNMENT 4

1.  Find an efficient way to generate random permutations of
    [1 2 ... N].  Write SAS code which does this.


2.  Write a complete SAS or SPLUS program to generate randomization
    schedules based on permuted blocks of varying sizes.  The
    key parameters to the program are:

    1) The number of treatments

    2) The acceptable block sizes (and the number of such sizes)

    3) The number of strata

    4) The length of the schedule within each stratum (can be assumed
       to be the same for each stratum).


    You can assume that all the treatments will be assigned with equal
    probability.


~john-c/5421/notes.002   Revised Sept 14, 2011.