Random Permutations and Permutation Tests A. Generating Random Permutations Efficiently ---------------------------------------------- Given an array (x1, x2, x3, ..., xN), it is possible to create a random permutation by (1) writing out a list of n random numbers, (r1, r2, r3, ..., rN), (2) sorting the list in ascending order, and carrying along the x's in parallel. This is not an efficient way to create random permutations, because the process of sorting a list of numbers uses a lot of computer time. Some sorting algorithms are more efficient than others; the bubble sort is rather inefficient, requiring N * (N - 1) comparisons and moves. It is possible to create a random permutation with almost no comparisons, and with the number of moves being of order N. Here is how this is done. Assume you have two rows of boxes. In the first row of boxes there are the N numbers 1, 2, 3, ... N in order. The second row of boxes is empty. Begin by randomly choosing one number from the Row 1 and putting it in the first box in Row 2. Say that number came from the i-th box in Row 1. Then take the number which is in the last position in Row 1. Put it into the i-th (now empty) box in Row 1. Row 1 now has the first N - 1 boxes nonempty and the remaining box is empty. For the next step, choose a box randomly from the first N - 1 nonempty boxes in Row 1. Put the contents into the SECOND position in Row 2. Again move the contents of the (N - 1)th box in Row 1 into the box that was just vacated. Keep doing this. When you have finished, Row 2 is a random permutation of Row 1. This illustrates a general principle: usually it is easier to scramble something than it is to unscramble it. B. Permutation Tests --------------------- Suppose you have conducted a clinical trial. This is a simple two-group design (Drug A vs. Drug B) with a dichotomous outcome: e.g. X = 0 indicates failure, X = 1 indicates success. The following table summarizes the design: A B |-------|-------| | | | X = 0 | a | b | n1 | | | |-------|-------| | | | X = 1 | c | d | n2 | | | |-------|-------| m1 m2 The usual tests for significance are Fisher's Exact test, a chi-square test, or a Yates-corrected chi-square test. There is however another way to produce a test statistic. In the table above, m1 people were randomized to group A and m2 people were randomized to group B. This resulted in the table entries that are shown: a, b, c, and d. Now suppose after the fact you re-randomized the m1 + m2 people in such a way that m1 of them were assigned to group A and m2 were assigned to group B. The resulting 2 x 2 table has entries a', b', c' and d', which in general are not equal to the original entries. One measure of the treatment effect in such a table is the comparison of proportions of people in each group who have X = 1, that is, a comparison of these proportions within each group. In the original table, these proportions are pA = c / m1 and pB = d / m2. The observed difference of these proportions is d = pA - pB. The question you want to answer is: how likely is it that, just by chance - that is, assuming there is no true treatment effect - that you would observe a difference as large as d, or larger? So assume you carry out 10,000 "pseudo-randomizations". That is, you randomly assign m1 of the people to treatment A and m2 to treatment B. For each one of these pseudo-randomizations, you get table entries a', b', c', and d'. The corresponding proportions are pA' = c' / m1 and pB' = d' / m2. The corresponding difference of proportions is d' = pA' - pB'. You do this computation for each of the 10,000 pseudo-randomizations. [Note that you do NOT change the outcomes for the people in the study. You are just randomly re-assigning them to a different pseudo-treatment-group. Effectively you are just putting a different treatment label on then.] Then you count how many times you find that d' is bigger than or equal to d. Say that happens in 320 cases out of the 10,000 re- randomizations. Then let r = 320 / 10,000 = .032. This is an estimate of the probability that, if there is no true treatment effect, you would end up with an estimated effect which was bigger than or equal to the effect d = pA - pB that was actually observed. So by this means you get an estimated p-value, .032. Note that this is a one-sided test. However it is easy to modify this to get a two-sided test. You would need to count how many times out of 10,000 you find that d' <= -d and add that to the number of times that you found d' >= d. Say the resulting total is 606. Then the resulting estimate of the p-value is 0.0606. This is called a permutation test. This is a special case. You can also carry out permutation tests instead of using other test statistics. A great virtue of permutation tests is that they do not depend on knowing underlying distributions. In that sense they may be preferable to using other tests in small-sample size situations where you are uncertain what about what the variance structure of the outcome variable might be. Permutation tests are essentially non- parametric tests which take advantage of high-speed computing. Problem ------- a) Suppose you are conducting a two-group randomized trial with a quantitative outcome. In many such trials, the test statistic would be a two-sample t-test. Write a macro to carry out a permutation test instead of the t-test. The form of the macro should look like the following: %macro permttest(datafile, treatment_group, outcome, nperm, nsides, permpvalue) ; Here datafile is a file of the following form: Obs treatment_group outcome ----- --------------- ------- 1 A x1 2 B x2 3 B x3 4 A x4 . . . . . . . . . 96 B x96 97 B x97 98 A x98 99 A x99 100 A x100 *** nperm indicates the number of permutations. *** nsides = 1 or 2 depending on whether you want a one-sided test or a 2-sided one. *** permpvalue = the estimated p-value based on the nperm pseudo-randomizations. b) Find a real dataset for a two-group clinical trial with a quantitative outcome (e.g, weight change or blood pressure) and apply your permutation test to it. Compare the result with what you would get from PROC NPAR1WAY and PROC TTEST. Web address: http://www.biostat.umn.edu/~john-c/random.permutations Last modified: September 29, 2012.