PubH 7460 Biostatistical Computing - Fall 2009


  1. Homework #1 - due September 17, 2009.

    1. Write a macro to sort an array in SAS. Show how it works by sorting the following array elements:

    18 -12 . 41 2 2 2 95 -95 . . -14 21

    2. Suppose X and Y are two independent random variables each having the same distribution. Let Z = max(X, Y). Perform simulations of size N = 1000 to describe (using PROC UNIVARIATE) the distribution of Z, if:

    (a) X and Y are both uniform on [0, 1]

    (b) X and Y are both N(0, 1)

    In case (b), how can you test whether Z has a normal distribution?

    3. Project 3 - Problem 3, notes.001.

  2. Homework #2 - due September 24, 2009

    Problem 1, Project 4, notes.002

    Note: Efficient here means, not too many comparisons of elements of an array are required.

    Problem 2, Project 4, notes.002

    Problem 5, part 3., notes.004

  3. Homework #3 - due October 6, 2009

    1.  Assume the following 2 x 2 table:
    
    
                 A       B
             -----------------
             |       |       |
          1  |   a   |   b   |  11
             |       |       |
             -----------------
             |       |       |
          2  |   c   |   d   |   9
             |       |       |
             -----------------
                  8      12     20
    
    
      The margins are fixed as shown.  The counts in the cells are
    variable.
    
      Let 'a' denote the count of observations in the upper left cell
    (the [1, A] cell).  Assume 'a' has a hypergeometric distribution.
    
      a) Display the true distribution of 'a' as a histogram.
    
      b) Simulate 1000 observations of the variable 'a', assuming
         as above that 'a' has the hypergeometric distribution.
    
         Display the results again as a histogram.
    
      c) Compare the two histograms.
    
    2.  Assume you randomize 200 people, 100 in each to drug A and
        drug B.  The outcome is classified as either Success or
        Failure.  Assume that under the alternative hypothesis, the
        success rate with drug A is 70%, while the success rate with
        drug B is 55%.  Assume you are going to carry out a statistical
        test at the end of the study with a significance level of 0.05.
    
        Carry out a simulation study to estimate the statistical power
        for three different tests for a 2 x 2 table: the chi-square
        test, the continuity adjusted chi-square test, and Fisher's
        exact test.  Include a scatterplot of the p-values of the
        chi-square test versus Fisher's exact test.  The simulation
        study should be based on at least 500 simulated clinical trials.
    
    3.  Problem 7 parts 1. and 2., notes.005.
    
    
  4. Homework #4 - due Thursday, October 15, 2009

      4.1  Problem 10, Parts 1 & 2, notes.008.  Note that the file 'lhs.listing' is on
           the course website, right after notes.008.
    
      4.2  Problem 11, Parts 1, 2, 3, notes.010.  Note that the datafile 'lhs.data' is on
           the course website, right after notes.010.
    
      4.3  Write a program in SAS or R to perform simple linear regression,
           without using procedures.  The program should compute least-squares
           estimates of beta0 and beta1.  It should compute the model, error,
           and corrected total sums of squares, the F-statistic and corresponding
           p-value, the estimate of s^2, R-square, and the standard errors of
           the estimates of beta0 and beta1.  You should generate a sample data
           set of 100 observations to illustrate how the program works.  You
           should check that your program gives the same answers for all these
           that PROC REG or the corresponding R routine gives.
    
  5. Homework #5 - due Thursday, October 22, 2008

    
    1.  Problem 12.A, notes.011
    
    2.  Problem 13, notes.012
    
    3.  Find the matrix of the linear transformation T: R^2 ---> R^2
        which is reflection through the line y = 2*x.
    
    4.  Find the matrix of the linear transformation S(T), where T
        is the linear transformation in preceding problem 3 and S is
        the linear transformation of counterclockwise rotation by
        30 degrees.  Is S(T) the same thing as T(S) ?
    
  6. Homework #6 - due Tuesday, November 10, 2009

    
    Problem 14, both parts, notes.016
    
    Problem 15, both parts, notes.017
    
    Problem 16, part 1, notes.017
    
    
  7. Homework #7 - due Thursday Nov 19 2009:

    
    Problem 18 part 2, notes.019
    
    Problem 19, notes.020
    
    
  8. Homework #8 - due Tuesday, December 1, 2009
    Write a program to compute sample size for a clinical trial
    with two groups, where the endpoint is time-to-event (i.e.,
    survival).  The sample size computation should be based on the
    the description in Biostatistical Methods, by John Lachin,
    pages 409-412 [See class handout].  The test statistic is
    the logrank test.  Constant exponential hazards are assumed.
    You can assume that the sample sizes in the two groups will be 
    equal.  Input parameters should include the following:
    
    ==============================================================================
    
    *  alpha = two-sided signif level
    *  power = 1 - beta
    *
    *  f = Maximal follow-up time
    *  a = Accrual time (assuming uniform accrual)
    *
    *  r1 = proportion having event in group 1 at time = 1
    *  r2 = proportion having event in group 2 at time = 1
    *
    
    ==============================================================================
    
     Output from the program should look like the following:
    
    ==============================================================================
    
    Logrank sample size program: {program name} 27AUG07 17:26
     
    Computation based on Biostatistical Methods, John Lachin (2000)
     
    Two groups with exponential hazard in each group
     
    Two-sided alpha = 0.05
    Power           = 0.85
     
    Maximal follow-up time f = 2.5
    Accrual time             = 1.5  (uniform accrual assumed)
     
    Expected proportion of events in Group 1 in time = 1 : 0.55
    Expected proportion of events in Group 2 in time = 1 : 0.44
     
    Expected number of events in Group 1 :    189
    Expected number of events in Group 2 :    161
     
    Proportion of patients in Group 1: 0.5
    Proportion of patients in Group 2: 0.5
     
    Hazard in Group 1:     0.799
    Hazard in Group 2:     0.580
    Average hazard   :     0.689
     
    Relative hazard (Group 2 relative to Group1) :     0.726
     
    Required total sample size         :    513
    
    ===============================================================================
    
    You can check that your program is giving approximately the
    right values by comparing the results to those you can obtain
    from PROC POWER in SAS version 9.
    
    
    Web address: http://www.biostat.umn.edu/~john-c/assign7460.f2009.html

    Most recent update: November 20, 2009.