PubH 7460 Biostatistical Computing - Fall 2007


  1. Homework #1 - due September 11, 2007.

    1. Write a macro to sort an array in SAS. Show how it works by sorting the following array elements:

    18 -12 . 41 2 2 2 95 -95 . . -14 21

    2. Suppose X and Y are two independent random variables each having the same distribution. Let Z = max(X, Y). Perform simulations of size N = 1000 to describe (using PROC UNIVARIATE) the distribution of Z, if:

    (a) X and Y are both uniform on [0, 1]

    (b) X and Y are both N(0, 1)

    In case (b), how can you test whether Z has a normal distribution?

    3. Project Assignment 3 in notes.001.

  2. Homework #2 - due September 25, 2007 ***** Note Change in Due Date *****

    Problem 1, Project 4, notes.002 ***** Note addition *****

    Problem 2, Project 4, notes.002

    Problem 1, notes.003

    Problem 5, part 3., notes.004 ***** Note addition *****

  3. Homework #3 - due October 4, 2007

    
    1.  Assume the following 2 x 2 table:
    
    
                 A       B
             -----------------
             |       |       |
          1  |   a   |   b   |  30
             |       |       |
             -----------------
             |       |       |
          2  |   c   |   d   |  20
             |       |       |
             -----------------
                 20      30     50
    
    
      The margins are fixed as shown.  The counts in the cells are
    variable.
    
      Let 'a' denote the count of observations in the upper left cell
    (the [1, A] cell).  Assume 'a' has a hypergeometric distribution,
    as described in class.
    
      a) Display the true distribution of 'a' as a histogram.
    
      b) Simulate 1000 observations of the variable 'a', assuming
         as above that 'a' has the hypergeometric distribution.
    
         Display the results again as a histogram.
    
      c) Compare the two histograms.
    
    2.  Assume you randomize 200 people, 100 in each to drug A and
        drug B.  The outcome is classified as either Success or
        Failure.  Assume that under the alternative hypothesis, the
        success rate with drug A is 70%, while the success rate with
        drug B is 55%.  Assume you are going to carry out a statistical
        test at the end of the study with a significance level of 0.05.
    
        Carry out a simulation study to estimate the statistical power
        for three different tests for a 2 x 2 table: the chi-square
        test, the continuity adjusted chi-square test, and Fisher's
        exact test.  Include a scatterplot of the p-values of the
        chi-square test versus Fisher's exact test.  The simulation
        study should be based on at least 500 simulated clinical trials.
    
    3.  Problem 7 parts 1. and 2., notes.005.
    
    
  4. Homework #4 - due Tuesday, October 16, 2007

    Problem 10, Parts 1 & 2, notes.008
    
    Problem 11, Parts 1, 2, 3, notes.010.
    
    
    Write a program in SAS or R to perform simple linear regression,
    without using procedures.  The program should compute least-squares
    estimates of beta0 and beta1.  It should compute the model, error,
    and corrected total sums of squares, the F-statistic and corresponding
    p-value, the estimate of s^2, R-square, and the standard errors of
    the estimates of beta0 and beta1.  You should generate a sample data
    set of 100 observations to illustrate how the program works.  You
    should check that your program gives the same answers for all these
    that PROC REG or the corresponding R routine gives.
    
  5. Homework #5 - due Tuesday, October 30, 2007

    
    Problem 12.A, notes.011
    
    Problem 13, notes.012
    
    Write a program in SAS to simulate observations from the geometric
    distribution.  A geometric random variable is the number
    of independent Bernoulli trials required before a success
    occurs, where the probability of success on any given trial
    is 'p'.  Your program should have 'p' as a variable parameter.
    Letting p = .03, generate 10000 simulated observations from the
    geometric distribution, and use the results to estimate the 
    mean and standard deviation.
    
    
  6. Homework #6 - due Thursday, November 15, 2007
    Problem 15, part 2, notes.017
    
    Problem 16, part 1, notes.017
    
    Problem 17, notes.018
    
    
  7. Homework #7 - due Tuesday, November 27, 2007
      Write a program to compute sample size for a clinical trial
    with two groups, where the endpoint is time-to-event (i.e.,
    survival).  The sample size computation should be based on the
    the description in Biostatistical Methods, by John Lachin,
    pages 409-412 [See class handout].  The test statistic is
    the logrank test.  Constant exponential hazards are assumed.
    You can assume that the sample sizes in the two groups will be 
    equal.  Input parameters should include the following:
    
    ==============================================================================
    
    *  alpha = two-sided signif level
    *  power = power = 1 - beta
    *
    *  f = Maximal follow-up time
    *  a = Accrual time (assuming uniform accrual)
    *
    *  r1 = proportion having event in group 1 at time = 1
    *  r2 = proportion having event in group 2 at time = 1
    *
    
    ==============================================================================
    
     Output from the program should look like the following:
    
    ==============================================================================
    
    Logrank sample size program: {program name} 27AUG07 17:26
     
    Computation based on Biostatistical Methods, John Lachin (2000)
     
    Two groups with exponential hazard in each group
     
    Two-sided alpha = 0.05
    Power           = 0.85
     
    Maximal follow-up time f = 2.5
    Accrual time             = 1.5  (uniform accrual assumed)
     
    Expected proportion of events in Group 1 in time = 1 : 0.55
    Expected proportion of events in Group 2 in time = 1 : 0.44
     
    Expected number of events in Group 1 :    189
    Expected number of events in Group 2 :    161
     
    Proportion of patients in Group 1: 0.5
    Proportion of patients in Group 2: 0.5
     
    Hazard in Group 1:     0.799
    Hazard in Group 2:     0.580
    Average hazard   :     0.689
     
    Relative hazard (Group 2 relative to Group1) :     0.726
     
    Required total sample size         :    513
    
    ===============================================================================
    
    You can check that your program is giving approximately the
    right values by comparing the results to those you can obtain
    from PROC POWER in SAS version 9.
    
    
  8. Homework #8 - due Thursday, December 6, 2007
      Problem 18 part 2., notes.019
    
      Problem 19, notes.020
    
      Problem 20, notes.021
    
      Problem 21a, notes.021
    
    

    .

    .

    .

    Web address of this page: http://www.biostat.umn.edu/~john-c/assign7460.f2007.html