SPH 5421 First Exam  October 26, 2004                                     page 1 of 5

SOLUTION KEY                          Name: _________________________________________
=====================================================================================

1.  Mr. Smith goes to a gambling casino and plays the slot machine.  Every time
    he pulls the lever on the slot machine, he has a probability p = .01 of winning.
    Let N the the number of times Mr. Smith pulls the lever before he finally wins.
    Assume that the outcomes of each pull are independent.

    1.1  What is the probability that N = 3 ?  What is the most probable value of N ?

         Prob (N = 3) = .99 * .99 * .01 = .009801

 { 8}    Most probable: N = 1:  prob(N = 1) = .01.


    1.2   Write a SAS program which produces 1000 simulated random values of N
          and also computes an estimated mean and standard deviation of N.

          data simslot ;

 {12}          m = 1000 ;
               p = .01 ;
               nsum = 0 ;
               nsum2 = 0 ;

               do i = 1 to m ;

                  problose = 1 - p ;
                  lose = 1 ;
                  n = 0 ;

                  do while (lose eq 1) ;
                     n = n + 1 ;
                     r = ranuni(-1) ;

                     if r lt p then do ;

                        lose = 0 ;
                        problose = problose * p ;

                     end ;

                     if r ge p then do ;
                        problose = problose * (1 - p) ;
                     end ;

                  end ;

                  nsum = nsum + n ;
                  nsum2 = nsum2 + n*n ;

                  output ;

               end ;

               nave = nsum / m ;
               nvar = (nsum2 - nsum*nsum/m)/(m - 1) ;
               nsdev = sqrt(nvar) ;
               output ;

          run ;

          proc print data = simslot ;
               where i ge 1000 ;
               var i n nave nvar nsdev ;
          run ;

          proc means n mean var stddev data = simslot ;
               var n ;
          run ;

SPH 5421 First Exam  October 26, 2004                                     page 2 of 5

                                      Name: _________________________________________
=====================================================================================

2.  Values of weight, blood pressure, and cholesterol are on three different
    data files for a set of 10 people.  Here are the three data files:

    Data File 1                     Data File 2                Data File 3
    ----------------------------    ---------------------      --------------------------
    ID     Date     Bldpress        ID     Date    Weight      ID     Date       Chol
    ----- ------  --------------    ----- ------   ------      ----- ------   -----------
    0001  040228       88           0004  030401     144       0004  030824       225
    0002  040301        .           0010  030531     207       0012  030829       256
    0003  040327      102           0001  030605     208       0007  030829       169
    0004  040415       92           0006  040714     141       0006  031011       196
    0005  040704       66           0005  040722     130       0005  031017       289
    0006  040901       68           0009  040819      95       0011  031105       144
    0007  041012       70           0003  040820       .       0003  031111         .
    0008  041126      104           0007  040909     130       0008  031207       121
    0009  041225       94           0008  041016     125       0009  031217       361
    0010  041231       80           0002  041030     144       0001  031224       324


    2.1  Write a program which produces a file that has blood pressure, weight,
         and cholesterol for a given ID all on the same line in the file

         data file1 ;
              infile 'file1' ;
              input  id date1 bldpress ;
         run ;

         data file2 ;
              infile 'file2' ;
              input  id date2 weight ;
         run ;

         data file3 ;
              infile 'file3' ;
              input  id date3 chol ;
         run ;

         proc sort data = file1 ; by id ; run ;
         proc sort data = file2 ; by id ; run ;
         proc sort data = file3 ; by id ; run ;

         data allfiles ;
              merge file1 file2 file3 ; by id ;


{12}


    2.2  How many lines will the new file include ?  Explain.

         12, because file3 has 2 ids that the other two files do not have.

 {5}


    2.3  Show the first three lines of the output file.

      obs  ID    date1    date2    date3     bldpress   weight  chol
 {6}  --- ----  -------  -------  -------    --------   ------  ----
       1    1    040228   030605   031224       88        208    324
       2    2    040301   041030     .           .        144      .
       3    3    040327   030605   031111      102          .      .


SPH 5421 First Exam  October 26, 2004                                     page 3 of 5

                                      Name: _________________________________________
=====================================================================================

3.  Randomization in clinical trials is sometimes done in such a way as to achieve
    PROBABLE balance between the two treatment groups A and B without using
    permuted blocks.

    3.1  After the i-th treatment assignment, let S(i) be the proportion who are
         assigned to treatment group A.  We specify that the probability that the
         (i + 1)-st treatment assignment is to group A is

            pA = .1 * S(i)  +  .9 * (1 - S(i)).

         (Note S(0) is defined to be 0.5.  The probability of assignment to B is 1 - pA.)

         Suppose the first 9 treatment assignments are:  B B A B B B A B B.  What is the
         probability that the 10th person will be assigned to group A ?

         pA = .1 * (2/9) + .9 * (7/9) = .0222 + .70 = .7222.
{5}


    3.2  Write a randomization program which produces 1000 randomized treatment 
         assignments with the probabilities of assignment as specified in 3.1.


         data probrand ;

              n = 1000 ;

              m = 0 ;
              r = ranuni(-1) ;
              if  r < .5 then m = 1 ;
{12}          assign = m ;
              s = assign ;
              output ;

              do i = 2 to n ;

                 assign = 0 ;
                 r = ranuni(-1) ;
                 pA = .1 * s + .9 * (1 - s) ;

                 if r < pA then do ;
                    assign = 1 ;
                    m = m + 1 ;
                    s = m / i ;
                 end ;

                 output ;

              end ;


SPH 5421 First Exam  October 26, 2004                                     page 4 of 5

                                      Name: _________________________________________
=====================================================================================

Problem 3, contin.

    3.3  What is an advantage of this kind of randomization schedule over a permuted-
         blocks randomization schedule?  What is a disadvantage ?


         Advantages:    1.  The more out of balance you are, the more likely
                            you are to return to balance.
                        2.  The schedule is not completely predictable at any
                            point.
 {9}
         Disadvantages: 1.  It does not absolutely prevent bad out-of-balance
                        2.  It does not prevent long runs of the same assignment
                        3.  If the formula for pA is known, you can compute exactly
                            the probability that the next assignment will be A.


    3.4  Suppose the formula for pA is changed to

              pA  = .4 * S(i) + .6 * (1 - S(i)).

         What kind of difference will that make in the randomization schedule?
         (Hint: repeat the computation in 3.1 using this modified formula for pA.)

         pA = .4*(2/9) + .6*(7/9) = .08889 + .46667 = .55556.

         In general, it will result in a schedule which is closer to
         being completely random at any point.  More long runs are likely,
         and out-of-balances are more likely.

 {9}

SPH 5421 First Exam  October 26, 2004                                     page 5 of 5

                                      Name: _________________________________________
=====================================================================================

4.  In simple linear regression, where the model is

              Y = b0 + b1*X + e,

    where expectation(e) = 0 and variance(e) = sigma^2, an unbiased estimate of the
    value of sigma^2 can be found by computing the sum of squared residuals and dividing 
    by (n - 2), where n is the number of observations.  The residual for a given
    observation is the difference between the observed value of Y corresponding to
    a given value of X, and the predicted value based on the least-squares estimates
    of the slope and intercept.

    Write a program which performs this computation.  You can assume that the
    input data file has values of X and Y for each of n = 100 observations.


        data regress outstats ;
             retain xsum 0  ysum 0  xysum 0  x2sum 0  nobs 0 ;

             infile 'xy' eof = stats ;
             input x y ;

             if x ne . and y ne . then do ;
                nobs = nobs + 1 ;
                xsum = xsum + x ;
                ysum = ysum + y ;
 {22}           x2sum = x2sum + x * x ;
                xysum = xysum + x * y ;
                output ;
             end ;
             return ;

stats:
             top = xysum - xsum * ysum / nobs ;
             bot = x2sum - xsum * xsum / nobs ;
             slope = top / bot ;
             yave = ysum / nobs ;
             xave = xsum / nobs ;
             intcpt = yave - slope * xave ;
             kobs = nobs ;

             do nobs = 1 to kobs ;
                output stats ;
             end ;


             data regress ; merge regress stats ; by nobs ;
                  retain sumres2 0 ;
                  predy = intcpt + slope * x ;
                  resid = y - predy ;
                  sumres2 = sumres2 + resid**2 ;
                  s2 = sumres2 / (kobs - 2) ;
                  run ;

             proc print data = regress ;
             where nobs eq kobs ;
             var s2 ;
run ;