December 17, 2003 page 1 of 6
SPH 5421 Final Examination Name: ________________________________________
=================================================================================
1. The random variable X has the distribution specified by:
prob(X = n) = 1 / 2^n,
where n = 1, 2, 3, ....
Write a SAS program (not using any SAS procedures) which
(1) Generates 100 independent observations
from this distribution.
(2) Computes the mean of the observations
(3) Computes the variance of the observations.
[25]
options linesize = 80 ;
data geom ;
n = 100 ;
sum = 0 ;
sumsq = 0 ;
do i = 1 to n ;
r = ranuni(-1) ;
sumprob = 0 ;
m = 1 ;
do j = 1 to 100 ;
sumprob = sumprob + 1 / 2**m ;
if r < sumprob then goto jump1 ;
m = m + 1 ;
end ;
jump1:
sum = sum + m ;
sumsq = sumsq + m*m ;
output ;
end ;
mean = sum / n ;
var = (sumsq - n*mean*mean)/(n - 1) ;
output ;
run ;
proc print ;
December 17, 2003 page 2 of 6
SPH 5421 Final Examination Name: ________________________________________
=================================================================================
2. Let X1, X2, ..., Xn be a sample of observations of the random
variable X.
Define the LOWEST TERTILE to be the [n/3] smallest values in the
sample, where [n/3] is the largest integer less than or equal
to n/3.
Define the HIGHEST TERTILE to be the [n/3] largest values.
Define the "1/3 trimmed mean" to be the mean of the sample after
the lowest tertile and highest tertile are thrown out.
Write a SAS macro to compute the 1/3 trimmed mean of a sample.
The call to the macro should look like the following:
%trim3 (dataset, n, x, tmean), where
dataset = a data set that includes the values for x
n = number of observations in the dataset (you can assume
none are missing)
x = the variable of interest
tmean = output trimmed mean.
[25]
%macro trim3 (dataset, n, x, tmean) ;
proc sort data = &dataset ; by &x ;
data xsort ;
retain xobs 0 ;
set &dataset ;
xobs = xobs + 1 ;
run ;
proc means data = xsort n mean std ;
where xobs gt int(%n / 3) and xobs le int(2 * &n / 3) ;
var &x ;
output out = xmean
&tmean = mean ;
run ;
%mend ;
December 17, 2003 page 3 of 6
SPH 5421 Final Examination Name: ________________________________________
=================================================================================
3. A datafile has the following structure:
OBS ID X
----- ---- ---
1 1 16
2 1 15
3 1 18
4 2 4
5 2 7
6 2 2
7 3 X7
8 3 X8
9 3 X9
10 4 X10
11 4 X11
12 4 X12
13 5 X13
14 5 X14
14 5 X15
etc.
That is, there are 3 consecutive observations for each ID.
Write a SAS program which reads in this datafile and writes
out another datafile which has the following structure:
OBS ID R S T
----- ---- --- --- ---
1 1 X1 X2 X3
2 2 X4 X5 X6
3 3 X7 X8 X9
etc.
[25]
data xobs ;
retain casecount 0 x1 x2 ;
input ID x ;
casecount = casecount + 1 ;
if casecount = 1 then x1 = x ;
if casecount = 2 then x2 = x ;
if casecount = 3 then do ;
x3 = x ;
output ;
casecount = 0 ;
end ;
run ;
proc print ;
endsas ;
December 17, 2003 page 4 of 6
SPH 5421 Final Examination Name: ________________________________________
=================================================================================
4. A program produces maximum likelihood estimates s and t of
two parameters S and T. It also produces a covariance matrix
A for s and t:
| .02 -.01 |
A = | |
| -.01 .03 |.
Find an estimated standard error of r = s^2 + 3 * s * t.
[25]
var(r) = (approx) (dr/ds)^2 * var(s) + 2*(dr/ds) * (dt/ds) * cov(s, t)
+ (dr/dt)^2 * var(t)
= (2*s + 3*t)^2*(.02) + 2*(2*s + 3*t)*(3*s)*(-.01) + (3*s)^2 * (.03)
December 17, 2003 page 5 of 6
SPH 5421 Final Examination Name: ________________________________________
=================================================================================
5. Levels of cortisol in a person's blood tend to vary according to
the time of day that the blood is drawn. Here is a graph of
cortisol levels for individuals, plotted against time of day
on a 24-hour clock:
|
.20 | xxx xxx
| xxxxx x xxx
| x xx x
.15 | x x xx
| x xxx x
|x xxxxx xx xxxx
.10 | x x xx
| x x
| xx xxx
.05 |_______xx___________________________________________
0 4 8 12 16 20 24 time t
A reasonable model for the expected cortisol level might be:
E(C(t)) = a + b * cos(c + d*pi*t),
where time t is in hours.
a) Describe what the parameters are in terms of the graph.
b) Specify further assumptions which are needed to justify using
a least-squares procedures to obtain estimates of the para-
meters a, b, c, and d.
c) What would good initial guesses be for parameters a, b,
c, and d ?
d) Write a PROC NLIN program which produces least-squares
estimates of the parameters.
[25]
a): a = overall mean
b = amplitude
c = phase offset
d = frequency
b): error ~ N(0, sig^2).
c): a = .13, b = .08, c = 3.8, d = 3.8 = 4*pi/3 ;
d):
proc nlin method = marquardt ;
pars a = .13
b = .08
c = 3.8
d = .08 ;
der.a = 1 ;
der.b = cos(c + d * pi * t) ;
der.c = -b * sin(c + d * pi * t) ;
der.d = - b*pi*t*sin(c + d * pi * t) ;
f = a + b * cos(c + d * pi * t) ;
model y = f ;
run ;
endsas ;
December 17, 2003 page 6 of 6
SPH 5421 Final Examination Name: ________________________________________
=================================================================================
6. Short answers:
1) What is an eigenvector?
Given an n x n matrix A, an eigenvector v is an n x 1 column vector
such that A * v = a * v for some nonzero constant a.
2) What is an advantage of the simplex method of computing
a minimum of a function ?
Usually converges, and does not need expressions for derivatives.
3) What is a disadvantage of the simplex method?
Slow to converge, does not automatically give an estimate of variance.
4) Suppose f(x) = 5*x - exp(x). You can find a solution
to f(x) = 0 by the use of Newton's method. The key
equation is
x(n + 1) = x(n) - ??? / ???.
x(n + 1) = x(n) - f(x) / f'(x)
= x(n) - (t*x - exp(x)) / (5 - exp(x)) ;
5) If X has a Poisson distribution with parameter h = 1,
give the probabilities that:
X = 0 : h^0 * exp(-h) / 0! = 1/e
X = 1 : h^1 * exp(-h) / 1! = 1/e
X = 2 : h^2 * exp(-h)/2! = 1 / (2 * e) ;
[25]