Assume you have a sample of size n, and that m people in this sample have a specified characteristic. The sample proportion of people who have this characteristic is denoted by pobs. It has the distribution of a binomial proportion, i.e., the number m has a binomial distribution binom(n, p), where p is the true (and unknown) proportion. The object of this problem is to write an algorithm in SAS or Splus or R which computes an exact confidence interval (plower, pupper) for the true proportion p, given the sample data (n, m).
This is explained in a 1934 paper in Biometrika by Clopper and Pearson.
Clopper and Pearson show that an exact 95% confidence interval has the following property:
If plower is the lower bound for the exact confidence interval, then if the true proportion were equal to plower, the probability that you would observe a sample proportion of size pobs or lower is .975.
If pupper is the upper bound for the exact confidence interval, then if the true proportion were equal to pupper, the probability that you would observe a sample proportion of size pobs or higher is .975.
Given a value for plower, you can find the desired probability by using the cumulative binomial distribution function: in SAS, you would use the function probbnml.
You can find plower by starting with a guess at what it should be (for example, a good starting value might be pobs/2), and then doing a binary search to improve the guess.
So the assignment is to write a program (or a macro) which has as input: n, m, and alpha, and has as output the lower and upper (1 - alpha) exact confidence interval limits, plower and pupper.
Test your program with the following values for n and m:
n = 100, m = 40
n = 1000, m = 30
n = 10000, m = 5
n = 100000, m = 5
n = 100000, m = 1
For of these values, compare your results with the confidence interval that can be obtained with the usual normal approximation.
.
.
.
Web address: http://www.biostat.umn.edu/~john-c/assign5421.s2004.html
Most recent update: December 8, 2004