PubH 7407: Analysis of
Categorical Data -
Spring 2009 | |
Course Objectives
This course provides a graduate-level introduction to models and methods
for analyzing categorical data. The best known categorical data model is
logistic regression. We will explore this model in detail, as well as
log-linear models, ordinal regression models, and approaches to handling correlated data with categorical outcomes.
Course
syllabus (PDF).
Material covered
Tues., Jan. 20: Sections 1.1, 1.2, 1.3. Class
notes.
Thur., Jan. 22: Sections 1.4, 1.5. Class
notes.
Tues., Jan. 27: Sections 2.1, 2.2. Class
notes.
Thur., Jan. 29: Sections 2.3, 2.4. Class
notes.
Tues., Feb. 3: Sections 3.1, 3.2, 3.3. Class
notes.
Thur., Feb. 5: Sections 3.4, 3.5. Class
notes. Some sample code.
Tues., Feb. 10: Sections 4.1, 4.2, 4.3. Class
notes.
Thur., Feb. 12: Sections 4.3, 4.4, 4.5, 4.6, 4.7. Class
notes.
Tues., Feb. 17: Chapter 4, continued.
Thur., Feb. 19: Sections 5.1, 5.2, 5.3. Class
notes.
Tues., Feb. 24: Sections 5.4, 5.5. Class
notes. Some material
on complete and quasicomplete separation.
Thur., Feb. 26: Chapter 5 continued.
Tues., Mar. 3: Chapter 5 continued.
Thur., Mar. 5: Sections 6.1, 6.2, 6.3. Course
notes.
Tues., Mar. 10: Review. Sample problems: most homework (including
odd) problems. Midterm from 2007. Midterm from 2008 with answers. Some review notes.
Thur., Mar. 12: Exam I.
Tues., Mar. 24: Sections 6.3, 6.4, 6.5, 6.6.Course notes. Exam I answer key.
Thur., Mar. 26: Chapter 6 continued.
Tues., Mar. 31: Sections 7.1, 7.2, 7.4.3. Course
notes.
Thur., Apr. 2: Chapter 7, continued.
Tues., Apr. 7: Sections 10.1, 10.2. Course notes.
Thur., Apr. 9: Sections 11.3, 11.4. Course notes.
Tues., Apr. 14: Sections 11.5, 12.1, 12.2. Course notes.
Thur., Apr. 16: Sections 12.2, 12.3. Course
notes.
Tues., Apr. 21: Sections 12.3, 12.6, 12.4. Course
notes. Section 12.5. Course
notes.
Thur., Apr. 23: SAS's GLIMMIX. Course
notes. Sections 10.4.1, 10.5.4, and 4.8. Course
notes.
Tues., Apr. 28. Continued.
Thur., Apr. 30. A bit on sensitivity, specificity, and ROC curves. notes and plots in Word.
Tues., May 5: Review Chapters 6, 7, 10, 11, 12. Review notes. 2007 Exam II. 2008 Exam II.
Thur., May 7: Exam II. Answer key.
Homework assignments
Homework 1, due Thursday Feb. 5, Chapter 1: 1, 2*, 3, 4*, 6*, 7, 8*,
10*, 12ab*
(hint: a formula from probability helps), 17ab, 30* (part b is extra
credit), 31. Chapter 2: 1, 2*, 3, 4*, 5, 7, 8*,
9, 10*. * = hand in.
Homework 2, due Thursday Feb. 12, Chapter 2:
12*, 15, 18abc*, 19, 20*, 21, 29, 30*. Chapter 3: 1, 2*, 3, 4*, 5, 9. * = hand in. There is an
extra credit problem in the notes involving local odds ratios.
Homework 3, due Thursday Feb. 19, Chapter 3:
10*, 11, 12* (also obtain estimate and 95% CI for polychoric correlation),
31, 32ab*. Chapter 4: 1, 2*, 3.
Homework 4, due Thursday Feb. 26, Chapter 4: 5, 6abc*, 7,
8*, 11, 12*, 13, 14*, 15, 17, 18*, 19, 21, 22 (this is why it's important
to group!), 28 (extra credit), 30*, 32*.
Homework 5, due Friday Mar. 6 by noon, Chapter 5: 1abcefgh, 2* (also report
H-L test), 4* (also report H-L test for linear and quadratic models), 6*, 8*, 12* (check for interaction
as well -- here, the interaction model is also the saturated model), 15,
16a*, 17* (check for interaction & report H-L too), 19, 22*, 26*, 28*.
Good problems, but beyond our scope: 33, 34, 37, 42. Sample code for space shuttle data.
Homework 6, due Thursday Apr. 2. vasoconstriction data; heart
data. For the problems involving data from the Los Angeles Heart
Study and problem 5.26, only consider logistic regression, i.e. not
probit or cloglog.
Homework 7, due Thursday, April 9. Chapter
7: 1abc, 2*, 3, 4*, 7, 9*, 10*, 22*, 29.
Homework 8, due Thursday, April 16. Chapter 10: 1*, 4*. For 4, perform a conditional analysis
in proc logistic using the strata and exact commands and interpret the results. Do not do parts (a) through (e). Here's the
data. Chapter 11: 2*
(hint: you will define a CLASS variable called "substance" with
three levels), 3, 6, 7b, 8*, 9, 10*. Substance use data set; variables are subject, use
(yes/no=1/0), type (1,2,3=alcohol, cigarettes, marijuana), race
(1/0=white/other), and gender (1/0=female/male). Sample SAS code to get the data
for problem 10 in a good form for GENMOD.
Homework 9, due April 28. Paper and data for
multi-site clinical trial.
Teratology data.
Extra credit problem and data. Optional, but due April 28 if
you do it.
Homework solutions from two years ago: Solutions 1, Solutions 2, Solutions 3, Solutions 4, Solutions 5.
More solutions: solutions 6; solutions 7 with code
and output; solutions 8.