School of Public Health

Department of Biostatistics


PubH 7407: Analysis of Categorical Data - Spring 2007


Course Objectives

This course provides a graduate-level introduction to models and methods for analyzing categorical data. The best known categorical data model is logistic regression. We will explore this model in detail, as well as log-linear models, ordinal regression models, discrete choice models, Poisson regression, and random effects models with categorical outcomes. Course syllabus (PDF).


Material covered

  • Tues., Jan. 16: Sections 1.1, 1.2, 1.3. Class notes.
  • Thur., Jan. 18: Sections 1.4, 1.5. Class notes.
  • Tues., Jan. 23: Sections 2.1, 2.2. Class notes.
  • Thur., Jan. 25: Sections 2.3, 2.4. Class notes.
  • Tues., Jan. 29: Sections 3.1, 3.2, 3.3. Class notes.
  • Thur., Jan. 31: Sections 3.4, 3.5. Class notes. Some sample code.
  • Tues., Feb. 6: Sections 4.1, 4.2, 4.3. Class notes.
  • Thur., Feb. 8: Sections 4.3, 4.4, 4.5, 4.6, 4.7. Class notes.
  • Tues., Feb. 13: Sections 5.1, 5.2, 5.3. Class notes.
  • Thur., Feb. 15: Sections 5.3, 5.4, 5.5. Class notes. Some material on complete and quasicomplete separation.
  • Tues., Feb. 20: Sections 6.1, 6.2, 6.3. Course notes.
  • Thur., Feb. 22: Sections 6.2, 6.3, 6.4. Course notes continued from last time.
  • Tues., Feb. 27: Sections 6.5, 6.6, 7.1. Course notes.
  • Thur., Mar. 1: Sections 7.1, 7.2, 7.3. Notes continued from last time; Continuation-ratio logits.
  • Tues., Mar. 6: Review. Sample problems: most homework (including odd) problems; problems on Dr. Agresti's sample exam questions at the bottom of this page under "Exams". Exam 1: 1abdeg, 2, 3, 4, 5, 6. Exam 2: 1abcd, 2abdef, 3, 4ab, 5.
  • Thur., Mar. 8: Midterm I.
  • Tues., Mar. 20: Sections 10.1, 10.2. Course notes.
  • Thur., Mar. 22: Sections 11.3, 11.4, examples. Course notes.
  • Tues., Mar. 27: Sections 11.5, 12.1, 12.2. Course notes.
  • Thur., Mar. 29: Section 12.3. Course notes.
  • Tues., Apr. 3: Sections 12.6, 12.3, 12.4. Class notes.
  • Thur., Apr. 5: Section 12.5 and more examples. Class notes.
  • Tues., Apr 10: Class cancelled.
  • Thur., Apr 12: Section 8.1. Class notes.
  • Tues., Apr 17: Section 8.1, 8.2, 8.4. Course notes. A paper on using log-linear and logistic models to find useful subsets of diagnostic tests for animal disease.
  • Thur., Apr 19: 9.1, 9.3, animal screening example. Course notes.
  • Tues., Apr 24: linear models odds and ends. Nothing to download (notes in class).
  • Thur., Apr 26: generalized additive models. Course notes.
  • Tues., May 1: review for Exam II. Review notes.
  • Thur., May 3: Exam II.
  • Homework assignments

  • Homework 1, due Thursday Jan. 25, Chapter 1: 1, 2*, 3, 4*, 6*, 7, 8*, 10*, 12ab* (hint: a formula from probability helps), 17ab, 30* (part b is extra credit), 31, 33 (tedious). * = hand in. Luping's solutions.
  • Homework 2, due Thursday Feb. 1, Chapter 2: 1, 2*, 3, 4*, 5, 7, 8*, 9, 10*, 12*, 15, 18abc*, 19, 20*, 21, 29, 30*. * = hand in. There is an extra credit problem in the notes involving local odds ratios. Luping's solutions.
  • Homework 3, due Thursday Feb. 8, Chapter 3: 1, 2*, 3, 4*, 5, 9, 10*, 11, 12* (also obtain estimate and 95% CI for polychoric correlation), 31, 32ab*. Luping's solutions.
  • Homework 4, due Thursday Feb. 15, Chapter 4: 1, 2*, 3, 5, 6abc*, 7, 8*, 11, 12*, 13, 14*, 15, 17, 18*, 19, 21, 22 (this is why it's important to group!), 28 (extra credit), 30*, 32*. Luping's solutions.
  • Homework 5, due Thursday Feb. 22, Chapter 5: 1abc, 2* (also report H-L test), 4* (also report H-L test), 6*, 8*, 12* (check for interaction as well -- here, the interaction model is also the saturated model), 15, 16a*, 17* (check for interaction & report H-L too), 19, 22*, 26*, 28*. Luping's solutions. Good problems, but beyond our scope: 33, 34, 37, 42. Sample code for space shuttle data.
  • Homework 6, due Thursday Mar. 1. vasoconstriction data; heart data. Luping's solutions.
  • Homework 7, due Tuesday, March 20. Chapter 6: 13*, 14a*. Chapter 7: 1abc, 2*, 3, 4*, 7, 9*, 10*, 22*, 29. Juanran's solutions with code and output.
  • Homework 8, due Thursday, March 29. Chapter 10: 1. Chapter 11: 2* (hint: you will define a CLASS variable called "substance" with three levels), 3, 6*, 7b, 8*, 9, 10*, statistics students: read through Agresti's solutions to 23, 25, and 27; think about them. Substance use data set; variables are subject, use (yes/no=1/0), type (1,2,3=alcohol, cigarettes, marijuana), race (1/0=white/other), and gender (1/0=female/male). Sample SAS code to get the data for problem 10 in a good form for GENMOD. Juanran's solutions.
  • Homework 9, due April 12. Paper and data for multi-site clinical trial. Sample code for fitting mice toxicity continuation ratio model. Sample code for drug/sequence data. Juanran's solutions.
  • Homework 10, due April 26. abortion opinion data and muscle tension data. Juanran's solutions.