Materials for PubH8492 Fall 2023 "Richly Parameterized Models".

These are the materials for the current offering (Fall 2023, starting in September 2023). The vintage of each item is indicated by "Updated [date]".

Official Syllabus

Here is the syllabus, updated August 2023 and current for the Fall 2023 offering.

Suggestions for class projects

These are now in my book, Richly Parameterized Linear Models: Additive, Time Series, and Spatial Models Using Random Effects, in the exercises at the end of each chapter headed "Open Questions". Many of those open questions are big enough for an entire dissertation, so I suggest that you consider special cases for class projects.

Papers assigned as reading

Bose M, Hodges J, Banerjee S, 2018 Biometrics using a spectral approximation to a Gaussian Process to get a tractable approximate restricted likelihood, to understand how data drive estimates of variance-structure parameters and ultimately as a basis for diagnostics; and the supplement.

Cui, Hodges, Kong, Carlin Technometrics 2010, degrees of freedom in some generality.

Cui & Hodges unpublished manuscript, smoothed ANOVA for the general case.

Hanks et al, Environmetrics 2015, which cordially disagrees with me about using restricted spatial regression to minimize the effect of spatial confounding. Here is my equally cordial (I hope) response.

Hodges and Reich, The American Statistician 2010, more on spatial confounding; here is the supplement to the TAS article.

Hodges JRSSB 1998, a system for developing diagnostics for hierarchical models.

Hodges JS, "Boundary estimates in random regressions: Statistical methods research done as science rather than math", May 2017 version. This manuscript is about using empirical methods to study statistical methods, using as an example the commonly-occurring bad estimates for variance-structure parameters in the random-regressions model. The supplement is here. Among other things, this updated version omits the parts of the original that were, um, especially self-indulgent. This version is now on arXiv.

Hodges and Clayton 2010, an unpublished manuscript "Random Effects Old and New", mostly preserved in Chapter 13 of my book.

Hodges, Cui, Sargent, Carlin, Technometrics 2007, smoothed ANOVA for balanced, single-error-term ANOVAs.

Hughes & Haran, JRSSB 2013, reformulating the improper CAR model to reduce its dimension while also avoiding spatial confounding.

Martinez-Beneito, Hodges, & Marí-Dell'Olmo (2016; these are the page proofs), CRC Handbook of Spatial Epidemiology, eds. Andrew B Lawson, Sudipto Banerjee, Robert Haining, Lola Ugarte. This handbook paper describes smoothed ANOVA for spatial epidemiology as elaborated by Miguel Martinez-Beneito, Marc Marí-Dell'Olmo, and their colleagues. Here are color versions of Figure 2 and Figure 3.

Papadogeorgou et al, Biostatistics 2018, a causal-inference approach to adjusting for unmeasured spatially-structured confounders. Step 1 in a new approach to doing analyses with spatially-referenced data where the goal is causal interpretation of regression coefficients.

Peterson et al J. Structural Bio 2001, the paper reporting the viral-structure data.

Reich and Hodges JSPI 2008, laying bare the deep structure of hierarchical models, or at least that's what we thought until the drugs wore off. This is now superseded by Chapter 15 of my book but since Brian has become a rock star, I thought I'd leave it here for posterity.

Reich et al JASA 2007, modeling periodontal data with CAR models having two classes of neighbor pairs. Here is the tech-report version, rr2004-004, which includes a lot of interesting stuff that got cut out of the JASA version.

Reich, Hodges, Zadnik Biometrics 2006, on spatial confounding (the Slovenia paper). Students in previous classes helpfully pointed out these known errors. Feel free to tell me about other errors that you find.

Reich & Hodges Biometrics 2008, a spatially-adaptive version of the improper CAR model.

Zhang, Hodges, Banerjee, Annals of Applied Stat 2009, smoothed ANOVA with spatial smoothing for one factor, as a competitor to MCAR models.

Transparencies used in lectures.

Mixed linear model syntax and conventional analyses, here, updated Aug 2023.

Mixed linear models, Bayesian analysis and the two analyses compared, here, updated Dec 2017.

Two tools: Constraint-case formulation, measures of complexity, here, updated Dec 2017.

Penalized splines, here, updated Dec 2017.

Additive models and models with interactions except discrete-by-discrete interactions, here, updated Dec 2017.

Discrete-by-discrete interactions (smoothed ANOVA), here, updated Dec 2017.

Spatial smoothing 1 (CAR smoothing on a lattice), here, updated Dec 2017.

Spatial smoothing 2 (2D penalized splines), here, updated Dec 2017.

Time series (dynamic linear models, Kalman filter-style models), Two alternative syntaxes (Rue & Held; Lee, Nelder, & Pawitan), here, updated Dec 2017.

Preface to the second half of the course: Doing statistical methods research with a scientific (as opposed to mathematical) style, with the random-regressions model as an example, here, updated Dec 2017.

From linear models to richly-parameterized models: Mean structure. Simple extensions of linear-model diagnostics, here, updated Dec 2017.

Collinearity/confounding and smoothing/shrinkage (a teaser, showing puzzles to be explored in later lectures), here, updated Dec 2017.

Collinearity & smoothing: Adding a random effect can zap a fixed effect or another random effect (oddities 1 and 4); brief summaries of results for oddities 2 and 3, here, updated Dec 2017.

Old- vs. new-style random effects: The difference has practical implications. Here are lecture transparencies, updated Jan 2016. This is covered in Chapter 13 of my book, Richly Parameterized Linear Models: [etc.]; there's also a free-standing but slightly out-of-date manuscript co-authored with Murray Clayton, dated 3/19/10.

Beyond extensions of linear models: Variance structure. Five mysterious, inconvenient, or plainly wrong results obtained doing the obvious things with some mixed linear models, here, updated Dec 2013.

Two-variance models: Re-expressing the restricted likelihood (and thus the marginal posterior) in a simple form, here, updated Feb 2016.

Two-variance models: Using the re-expressed restricted likelihood to understand which functions of the data provide information about which variance, here, updated Feb 2016.

Models more general than two-variance models: The tools in the preceding lectures can be extended to some but not all models more complex than two-variance models. For some of the latter, some expedients (various kinds of approximations) have been developed to some extent. (Is that enough qualifications?) This material is in three pieces: proving you can't extend the tools to all mixed linear models and showing the first expedient, updated November 2023; showing the second expedient, posted 4/25/14, and Maitreyee Bose's work on Gaussian Processes, posted 2/24/20.

What little is known about zero variance estimates (as far as I know), here, updated Feb 2014.

Multiple local maxima in the restricted likelihood or posterior, some weird and inconvenient results intended to give you nightmares here, updated Feb 2014. And while we're on the subject of weird things, here's a fun oldie for the end of the last lecture: This may be the weirdest thing I've seen in statistics, an example where using more data actually gives you a higher standard error, here.

Homework assignments

I hand these out in class, they're here just in case.

#1, updated for 2018.

#2, updated for 2018.

#3, updated for 2018.

#4, updated for 2018.

#5, posted 3/1/12

Datasets

All the datasets for this course are accessible on the Datasets page for my book, Richly Parameterized Linear Models: Additive, Time Series, and Spatial Models Using Random Effects.