An Approach to Diagnostics for Multiple Error-term Linear Models
Jim Hodges
Division of Biostatistics/School of Dentistry
University of Minnesota
*Candidate for the Assistant/Associate Professor Position
Tuesday, February 28th
10:00am
Moos 2-620
Minneapolis Campus
Abstract:
In any statistical analysis, an inferential summary is a function mapping the
data to some value of the summary. In linear regression, for example, coefficient
estimates are a function of the outcome vector y and design matrix X. Regression
diagnostics allow essentially complete understanding of such functions for models
with a mean structure linear in its unknowns, with each outcome measure contaminated
by a single independent normally-distributed error. These diagnostics, developed
mostly in the 1970s, allow users to stop taking inferential summaries on faith
and instead to fit linear models with reasonable confidence that the summaries
are not distorted by a few anomalous cases or by inappropriate mathematical
assumptions. The power of these methods has two deep sources: linear-model fits
treated as orthogonal projections, and algebraic results permitting, for example,
rapid computing for case deletions.
Linear models with more than one error term have been used for well over 50
years, but their use grew with computer speed in the 1980s and exploded in the
1990s with the advent of Markov chain Monte Carlo (MCMC). However, people using
these models are in roughly the same unhappy position as regression users before
1970: their fitting methods, the functions that turn data into summaries, are
ill-understood, and anyone with a deadline has little choice but to take their
computer output on faith. Even worse, pitfalls lie not only in the equations
that turn data into summaries, but in the MCMC routines that compute the summaries.
For the past 13 or so years, I have worked to adapt the geometric and algebraic
insights of regression diagnostics to provide similar tools for hierarchical
models, conditionally autoregressive (CAR) smoothers, and many others that can
be expressed as multiple-error-term linear models. This talk surveys work by
me, my students, and my collaborators, and indicates unsolved problems, including
implementation in easily-used software.
A social tea will be held at 9:30A.M. in A434 Mayo. All are Welcome.
For more details contact 612-624-4655 or see http://www.biostat.umn.edu/seminar_academic.html