SEMINAR

Bayesian Spatial Scan Statistic Adjusted for Over Dispersion and Spatial Correlation

Deepak Agarwal
Yahoo! Research

Wednesday, March 7, 2007
3:30pm in Moos 2-690
Minneapolis Campus

Abstract:

Spatial scan statistic has become the method of choice for detecting spatial clustering after adjusting for inhomogeneity. The method is particularly suitable in applications where the goal is to find the actual location of spatial clusters or "hotspots" as opposed to testing for global clustering. The method has been extremely successful and has found applications in diverse areas ranging from biosurveillance, forestry, criminology, psychology, etc. The method proceeds by scanning the study region using all possible spatial sub-regions that conform to some geometric shape (e.g., circle, rectangle, ellipsoid, etc). Each sub-region is assigned a discrepancy measure, which is based on a likelihood ratio test that compares the intensity inside the sub-region to the intensity outside. The sub-region with the maximum discrepancy is generally declared to be a "hotspot" provided it is statistically significant. The significance test is based on an expensive randomisation procedure which computes a Monte Carlo p-value by repeatedly (approximately 10K times) generating realizations under the null hypothesis of no spatial clustering.
In this talk, we propose a Bayesian solution to the problem. A Bayesian solution has several advantages in this scenario. First, hotspot detection is based on posterior probabilities of models corresponding to each sub-region and hence there is no need to conduct the randomization procedure. This gain in computational efficiency is obtained by performing a slightly more expensive discrepancy calculation for each sub-region wherein a simple and closed form likelihood maximization is substituted by a numerical integration routine. Second, compared to the classical approach where multiple hotspots are generally detected using a conservative test, detecting multiple hotspots in the Bayesian framework is automatic and doesn't require any additional machinery. Furthermore, we provide a framework to adjust for additional characteristics like over dispersion and spatial correlation using a Cox process formulation. Such adjustments are potentially useful in the context of biosurveillance where the analyst might not be interested in investigating clusters that are caused only due to presence of routine over dispersion relative to the usual Poisson or Bernoulli model. Adjusting for such routine characteristics in the baseline model can potentially reduce false positives and enhance disease-monitoring systems used in public health. We illustrate our method on datasets that have been previously analyzed in the literate.

A social tea will be held at 3:00 P.M. in A434 Mayo. All are Welcome.
For more details contact 612-624-4655 or see http://www.biostat.umn.edu/seminar_academic.html