Bayesian Spatial Scan Statistic Adjusted for Over Dispersion and Spatial Correlation
Deepak Agarwal
Yahoo! Research
Wednesday, March 7, 2007
3:30pm in Moos 2-690
Minneapolis Campus
Abstract:
Spatial scan statistic has become the method of choice for detecting spatial
clustering after adjusting for inhomogeneity. The method is particularly suitable
in applications where the goal is to find the actual location of spatial clusters
or "hotspots" as opposed to testing for global clustering. The method
has been extremely successful and has found applications in diverse areas ranging
from biosurveillance, forestry, criminology, psychology, etc. The method proceeds
by scanning the study region using all possible spatial sub-regions that conform
to some geometric shape (e.g., circle, rectangle, ellipsoid, etc). Each sub-region
is assigned a discrepancy measure, which is based on a likelihood ratio test
that compares the intensity inside the sub-region to the intensity outside.
The sub-region with the maximum discrepancy is generally declared to be a "hotspot"
provided it is statistically significant. The significance test is based on
an expensive randomisation procedure which computes a Monte Carlo p-value by
repeatedly (approximately 10K times) generating realizations under the null
hypothesis of no spatial clustering.
In this talk, we propose a Bayesian solution to the problem. A Bayesian solution
has several advantages in this scenario. First, hotspot detection is based on
posterior probabilities of models corresponding to each sub-region and hence
there is no need to conduct the randomization procedure. This gain in computational
efficiency is obtained by performing a slightly more expensive discrepancy calculation
for each sub-region wherein a simple and closed form likelihood maximization
is substituted by a numerical integration routine. Second, compared to the classical
approach where multiple hotspots are generally detected using a conservative
test, detecting multiple hotspots in the Bayesian framework is automatic and
doesn't require any additional machinery. Furthermore, we provide a framework
to adjust for additional characteristics like over dispersion and spatial correlation
using a Cox process formulation. Such adjustments are potentially useful in
the context of biosurveillance where the analyst might not be interested in
investigating clusters that are caused only due to presence of routine over
dispersion relative to the usual Poisson or Bernoulli model. Adjusting for such
routine characteristics in the baseline model can potentially reduce false positives
and enhance disease-monitoring systems used in public health. We illustrate
our method on datasets that have been previously analyzed in the literate.
A social tea will be held at 3:00 P.M. in A434 Mayo. All are Welcome.
For more details contact 612-624-4655 or see http://www.biostat.umn.edu/seminar_academic.html