Skip to Main Content
HBS Home
  • About
  • Academic Programs
  • Alumni
  • Faculty & Research
  • Baker Library
  • Giving
  • Harvard Business Review
  • Initiatives
  • News
  • Recruit
  • Map / Directions
Faculty & Research
  • Faculty
  • Research
  • Featured Topics
  • Academic Units
  • …→
  • Harvard Business School→
  • Faculty & Research→
Publications
Publications
  • 2016
  • Article
  • Journal of Computational and Graphical Statistics

Penalized Fast Subset Scanning

By: Skyler Speakman, Sriram Somanchi, Edward McFowland III and Daniel B. Neill
  • Format:Print
ShareBar

Abstract

We present the penalized fast subset scan (PFSS), a new and general framework for scalable and accurate pattern detection. PFSS enables exact and efficient identification of the most anomalous subsets of the data, as measured by a likelihood ratio scan statistic. However, PFSS also allows incorporation of prior information about each data element’s probability of inclusion, which was not previously possible within the subset scan framework. PFSS builds on two main results: first, we prove that a large class of likelihood ratio statistics satisfy a property that allows additional, element-specific penalty terms to be included while maintaining efficient computation. Second, we prove that the penalized statistic can be maximized exactly by evaluating only O(N) subsets. As a concrete example of the PFSS framework, we incorporate “soft” constraints on spatial proximity into the spatial event detection task, enabling more accurate detection of irregularly shaped spatial clusters of varying sparsity. To do so, we develop a distance-based penalty function that rewards spatial compactness and penalizes spatially dispersed clusters. This approach was evaluated on the task of detecting simulated anthrax bio-attacks, using real-world Emergency Department data from a major U.S. city. PFSS demonstrated increased detection power and spatial accuracy as compared to competing methods while maintaining efficient computation.

Keywords

Disease Surveillance; Likelihood Ratio Statistic; Pattern Detection; Scan Statistic; Mathematical Methods

Citation

Speakman, Skyler, Sriram Somanchi, Edward McFowland III, and Daniel B. Neill. "Penalized Fast Subset Scanning." Journal of Computational and Graphical Statistics 25, no. 2 (2016): 382–404. (Selected for “Best of JCGS” invited session by the journal’s editor in chief.)
  • Find it at Harvard
  • Purchase

About The Author

Edward McFowland III

Technology and Operations Management
→More Publications

More from the Authors

    • 2023
    • Journal of the American Statistical Association

    Estimating Causal Peer Influence in Homophilous Social Networks by Inferring Latent Locations.

    By: Edward McFowland III and Cosma Rohilla Shalizi
    • October–December 2022
    • INFORMS Journal on Data Science

    Achieving Reliable Causal Inference with Data-Mined Variables: A Random Forest Approach to the Measurement Error Problem

    By: Mochen Yang, Edward McFowland III, Gordon Burtch and Gediminas Adomavicius
    • 2022
    • Journal of Computational and Graphical Statistics

    Nonparametric Subset Scanning for Detection of Heteroscedasticity

    By: Charles R. Doss and Edward McFowland III
More from the Authors
  • Estimating Causal Peer Influence in Homophilous Social Networks by Inferring Latent Locations. By: Edward McFowland III and Cosma Rohilla Shalizi
  • Achieving Reliable Causal Inference with Data-Mined Variables: A Random Forest Approach to the Measurement Error Problem By: Mochen Yang, Edward McFowland III, Gordon Burtch and Gediminas Adomavicius
  • Nonparametric Subset Scanning for Detection of Heteroscedasticity By: Charles R. Doss and Edward McFowland III
ǁ
Campus Map
Harvard Business School
Soldiers Field
Boston, MA 02163
→Map & Directions
→More Contact Information
  • Make a Gift
  • Site Map
  • Jobs
  • Harvard University
  • Trademarks
  • Policies
  • Accessibility
  • Digital Accessibility
Copyright © President & Fellows of Harvard College