September 15, 2021
Article
Bioinformatics

Improving Deconvolution Methods in Biology Through Open Innovation Competitions: An Application to the Connectivity Map

By: Andrea Blasco, Ted Natoli, Michael G. Endres, Rinat A. Sergeev, Steven Randazzo, Jin Hyun Paik, N.J. Maximilian Macaluso, Rajiv Narayan, Xiaodong Lu, David Peck, Karim R. Lakhani and Aravind Subramanian

Format:Print

Abstract

A recurring problem in biomedical research is how to isolate signals of distinct populations (cell types, tissues, and genes) from composite measures obtained by a single analyte or sensor. Existing computational deconvolution approaches work well in many specific settings, but they might be suboptimal in more general applications. Here, we describe new methods that were obtained via an open innovation competition. The goal of the competition was to characterize the expression of 1,000 genes from 500 composite measurements, which constitutes the approach of a new assay, called L1000, used to scale-up the Connectivity Map (CMap)—a catalog of millions of perturbational gene expression profiles. The competition used a novel dataset of 2,200 profiles and attracted 294 competitors from 20 countries. The top-nine performing methods ranged from machine learning approaches (Convolutional Neural Networks and Random Forests) to more traditional ones (Gaussian Mixtures and k-means). These solutions were faster and more accurate than the benchmark and likely have applications beyond gene expression.

Keywords

Deconvolution; Methods; Open Innovation Competition; Genomics; Research; Innovation and Invention

Citation

Blasco, Andrea, Ted Natoli, Michael G. Endres, Rinat A. Sergeev, Steven Randazzo, Jin Hyun Paik, N.J. Maximilian Macaluso, Rajiv Narayan, Xiaodong Lu, David Peck, Karim R. Lakhani, and Aravind Subramanian. "Improving Deconvolution Methods in Biology Through Open Innovation Competitions: An Application to the Connectivity Map." Bioinformatics 37, no. 18 (September 15, 2021).