03-028

THE BIOTECHONOMY (1.0): A ROUGH MAP OF BIO-DATA FLOW

Juan Enriquez and Rodrigo Martinez

We are living a rapidly evolving life sciences revolution. It is based on the ability to identify, read, understand, and manipulate the four nucleotides that code for all life forms on the planet. These four base pairs form deoxyribonucleic acid (DNA). Over the past decade an increasing amount of scientists, labs, and computer centers throughout the world have chosen to produce, store, and use biodata. This can be in the form of full genomes, specific genes, parts of genes, single letter variations in gene code (SNPs), proteins, or a variety of other variations on organic molecule data.

Bio-literacy is an essential first step in building a bio-based economy (biotechonomy). So far most academic research has focused on sequencing, understanding, and annotating genomes or parts thereof. There has been little focus on the customer. This leaves open a series of interesting questions like: Who is accessing and reading these tidal waves of data? What are they being used for? How might this usage pattern change industrial structures and national competitiveness? The Life Sciences Project at HBS has drafted a first, and quite rough, map of who is producing, storing, and using public bio data. We hope this draft will improve and become far more complete as the project evolves. As the project moves forward, we intend to include more data, include key private data providers, and expand the time periods analyzed.

Given that just a few companies produce the equipment required to produce bio sequence data, one can analyze the sequencer market and build a proxy for the world’s DNA sequencing capacity. This gave us a sense of how much data is being generated, how much is public and how much is private, and what the growth trends are. We then tried to understand who is accessing this data and for what purpose. Some are carrying out strictly academic research, others are downloading data in an attempt to package and sell results, still others are attempting to patent and commercialize products derived from the data. To get a sense of these patterns, we analyzed the server logs of the three key public biodatabases. Millions of data points give us an initial glimpse of how the biotechonomy is evolving in the academic, non-profit, and private spheres.

To protect privacy, no individual user is identified, instead we aggregated usage patterns by country, domain, and in the case of the GenBank in the US, by organism and format. We also created a proxy variable to identify dispersion or concentration of downloads from the European database.

This paper provides a brief overview of the initial research. We highlight eight key results and highlight what surprised us within each of these results.

Unaffiliated
43 pages

| Back to 2002-2003 Working Papers | Copyright © President and Fellows of Harvard College