Monday, December 3, 2018

Current state of the art on genomics projects and technology - Dec 2018

Gene sequencing has its own kind of Moore's Law, but with more cliffs and plateaus.

Image result for genome sequencing moore's law
Technology: The $1,000 genome (2014), Nature News
This was plotted in 2014, back then a human genome costed \$5000. In 2015 it dropped suddenly to $1000 and that's where it's at now, according to NIH.
NHGRI Genome Sequencing Program (2017)
BGI claims to sequence for only \$600, so I'd put that as the cheapest option currently: $600.

The $100 genome is still not here yet. The earliest mention I could find is from a 2008 report from MIT Tech Review that quotes two predictions:
  1. $1000 genome hopefully before 2011. 
  2. $100 genome after 2013.
First prediction is wrong. Extrapolating at 2008 from the graph, 2011 seemed reasonable, but turns out there's significant plateauing. Second prediction is right, but boringly so. 

There's another mention of $100 genome, from Illumina, but no promised time of delivery. And finally, there's BGI's promise of $300 genome by 2020.

All these doesn't bode well for very cheap genome sequencing. It seems $100 genome won't be here for another 3 years at least.

Mass Human Whole-Genome Sequencing (WGS)

There seems to be on the order of 1 million human genomes sequenced so far. Many are national projects, for example, Genomics England has almost completed 0.1 million genomes of Englanders so far. UK Biobank has 0.5 million (not sure if WGS, or just genome-wide SNPs), and All Of Us in America aims for 1 million WGS, and GenomeAsia100K aims to sequence 0.1 million Asians.

Other WGS Projects

1001 Genomes started in 2008 aiming to sequence genomes of strains of Arabidopsis thaliana (the model plant in biology). It has completed with 1135 genomes published in 2016.

B10K aims to sequence 10000 species of birds. Started in 2014, aims to sequence all 10,560 species of Aves before 2020. So far (July 5, 2017) it has acquired 2500 samples and sequenced just 300. It will surely fail to deliver. There's another project, OpenWings, that aims for the same but has no news since this April.

Bat 1K aims to sequence all bats, defined as the 1288 species of Chiroptera. It's just begun and has only 14 genomes so far, taken from previous projects.

The Vertebrate Genomes Project aims to sequence all 66,000 extant vertebrate species.

Earth BioGenome Project

And the logical conclusion is the Earth BioGenome Project, "a moonshot for biology" aiming to sequence all genomes of known eukaryotic species on earth in 10 years:
... sequencing and functionally annotating the genomes of 1.5 million known species of eukaryotes, a massive group that includes plants, animals, fungi and other organisms whose cells have a nucleus that houses their chromosomal DNA. To date, the genomes of less than 0.2% of eukaryotic species have been sequenced... The completed project is expected to require about 1 exabyte (1 billion gigabytes) of digital storage capacity.
Currently, only 33000 eukaryotes have been sequenced, making up the 0.2%; only half are of high quality.
The tree of all the species sequenced so far.
Note: 1 exabyte isn't that much in terms of REALLY big science. The voracious LHC produces 1 petabyte/sec, too much to record, and so it's filtered before storage. Even after filtering, it still has archived 200 petabytes on 2017 June 29.
The project also seeks to reveal some of the estimated 10 million to 15 million unknown species of eukaryotes, most of which are single cell organisms, insects and small animals in the oceans.
Considering the current cost of one human genome is \$1k, this gives an estimate of \$1.5 billion. The projected cost is \$4.7 billion, which passes the Fermi estimate sanity check.
“Variation is the fount of all genetic knowledge,” says project member and evolutionary geneticist Jenny Graves of La Trobe University in Melbourne, Australia. “The more variation you have the better — so why not sequence everything?”
 

The project is made of many sub projects. For example:
  • 10KP aims to sequence 10,000 land plants and 4000 protists by 2022, representing every major clade of plants and eukaryotic microbes.
  • Darwin Tree of Life Project aims to sequence 66,000 UK species in about 2028.

Oh and some microbes too

The Earth Microbiome Project studies the microbes of earth:
We use DNA sequencing and mass spectrometry of crowd-sourced samples... set out to analyze 200,000 samples... to produce a global Gene Atlas... approximately 500,000 reconstructed microbial genomes...
At the end of 2017, it reported 28000 species sequenced.

No comments:

Post a Comment

Let's Read: Neuropath (Bakker, 2009)

Neuropath  (Bakker 2009) is a dramatic demonstration of the eliminative materialism worldview of the author R. Scott Bakker. It's very b...