ray and ray cloud browser for metagenomics17 software should be parallel too highly parallel genomic...
TRANSCRIPT
![Page 1: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/1.jpg)
1
Ray and Ray Cloud Browser for Metagenomics
Sébastien Boisvert @sebhtmlUniversité Laval, Québec, Canada
Beatles and Bioinformatics! #BeatlesAndBioinformatics University of Liverpool
27th November 2013 13:00
Talk: 40 minutesQuestions: 5 min
![Page 2: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/2.jpg)
2
Where is Laval University ?
In Québec City
![Page 3: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/3.jpg)
3
Canada is in the Commonwealth of Nations too !
● Canadian money
Photo: http://www.bridgeandtunnelclub.com/bigmap/outoftown/canada/money/
![Page 4: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/4.jpg)
4
Super computing at Laval University
colosse#314 top500 06/20127616 Intel Xeon X5560 coresMellanox Technologies MT26428332 kW
![Page 5: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/5.jpg)
5
Plan
● Background● Parallelism● Ray & metagenomics● Compare samples with Surveyor● Interactive visualization● Futures
![Page 6: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/6.jpg)
6
Background
![Page 7: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/7.jpg)
7
We buy sequencers and computers but...● We have:
– DNA sequencers to read genetic code (parallel)
– Supercomputers to compute stuff in the general sense (parallel)
Mardis, E. R. (2011, February). A decade/'s perspective on DNA sequencing technology. Nature 470 (7333), 198-203.
Sanger, F., S. Nicklen, and A. R. Coulson (1977). DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences 74 (12), 5463-5467.
Shendure, J. and H. Ji (2008, October). Next-generation DNA sequencing. Nature Biotechnology 26 (10), 1135-1145.
Sanger, F. (2001, March). The early days of DNA sequences. Nat Med 7 (3), 267-268.
Afuah, A. N. and J. M. Utterback (1991, December). The emergence of a new supercomputer architecture. Technological Forecasting and Social Change 40 (4), 315-328.
![Page 8: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/8.jpg)
8
Trend
● However:– Genomics need more parallel software that scale with
biology's huge problems
Pollack, A. (2011). DNA sequencing caught in deluge of data. New York Times 1.
Baker, M. (2010, July). Next-generation sequencing: adjusting to data overload. Nature Methods 7 (7), 495-499.
Trelles, O., P. Prins, M. Snir, and R. C. Jansen (2011, February). Big data, but are we ready? Nature Reviews Genetics 12 (3), 224.
(2013, October). In need of an upgrade. Nature Biotechnology 31 (10), 857.
McPherson, J. D. (2009, November). Next-generation gap. Nature Methods 6 (11 Suppl), S2-S5.
Mardis, E. (2010). The $1,000 genome, the $100,000 analysis? Genome Medicine 2 (11), 84+.
![Page 9: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/9.jpg)
9
I created some useful software
● Ray genome assembly, metagenome assembly, taxonomic profiling, sample comparison
● RayPlatform platform on which Ray is built● Ray Cloud Browser visualization of large genome
graphs
![Page 10: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/10.jpg)
10
In this talk
● Ray (C++, started with bacterial genome assembly)● Ray Meta (assembling metagenomes with Ray)● Ray Communities (profiling metagenomes with
Ray)● Ray Surveyor (comparing DNA sequencing samples
without reference; Ray -run-surveyor)● Ray Cloud Browser (separate project )
![Page 11: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/11.jpg)
11
Our original idea in 2010
● Mixing reads from different technologies (454 + Illumina)
● 2010 paper about Ray heuristics:
Boisvert, S., F. Laviolette, and J. Corbeil (2010, November). Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. Journal of Computational Biology 17 (11), 1519-1533.
![Page 12: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/12.jpg)
12
Mixing sequencing reads
Figure from: Journal of Computational Biology 17 (11), 1519-1533.
![Page 13: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/13.jpg)
13
Platform
● Goal: build a platform for distributed genomic computing
● Thread-based programming is hard● Message passing is easy to understand, scales. but
harder to program● Solution: framework to abstract everything
![Page 14: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/14.jpg)
14
Platform perks
● Plugin interface● Actor model interface
● Runtimes:– Actor playground
– Standard mode
– Mini-ranks
![Page 15: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/15.jpg)
15
RayPlatform's scalability
● Ray is scalability is measurable
Sample SRS011098 from Human Microbiome Project (202 487 723 reads)
Figure from:
Godzaridis, Boisvert, et al. Big Data (accepted)
![Page 16: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/16.jpg)
16
Parallelism
![Page 17: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/17.jpg)
17
Software should be parallel too
● Highly parallel genomic assays
Nature Reviews Genetics 7, 632-644 (August 2006)
● Couple of reviews about need for speed
Flicek, P. (2009, March). The need for speed. Genome biology 10 (3), 1-4.
Bonetta, L. (2006, February). Genome sequencing in the fast lane. Nature Methods 3 (2), 141-147.
Schatz, M. C., B. Langmead, and S. L. Salzberg (2010, July). Cloud computing and the DNA data race. Nature Biotechnology 28 (7), 691-693.
![Page 18: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/18.jpg)
18
What is concurrency
● Several actions performed simultaneously during a period of time
● Example: give 1000000 sequences to 10 computers: each processes 100000 seq. simultaneously
● Threads are local to 1 computer● Processes can be distributed
![Page 19: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/19.jpg)
19
Actor model for programming genomic tools
● In a nutshell: actors send messages to each other and can spawn actors
● Video: http://channel9.msdn.com/Shows/Going+Deep/Hewitt-Meijer-and-Szyperski-The-Actor-Model-everything-you-wanted-to-know-but-were-afraid-to-ask
Hewitt, C., P. Bishop, and R. Steiger (1973). A universal modular ACTOR formalism for artificial intelligence. In Proceedings of the 3rd international joint conference on Artificial intelligence, IJCAI'73, San Francisco, CA, USA, pp. 235-245. Morgan Kaufmann Publishers Inc.
Agha, G. (1986). Actors: a model of concurrent computation in distributed systems. Cambridge, MA, USA: MIT Press.
![Page 20: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/20.jpg)
21
Ray & metagenomics
![Page 21: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/21.jpg)
22
Metagenomics (started in 1998)
● DNA sequencing is cheap● Bacteria in complex communities can not be
cultured easily● Metagenomics: direct DNA sequencing from
uncultured microorganisms● Field started by Jo Handelsman in 1998
Handelsman, J. (2004, December). Metagenomics: Application of genomics to uncultured microorganisms. Microbiology and Molecular Biology Reviews 68 (4), 669-685.
The microbiome explored: recent insights and future challenges. Blaser, Bork, Fraser, Knight & Wang Nature Reviews Microbiology 11, 213-217 (March 2013)
Handelsman et al. (Oct 1998) Chemistry & biology 5 (10).
![Page 22: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/22.jpg)
23
Existing metagenomic tools do ABC, we do XYZ
● Metagenomic sequencing data must be analyzed● Methods A, B, C (16S = metagenomics)● We propose X, Y and Z (whole genome shotgun + k-mers)
● Also, so many choices (tools, sequencers), most do ABC, we do XYZ
Loman, N. J., C. Constantinidou, J. Z. Chan, M. Halachev, M. Sergeant, C. W. Penn, E. R. Robinson, and M. J. Pallen (2012, September). High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity. Nature Reviews Microbiology 10 (9), 599-606.
Kahvejian, A., J. Quackenbush, and J. F. Thompson (2008, October). What would you do if you could sequence everything? Nature Biotechnology 26 (10), 1125-1133.
Metagenomics: DNA sequencing of environmental samples Nature Reviews Genetics 6, 805-814 (November 2005)
![Page 23: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/23.jpg)
24
Some concepts
● Taxonomy: the branch of science concerned with classification, especially of organisms; systematics.
● Taxon: taxonomic group● Taxonomic tree: a tree of taxon● Leaf: a tree node without children● OTU: operational taxonomic unit
![Page 24: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/24.jpg)
25
Taxonomic profiling with kmers
● Kmers: DNA words of length k● Given (1) a taxonomic tree and (2) data (usually
reads or kmers) on the tree's leaves● LCA: Last Common Ancestor to classify each kmer
to a node (possibly not a leaf)● Colored = labeled with a taxon or genome identifier
![Page 25: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/25.jpg)
26
Examples
● Annotation with k-mers: Edwards, R. A., R. Olson, T. Disz, G. D. Pusch, V. Vonstein, R. Stevens, and R. Overbeek (2012, December). Real time metagenomics: using k-mers to annotate metagenomes. Bioinformatics (Oxford, England) 28 (24), 3316-3317.
● “Ray Communities” => Boisvert et al. 2012 Genome Biology● Scalable taxonomic assignation: Ames, S. K., D. A. Hysom, S. N. Gardner, G. S.
Lloyd, M. B. Gokhale, and J. E. Allen (2013, September). Scalable metagenomic taxonomy classification using a reference genome database. Bioinformatics 29 (18), 2253-2260.
![Page 26: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/26.jpg)
27
Profile with kmers using Ray Communities
● Genome abundance● Taxon abundance (good correlation with Metaphlan)● Gene Ontology
![Page 27: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/27.jpg)
28
UniFrac is mathematically sound
● Use taxon profiles● UniFrac: distance between 2 community samples
Lozupone, C. and R. Knight (2005, December). UniFrac: a new phylogenetic method for comparing microbial communities. Applied and Environmental Microbiology 71 (12), 8228-8235.
![Page 28: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/28.jpg)
29
Ray Meta
● “Ray Meta” => metagenome assembly with Ray● Binning with coverage may not accurate because
coverage depth changes with GC content and other factors
● Ray trick: instead of binning with coverage, bin with graph seeds (locality)
Boisvert, S., F. Raymond, E. Godzaridis, F. Laviolette, and J. Corbeil (2012, December). Ray meta: scalable de novo metagenome assembly and profiling. Genome Biology 13 (12), R122+.
● http://genomebiology.com/2012/13/12/R122
![Page 29: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/29.jpg)
30
Assembled proportions of bacterial genomes for a simulated metagenome
with sequencing errors
1000 bacterial genomes with power law distribution3*10^9 readsSimulated errorsFigure 1, Boisvert et al. 2012 Genome Biology
Good assembly proportion of contained genomes within metagenome
![Page 30: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/30.jpg)
31
Estimated bacterial genome
proportions● With kmer● Uniquely-colored k-
mers
A: 100-genome metagenome
B: 1000-genome metagenome
Figure 2, Boisvert et al. 2012 Genome Biology
![Page 31: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/31.jpg)
32
Enterotypes
● 3 enterotypes:Arumugam, M. (...) and P. Bork (2011, April). Enterotypes of the human gut microbiome. Nature 473 (7346), 174-180.
● 2 enterotypes:Wu, G. D. (...) and J. D. Lewis (2011, October). Linking long-term dietary patterns with gut microbial enterotypes. Science (New York, N.Y.) 334 (6052), 105-108.
● Can we reproduce that with k-mers-based classification ?
![Page 32: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/32.jpg)
33
Reproduction of enterotypes with k-mer based profiling
● Data: Qin et al. 2010 Nature (MetaHIT)
Figure 4, Boisvert et al. 2012 Genome Biology
![Page 33: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/33.jpg)
34
Some quotes
● Snake assembly in Assemblathon 2:
“The Ray assembly was ranked 1st overall, and also ranked 1st for all individual measures except multiplicity (where it still had a better than average performance). “ GigaScience 2013, 2:10
● E. coli sequencing on MiSeq:
“Ray stood apart as the most accurate of the three assemblers, based on the number of inversions, relocations, SNPs, and a visual inspection of the associated dot plots” BMC Genomics 2013, 14:675
● “Ray will be a good validation assembler” Bastien Chevreux (Mira assembler author) http://article.gmane.org/gmane.science.biology.ray-genome-assembler/696
![Page 34: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/34.jpg)
35
Compare samples with Surveyor
![Page 35: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/35.jpg)
36
Using a graph to mine variation
Bubble caused by variation or sequencing error
![Page 36: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/36.jpg)
37
Comparing metagenome samples
● Idea: compare samples without a reference● Be it variants, or kmer content● For kmer presence/absence, don't use coverage● For RNA-Seq or taxon abundances, compare
normalized kmer counts
![Page 37: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/37.jpg)
38
Compare genomic content without a ref. with Surveyor
● Set of biological samples● DNA sequencing for each● Use Actor Model to compare a lot of samples● Build a de Bruijn graph that contains all of them (à
la fermi or Cortex), but distributed● In development
Iqbal, Z., I. Turner, and G. McVean (2013, January). High-throughput microbial population genomics using the cortex variation assembler. Bioinformatics 29 (2), 275-276. Cortex for microbial populations
Iqbal, Z., M. Caccamo, I. Turner, P. Flicek, and G. McVean (2012, February). De novo assembly and genotyping of variants using colored de bruijn graphs. Nature Genetics 44 (2), 226-232. Cortex
Li, H. (2012, July). Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly. Bioinformatics 28 (14), 1838-1844. Fermi
![Page 38: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/38.jpg)
39
Ray -run-surveyor
● Existing methods enumerate variation entries● Genomic word content may also be interesting● Compare many samples (their kmer content)
![Page 39: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/39.jpg)
40
Legionella
● 2012 outbreak in Quebec City● What's the source of contamination ?● 3 suspect cooling towers● On the Illumina MiSeq
![Page 40: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/40.jpg)
41
Samples
● 22 patient-samples● 3 source-tower-samples (metagenomic)● 2 epidemic-strain-environmental-samples● 7 environmental-samples● 4 contemporaneous-samples● 5 old-1996-samples
![Page 41: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/41.jpg)
42
Questions
● Are the 2012 strains similar to the 1996 (also in Québec City) strains ?
● Which cooling tower is the most-likely source of contamination ?
![Page 42: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/42.jpg)
43
Similarity matrix (k spectrum kernel)
Ref. For spectrum kernel: Leslie, C., E. Eskin, and W. S. S. Noble (2002). The spectrum kernel: a string kernel for SVM protein classification. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 564-575.
![Page 43: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/43.jpg)
44
Kernel-based distance matrix
For kernel distance formula: Scholkopf, B. (2000). The kernel trick for distances. In NIPS, pp. 301-307.
d(x, y)2 = k(x, x) + k(y, y) – 2k(x,y)
![Page 44: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/44.jpg)
45
Tree
Towers are outliers and their placement may not accurate.
![Page 45: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/45.jpg)
46
Similarity between patient samples & tower samples
towers/002-1 towers/006-1 towers/010-1
pat/ID120206 11187 12528 11329
pat/ID120368 11168 12513 11315
pat/ID120369 11282 12617 11427
pat/ID120370 11272 12613 11421
pat/ID120371 11289 12621 11434
pat/ID120713 11225 12566 11368
pat/KID119442 11092 12445 11239
pat/KID119444 11097 12449 11244
pat/KID119445 11117 12468 11261
pat/KID119536 11138 12488 11287
pat/KID119537 11175 12518 11321
pat/KID119788 11193 12536 11336
pat/KID119957 11092 12445 11239
pat/KID119958 11144 12494 11292
pat/KID119960 11265 12602 11408
pat/KID120069 11089 12442 11236
pat/KID120070 11154 12501 11299
pat/KID120071 11116 12467 11261
pat/KID120111 11219 12559 11365
pat/KID120112 11172 12518 11319
pat/KID120113 11357 12686 11497
pat/KID120114 11235 12577 11381
Smallest distance
![Page 46: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/46.jpg)
47
Interactive visualization
![Page 47: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/47.jpg)
48
Visualizing a microbiota with nucleic acid probes
Figure 2, Handelsman (2004) Microbiology and Molecular Biology Reviews 68 (4), 669-685.
![Page 48: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/48.jpg)
49
Observation
● Visualization is important to reach out to the general public
● People like beautiful things
![Page 49: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/49.jpg)
50
Structural metagenomics visualization
● Ray Cloud Browser● Project started to debug genome assembly code● http://genome.ulaval.ca:10208/client/● All you need is a modern web browser
![Page 50: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/50.jpg)
51
Ray Cloud Browser: interactively skim processed genomics data with energy
Frontend: Javascript, canvas
Backend: C++
https://github.com/sebhtml/Ray-Cloud-Browser
![Page 51: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/51.jpg)
52
Computing DNA layout for display
Barnes-Hut algorithm: Barnes, J. and P. Hut (1986, December). A hierarchical O(N log n) force-calculation algorithm. Nature 324 (6096), 446-449.
![Page 52: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/52.jpg)
53
Evolution path: linear -> bubble -> hairy bubble -> super bubble
Onodera, T., K. Sadakane, and T. Shibuya (2013). Detecting superbubbles in assembly graphs. In A. Darling and J. Stoye (Eds.), Algorithms in Bioinformatics, Volume 8126 of Lecture Notes in Computer Science, pp. 338-348. Springer Berlin Heidelberg.
Hairy bubbles
![Page 53: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/53.jpg)
54
Interactive too
![Page 54: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/54.jpg)
55
Bird's view
![Page 55: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/55.jpg)
56
Lumps
Howe, A. C., J. Pell, R. Canino-Koning, R. Mackelprang, S. Tringe, J. Jansson, J. M. Tiedje, and C. T. Brown (2012, December). Illumina sequencing artifacts revealed by connectivity analysis of metagenomic datasets.
http://dskernel.blogspot.ca/2013/01/metagenome-lumps-artifactual-mutations.html
![Page 57: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/57.jpg)
58
Lumps
![Page 58: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/58.jpg)
59
Lumps
![Page 59: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/59.jpg)
60
SRS011134
● Demo (2 min): http://genome.ulaval.ca:10208/client/● Genomic DNA from stool of a male● http://sra.dnanexus.com/samples/SRS011134
![Page 60: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/60.jpg)
62
Futures
● Genomic need more scalable & parallel software● More parallel● More push-button● Robustness● K-mer-based (paper: realtime kmers)
![Page 61: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/61.jpg)
64
Acknowledgements
● Invitation: Nicholas J. Loman, University of Birmingham
● Arrangements: Lesley Parsons, University of Liverpool
![Page 62: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/62.jpg)
65
Acknowledgements
● Funding: Canadian Institutes of Health Research (doctoral award)
● Compute time: Compute Canada & Calcul Québec (colosse and Mammouth Parallèle II)
● Jacques Corbeil (director) & François Laviolette (codirector)
![Page 63: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/63.jpg)
66
Acknowledgements
● Jean-François Erdelyi (from France) for working on Ray Cloud Browser during the 2013 summer
![Page 64: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/64.jpg)
67
Acknowledgements
● E. Godzaridis to comments and suggestions for my talk
![Page 65: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about](https://reader036.vdocuments.net/reader036/viewer/2022081404/5f067d077e708231d4183e27/html5/thumbnails/65.jpg)
68
Questions
● don't forget to tweet !● @sebhtml● #BeatlesAndBioinformatics