genomic futures v_pitt_kent_osu
TRANSCRIPT
Ben Busby, Ph.D.Lead, Bioinformatics Training, NCBI
Chair, Department of Bioinformatics and Data Science, [email protected]
Exploring the Many Possible Futures of Genomics
Efficiently Leveraging Commercial and Open Source Bioinformatics Tools for Clinical Interventions and Research Discoveries from Very Large Datasets
Please note that all views are my own and not necessarily those of any Federal agency. No mention of any commercial or non-
profit entity should be considered an endorsement.
Slides available at: https://www.slideshare.net/benbusby/genomic-futures-v2
Exploring the Many Possible Futures of Genomics
• Human Genomic Variants – Chronic Disease and Cancer• Viruses – Zika• Bacteria – Food Borne Pathogens• Data Transfer and Storage
NCBI
Review of terminology and conceptsNext Generation Sequencing
Graphic Credit: Spencer Martin, UBC
Review of terminology and conceptsHow Genomes are Mapped and Assembled
© Martine Zilversmit 2013
http://1.usa.gov/1J1xmYs
NCBI NGS Online Workshop – Available on the NCBI YouTube Channel!
Review of terminology and conceptsHow Genomes are Mapped and Assembled
dbGaP
dbGaP
2007 2008 2009 2010 2011 2012 2013 2014 2015
14,20153,216
139,311
374,464
485,727
566,181
660,665
876,849
1,002,935
Subjects
dbGaP – GWAS and PheGenI
dbGaP – GWAS and PheGenI
dbGaP – ClinVar
ClinVar
ClinVar
ClinVar – Why Should we Care?
ClinVar – Why Should we Care?
ClinVar – Why Should we Care?
ClinVar – Why Should we Care?
ClinVar – Why Should we Care?
Translation to the Clinic
Combined score is the average of SVs, mappability, GC..
NCBI region list
Encode blacklist
DangerTrack!
Genome in a Bottle
Combined score is the average of SVs, mappability, GC..
NCBI region list
Encode blacklist
DangerTrack!
We’ve run 9 hackathons
over the past two years.
We will run
7 or 8 this year
Matching Expressed Variants in-memory; Analyzing with Graphs
https://f1000research.com/articles/5-674/v1
phenvar.colorado.edu
Variants are Often Pleiotropic
Translation to the Clinic
Data Science Training!
Carpentries, MOOCs, Semi-Traditional Coursework and Mentoring
NCBI Webinars
Viral Genomes
Virus Variation
Virus Variation
Virus Variation
Subscribe!
Virus Variation
EMRs and NLP
Food Borne Pathogens
Food Borne Pathogens
Food Borne Pathogens
• Escherichia and Shigella• Campylobacter• Acinetobacter• Salmonella• Klebsiella• Listeria
Investigation of NGS:MagicBLAST!
Extracting Pathogenic Information from Metagenomes
44
• Qiime• Mothur
• Nepthele• MetAMOS• MetaViz
• mash
Popular Metagenomics Tools!
Investigation of NGS (esp metagenomes):SRA BLAST!
Another example – Cas9
Immunogenic Peptides
Where to Get More Information!
Upcoming Hackathons
• March 20-22, NIH Campus
• May 22-24, UC BioFrontiers Institute Boulder CO
• June 19-21, NYGC
• August 17-19, NIH Campus
• September 25-27, Pittsburgh, PA
• October 11-13, Microbial and Metagenomics, NCBI (with ASM and CDC)
Intensive Internships for Grad Students, Postdocs and Clinicians at NCBI
Come work at NCBI for 4-6 weeks!
Email [email protected]
for more information!
My View of Data Transfer Principles• Metadata Search
• Rapid NoSQL (for now)• Integration• Non-ambiguous identifiers
• Transferring Small amounts of Data• Data still gets transferred in the cloud• Underlying structure• Finding specific data from validated formats
• Democratization of Data• Rapid comparison by domain experts
• Reporting• Metrics to report data upload and [unique IP] download of datasets• Post-publication User Review
Websearch!
52
EDirect (Search API) Cookbook
53
New APIs!
Variable Storage (and collaboration)!
Federated Datasets