x team 2 - presentation
TRANSCRIPT
High-dimensional biological data
● High-throughput genotyping and phenotyping
● Finding biological meaning in big data with high N and/or P
The ability to harvest the wealth of information contained in biomedical Big Data will advance our understanding of human health and disease; however, lack of appropriate tools, poor data accessibility, and insufficient training, are major impediments to rapid translational impact. -NIH BD2K
Data integration
● Data fragmentationo individual vs populationo multiple -omicso multiple sources
● Discovery and predictiono genome and functional
annotation
Statistical learning methods
● Data quality○ hidden sources of variability○ limitations of short read
sequencing
Data annotationGenome assembly/error
correction
Problem Solution
Success StoriesDomain Science Data Science Methods
Metabolic pathway - Ingenuity Pathway Analysis (http://www.ingenuity.com/products/ipa)
Genomic data - Quality Control- FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)- EasyQC for genome-wide association meta-analyses
(http://www.nature.com/nprot/journal/v9/n5/full/nprot.2014.071.html)- Batch effect
- PEER (http://www.ncbi.nlm.nih.gov/pubmed/22343431)- SVA (http://www.ncbi.nlm.nih.gov/pubmed/22257669)- scLVM (Buettner et al., 2015)
- Data storage and sharing- NCBI (http://www.ncbi.nlm.nih.gov)- GitHub (https://github.com)- UCSC genome browser (http://genome.ucsc.edu/)
- Gene annotation- Gene Ontology (http://geneontology.org/page/documentation)
Proteomics - Protein Data Bank (PDB) (http://www.rcsb.org/pdb/home/home.do)
Disease Survivability - WEKA (Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten (2009); The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1.)
InterdisciplinaryResearch
Statistics
Domain science
Computer science
Scientific writingCollaboration
Visualization of data
Database
Bioinformatics
Interdisciplinary data science essentials
Going Forward● Create and maintain a HowTo website for
Data Science computational tools and methods.
http://data-science-for-biologists.wikia.com/wiki/Data_Science_for_Biologists_Wikia
● Collaborate via Github