xu xing: easygenomics – next generation bioinformatics on the cloud
DESCRIPTION
Xu Xing's talk at ISCB-Asia EasyGenomics – Next Generation Bioinformatics on the Cloud, December 17th 2012TRANSCRIPT
![Page 1: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/1.jpg)
Contact [email protected]
http://www.easygenomics.com
Next Generation Bioinformaticson the Cloud
Xing Xu, Ph.DDirector of Cloud Computing Product
![Page 2: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/2.jpg)
Topics for Today
Behind the cloud product- BGI- The team
The product: EasyGenomics- Why are we building this product?- What can this product do?
Future direction and open questions
2
![Page 3: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/3.jpg)
BGI
The world largest genome sequencing center- Started with Human Genome Project in 1999 with only a
few sequencers.- Now more than 150 sequencers, 6 TB/day sequencing
throughput.
MODEL ABI3730XL
Roche454
ABISOLiD 4
SolexaGA IIx
IlluminaHiSeq 2000
INSTALLATION 16 1 27 6 135
![Page 4: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/4.jpg)
BGI
The world largest genome sequencing center The largest computing and storage center for
genomics in China
- 20,000+ CPU cores- 19 NVIDIA GPUs- 220+ Tflops peak
performance- 17 PB data storage- The storage and
computation capability increase by 10000 folds!
- Still increasing …
![Page 5: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/5.jpg)
BGI
The world largest genome sequencing center The largest computing and storage center for
genomics in China One of world leading research institutes in
Genomics
Since 2007, - 253 papers in high-impact journals- Including 47 in Nature and its sub-
journals, 9 in Science, 2 in Cell, and 1 in NEJM, with 42 first and/or corresponding authors
- 369 patent applications- 254 software authorship
![Page 6: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/6.jpg)
BGI
The world largest genome sequencing center The largest computing and storage center for
genomics in China One of world leading research institutes in
Genomics
BGI has the sequencing capacity, hardware resource and software proficiency to be the one of the strongest end-to-end service providers in the world for NGS sequencing, data analysis and data interpretation.
![Page 7: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/7.jpg)
Team for the Cloud Platform
Run like a software company
Managers are from leading software companies, such as HP, Microsoft, and Levono.
Team members are Young, Energetic, and Ambitious.
Fully supported by BGI in-house algorithm development teams.
Product
Development
Testing
Operation
BGI Support
![Page 8: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/8.jpg)
Team for the Cloud Platform
Development Team- Dev: Ming Jiang, Yongsheng Chen, Can Long, Jiasheng Wu, etc.- Flex Lab: Yan Li, Shengchang Gu etc. GPU Lab: Bingqiang Wang etc.- Pipeline: Liang Wang etc.
Test & QA Team- Xin Guan, Jingjuan Liu, etc.
PMO & IT Operation- Wenjun Zeng, Litong Lai, Jing Tian, etc.
Product Team- Xing Xu, Jing Guo, Fang Fang etc.
Other BGI Teams
+ + +
![Page 9: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/9.jpg)
Topics for Today
Behind the cloud product- BGI- The team
The product: EasyGenomics- Why are we building this product?- What can this product do?
Future direction and open questions
9
![Page 10: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/10.jpg)
Trend of Volume and Cost
10
![Page 11: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/11.jpg)
Geological side of the problem
Sequencing happens EVERYWHERE.
+
Geological side of the problem
Images from omicsmaps.com
BGI
![Page 12: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/12.jpg)
Difficulties of Analysis
In-depth Annotation
Lack of knowledge
Post Tertiary Analysis
Variant Calling
Complicated AlgorithmsComputation intensive
Tertiary Analysis
Mapping
Computation intensiveData storage
Secondary Analysis
Base calling
Data throughputData storage
Primary analysis
![Page 13: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/13.jpg)
Problems and Solutions
13
Problems:
• Big genomic data
• Geological distribution
• Algorithm integration
• Computational demand
• Big genomic data
• Geological distribution
• Algorithm integration
• Computational demand+)
Cloud
High Speed Data Exchange
Pipelines
Distributed Workloads
Solutions
![Page 14: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/14.jpg)
EasyGenomics™
EasyGenomics is a Software as a Service (SaaS) bioinformatics platform for research and applications.
Algorithms, Workflows,
Reports
Computational ResourcesDatabase,
Data management
Web portal,Simple UIHigh speed
connection
![Page 15: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/15.jpg)
Bioinformatics Workflows
Data Management
High Speed Connection
Key Features
![Page 16: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/16.jpg)
Bioinformatics Workflow
Four steps: Upload, Create a Sample, Perform Analyses, Download Results
Algorithms: Carefully chosen, tested and optimized
Workflows: Whole Genome Resequencing, Exome Resequencing, RNA-Seq, small RNA, ncRNA, and De novo Assembly
![Page 17: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/17.jpg)
Homepage
Four task portals
Status of recent works
Warning and Logging
Navigation Tabs
![Page 18: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/18.jpg)
Bioinformatics Workflow--- Pipelines
18
Exome Resequencing RNASeq
Transcriptome
![Page 19: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/19.jpg)
Bioinformatics Workflow---Comprehensive Reports
19
![Page 20: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/20.jpg)
Bioinformatics Workflow---Comprehensive Reports
20
![Page 21: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/21.jpg)
Data Management
“Sample”, “Analysis”, “Project” Mimicking real research procedure Automatic management of underlying data structure
Raw Data
Sample A
Sample B
Analysis I
Analysis II
Analysis XProject I
![Page 22: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/22.jpg)
Create a Sample
Add read groups
![Page 23: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/23.jpg)
Sample Page
Individual report for each lane
Summarized report for all lanes
![Page 24: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/24.jpg)
Data management---Security
Access
Multi-tenancy
Isolation
Compliance
• Username/Password• Biometric access• HTTPS , Aspera fastpTM
• Trusted database connection
• ACL, Data encryption
• Physical isolation• Virtual isolation
• ISO27000
![Page 25: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/25.jpg)
High Speed Data Exchange
Aspera’s patented fasp™ high-speed file transferring technology
10~100X faster than FTP
25
![Page 26: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/26.jpg)
Transfer 24GB in 30 Seconds
26
Demonstrated 10Gbps ultra high speed data exchange with UC Davis, and NCBI in June.
![Page 27: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/27.jpg)
Transfer 24GB in 30 Seconds
27
Demonstrated 10Gbps ultra high speed data exchange with UC Davis, and NCBI in June.
A 24GB file was transferred from China to US in 30 Seconds (~8Gbits/s).
![Page 28: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/28.jpg)
Amount of Data that can be transferred in 24hr
28
![Page 29: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/29.jpg)
Easy-to-Use UI
Reusability- Reuse the same sample for different analyses (different
parameters)- Reuse all parameter settings for different analyses
Simple UI and interactive features- As easy as to do online shopping- Shortcut for predefined setting, at the same time fully
customizable for advance users- Handle batch analyses in one setting
29
![Page 30: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/30.jpg)
Create an Analysis
Selected sample(s)
• One selected sample => Single Analysis
• Multiple selected samples => Batch Analyses
![Page 31: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/31.jpg)
Create an Analysis
Selectable modules
Predefined Settings
Shortcut
![Page 32: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/32.jpg)
Create an Analysis
![Page 33: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/33.jpg)
Create an Analysis
Customizable
![Page 34: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/34.jpg)
Create an Analysis
![Page 35: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/35.jpg)
Project TableAdd/Remove
Project
Operation short cuts
Project list table Filter and search box
![Page 36: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/36.jpg)
Analysis Table
![Page 37: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/37.jpg)
Sample Table
![Page 38: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/38.jpg)
A typical user case
38
![Page 39: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/39.jpg)
Topics for Today
Behind the cloud product- BGI- The team
The product: EasyGenomics- Why are we building this product?- What can this product do?
Future direction and open questions
39
![Page 40: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/40.jpg)
Future directions
What is the market? Which direction to go?
- Cloud on the public infrastructure vs cloud on the private infrastructure
- SaaS vs PaaS- Data analysis is only one step of the whole process.- What will be the sustained model for the cloud service?
![Page 41: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/41.jpg)
Cloud Service Providers
Market Position
Annotation Providers
Sequencing Service ProvidersInstrument Manufacturers
Personal Genetic TestingProviders
illumina
Software Providers
NOW
![Page 42: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/42.jpg)
Challenge and Solution
DNANexus Basespace(Illumina)
GenomeSpace EasyGenomics Ingenuity/ NextBio
Cloud Public Public Public Private PrivateReasoning Great demand on
space and computation resources
Security, Privacy issue
Positioning Infrastructure (PaaS)
App Store Platform for accessing available tools.
SaaS Solution InformationThey are playing the results from NGS not the raw reads.
Advantage Funding Advance in the
field
Sequencing service Community of
Partners
Strong connection to academia
Sequencing Service Development
Capability
Experience
42
![Page 43: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/43.jpg)
Public vs Private Cloud
Public Cloud
Pros:− “Limitless” resource− Share data to a wide
range of people− Offering nice platform
Cons:− Security and reliability− Short term cost saving
vs Long term cost nightmare
Private Cloud
Pros:− Flexibility− Security and Privacy
control− Long-term cost saving
Cons:− Big initial investment− Maintaining the
infrastructure and software on the cloud
But, the line between public and private cloud are blurring.
![Page 44: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/44.jpg)
A sustained model for cloud service?
Key components of cost- Storage- Computational resource- Data transfer- Software usage
App store or Cell phone plan
Long term cost vs Short term cost
![Page 45: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/45.jpg)
Data analysis is NOT ALL!
EPM
Project Management Sample Center Wet Lab
OperationBioinformatics Data Analysis
EPM
Management System
Budgeting
Tasking
Receipt/Storage
Handover
Sample QC
Sample prep
Workflow
Sequencing
Data analysis
Data QC
Sal
es
Bil
lin
g
Web-based Interface
Management Interfacing Query Statistics
![Page 46: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/46.jpg)
Roadmap of EasyGenomics
46
Jun 2012
Aug 2012
Sep 2012
Dec 2012
Apr 2013
EG1.1 (in Jun)• New result reports• Fully Integrated Data
Exchange Interface
EG1.2 (in Aug)• New read filtering step,
speed up 20x
EG1.3 (in Sep)• Data import from BGI
sequencing service
EG1.5 (est. in Dec)• QC indicator, QC module• New Sample report• Transcriptome workflows• Reference management
EG2.0 (est. in Apr, 2013)• IRODs data management• Data sharing, collaboration• User own applications• Comparison, Filtering tools• Visualization
![Page 47: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/47.jpg)
www.EasyGenomics.com
Free Beta Trial is on going!!
![Page 48: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/48.jpg)
Interpretation is the KEY
Analysis and Interpretation is the KEY
![Page 49: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/49.jpg)
Enabling Technology
49
Best Practice Award for IT Infrastructure
Human Genome SOAPdenovo EasyGenomicsTM
(192 cores)
Genome Coverage 86% 86%
Assembly Time 70h 55h
No. of Servers 1 15
Memory Size 500GB x 1 24 GB x 15
Mode Centralized Distributed
Hadoop-based Flexible Computing
![Page 50: Xu Xing: EasyGenomics – Next Generation Bioinformatics on the Cloud](https://reader033.vdocuments.net/reader033/viewer/2022061300/54c8ae2c4a7959da358b456d/html5/thumbnails/50.jpg)
Enabling Technology
SOAP Hadoop (Gaea)
GPU
50