sequence services phase 2--eagle genomics and cycle computing
DESCRIPTION
William Spooner (Eagle) and Carl Chesal (Cycle) introduce the proof of concept provided by this consortium for Phase 2 of the Pistoia Alliance Sequence Services project. The presentation was delivered at the Pistoia Alliance Conference in Boston, MA, on April 24, 2012.TRANSCRIPT
![Page 1: Sequence Services Phase 2--Eagle Genomics and Cycle Computing](https://reader034.vdocuments.net/reader034/viewer/2022052523/5550693fb4c90524138b462f/html5/thumbnails/1.jpg)
Sequence Services Phase 2Pistoia Alliance AGM, Boston MA, April 24th 2012
![Page 2: Sequence Services Phase 2--Eagle Genomics and Cycle Computing](https://reader034.vdocuments.net/reader034/viewer/2022052523/5550693fb4c90524138b462f/html5/thumbnails/2.jpg)
NurtureBuild trust, shared language
CollaborateEnterpriseAcademiaGovernmentFoundations Open
Innovation
ExploreWork together
to find a common purpose
ExploitTurn ideas into
tangible benefits
2/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
![Page 3: Sequence Services Phase 2--Eagle Genomics and Cycle Computing](https://reader034.vdocuments.net/reader034/viewer/2022052523/5550693fb4c90524138b462f/html5/thumbnails/3.jpg)
The Requirements
3/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
$
?
Share
FUNCTIONALLogin and workspace
Manage users
Manage data
Upload private data
Access public data
Export
Delete/archive
Manage applications
Upload scripts/pipelines
Analyse data
Monitor use/performance
NON-FUNCTIONAL
Charging Model
Service Support
Operational Requirements
Security Requirements
![Page 4: Sequence Services Phase 2--Eagle Genomics and Cycle Computing](https://reader034.vdocuments.net/reader034/viewer/2022052523/5550693fb4c90524138b462f/html5/thumbnails/4.jpg)
The Partnership
4/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
Established: 2005 2008
Domain: High performance computing
Operational bioinformatics
Employees: 18, 16 engineers 12, 9 engineers, pool of external consultants
Location: Across USA/Canada Cambridge, UK
Sectors: Pharmaceutical, biotechnology, financial, computer gaming, engineering, academia.
Pharmaceutical, biotechnology, agri-biotechnology, consumer goods, food, other life sciences.
Customers: North America, Europe North America, Europe, Asia
Partnerships: Schrodinger, VMWare, Canonical
Amazon Web Services, Cognizant, European Bioinformatics Institute, University of Manchester, John Innes Centre
![Page 5: Sequence Services Phase 2--Eagle Genomics and Cycle Computing](https://reader034.vdocuments.net/reader034/viewer/2022052523/5550693fb4c90524138b462f/html5/thumbnails/5.jpg)
The Platform
The platform for storage, analysis and sharing of life sciences data in the cloud
5/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
![Page 6: Sequence Services Phase 2--Eagle Genomics and Cycle Computing](https://reader034.vdocuments.net/reader034/viewer/2022052523/5550693fb4c90524138b462f/html5/thumbnails/6.jpg)
The Proposal
6/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
ANALYSESUpload
Pipeline process
Stored data
Manual process
Start
StopStored data
Share
Depositor
Collaborator
BioinformaticianCIO
Biologist
![Page 7: Sequence Services Phase 2--Eagle Genomics and Cycle Computing](https://reader034.vdocuments.net/reader034/viewer/2022052523/5550693fb4c90524138b462f/html5/thumbnails/7.jpg)
The Architecture
7/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
A
mazo
n E
C2 C
loud
Gateway Shiboleth
Web ServerCycleCloud
MySQLAssets
DB
Bioinformatician
Collaborator
Depositor
OpenAM IdP
Customer Single Sign On
SA
ML
Token
Exch
an
ge
HTTPSWeb
Web ServerSEEK
HTTPSWeb
Encrypt/Decrypt
Data FiData Fi
Data Files
Customer Sandbox
S3 Storage
Data FiData Fi
Data Files
Customer SandboxEC2/AMIs
Customer SandboxEC2/AMIs
Condor
Ensembl
BioLinux
HTTPSWeb
SA
ML
Au
then
ticate
HTTPSWeb
HTTPSWeb
![Page 8: Sequence Services Phase 2--Eagle Genomics and Cycle Computing](https://reader034.vdocuments.net/reader034/viewer/2022052523/5550693fb4c90524138b462f/html5/thumbnails/8.jpg)
The Present
8/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
Bioinformatician
DepositorCollaborator
![Page 9: Sequence Services Phase 2--Eagle Genomics and Cycle Computing](https://reader034.vdocuments.net/reader034/viewer/2022052523/5550693fb4c90524138b462f/html5/thumbnails/9.jpg)
The1000 Genomes• A Deep Catalogue of Human
Variation– Freely available on AWS– 1,700 Individuals– 200Tb data– 10,000s data files– Almost no metadata!
• ElasticAP evaluating 1000 Genomes Project Pilot 2– 20X resequencing– 2 trios (6 individuals)
![Page 10: Sequence Services Phase 2--Eagle Genomics and Cycle Computing](https://reader034.vdocuments.net/reader034/viewer/2022052523/5550693fb4c90524138b462f/html5/thumbnails/10.jpg)
TRUP: Tumor RNA-seq Unified Pipeline
• Collaboration between–Max Planck Institute for Molecular
Genetic– Bayer Pharma AG
• Identifies gene fusion events in tumor samples
• Involves both alignment and de-novo sequencing steps
• Pipeline is being implemented on ElasticAP– Using public GEO datasets for validation
![Page 11: Sequence Services Phase 2--Eagle Genomics and Cycle Computing](https://reader034.vdocuments.net/reader034/viewer/2022052523/5550693fb4c90524138b462f/html5/thumbnails/11.jpg)
The PoC
11/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
FUNCTIONAL
Login and workspace
Load dataManage public
dataLoad scripts and
pipelinesAnalyse dataExport dataArchive dataManage
applicationsManage usersMonitor
use/performance
NON-FUNCTIONAL
Charging ModelService SupportOperational
RequirementsSecurity
Requirements
KEYFully implementedPartially implementedTo-do list
$?
![Page 12: Sequence Services Phase 2--Eagle Genomics and Cycle Computing](https://reader034.vdocuments.net/reader034/viewer/2022052523/5550693fb4c90524138b462f/html5/thumbnails/12.jpg)
The Prior Art• Eagle have been building analysis
pipelines and hosting secure cloud apps for years.
• Cycle have been developing HPC solutions and deploying them on the cloud for years
• We built this as a platform we could use ourselves in order to carry on delivering what we already do.
• But now the results are interactive, and everyone can share and participate.
• The most common tasks won’t need to involve us at all.
12/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
![Page 13: Sequence Services Phase 2--Eagle Genomics and Cycle Computing](https://reader034.vdocuments.net/reader034/viewer/2022052523/5550693fb4c90524138b462f/html5/thumbnails/13.jpg)
The Price• AWS-style pay as you go business model
– Free sign-up and account creation– Tiered applications by the hour.– Discounts for up-front reservation fee.– Offline data import/export also available.– Flat-rate data by the gigabyte-month.– Backup data by the gigabyte-month.– Monthly billing.– Support contracts available.
• Customisation and new pipelines at Eagle/Cycle standard consulting rates.
13/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
![Page 14: Sequence Services Phase 2--Eagle Genomics and Cycle Computing](https://reader034.vdocuments.net/reader034/viewer/2022052523/5550693fb4c90524138b462f/html5/thumbnails/14.jpg)
The Plan• Early access to preferred partner
customers in July– talk to us now if you’d like to be part of that.
• Full production in September with all partial/todo items implemented.
• Increased number of public datasets.
• Increased range of applications and pipelines.
• User interface improvements based on feedback from early access period.
14/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
![Page 15: Sequence Services Phase 2--Eagle Genomics and Cycle Computing](https://reader034.vdocuments.net/reader034/viewer/2022052523/5550693fb4c90524138b462f/html5/thumbnails/15.jpg)
The Potential• Available as customisation projects:
– Conversions to other clouds.
– Conversions to run on in-house infrastructure.
• Truly secure and scalable R&D collaboration environment.– Applicable to all sciences, not just genomics.
15/ ElasticAP, Pistoia Alliance Conference, Boston MA, 24th April 2012
Change the way you do science