winter school in mathematical & computational biology 1-5...
TRANSCRIPT
The dawn of cloud‐native bioinformatics
HEALTH AND BIOSECURITY
Denis Bauer | PhD, hon A/Prof July 2019, UQ Bioinf Winter school
Winter School in Mathematical & Computational Biology1-5 July 2019
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde
• Workflows need to be reproducible/compliant. • Data sizes increase for robust insights.• Algorithms getting more complex/interdependent.
#FutureOfBioinfis collaborative
Sustainable in the Cloud
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde2 |
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde3 |
GT‐Scan2
What’s the next big thing in cloud?
VariantSpark
How to use clusters in the cloud?
Cloud
What is the cloud?
Overview
Bioinformatics | Denis C. Bauer | @allPowerde4
CSIRO: We are innovators and builders
• Invented WiFi, used in five billion devices globally.
• Developed the vaccine for the Hendra Virus.
• Developed the Total Wellbeing & Low-Carb Diets.
Credit https://toolstotal.com/
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde4 |
‘Improve health care through digital technology and services.’
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde5 |
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde6 |
Transformational Bioinformatics
Bringing BigDatatechnology to Life Science Research
CSIRO’s cloud-based solutions
Three focus areas of Genomic health
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde7 |
FindingDisease Genes
CorrectingGenomes
TreatingIndividuals
By 2025 it is estimated that 50% of the world population will have been sequenced.
8
20 EB Storage / yearStephens et al. BigData: Astronomical or Genomical (2015)
Data acquisition of BigData disciplines in 2025
GenomicsYouTube
Astronomy
Frost&Sullivan
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde8 |
Bioinformatics | Denis C. Bauer | @allPowerde
Data size is astronomicalgenomical
• 2 meters per genome. • 100 trillion cells in our body.• Information in us is larger than the Universe.
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde9 |
Public cloud providers
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde10 |
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde11 |
What is the Cloud
Cloud compute… … is like an In‐room sound system
GPU
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde12 |
GT‐Scan2
What’s the next big thing in cloud?
VariantSpark
How to use clusters in the cloud?
Cloud
What is the cloud?
Overview
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde13 |
Finding the cure for ALS
• Unprecedented scale• Groundbreaking research• Latest Technology
Finding the disease gene(s)
cases
controls
Gene1 Gene2
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde14 |
Complex diseases are driven by
multiple interacting genes with variable contribution
cases
controls
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde
Need a more sophisticatedML approach, such as Random Forest
15 |
Machine learning on 1.7 Trillion datapoints
80 Million features
Individu
als
Genomic profile Disease status
22,500
sam
ples
Disease genes
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde16 |
Population‐scale genomic data analysis requires BigData solutions
Desktop compute High‐performancecompute cluster
Hadoop/Spark compute cluster
Focus small data Compute‐intensive Data‐intensive
Node‐bound Yes Yes No
Parallelization 10 CPU 100+ CPU 1000+ CPU
Parallelization procedure bespoke bespoke standardized
CSIRO solution
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde17 |
Bioinformatics | Denis C. Bauer | @allPowerdeCloud‐native bioinformatics | Denis C. Bauer | @allPowerde
Powerful disease gene discovery
“Analyzes 3000 individuals with 80M features in 30 minutes“
“Requires 80% fewer samples to detect statistical significant signal“
Faster Smarter
Used by
18 |
low Accuracy high
low
Spee
d hi
gh
Managed Spark Clusters with databricks
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde19 |
International collaborations
https://docs.databricks.com/spark/latest/training/variant‐spark.html
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde20 |
Managed Spark Clusters with AWS
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde21 |
Managed Spark Clusters with Terra
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde22 |
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde23 |
GT‐Scan2
What’s the next big thing in cloud?
VariantSpark
How to use clusters in the cloud?
Cloud
What is the cloud?
Overview
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde
Credit: National Institute on Aging, National Institutes of Health
Finding the cure for ALS
*familial+mouse model: Iannitti, et al. Mol Ther Nucleic Acids. 2018
Can the genome of neurons be modified to prevent the disease?
• Mutations in SOD1 gene causes ALS*• Silencing SOD1* slows neuronal death+
24 |
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde
Genome Editing: Revolutionize medicine
25 |
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde
Why is it difficult?
Finding the right grain of sand…
… that is unique on a beach.
26 |
Recruiting instantaneous appropriately powered compute
On‐premServer
Cloud Server
Autos‐scale Cloud server
Serverless
Focus Full control Lower maintenance Flexible Agility
Flexibility
No Overhead
Scalability
Cost‐effective
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde
by blacklane.com by madd.org
Analogy
own car car rental chauffeur ride share
28 |
One of the first Serverless Applications in Research
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde29 |
Serverless saves us 95% of cloud costs
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde30 |
1
10
100
server serverless
cost
(log
−sca
le)
operation_modeserver
serverless
$ 714/month
$ 30/month
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde
Hypothesis-driven blue-yellow deployment
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde31 |
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde32 |
Evaluating Auto‐scaling for DynamoDB
Serverless saves us 99% of cloud costs
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde33 |
1
10
100
server serverless serveless optimized
cost
(log
−sca
le)
operation_modeserver
serverless
serveless optimized
$ 2.50/month
$ 714/month
$ 30/month
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde
GT‐Scan on Alibaba Cloud…
Thanks to Jason, Sabith and the Alibaba cloud serverless platform team
GT‐Scan on Alibaba Cloud…
34 |
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde
Thanks to Simon Waight
GT‐Scan on Azure Cloud…
35 |
Differences Cloud Providers
AWS Alibaba Cloud Azure
DynamoDB TableStore Cosmos DB TableStore larger cell values
Lambda Function Compute Azure Functions Lambda more languageAzure no binaries
CloudWatch Log Service Azure Monitor
CloudFormation FUN Azure Resource Manager
CloudFormation & ARM more mature
Code pipeline Azure DevOps DevOps more and well integrated functionality
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde36 |
Bioinformatics | Denis C. Bauer | @allPowerde
Once you go serverless you never go back.- Rapid prototyping and scaling to production.
- Allowing burstable workloads.
- Innovation becomes affordable.
AWS SummitCanberra ‘18
37 | Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde
By 2025 serverless will be a USD 20 billion market.
Grand View Research, Inc.
CSIRO’s cloud-based solutions
Three focus areas of Genomic health
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde38 |
FindingDisease Genes
CorrectingGenomes
TreatingIndividuals
Patients like me
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde39 |
Bioinformatics | Denis C. Bauer | @allPowerde40
Genomics affects the way we look.
https://www.bioinformatics.csiro.au/thumb/
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde40 |
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde41 |
https://www.bioinformatics.csiro.au/thumb/
How can you skill up ?
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde42 |
https://www.alibabacloud.com/campaign/education
https://aws.amazon.com/education/awseducate/
https://azure.microsoft.com/en-au/free/students/
https://edu.google.com/programs/credits
Bioinformatics | Denis C. Bauer | @allPowerde
Three things to remember
• ‘Datafication’ demands ‘wide’ ML for more disciplines (VariantSpark).
• Serverless architecture will be the future - enabling rapid prototyping and scalability.
• Bioinformatics has become a team sport: let’s build a healthier future together!
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde43 |
Bioinformatics | Denis C. Bauer | @allPowerde
Let’s build a healthier world together
Denis Bauer, PhD
Rob Dunne, PhD
Piotr Szul
Collaborators
News
Software
Lynn Langit
Top 10 Australian IT stories of 2017
You?
We are hiring…
…email Denis
#InCoB 2019 | Jakarta
KEYNOTE
SuzanneScott
Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde
Mumbai2019
44 |
Aidan O’Brien
Laurence Wilson, PhD
Brendan Hosking
Aidan Tay
Daniel Reti
TBA
Digita
l Gen
ome En
gine
ering
SuzanneScott
YatishJain
Natalie Twine, PhD
TBAArashBayat, Phd
Gen
ome Insigh
t