winter school in mathematical & computational biology 1-5...

44
The dawn of cloud‐native bioinformatics HEALTH AND BIOSECURITY Denis Bauer | PhD, hon A/Prof July 2019, UQ Bioinf Winter school Winter School in Mathematical & Computational Biology 1-5 July 2019

Upload: others

Post on 14-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

The dawn of cloud‐native bioinformatics

HEALTH AND BIOSECURITY

Denis Bauer |  PhD, hon A/Prof July 2019, UQ Bioinf Winter school

Winter School in Mathematical & Computational Biology1-5 July 2019

Page 2: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde

• Workflows need to be reproducible/compliant. • Data sizes increase for robust insights.• Algorithms getting more complex/interdependent.

#FutureOfBioinfis collaborative

Sustainable in the Cloud

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde2 |

Page 3: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde3 |

GT‐Scan2

What’s the next big thing in cloud?

VariantSpark

How to use clusters in the cloud?  

Cloud

What is the cloud?

Overview

Page 4: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Bioinformatics | Denis C. Bauer | @allPowerde4

CSIRO: We are innovators and builders

• Invented WiFi, used in five billion devices globally.

• Developed the vaccine for the Hendra Virus.

• Developed the Total Wellbeing & Low-Carb Diets.

Credit https://toolstotal.com/

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde4 |

Page 5: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

‘Improve health care through digital technology and services.’

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde5 |

Page 6: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde6 |

Transformational Bioinformatics

Bringing BigDatatechnology to Life Science Research 

Page 7: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

CSIRO’s cloud-based solutions

Three focus areas of Genomic health

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde7 |

FindingDisease Genes

CorrectingGenomes

TreatingIndividuals

Page 8: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

By 2025 it is estimated that 50% of the world population will have been sequenced.

8

20 EB Storage / yearStephens et al. BigData: Astronomical or Genomical (2015)

Data acquisition of BigData disciplines in 2025

GenomicsYouTube

Astronomy

Twitter

Frost&Sullivan

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde8 |

Page 9: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Bioinformatics | Denis C. Bauer | @allPowerde

Data size is astronomicalgenomical

• 2 meters per genome. • 100 trillion cells in our body.• Information in us is larger than the Universe.

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde9 |

Page 10: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Public cloud providers

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde10 |

Page 11: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde11 |

What is the Cloud

Cloud compute… … is like an In‐room sound system

GPU

Page 12: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde12 |

GT‐Scan2

What’s the next big thing in cloud?

VariantSpark

How to use clusters in the cloud?  

Cloud

What is the cloud?

Overview

Page 13: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde13 |

Finding the cure for ALS

• Unprecedented scale• Groundbreaking research• Latest Technology

Page 14: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Finding the disease gene(s)

cases

controls

Gene1                 Gene2

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde14 |

Page 15: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Complex diseases are driven by 

multiple interacting genes with variable contribution

cases

controls

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde

Need a more sophisticatedML approach, such as Random Forest

15 |

Page 16: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Machine learning on 1.7 Trillion datapoints

80 Million features

Individu

als

Genomic profile  Disease status

22,500

 sam

ples

Disease genes

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde16 |

Page 17: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Population‐scale genomic data analysis requires BigData solutions

Desktop compute High‐performancecompute cluster

Hadoop/Spark compute cluster

Focus small data Compute‐intensive Data‐intensive

Node‐bound Yes Yes No

Parallelization 10 CPU 100+ CPU 1000+ CPU

Parallelization procedure bespoke bespoke standardized

CSIRO solution

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde17 |

Page 18: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Bioinformatics | Denis C. Bauer | @allPowerdeCloud‐native bioinformatics | Denis C. Bauer | @allPowerde

Powerful disease gene discovery

“Analyzes 3000 individuals with 80M features in 30 minutes“

“Requires 80% fewer samples to detect statistical significant signal“

Faster Smarter

Used by

18 |

low Accuracy         high

low

Spee

d     hi

gh       

Page 19: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Managed Spark Clusters with databricks

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde19 |

Page 20: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

International collaborations

https://docs.databricks.com/spark/latest/training/variant‐spark.html

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde20 |

Page 21: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Managed Spark Clusters with AWS

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde21 |

Page 22: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Managed Spark Clusters with Terra

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde22 |

Page 23: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde23 |

GT‐Scan2

What’s the next big thing in cloud?

VariantSpark

How to use clusters in the cloud?  

Cloud

What is the cloud?

Overview

Page 24: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde

Credit: National Institute on Aging, National Institutes of Health

Finding the cure for ALS

*familial+mouse model: Iannitti, et al. Mol Ther Nucleic Acids. 2018

Can the genome of neurons be modified to prevent the disease?

• Mutations in SOD1 gene causes ALS*• Silencing SOD1* slows neuronal death+

24 |

Page 25: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde

Genome Editing: Revolutionize medicine

25 |

Page 26: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde

Why is it difficult?

Finding the right grain of sand…

… that is unique on a beach.

26 |

Page 27: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires
Page 28: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Recruiting instantaneous appropriately powered compute

On‐premServer

Cloud Server

Autos‐scale Cloud server

Serverless

Focus Full control Lower maintenance Flexible Agility

Flexibility

No Overhead

Scalability

Cost‐effective

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde

by blacklane.com by madd.org

Analogy

own car           car rental            chauffeur           ride share

28 |

Page 29: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

One of the first Serverless Applications in Research 

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde29 |

Page 30: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Serverless saves us 95% of cloud costs

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde30 |

1

10

100

server serverless

cost

(log

−sca

le)

operation_modeserver

serverless

$ 714/month

$ 30/month

Page 31: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde

Hypothesis-driven blue-yellow deployment

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde31 |

Page 32: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde32 |

Evaluating Auto‐scaling for DynamoDB

Page 33: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Serverless saves us 99% of cloud costs

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde33 |

1

10

100

server serverless serveless optimized

cost

(log

−sca

le)

operation_modeserver

serverless

serveless optimized

$ 2.50/month

$ 714/month

$ 30/month

Page 34: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde

GT‐Scan on Alibaba Cloud…

Thanks to Jason, Sabith and the Alibaba cloud serverless platform team

GT‐Scan on Alibaba Cloud…

34 |

Page 35: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde

Thanks to Simon Waight

GT‐Scan on Azure Cloud…

35 |

Page 36: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Differences Cloud Providers

AWS Alibaba Cloud Azure

DynamoDB TableStore Cosmos DB TableStore larger cell values

Lambda Function Compute Azure Functions Lambda more languageAzure no binaries

CloudWatch Log Service Azure Monitor

CloudFormation FUN Azure Resource Manager

CloudFormation & ARM more mature

Code pipeline Azure DevOps DevOps more and well integrated functionality

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde36 |

Page 37: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Bioinformatics | Denis C. Bauer | @allPowerde

Once you go serverless you never go back.- Rapid prototyping and scaling to production.

- Allowing burstable workloads.

- Innovation becomes affordable.

AWS SummitCanberra ‘18

37 | Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde

By 2025 serverless will be a USD 20 billion market.

Grand View Research, Inc.

Page 38: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

CSIRO’s cloud-based solutions

Three focus areas of Genomic health

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde38 |

FindingDisease Genes

CorrectingGenomes

TreatingIndividuals

Page 39: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Patients like me

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde39 |

Page 40: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Bioinformatics | Denis C. Bauer | @allPowerde40

Genomics affects the way we look.

https://www.bioinformatics.csiro.au/thumb/

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde40 |

Page 41: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde41 |

https://www.bioinformatics.csiro.au/thumb/

Page 42: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

How can you skill up ?

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde42 |

https://www.alibabacloud.com/campaign/education

https://aws.amazon.com/education/awseducate/

https://azure.microsoft.com/en-au/free/students/

https://edu.google.com/programs/credits

Page 43: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Bioinformatics | Denis C. Bauer | @allPowerde

Three things to remember

• ‘Datafication’ demands ‘wide’ ML for more disciplines (VariantSpark).

• Serverless architecture will be the future - enabling rapid prototyping and scalability.

• Bioinformatics has become a team sport: let’s build a healthier future together!

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde43 |

Page 44: Winter School in Mathematical & Computational Biology 1-5 ...bioinformatics.org.au/winterschool/wp-content/uploads/sites/15/201… · Population‐scale genomic data analysis requires

Bioinformatics | Denis C. Bauer | @allPowerde

Let’s build a healthier world together

Denis Bauer, PhD

Rob Dunne, PhD

Piotr Szul

Collaborators

News

Software

Lynn Langit

Top 10 Australian IT stories of 2017

You?

We are hiring…

…email Denis

#InCoB 2019  |  Jakarta

KEYNOTE

SuzanneScott

Cloud‐native bioinformatics | Denis C. Bauer | @allPowerde

Mumbai2019

44 |

Aidan O’Brien

Laurence Wilson, PhD

Brendan Hosking

Aidan Tay

Daniel Reti

TBA

Digita

l Gen

ome En

gine

ering

SuzanneScott

YatishJain

Natalie Twine, PhD

TBAArashBayat, Phd

Gen

ome Insigh

t