talk given at "cloud computing for systems biology" workshop

187
The role of cloud compu.ng in big biology Deepak Singh

Upload: deepak-singh

Post on 06-May-2015

3.021 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Talk given at "Cloud Computing for Systems Biology" workshop

The  role  of  cloud  compu.ng  in  big  biologyDeepak  Singh

Page 2: Talk given at "Cloud Computing for Systems Biology" workshop
Page 4: Talk given at "Cloud Computing for Systems Biology" workshop
Page 5: Talk given at "Cloud Computing for Systems Biology" workshop
Page 6: Talk given at "Cloud Computing for Systems Biology" workshop

life science industry

Page 7: Talk given at "Cloud Computing for Systems Biology" workshop

Credit: Bosco Ho

Page 8: Talk given at "Cloud Computing for Systems Biology" workshop
Page 9: Talk given at "Cloud Computing for Systems Biology" workshop

By ~Prescott under a CC-BY-NC license

Page 10: Talk given at "Cloud Computing for Systems Biology" workshop

context

Page 11: Talk given at "Cloud Computing for Systems Biology" workshop

analysis methods

Page 12: Talk given at "Cloud Computing for Systems Biology" workshop

technology

Page 13: Talk given at "Cloud Computing for Systems Biology" workshop

technology

?

??

?

Page 14: Talk given at "Cloud Computing for Systems Biology" workshop
Page 15: Talk given at "Cloud Computing for Systems Biology" workshop

back of the room

Page 16: Talk given at "Cloud Computing for Systems Biology" workshop
Page 17: Talk given at "Cloud Computing for Systems Biology" workshop
Page 18: Talk given at "Cloud Computing for Systems Biology" workshop
Page 19: Talk given at "Cloud Computing for Systems Biology" workshop

technology

technology

technologytechnology

Page 20: Talk given at "Cloud Computing for Systems Biology" workshop

technology

technology

technologytechnology

techn

ology

technology

technology

tech

nolo

gy

Page 21: Talk given at "Cloud Computing for Systems Biology" workshop
Page 23: Talk given at "Cloud Computing for Systems Biology" workshop

inherent characteristics

Page 24: Talk given at "Cloud Computing for Systems Biology" workshop

data driven

Page 25: Talk given at "Cloud Computing for Systems Biology" workshop

multi-dimensional

Page 26: Talk given at "Cloud Computing for Systems Biology" workshop

collaborative

Page 27: Talk given at "Cloud Computing for Systems Biology" workshop

distributed

Page 28: Talk given at "Cloud Computing for Systems Biology" workshop
Page 29: Talk given at "Cloud Computing for Systems Biology" workshop

<amazon web services>

Page 30: Talk given at "Cloud Computing for Systems Biology" workshop

the cloud

Page 31: Talk given at "Cloud Computing for Systems Biology" workshop

has_many :definitions

Page 32: Talk given at "Cloud Computing for Systems Biology" workshop

infrastructure as a service

Page 33: Talk given at "Cloud Computing for Systems Biology" workshop

precursors

Page 34: Talk given at "Cloud Computing for Systems Biology" workshop

virtualization

Page 35: Talk given at "Cloud Computing for Systems Biology" workshop

service oriented architecure

Page 36: Talk given at "Cloud Computing for Systems Biology" workshop

distributed computing

Page 37: Talk given at "Cloud Computing for Systems Biology" workshop
Page 38: Talk given at "Cloud Computing for Systems Biology" workshop

ComputeAmazon Elastic Compute

Cloud (EC2)- Elastic Load Balancing- Auto Scaling

StorageAmazon Simple

Storage Service (S3)- AWS Import/Export

DatabaseAmazon RDS and

SimpleDB

Page 39: Talk given at "Cloud Computing for Systems Biology" workshop

ComputeAmazon Elastic Compute

Cloud (EC2)- Elastic Load Balancing- Auto Scaling

StorageAmazon Simple

Storage Service (S3)- AWS Import/Export

Content DeliveryAmazon CloudFront

MessagingAmazon Simple

Queue Service (SQS)

PaymentsAmazon Flexible Payments Service

(FPS)

On-Demand Workforce

Amazon Mechanical Turk

Parallel ProcessingAmazon Elastic

MapReduce

DatabaseAmazon RDS and

SimpleDB

Page 40: Talk given at "Cloud Computing for Systems Biology" workshop

ComputeAmazon Elastic Compute

Cloud (EC2)- Elastic Load Balancing- Auto Scaling

StorageAmazon Simple

Storage Service (S3)- AWS Import/Export

Content DeliveryAmazon CloudFront

MessagingAmazon Simple

Queue Service (SQS)

PaymentsAmazon Flexible Payments Service

(FPS)

On-Demand Workforce

Amazon Mechanical Turk

Parallel ProcessingAmazon Elastic

MapReduce

MonitoringAmazon CloudWatch

ManagementAWS Management Console

ToolsAWS Toolkit for Eclipse

Isolated NetworksAmazon Virtual Private

Cloud

DatabaseAmazon RDS and

SimpleDB

Page 41: Talk given at "Cloud Computing for Systems Biology" workshop

ComputeAmazon Elastic Compute

Cloud (EC2)- Elastic Load Balancing- Auto Scaling

StorageAmazon Simple

Storage Service (S3)- AWS Import/Export

Your Custom Applications and Services

Content DeliveryAmazon CloudFront

MessagingAmazon Simple

Queue Service (SQS)

PaymentsAmazon Flexible Payments Service

(FPS)

On-Demand Workforce

Amazon Mechanical Turk

Parallel ProcessingAmazon Elastic

MapReduce

MonitoringAmazon CloudWatch

ManagementAWS Management Console

ToolsAWS Toolkit for Eclipse

Isolated NetworksAmazon Virtual Private

Cloud

DatabaseAmazon RDS and

SimpleDB

Page 42: Talk given at "Cloud Computing for Systems Biology" workshop

scalable

Page 43: Talk given at "Cloud Computing for Systems Biology" workshop

cost effectivescalable

Page 44: Talk given at "Cloud Computing for Systems Biology" workshop

cost effectivescalable

Pay as y

ou go

Page 45: Talk given at "Cloud Computing for Systems Biology" workshop

cost effectivescalable

reliable

Page 46: Talk given at "Cloud Computing for Systems Biology" workshop

cost effectivescalable

reliablesecure

Page 47: Talk given at "Cloud Computing for Systems Biology" workshop

Amazon EC2

Page 48: Talk given at "Cloud Computing for Systems Biology" workshop

servers on demand

Page 49: Talk given at "Cloud Computing for Systems Biology" workshop

highly scalable

Page 50: Talk given at "Cloud Computing for Systems Biology" workshop
Page 51: Talk given at "Cloud Computing for Systems Biology" workshop

3000 CPU’s for one firm’s risk management application

!"#$%&'()'*+,'-./01.2%/'

344'+567/'(.'

8%%9%.:/'

;<"&/:1='

>?,3?,44@'

A&B:1='

>?,>?,44@'

C".:1='

>?,D?,44@'

E(.:1='

>?,F?,44@'

;"%/:1='

>?,G?,44@'

C10"&:1='

>?,H?,44@'

I%:.%/:1='

>?,,?,44@'

3444JJ'

344'JJ'

Page 52: Talk given at "Cloud Computing for Systems Biology" workshop

design for failure

Page 53: Talk given at "Cloud Computing for Systems Biology" workshop
Page 54: Talk given at "Cloud Computing for Systems Biology" workshop

“Everything fails, all the time”-- Werner Vogels

Page 55: Talk given at "Cloud Computing for Systems Biology" workshop

assume failure

Page 56: Talk given at "Cloud Computing for Systems Biology" workshop

design backwards

assume failure

Page 57: Talk given at "Cloud Computing for Systems Biology" workshop

nothing fails

design backwards

assume failure

Page 58: Talk given at "Cloud Computing for Systems Biology" workshop

highly available systems

Page 59: Talk given at "Cloud Computing for Systems Biology" workshop

elastic block store

Page 60: Talk given at "Cloud Computing for Systems Biology" workshop

elastic IP

Page 61: Talk given at "Cloud Computing for Systems Biology" workshop

SQS

Page 62: Talk given at "Cloud Computing for Systems Biology" workshop

US East Region

Availability Zone A

Availability Zone B

Availability Zone C

Availability Zone D

Page 63: Talk given at "Cloud Computing for Systems Biology" workshop

data storage

Page 64: Talk given at "Cloud Computing for Systems Biology" workshop

one size does not fit all

Page 65: Talk given at "Cloud Computing for Systems Biology" workshop
Page 66: Talk given at "Cloud Computing for Systems Biology" workshop

Amazon S3

Page 67: Talk given at "Cloud Computing for Systems Biology" workshop

distributed object store

Page 68: Talk given at "Cloud Computing for Systems Biology" workshop

durable

Page 69: Talk given at "Cloud Computing for Systems Biology" workshop

available

Page 70: Talk given at "Cloud Computing for Systems Biology" workshop

!"#$%&'()*+

T

TT

Page 71: Talk given at "Cloud Computing for Systems Biology" workshop

scalable

Page 72: Talk given at "Cloud Computing for Systems Biology" workshop

fast

Page 73: Talk given at "Cloud Computing for Systems Biology" workshop

simple

Page 74: Talk given at "Cloud Computing for Systems Biology" workshop
Page 75: Talk given at "Cloud Computing for Systems Biology" workshop

structured data anyone?

Page 76: Talk given at "Cloud Computing for Systems Biology" workshop

Amazon SimpleDB

Page 77: Talk given at "Cloud Computing for Systems Biology" workshop

zero administration

Page 78: Talk given at "Cloud Computing for Systems Biology" workshop

highly available

Page 79: Talk given at "Cloud Computing for Systems Biology" workshop

schema less

Page 80: Talk given at "Cloud Computing for Systems Biology" workshop

key-value store

Page 81: Talk given at "Cloud Computing for Systems Biology" workshop

Amazon Relational Data Service

Page 82: Talk given at "Cloud Computing for Systems Biology" workshop

single API call

Page 83: Talk given at "Cloud Computing for Systems Biology" workshop

MySQL database

Page 84: Talk given at "Cloud Computing for Systems Biology" workshop

automatic backup

Page 85: Talk given at "Cloud Computing for Systems Biology" workshop

scale up with API call

Page 86: Talk given at "Cloud Computing for Systems Biology" workshop

futu

res

Page 87: Talk given at "Cloud Computing for Systems Biology" workshop

master-slave replicationfu

ture

s

data center failover

Page 88: Talk given at "Cloud Computing for Systems Biology" workshop
Page 89: Talk given at "Cloud Computing for Systems Biology" workshop

what do people do?

Page 90: Talk given at "Cloud Computing for Systems Biology" workshop

solve problems

Page 91: Talk given at "Cloud Computing for Systems Biology" workshop

> 1PB of data in S3

Page 92: Talk given at "Cloud Computing for Systems Biology" workshop
Page 93: Talk given at "Cloud Computing for Systems Biology" workshop

provide platforms & services

Page 94: Talk given at "Cloud Computing for Systems Biology" workshop
Page 95: Talk given at "Cloud Computing for Systems Biology" workshop
Page 96: Talk given at "Cloud Computing for Systems Biology" workshop

http://heroku.com

Platform as a Service

Page 97: Talk given at "Cloud Computing for Systems Biology" workshop

http://cyclecomputing.com

Computation as a Service

Page 98: Talk given at "Cloud Computing for Systems Biology" workshop

http://cyclecomputing.comhttp://wiki.github.com/documentcloud/cloud-crowd

Computational Platforms

sudo gem install cloud-crowd

Page 100: Talk given at "Cloud Computing for Systems Biology" workshop

Image: Matt Wood

they do science

Page 101: Talk given at "Cloud Computing for Systems Biology" workshop
Page 102: Talk given at "Cloud Computing for Systems Biology" workshop

3.7 million classifications in just over three days~15 million in less than a month>2.6 million clicks in 100 hours

Page 103: Talk given at "Cloud Computing for Systems Biology" workshop

Image  via  image  editor  under  a  CC-­‐BY  License

Page 104: Talk given at "Cloud Computing for Systems Biology" workshop

Protein Docking @ Pfizer

http://bioteam.net

Page 106: Talk given at "Cloud Computing for Systems Biology" workshop
Page 107: Talk given at "Cloud Computing for Systems Biology" workshop

</amazon web services>

Page 108: Talk given at "Cloud Computing for Systems Biology" workshop

anecdote

Page 109: Talk given at "Cloud Computing for Systems Biology" workshop

collaborative project

Page 110: Talk given at "Cloud Computing for Systems Biology" workshop

800 GB

Page 111: Talk given at "Cloud Computing for Systems Biology" workshop

Image: Wikipedia Commons

Page 112: Talk given at "Cloud Computing for Systems Biology" workshop

weeks to get started

Page 113: Talk given at "Cloud Computing for Systems Biology" workshop

Image: Matt Wood

Page 114: Talk given at "Cloud Computing for Systems Biology" workshop

Image: Chris Dagdigian

Page 115: Talk given at "Cloud Computing for Systems Biology" workshop
Page 116: Talk given at "Cloud Computing for Systems Biology" workshop
Page 117: Talk given at "Cloud Computing for Systems Biology" workshop
Page 118: Talk given at "Cloud Computing for Systems Biology" workshop
Page 119: Talk given at "Cloud Computing for Systems Biology" workshop

gigabytes

Page 120: Talk given at "Cloud Computing for Systems Biology" workshop

terabytes

Page 121: Talk given at "Cloud Computing for Systems Biology" workshop

petabytes

Page 122: Talk given at "Cloud Computing for Systems Biology" workshop

really fast

Page 123: Talk given at "Cloud Computing for Systems Biology" workshop

constant flux

Page 124: Talk given at "Cloud Computing for Systems Biology" workshop

Image: Chris Dagdigian

Page 125: Talk given at "Cloud Computing for Systems Biology" workshop

data management is not data storage

Page 126: Talk given at "Cloud Computing for Systems Biology" workshop

masterclassBig data & Biology: The implications of

petascale scienceTuesday November 17

1:30PM - 3:00PM Room: PB253-254-257-258

Page 127: Talk given at "Cloud Computing for Systems Biology" workshop

“science data platform”

Page 128: Talk given at "Cloud Computing for Systems Biology" workshop

deliver data to applications

Page 129: Talk given at "Cloud Computing for Systems Biology" workshop

deliver data to people

Page 130: Talk given at "Cloud Computing for Systems Biology" workshop

typical informatics workflow

Page 131: Talk given at "Cloud Computing for Systems Biology" workshop
Page 132: Talk given at "Cloud Computing for Systems Biology" workshop
Page 133: Talk given at "Cloud Computing for Systems Biology" workshop
Page 134: Talk given at "Cloud Computing for Systems Biology" workshop
Page 136: Talk given at "Cloud Computing for Systems Biology" workshop

Via Argonne National Labs under a CC-BY-SA license

Page 137: Talk given at "Cloud Computing for Systems Biology" workshop

Via Argonne National Labs under a CC-BY-SA license

killer a

pp

Page 138: Talk given at "Cloud Computing for Systems Biology" workshop

Data

Apps

Page 139: Talk given at "Cloud Computing for Systems Biology" workshop

Data Platform

App Platform

Page 140: Talk given at "Cloud Computing for Systems Biology" workshop

Data Platform

App Platform

Page 141: Talk given at "Cloud Computing for Systems Biology" workshop

Data Platform

App Platform

Page 142: Talk given at "Cloud Computing for Systems Biology" workshop

data services

Data Platform

Page 143: Talk given at "Cloud Computing for Systems Biology" workshop

application services

App Platform

Page 144: Talk given at "Cloud Computing for Systems Biology" workshop

Scalable Data Platform

Services

APIs

Getters Filters Savers

WORK

Page 145: Talk given at "Cloud Computing for Systems Biology" workshop

must accommodate change

Page 146: Talk given at "Cloud Computing for Systems Biology" workshop

must scale

Page 147: Talk given at "Cloud Computing for Systems Biology" workshop

highly available

Page 148: Talk given at "Cloud Computing for Systems Biology" workshop

loosely coupled

Page 149: Talk given at "Cloud Computing for Systems Biology" workshop

dynamic

Page 150: Talk given at "Cloud Computing for Systems Biology" workshop

task-based resources

Page 151: Talk given at "Cloud Computing for Systems Biology" workshop

one projectone set of resources

Page 152: Talk given at "Cloud Computing for Systems Biology" workshop

no waiting

Page 153: Talk given at "Cloud Computing for Systems Biology" workshop

Protein Docking @ Pfizer

http://bioteam.net

Page 154: Talk given at "Cloud Computing for Systems Biology" workshop

distributed mindset

Page 155: Talk given at "Cloud Computing for Systems Biology" workshop

one approach

Page 156: Talk given at "Cloud Computing for Systems Biology" workshop

disk read/writesslow & expensive

Page 157: Talk given at "Cloud Computing for Systems Biology" workshop

data processingfast & cheap

Page 158: Talk given at "Cloud Computing for Systems Biology" workshop

distribute dataparallelize reads

Page 159: Talk given at "Cloud Computing for Systems Biology" workshop

map/reduce

Page 160: Talk given at "Cloud Computing for Systems Biology" workshop
Page 161: Talk given at "Cloud Computing for Systems Biology" workshop

distributed data processingat scale

Page 162: Talk given at "Cloud Computing for Systems Biology" workshop

abstracting away hadoop

Page 163: Talk given at "Cloud Computing for Systems Biology" workshop

apache hive

http://hadoop.apache.org/hive/

Page 164: Talk given at "Cloud Computing for Systems Biology" workshop

apache pig

http://hadoop.apache.org/pig/

Page 165: Talk given at "Cloud Computing for Systems Biology" workshop

cascading

http://www.cascading.org/

Page 166: Talk given at "Cloud Computing for Systems Biology" workshop
Page 167: Talk given at "Cloud Computing for Systems Biology" workshop

hosted hadoop service

Page 168: Talk given at "Cloud Computing for Systems Biology" workshop

hadoop easy & simple

Page 169: Talk given at "Cloud Computing for Systems Biology" workshop

Input  S3  bucket

Output  S3  bucket

Amazon S3

Hadoop

Amazon EC2 Instances

Input dataset

outputresults

Deploy Application

Web Console, Command line tools

End

Notify

Get ResultsInput Data

Amazon Elastic MapReduce

Hadoop Hadoop

Hadoop

Hadoop

Hadoop

Elastic MapReduce

Elastic MapReduce

Page 170: Talk given at "Cloud Computing for Systems Biology" workshop

developersdevelop & distribute

Page 171: Talk given at "Cloud Computing for Systems Biology" workshop

scientists/analystsconsume

Page 172: Talk given at "Cloud Computing for Systems Biology" workshop

CloudBurst

Catalog k-mers Collect seeds End-to-end alignment

Page 173: Talk given at "Cloud Computing for Systems Biology" workshop

Mike Schatz, University of Maryland

Page 174: Talk given at "Cloud Computing for Systems Biology" workshop
Page 175: Talk given at "Cloud Computing for Systems Biology" workshop

Scalable Data Platform

Services

APIs

Getters Filters Savers

WORK

Page 176: Talk given at "Cloud Computing for Systems Biology" workshop

IN CONCLUSION

Page 177: Talk given at "Cloud Computing for Systems Biology" workshop

large scale biology

Page 178: Talk given at "Cloud Computing for Systems Biology" workshop

complex multidimensional data

Page 179: Talk given at "Cloud Computing for Systems Biology" workshop

whole lot of data

Page 180: Talk given at "Cloud Computing for Systems Biology" workshop

distributed collaborations

Page 181: Talk given at "Cloud Computing for Systems Biology" workshop

new computing and data architectures

Page 182: Talk given at "Cloud Computing for Systems Biology" workshop

a solution: cloud services

Page 183: Talk given at "Cloud Computing for Systems Biology" workshop

distributed

Page 184: Talk given at "Cloud Computing for Systems Biology" workshop

scalable

Page 185: Talk given at "Cloud Computing for Systems Biology" workshop

economical

Page 186: Talk given at "Cloud Computing for Systems Biology" workshop

here today

Page 187: Talk given at "Cloud Computing for Systems Biology" workshop

[email protected]  Twi<er:@mndoci  Presenta?on  ideas  from  @mza,  James  Hamilton,  and  @lessig

Thank  you!