scalable, collaborative, reproducible, and extensible analysis of tcga data in the cloud

24
SCALABLE, COLLABORATIVE, REPRODUCIBLE, AND EXTENSIBLE ANALYSIS OF TCGA DATA IN THE CLOUD Brandi Davis-Dusenbery, PhD AACR April 18, 2016

Upload: brandi-davis-dusenbery

Post on 08-Jan-2017

124 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

SCALABLE, COLLABORATIVE, REPRODUCIBLE, AND EXTENSIBLE ANALYSIS OF TCGA DATA IN THE

CLOUDBrandi Davis-Dusenbery, PhD

AACRApril 18, 2016

Page 2: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

DISCLOSURE & FUNDING

This project has been funded in whole or in part with Federal funds from the National

Cancer Institute, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN261201400008C.

I am an employee of Seven Bridges

Page 3: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

GUIDING PRINCIPLES

Making data available isn’t

enough to make it usable

The best science happens in

teams

Reproducibility shouldn’t be

hard

The impact of TCGA is

extended by new data & tools

Page 4: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

MAKING DATA AVAILABLE

ISN’T ENOUGH TO

MAKE IT USABLE

Page 5: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

THE CGC ALLOWS YOU TO ACCESS MORE THAN 1PB OF MULTIDIMENSIONAL -OMICS DATA.

multiple Samples per Case

Primary Tumor

Solid Tissue NormalBlood Derived Normal

Metastatic… …

multiple Analyses per Sample

Genomic Transcriptomic

Proteomic Epigenomic

… …

Open Data Controlled Data

Page 6: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

EXPLORE THE DATASET…

Page 7: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

… AND THEN IMMEDIATELY RUN AN ANALYSIS.

Page 8: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

THE BEST SCIENCE

HAPPENS IN TEAMS

Page 9: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

SECURE AND COMPLIANT PROJECT MEMBERSHIP

• Projects serve as isolated workspaces for your data and tools.

• Fine-grained permissions give you control over who can see and use your assets.

• TCGA Controlled data projects access limited to only Authorized users.

Page 10: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

RICH COMMUNICATION & EFFECTIVE COLLABORATION

Project descriptions, conversations, and realtime notifications keep everyone on the same page.

Page 11: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

REPRODUCIBILITY SHOULDN’T BE

HARD

Page 12: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

The inputs, outputs, and parameters as well of the

precise tool versions (including dependencies!)

are always linked and available for reference days

or months later.

EACH TASK IS REPRODUCIBLE & REMEMBERABLE

Page 13: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

• Even the most complex workflows are captured as small run-able text files.

• Easy to share and save.

… AND SELF CONTAINED

Page 14: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

THE IMPACT OF TCGA IS

EXTENDED BY NEW DATA &

TOOLS

Page 15: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

• Graphical uploader

• Command Line uploader

• FTP / HTTP

• API

FOUR WAYS TO ADD YOUR OWN DATA

Page 16: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

~40 properties in visual interface, unlimited custom properties via API.

EASILY ANNOTATE UPLOADED DATA TO MAKE IT EASIER TO FIND LATER

Page 17: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

AS THE AMOUNT OF DATA HAS GROWN, SO TOO HAS THE NUMBER OF

TOOLS AVAILABLE TO ANALYZE IT

-omics data analysis tools* (each with many versions)

50+ used in a single TCGA marker paper

11,160

*omictools.com

Page 18: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

DOCKER + CWL MAKES IT EASY TO PUT THESE TOOLS ON THE CGC …

AND OTHER PLACES

+

Page 19: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

DEFINE THE TOOL, INPUTS, OUTPUTS AND PARAMETERS

Page 20: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

ADD YOUR TOOL TO 100’S OF EXISTING TOOLS TO CREATE A WORKFLOW

Page 21: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

WWW.CANCERGENOMICSCLOUD.ORG

MORE THAN $1M IN COMPUTE AND STORAGE CREDITS AVAILABLE FOR

YOU TO USETiered model allows everyone to access up to $1,600

(~ enough to do whole exome analysis of all pancreatic carcinoma samples)

Request up to $10,000 credits for large collaborative projects (Graduate students and Post-docs are particularly

encouraged to submit a request)

Page 22: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

NEARLY 500 RESEARCHERS ARE USING THE CGC TODAY …

Early Adopter

Open Release

WWW.CANCERGENOMICSCLOUD.ORG

Page 23: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

… JOIN THEM

Booth 452 Networking event

WWW.CANCERGENOMICSCLOUD.ORG

Page 24: Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data in the Cloud

THANK YOU

This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, Department of Health and Human Services, under Contract No.

HHSN261201400008C.