cyverse: transforming life science research via cyberinfrastructure

23
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE Matthew Vaughn @mattdotvaughn Director, Life Sciences Computing, TACC Co-PI Cyverse, Araport, Jetstream Cloud 9/8/2016 1

Upload: matthew-vaughn

Post on 21-Jan-2018

106 views

Category:

Science


1 download

TRANSCRIPT

CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE

Matthew Vaughn @mattdotvaughn

Director, Life Sciences Computing, TACC

Co-PI Cyverse, Araport, Jetstream Cloud

9/8/2016 1

OVERVIEW

9/8/2016 2

• WHAT IS CYVERSE?

• HOW IS IT TRANSFORMATIONAL FOR LIFE SCIENCES RESEARCH?

• HOW DOES IT FIT INTO THE BIGGER SCHEME?

• WHAT DIRECTIONS AND CHALLENGES ARE IN ITS FUTURE?

CYVERSE IS A CYBERINFRASTRUCTURE

9/8/2016 3

Vision: Transforming science through data-driven discovery

Mission: To design, develop, deploy, and expand a national cyberinfrastructure for life science research, and to train scientists in its use

SUPPORTED BY THE NSF BIO DIRECTORATE

9/8/2016 4

• Division of Biological Infrastructure

• $100 Million, 10-year investment

• CyVerse resources are

– Freely available to the community

– Intended to spur national and international collaboration for research and education

iPlant 2008Empowering a New Plant Biology

iPlant 2013Cyberinfrastructure for Life Science

CyVerse 2016Transforming Science Through Data-Driven Discovery

DBI-0735191DBI-1265383

9/8/2016 5

DISRUPTIVE MEASUREMENT TECHNOLOGIES

NOT JUST ONE DATA TSUNAMI BUT THOUSANDS OF THEM

9/8/2016 6

EXPLOSION IN SOFTWARE AND SYSTEMS COMPLEXITY

9/8/2016 8

INCREASED ADOPTION OF COMPUTATIONAL METHODS

RESEARCH TEAMS NEED THIS

Store, organize, share primary data

Do basic analysis

Store, organize, share data products

Generate and explore hypotheses

Share analysis code with the scientific public

Integrate results from new experiments

Publish data alongside plots, visualizations and analytical tools

9/8/2016 9

BUT END UP DEALING WITH THIS

Data lifecycle management

Fine-grained permission management

Discoverability

Version control

Taming promising new analysis codes (usually based immature technology)

Paying for storage, cycles, and consulting

Making their science reproducible

9/8/2016 10

THE CYVERSE APPROACH

9/8/2016 11

CYVERSE PRODUCT MATRIX

9/8/2016 12

AtmosphereUser-provisioned, highly configurable cloud computing environment tailored for sciences

DiscoveryEnvironment

Web-accessible analysis workbench and gateway to national HPC infrastructure (XSEDE)

Bisque Software for managing, analyzing and visualizing high throughput imaging data

Data StoreScalable data storage for managing and sharing data across CyVerse’s CI and external data resources

Science APIsAutomation interfaces to connect data and computation for rapid integration external resources. Also used as a graduate teaching platform.

DNA Subway Classroom-friendly bioinformatics teaching platform

Powered by CyVerse Third-party applications built on CyVerse’s foundational services and

Welch et al. 2013

Bioinformatics Specialist

Computing Professional

Bench Scientist

EMPOWER USERS AT ALL LEVELS

Help them avoid data and

operations siloes

9/8/2016 14

Science applications

Domain-specific services

Establishedsoftware and CI

Physical resources

Federated Storage

National CI VirtualizationJob

SchedulingSingle

Sign-on

Ease

of

Use

Ease

of

Re-u

se

IMPACTS

9/8/2016 15

• 500+ publications• >2PB user data stored• 40+k registered users• Millions of compute

hours annually• Hundreds of trainees

CYVERSE IS A HUB IN A RICH & COLLABORATIVE ECOSYSTEM

9/8/2016 16

• Using• Collaborating• Contributing• Supporting• Inventing

CURRENT INITIATIVES

9/8/2016 17

Enabling Data-Driven Discovery. Providing Advanced Training to Researchers. Removing Barriers to Reproducible Science.

Cyverse Data Commons

Portable Science Lab

Intensive Engagement

CYVERSE DATA COMMONS

9/8/2016 18

Make research data discoverable and reusable. Ensure it ends up stored in its natural repository.

Cyverse Data Store

Staging Area

Data Commons Portal

Natural Repositories

Publish in place simply by sharing

Curate, format, describe metadata

Published snapshot with DOI and open

access

Facilitated deposit to NCBI-SRA, Genbank, and

more

PORTABLE SCIENCE LAB

9/8/2016 19

Continue adoption of technologies to describe, encapsulate, and share research code and data.

Virtual machines, Linux containers, Web Service APIs, Workflow Standards

Integrated via Interactive, Narrative Notebooks

INTENSIVE ENGAGEMENT

9/8/2016 20

Extended Collaborative

Support

Consultation and Support Forums

Hands-on Training and

Tutorials

Enhanced Support Tooling

Empower Researchers to Embrace and

Extend Cyverse

SUMMARY

9/8/2016 21

• CyVerse is a reference model for cyberinfrastructure that is already being extended to other disciplines

• CyVerse provides a vertically integrated, scalable data-to-discovery cyberinfrastructure that leverages existing federal and state investments to transform life science research

• Cyverse is driving technological and operational innovation via a web of interactions and collaborations with other projects, platforms, and infrastructures.

KEY CHALLENGE - CYVERSE VALUE PROPOSITION

9/8/2016 22

“Are you still going to be around in 3 years?”

”Why did my analysis fail? Don’t you have big computers?”

“Shouldn’t we just go to Amazon Web Services?”

“I don’t want my students spending time learning computing.”

“Why aren’t you working on X?”

DISCUSSION

9/8/2016 23

@mattdotvaughn www.slideshare.net/mattdotvaughn [email protected]