science for the future: strategies for moving and sharing data

Post on 10-May-2015

408 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

A talk at the National User Facility Organization (NUFO) 2013 meeting at LBNL, where the theme this year is "the future of scientific data."

TRANSCRIPT

globus online

Science for the Future

Strategies for distributing and sharing data

www.globusonline.org

Ian Fosterfoster@anl.gov

Big science data should be easy

RegistryStaging Store

IngestStore

AnalysisStore

Community Store

Archive Mirror

IngestStore

AnalysisStore

Community Store

Archive Mirror

Registry

… but it’s hard and frustrating!

RegistryStaging Store

IngestStore

AnalysisStore

Community Store

Archive Mirror

IngestStore

AnalysisStore

Community Store

Archive Mirror

Registry

Quotaexceeded

!

Expiredcredential

s

!

Networkfailed. Retry.

!

Permissiondenied

!

Excerpts from ESNet reports• “Transfers often take longer than expected

based on available network capacities”

• “Lack of an easy to use interface to some of the high-performance tools”

• “Tools [are] too difficult to install and use”

• “Time and interruption to other work required to supervise large data transfers”

• “Need data transfer tools that are easy to use, well-supported, and permitted by site and facility cybersecurity organizations”

We envisage a world where data …

… flows rapidly, reliably, and securely among:

experimental facilities, online and archival

storage, computing facilities, and remote institutions

We envisage a world where data …

… is easily integrated into dynamic datasets that also include metadata and programs necessary to understand and regenerate it

We envisage a world where data …

… is readily discoverable and accessible to collaborators, regardless of their and the data’s location

We believe a new approach is needed to deliver data

management infrastructure

FrictionlessAffordable

Sustainable

Like … but for science!

Focusing on “frictionless”, we’ve started to do this with the Globus Online service …

Transfer and sharing of large data sets …

… with dropbox-like characteristics …

… directly from your own storage systems

We started with reliable, secure, high-performance file transfer …

DataSource

DataDestinatio

n

User initiates transfer request

1

Globus Online moves and syncs files

2

Globus Online notifies user

3

… and then made it simple to share big data off existing storage systems

DataSource

User A selects file(s) to share, selects user or group, and sets permissions

1

Globus Online tracks shared files; no need to move files to cloud storage!

2

User B logs in to Globus Online and

accesses shared file

3

Early adoption is encouraging

Early adoption is encouraging

~18 PB and 1B files moved

10x (or better) performance vs. scp

99.9% availability

B. Winjum (UCLA) moves 900K-file plasma physics datasets UCLA NERSC

Dan Kozak (Caltech) replicates 1 PB LIGO astronomy data for resilience

Exemplar: APS Beamline 2-BM

X-Ray imaging, tomography, ~few µm to 30nm resolution

Currently can generate >100TB per day

<1GB/s data rate; ~3-5GB/s in 5-10 years

Transforming data acquisition

Current• Experimental parameters

optimized manually

• Collected data combined with visual inspection to confirm optimal condition

• Data reconstructed and sent to users via external drive

• User team starts data reduction at home institution

Transforming data acquisition

Envisaged• Experimental

parameters optimized automatically

• Collected data available to optimization programs

• Data are automatically reconstructed, reduced, and shared with local and remote participants

• User team leaves the APS with reduced data

Current• Experimental parameters

optimized manually

• Collected data combined with visual inspection to confirm optimal condition

• Data reconstructed and sent to users via external drive

• User team starts data reduction at home institution

Facility data acquisition

Globus Online as enabler

Globus Online transfer service

Reduced data

Analysis/SharingGlobus

Online sharing service

Globus Online dataset service*

* In development

21Credit: Kerstin Kleese-van Dam

Erin Miller (PNNL) collects data at Advanced Photon Source, renders at PNNL, and views at ANL

We believe a new approach is needed to deliver data

management infrastructure

FrictionlessAffordable

Sustainable

We’ve got a handle on “frictionless”

• Web interface, REST API, command line

• InCommon, Oauth, OpenID, X.509, …

• Credential management

• Group definition and management

• Transfer management and optimization

• Reliability via transfer retries

• Integration with ESNet “Science DMZs”

• One-click “Globus Connect” install

• 5-minute Globus Connect Multi User install

“Affordable” and “sustainable”?

Common expectation is either:– High-priced commercial software

(with generally higher levels of quality)

Or:– Free, open source software

(with generally lower levels of quality)

We aim to offer the best of all worlds!

We are a non-profit service provider to the non-profit

research community

Our challenge:

Sustainability

We are a non-profit service provider to the non-profit

research community

Starting at $20k per year

• Managed endpoints with sharing

• Multiple GridFTP servers per endpoint

• Branded web sites

• Alternate identity provider

• Usage reporting

• Mass storage system (MSS) optimizations

• Operations monitoring and management

• Input into and access to product roadmap

Globus Online Provider Plans

Provider Plan not required to get started

Use Globus Connect Multiuser to easily connect your resources with Globus

Go to: globusonline.org/gcmu

Registry

Staging Store

IngestStore

AnalysisStore

Community Store

Archive Mirror

IngestStore

AnalysisStore

Community Store

Archive Mirror

Registry

We hope you will join us

Providers are also using Globus Online as a platform

Globus Nexus (Identity, Group, Profile)

Sharing Service

Transfer Service

Dataset Services

Globus Toolkit

Glo

bu

s O

nlin

e A

PIs

Glo

bu

s C

on

nect

Early platform adopters

Our research is supported by:

U.S . DEPARTMENT OF

ENERGY

Questions

Contact: support@globusonline.org

Providers: globusonline.org/provider-plans

Researchers: globusonline.org/plus

www.globusonline.org

top related