science for the future: strategies for moving and sharing data
Post on 10-May-2015
408 Views
Preview:
DESCRIPTION
TRANSCRIPT
globus online
Science for the Future
Strategies for distributing and sharing data
www.globusonline.org
Ian Fosterfoster@anl.gov
Big science data should be easy
RegistryStaging Store
IngestStore
AnalysisStore
Community Store
Archive Mirror
IngestStore
AnalysisStore
Community Store
Archive Mirror
Registry
… but it’s hard and frustrating!
RegistryStaging Store
IngestStore
AnalysisStore
Community Store
Archive Mirror
IngestStore
AnalysisStore
Community Store
Archive Mirror
Registry
Quotaexceeded
!
Expiredcredential
s
!
Networkfailed. Retry.
!
Permissiondenied
!
Excerpts from ESNet reports• “Transfers often take longer than expected
based on available network capacities”
• “Lack of an easy to use interface to some of the high-performance tools”
• “Tools [are] too difficult to install and use”
• “Time and interruption to other work required to supervise large data transfers”
• “Need data transfer tools that are easy to use, well-supported, and permitted by site and facility cybersecurity organizations”
We envisage a world where data …
… flows rapidly, reliably, and securely among:
experimental facilities, online and archival
storage, computing facilities, and remote institutions
We envisage a world where data …
… is easily integrated into dynamic datasets that also include metadata and programs necessary to understand and regenerate it
We envisage a world where data …
… is readily discoverable and accessible to collaborators, regardless of their and the data’s location
We believe a new approach is needed to deliver data
management infrastructure
FrictionlessAffordable
Sustainable
Like … but for science!
Focusing on “frictionless”, we’ve started to do this with the Globus Online service …
Transfer and sharing of large data sets …
… with dropbox-like characteristics …
… directly from your own storage systems
We started with reliable, secure, high-performance file transfer …
DataSource
DataDestinatio
n
User initiates transfer request
1
Globus Online moves and syncs files
2
Globus Online notifies user
3
… and then made it simple to share big data off existing storage systems
DataSource
User A selects file(s) to share, selects user or group, and sets permissions
1
Globus Online tracks shared files; no need to move files to cloud storage!
2
User B logs in to Globus Online and
accesses shared file
3
Early adoption is encouraging
Early adoption is encouraging
~18 PB and 1B files moved
10x (or better) performance vs. scp
99.9% availability
B. Winjum (UCLA) moves 900K-file plasma physics datasets UCLA NERSC
Dan Kozak (Caltech) replicates 1 PB LIGO astronomy data for resilience
Exemplar: APS Beamline 2-BM
X-Ray imaging, tomography, ~few µm to 30nm resolution
Currently can generate >100TB per day
<1GB/s data rate; ~3-5GB/s in 5-10 years
Transforming data acquisition
Current• Experimental parameters
optimized manually
• Collected data combined with visual inspection to confirm optimal condition
• Data reconstructed and sent to users via external drive
• User team starts data reduction at home institution
Transforming data acquisition
Envisaged• Experimental
parameters optimized automatically
• Collected data available to optimization programs
• Data are automatically reconstructed, reduced, and shared with local and remote participants
• User team leaves the APS with reduced data
Current• Experimental parameters
optimized manually
• Collected data combined with visual inspection to confirm optimal condition
• Data reconstructed and sent to users via external drive
• User team starts data reduction at home institution
Facility data acquisition
Globus Online as enabler
Globus Online transfer service
Reduced data
Analysis/SharingGlobus
Online sharing service
Globus Online dataset service*
* In development
21Credit: Kerstin Kleese-van Dam
Erin Miller (PNNL) collects data at Advanced Photon Source, renders at PNNL, and views at ANL
We believe a new approach is needed to deliver data
management infrastructure
FrictionlessAffordable
Sustainable
We’ve got a handle on “frictionless”
• Web interface, REST API, command line
• InCommon, Oauth, OpenID, X.509, …
• Credential management
• Group definition and management
• Transfer management and optimization
• Reliability via transfer retries
• Integration with ESNet “Science DMZs”
• One-click “Globus Connect” install
• 5-minute Globus Connect Multi User install
“Affordable” and “sustainable”?
Common expectation is either:– High-priced commercial software
(with generally higher levels of quality)
Or:– Free, open source software
(with generally lower levels of quality)
We aim to offer the best of all worlds!
We are a non-profit service provider to the non-profit
research community
Our challenge:
Sustainability
We are a non-profit service provider to the non-profit
research community
Starting at $20k per year
• Managed endpoints with sharing
• Multiple GridFTP servers per endpoint
• Branded web sites
• Alternate identity provider
• Usage reporting
• Mass storage system (MSS) optimizations
• Operations monitoring and management
• Input into and access to product roadmap
Globus Online Provider Plans
Provider Plan not required to get started
Use Globus Connect Multiuser to easily connect your resources with Globus
Go to: globusonline.org/gcmu
Registry
Staging Store
IngestStore
AnalysisStore
Community Store
Archive Mirror
IngestStore
AnalysisStore
Community Store
Archive Mirror
Registry
We hope you will join us
Providers are also using Globus Online as a platform
Globus Nexus (Identity, Group, Profile)
…
Sharing Service
Transfer Service
Dataset Services
Globus Toolkit
Glo
bu
s O
nlin
e A
PIs
Glo
bu
s C
on
nect
Early platform adopters
Our research is supported by:
U.S . DEPARTMENT OF
ENERGY
Questions
Contact: support@globusonline.org
Providers: globusonline.org/provider-plans
Researchers: globusonline.org/plus
www.globusonline.org
top related