![Page 1: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/1.jpg)
globus online
Science for the Future
Strategies for distributing and sharing data
www.globusonline.org
![Page 2: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/2.jpg)
Big science data should be easy
RegistryStaging Store
IngestStore
AnalysisStore
Community Store
Archive Mirror
IngestStore
AnalysisStore
Community Store
Archive Mirror
Registry
![Page 3: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/3.jpg)
… but it’s hard and frustrating!
RegistryStaging Store
IngestStore
AnalysisStore
Community Store
Archive Mirror
IngestStore
AnalysisStore
Community Store
Archive Mirror
Registry
Quotaexceeded
!
Expiredcredential
s
!
Networkfailed. Retry.
!
Permissiondenied
!
![Page 4: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/4.jpg)
Excerpts from ESNet reports• “Transfers often take longer than expected
based on available network capacities”
• “Lack of an easy to use interface to some of the high-performance tools”
• “Tools [are] too difficult to install and use”
• “Time and interruption to other work required to supervise large data transfers”
• “Need data transfer tools that are easy to use, well-supported, and permitted by site and facility cybersecurity organizations”
![Page 5: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/5.jpg)
We envisage a world where data …
… flows rapidly, reliably, and securely among:
experimental facilities, online and archival
storage, computing facilities, and remote institutions
![Page 6: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/6.jpg)
We envisage a world where data …
… is easily integrated into dynamic datasets that also include metadata and programs necessary to understand and regenerate it
![Page 7: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/7.jpg)
We envisage a world where data …
… is readily discoverable and accessible to collaborators, regardless of their and the data’s location
![Page 8: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/8.jpg)
We believe a new approach is needed to deliver data
management infrastructure
FrictionlessAffordable
Sustainable
Like … but for science!
![Page 9: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/9.jpg)
Focusing on “frictionless”, we’ve started to do this with the Globus Online service …
Transfer and sharing of large data sets …
… with dropbox-like characteristics …
… directly from your own storage systems
![Page 10: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/10.jpg)
We started with reliable, secure, high-performance file transfer …
DataSource
DataDestinatio
n
User initiates transfer request
1
Globus Online moves and syncs files
2
Globus Online notifies user
3
![Page 11: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/11.jpg)
… and then made it simple to share big data off existing storage systems
DataSource
User A selects file(s) to share, selects user or group, and sets permissions
1
Globus Online tracks shared files; no need to move files to cloud storage!
2
User B logs in to Globus Online and
accesses shared file
3
![Page 12: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/12.jpg)
Early adoption is encouraging
![Page 13: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/13.jpg)
Early adoption is encouraging
~18 PB and 1B files moved
10x (or better) performance vs. scp
99.9% availability
![Page 14: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/14.jpg)
![Page 15: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/15.jpg)
B. Winjum (UCLA) moves 900K-file plasma physics datasets UCLA NERSC
![Page 16: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/16.jpg)
Dan Kozak (Caltech) replicates 1 PB LIGO astronomy data for resilience
![Page 17: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/17.jpg)
Exemplar: APS Beamline 2-BM
X-Ray imaging, tomography, ~few µm to 30nm resolution
Currently can generate >100TB per day
<1GB/s data rate; ~3-5GB/s in 5-10 years
![Page 18: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/18.jpg)
Transforming data acquisition
Current• Experimental parameters
optimized manually
• Collected data combined with visual inspection to confirm optimal condition
• Data reconstructed and sent to users via external drive
• User team starts data reduction at home institution
![Page 19: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/19.jpg)
Transforming data acquisition
Envisaged• Experimental
parameters optimized automatically
• Collected data available to optimization programs
• Data are automatically reconstructed, reduced, and shared with local and remote participants
• User team leaves the APS with reduced data
Current• Experimental parameters
optimized manually
• Collected data combined with visual inspection to confirm optimal condition
• Data reconstructed and sent to users via external drive
• User team starts data reduction at home institution
![Page 20: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/20.jpg)
Facility data acquisition
Globus Online as enabler
Globus Online transfer service
Reduced data
Analysis/SharingGlobus
Online sharing service
Globus Online dataset service*
* In development
![Page 21: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/21.jpg)
21Credit: Kerstin Kleese-van Dam
Erin Miller (PNNL) collects data at Advanced Photon Source, renders at PNNL, and views at ANL
![Page 22: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/22.jpg)
We believe a new approach is needed to deliver data
management infrastructure
FrictionlessAffordable
Sustainable
![Page 23: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/23.jpg)
We’ve got a handle on “frictionless”
• Web interface, REST API, command line
• InCommon, Oauth, OpenID, X.509, …
• Credential management
• Group definition and management
• Transfer management and optimization
• Reliability via transfer retries
• Integration with ESNet “Science DMZs”
• One-click “Globus Connect” install
• 5-minute Globus Connect Multi User install
![Page 24: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/24.jpg)
“Affordable” and “sustainable”?
Common expectation is either:– High-priced commercial software
(with generally higher levels of quality)
Or:– Free, open source software
(with generally lower levels of quality)
We aim to offer the best of all worlds!
![Page 25: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/25.jpg)
We are a non-profit service provider to the non-profit
research community
![Page 26: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/26.jpg)
Our challenge:
Sustainability
We are a non-profit service provider to the non-profit
research community
![Page 27: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/27.jpg)
Starting at $20k per year
• Managed endpoints with sharing
• Multiple GridFTP servers per endpoint
• Branded web sites
• Alternate identity provider
• Usage reporting
• Mass storage system (MSS) optimizations
• Operations monitoring and management
• Input into and access to product roadmap
Globus Online Provider Plans
![Page 28: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/28.jpg)
Provider Plan not required to get started
Use Globus Connect Multiuser to easily connect your resources with Globus
Go to: globusonline.org/gcmu
Registry
Staging Store
IngestStore
AnalysisStore
Community Store
Archive Mirror
IngestStore
AnalysisStore
Community Store
Archive Mirror
Registry
![Page 29: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/29.jpg)
We hope you will join us
![Page 30: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/30.jpg)
Providers are also using Globus Online as a platform
Globus Nexus (Identity, Group, Profile)
…
Sharing Service
Transfer Service
Dataset Services
Globus Toolkit
Glo
bu
s O
nlin
e A
PIs
Glo
bu
s C
on
nect
![Page 31: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/31.jpg)
Early platform adopters
![Page 32: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/32.jpg)
Our research is supported by:
U.S . DEPARTMENT OF
ENERGY
![Page 33: Science for the Future: Strategies for Moving and Sharing Data](https://reader035.vdocuments.net/reader035/viewer/2022070315/554ea0c9b4c905977e8b45fb/html5/thumbnails/33.jpg)
Questions
Contact: [email protected]
Providers: globusonline.org/provider-plans
Researchers: globusonline.org/plus
www.globusonline.org