research data management as a service
DESCRIPTION
This presentation is by Ian Foster, director of the Computation Institute at The University of Chicago. It was given at the Great Plains Network Annual Meeting, on May 29, 2013. For more information on Globus Online, visit globusonline.org. "What would a Dropbox for science look like?" asks Foster. "It should be trivial to collect, move, sync, share, analyze, annotate, publish, search, backup, and archive Big Data. But in reality it's often very challenging." Globus Online, a software as a service for data management, solves these problems. This slideshow explains how Globus Online does that for universities and laboratories around the world.TRANSCRIPT
computationinstitute.org www.globusonline.org
Research data management as a service
Ian Foster [email protected]
computationinstitute.org www.globusonline.org
High energy physics
Molecular biology
Cosmology
Genetics
Metagenomics
Linguistics
Economics
Climate change
Visual arts
computationinstitute.org www.globusonline.org
What would a “dropbox for science”
look like?
computationinstitute.org www.globusonline.org
Registry Staging Store
Ingest Store
Analysis Store
Community Store
Archive Mirror
Ingest Store
Analysis Store
Community Store
Archive Mirror
Registry
Quota exceeded
!
Expired credentials
!
Network failed. Retry.
!
Permission denied
!
It should be trivial to Collect, Move, Sync, Share, Analyze, Annotate, Publish, Search, Backup, & Archive BIG DATA … but in reality it’s often very challenging
computationinstitute.org www.globusonline.org
• Collect • Move • Sync • Share • Analyze
• Annotate • Publish • Search • Backup • Archive
BIG DATA …for
computationinstitute.org www.globusonline.org
• Collect • Move • Sync • Share • Analyze
• Annotate • Publish • Search • Backup • Archive
• Collect • Move • Sync • Share Capabili8es delivered using
So=ware-‐as-‐Service (SaaS) model
computationinstitute.org www.globusonline.org
computationinstitute.org www.globusonline.org
Data Source
Data Destination
User iniAates transfer request
1
Globus Online moves/syncs files
2
Globus Online noAfies user
3
computationinstitute.org www.globusonline.org
Data Source
User A selects file(s) to share; selects user/group, sets share permissions
1
Globus Online tracks shared files; no need to move files to cloud storage!
2
User B logs in to Globus Online and accesses shared file
3
computationinstitute.org www.globusonline.org
Early adopAon is encouraging
computationinstitute.org www.globusonline.org
Early adopAon is encouraging
8,000 registered users; >100 daily ~16 PB moved; ~1B files
10x (or beOer) performance vs. scp 99.9% availability
En8rely hosted on Amazon
computationinstitute.org www.globusonline.org
Globus Online already does a lot
Globus Toolkit
Sharing Service
Transfer Service
Globus Nexus (Identity, Group, Profile)
Glo
bu
s O
nlin
e A
PIs
Glo
bu
s C
on
nec
t
computationinstitute.org www.globusonline.org
We are also adding capabiliAes
Globus Toolkit
Sharing Service
Transfer Service
Globus Nexus (Identity, Group, Profile)
Glo
bu
s O
nlin
e A
PIs
Glo
bu
s C
on
nec
t
computationinstitute.org www.globusonline.org
We are also adding capabiliAes
Globus Toolkit
Sharing Service
Transfer Service
Dataset Services
Globus Nexus (Identity, Group, Profile)
Glo
bu
s O
nlin
e A
PIs
Glo
bu
s C
on
nec
t
computationinstitute.org www.globusonline.org
Expanding Globus Online services
• Ingest and publication – Imagine a DropBox that not only replicates, but
also extracts metadata, catalogs, converts • Cataloging
– Virtual views of data based on user-defined and/or automatically extracted metadata
• Computation – Associate computational procedures,
orchestrate application, catalog results, record provenance
computationinstitute.org www.globusonline.org
Builds on catalog as a service Approach
• Hosted user-defined catalogs
• Based on tag model <subject, name, value>
• Optional schema constraints
• Integrated with other Globus services
Three REST APIs /query/ • Retrieve subjects /tags/ • Create, delete, retrieve
tags /tagdef/ • Create, delete, retrieve
tag definitions Builds on USC Tagfiler project (C. Kesselman et al.)
17
mydata42
owner: Francesco type: 3dtomo format: HDF5 beamline: 2BM
Tomography!
Define dataset Infer type Extract metadata
Populate catalog(s)
Locate datasets Access files
analyze
Catalog derived products
transfer/schedule
Orchestra8on Organiza8on
Record provenance
Annotate, share browse, search
computationinstitute.org www.globusonline.org
Our challenge:
Sustainability
We are a non-profit service provider to the non-profit
research community
computationinstitute.org www.globusonline.org
Globus Online Provider Plans
Support ongoing operations
Offer value-added capabilities
Engage more closely with users
computationinstitute.org www.globusonline.org Starting at $20k per year
• Provider endpoints with sharing • Multiple GridFTP servers per endpoint • Branded web sites • Alternate identity provider • Usage reporting • MSS optimizations • Operations monitoring and management • Input into and access to product roadmap
Provider Plans offer…
computationinstitute.org www.globusonline.org
Thanks to great colleagues and collaborators
• Steve Tuecke, Rachana Ananthakrishnan, Kyle Chard, Raj Kettimuthu, Ravi Madduri, Tanu Malik, and many others at Argonne & Uchicago
• Carl Kesselman, Karl Czajkowski, Rob Schuler, and others at USC/ISI
• Birali Runesha and others at UChicago Research Computing Center
computationinstitute.org www.globusonline.org
Thank you to our sponsors!