open science data cloud (ieee cloud 2011)
TRANSCRIPT
1
OCC Open Science Data Cloud(www.opensciencedatacloud.org)
Robert GrossmanUniversity of Chicago
Open Cloud ConsortiumOpen Data Group
July 5, 2011
I’ll describe a new project (the Open Science Data Cloud) and
three research questions generated by the project.
Open Science Data Cloud
The OCC is a not-for-profit supporting the scientific community by operating cloud infrastructure.
The OSDC is a hosted distributed facility managed by the OCC that:
• Manages & archives medium and large size datasets.• Provides computational resources to analyze them.• Provides networking to share the datasets with your
colleagues and with the public.
Proof of Concept2008 - 2010
Phase 12011 - 2014
Phase 22015-2020
• 4 locations• 10G networks• 450+ nodes• 3000 cores• 2 PB
• 6+ locations• 100G networks• $1M - $2M
hardware/year• Sept, 2011
• Build a data center for science.
• Drive the the 4th paradigm.
Small Medium to Large Very Large
Data Size
Low
Med
Wide
Variety of analysis
No infrastructure Dedicated infrastructureGeneral infrastructure
Scientist with laptop
Open Science Data Cloud
High energy physics, astronomy
OSDC Perspective• Take a long term point of view (think
like an underfunded library not a cloud service provider).
• Manage both the data and the analysis environment.
• Develop open architecture that interoperates with other private and public clouds.
• Operate vendor neutral infrastructure at the scale of a small data center.
Research Questions
1. Develop technology to encapsulate a scientist’s data and analysis tools and to export, save and move these between clouds.
2. Develop protocols, utilities, and applications so that new racks and containers can be added to data clouds with minimal human involvement.
3. Develop technology to support the long term, low cost preservation of data in clouds.