open science data cloud - cca 11

33
Open Science Data Cloud Robert Grossman Open Cloud Consortium University of Chicago Open Data Group April 13, 2011

Upload: robert-grossman

Post on 19-May-2015

1.340 views

Category:

Technology


0 download

DESCRIPTION

This is a talk that I gave at Cloud Computing and Its Applications (CCA 11) on April 13, 2011 in Chicago.

TRANSCRIPT

Page 1: Open Science Data Cloud - CCA 11

Open Science Data Cloud

Robert GrossmanOpen Cloud Consortium

University of ChicagoOpen Data Group

April 13, 2011

Page 2: Open Science Data Cloud - CCA 11

Astronomical dataBiological data (Bionimbus)

NSF-PIRE OSDC Data ChallengeEarth science data (& disaster relief)

Open Science Data Cloud

Page 3: Open Science Data Cloud - CCA 11

Who are we?

Page 4: Open Science Data Cloud - CCA 11

4www.opencloudconsortium.org

• U.S based not-for-profit corporation.• Manages cloud computing infrastructure to

support scientific research: Open Science Data Cloud.

• Manages cloud computing testbeds: Open Cloud Testbed.

• Develop reference implementations, benchmarks and standards.

Page 5: Open Science Data Cloud - CCA 11

OCC Members

• Companies: Cisco, Citrix, Yahoo!, …• Universities: University of Chicago,

Northwestern Univ., Johns Hopkins, Calit2, ORNL, University of Illinois at Chicago, …

• Federal agencies: NASA• International Partners: AIST (Japan)• Other: National Lambda Rail• Beginning to add international partners in 2011.

5

Page 6: Open Science Data Cloud - CCA 11

Proof of Concept2008 - 2010

Phase 12011 - 2014

Phase 22015-2020

• 4 locations• 10G networks• 450+ nodes• 3000 cores• 2 PB

• 6-10 locations• 100G networks• $1M - $2M

hardware per year

• Build a data center for science

Page 7: Open Science Data Cloud - CCA 11

Why Another Cloud Project?

Page 8: Open Science Data Cloud - CCA 11

Small Medium to Large Very Large

Data Size

Low

Med

Wide

Variety of analysis

No infrastructure Dedicated infrastructureGeneral infrastructure

Scientist with laptop

Open Science Data Cloud

High energy physics, astronomy

Page 9: Open Science Data Cloud - CCA 11

Single workstations

Small to medium clusters

HPC

Cycles

Small

Med

Large

Persistent data

data clouds

Large & spec. clusters

databases

Page 10: Open Science Data Cloud - CCA 11

What is the Open Science Data Cloud?

Page 11: Open Science Data Cloud - CCA 11

Hosted, managed, distributed facility to:• Manage & archive your medium and large datasets• Provide computational resources to analyze it• Provide networking to share it with your colleagues

and the public.

Page 12: Open Science Data Cloud - CCA 11

Long Time Goal

Build a (small) data center for science.

Page 13: Open Science Data Cloud - CCA 11

And preserve your data the same way that libraries preserve books &

museums preserve art.

Page 14: Open Science Data Cloud - CCA 11

OSDC Perspective• Take a long term point of

view (think like a library not a cloud service provider)

• Operate infrastructure at the scale of a small data center

• Interoperate with public clouds

• Open, interoperable architecture

• Experiment at scale• Vendor neutral

Page 15: Open Science Data Cloud - CCA 11

OSDC Projects

Page 16: Open Science Data Cloud - CCA 11

Project 1. Bionimbus

www.bionimbus.org

Page 17: Open Science Data Cloud - CCA 11

Case Study: Public Datasets in Bionimbus

Page 18: Open Science Data Cloud - CCA 11

What Could You Do With 1 PB of Genomics Data?

• The NIH in the U.S. currently makes available for download approximately 2PB of data.

• Bionimbus today consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage.

• We plan to add approximately 1 PB of genomics and other data from the biological sciences to Bionimbus in 2011.

Page 19: Open Science Data Cloud - CCA 11

Case Study: ModENCODE

• Bionimbus is used to process the modENCODE data from the White lab (over 1000 experiments).

• Bionimbus VMs were used for some of the integrative analysis.

• Bionimbus is used as a backup for the modENCODE DCC

Page 20: Open Science Data Cloud - CCA 11

Project Matsu 2: An Elastic Cloud For Disaster Response

Daniel Mandl - NASA/GSFC, Lead

20

Page 21: Open Science Data Cloud - CCA 11

Provide Fire / Flood Data to Rescue Workers

Short Term Pilot for 2011• Colored areas represent catchments where rainfall collects and drains to river basins • River gauges displayed as small circles• Detailed measurements are available on the display by clicking on the river gauge stations.

21

Note blue bars indicating a surge of rainfall upstream

Then a flood wave appears downstream at Rundu river gauge days later

Flood Dashboard

Zambezi basin consisting of upper, middle and lower catchments

Page 22: Open Science Data Cloud - CCA 11

Project 3: OSDC PIRE Project

Page 23: Open Science Data Cloud - CCA 11

OSDC PIRE Project Overview

• Research– Cloud middleware for data intensive computing– Wide area clouds

• Training and education workshops – Data intensive computing using the OSDC– Cloud computing for scientific computing

• Outreach– OSDC Data Challenge

Page 24: Open Science Data Cloud - CCA 11

Foreign Partners

• National Institute of Advanced Industrial Science and Technology (AIST), Japan

• Beijing Institute of Genomics (BIG)• Edinburgh University• Korea Institute of Science & Technology• San Paulo State University• Universidade Federal Fluminense, Brasil• University of Amsterdam

Page 25: Open Science Data Cloud - CCA 11

OSDC Data Challenge

• Annual contest to select 3 to 4 datasets each year to add to the OSDC.

• Looking for the most interesting datasets to add.

Page 26: Open Science Data Cloud - CCA 11

Research Focus

• Cloud architectures for data intensive computing

• Wide area clouds• Continuous learning• Scanning queries

Page 27: Open Science Data Cloud - CCA 11

Ways to Participate

• Nominate one of your graduate students to spend a summer working with one of the OSDC PIRE Foreign Partners

• Send one of your graduate students to hands-on Workshops, such as Introduction to Data Intensive Computing

• Submit your most impressive dataset to the OSDC Data Challenge

• Buy a container of computers and join the OSDC

Page 28: Open Science Data Cloud - CCA 11

Open Science Data Cloud Sustainability Model

Page 29: Open Science Data Cloud - CCA 11

Towards a Long Term, Sustainable Model

• Capital Exp about $1M/year• Operating Exp about $1M/year• Moore Foundation providing $1M/year for

2011 and 2012 to support the Cap Exp.

Page 30: Open Science Data Cloud - CCA 11

Who do you most trust to manage your data for 100 years?

Companies may not be here tomorrow.

Think of a not for profit with that mission.

Government agencies have a role, but not always easy to use.

Page 31: Open Science Data Cloud - CCA 11

Buy A Container and Join the OCC

• Use 2/3 of the container for your own purposes.• Provide 1/3 of the container to the OCC for a

share replica space.

Page 32: Open Science Data Cloud - CCA 11

To Get Involved

Join the Open Cloud Consortium: www.opencloudconsortium.org

Page 33: Open Science Data Cloud - CCA 11

Questions?