toni saarinen, tite4 tomi ruuska, tite4 earth system grid - esg

19
Toni Saarinen, Tite4 Tomi Ruuska, Earth System Grid - ESG Earth System Grid - ESG

Post on 20-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Toni Saarinen, Tite4

Tomi Ruuska, Tite4

Earth System Grid - ESGEarth System Grid - ESG

ESG OverviewESG Overview

• Earth System Grid enables management, Earth System Grid enables management, discovery, distributed access, processing discovery, distributed access, processing and analysis of distributed terascale and analysis of distributed terascale climate research dataclimate research data

• A “Collaboratory Pilot Project”A “Collaboratory Pilot Project” funded by funded by the DOE(Department of Energy) SciDAC the DOE(Department of Energy) SciDAC programprogram

• Build upon ESG-I, Globus ToolkitBuild upon ESG-I, Globus Toolkit, , DataGrid technologiesDataGrid technologies

ESG OverviewESG Overview

• The main goal of ESG is to make climate data The main goal of ESG is to make climate data an easily accessible community resource. an easily accessible community resource.

• Enabling researchers to understand and make Enabling researchers to understand and make effective use of very large, distributed climate effective use of very large, distributed climate datasets is critical.datasets is critical.

• The broad strategy is to develope a collection of The broad strategy is to develope a collection of server-side capabilities – minimize the amount of server-side capabilities – minimize the amount of data movementdata movement

• Multiple interfaces to ESG will allow researchers Multiple interfaces to ESG will allow researchers to focus on science rather than issues of data to focus on science rather than issues of data transfer, format, and data set manipulationtransfer, format, and data set manipulation

ESG ParticipantsESG Participants• ANL Argonne National Laboratory (Argonne, IL)• ISI Information Sciences Institute (Marina del Rey, CA)• LANL Los Alamos National Laboratory (Los Alamos, NM)• LBNL Lawrence Berkeley National Laboratory (Berkeley, CA)• LLNL Lawrence Livermore Nat. Laboratory (Livermore, CA)• NCAR Nat. Center for Atmospheric Research (Boulder, CO)• NERSCNat. Energy Res. Scient. Comp. Center (Oakland, CA)• ORNL Oak Ridge National Laboratory (Oak Ridge, TN)• USC University Of Southern California (Los Angeles, CA)

ESG HistoryESG History• ESG-I: DOE NGI(Next Generation Internet) project

– Focus on high-performance data movement, Grid-enabled versions of LLNL tools

– Early successes include bandwidth challenge at SC’2001, significant technology output

– Experimental deployments only, at participating sites

• ESG-II: DOE SciDAC(Scientific Discovery through Advanced Computing) project– “Smart servers” for server-side data reduction– Integration with common “thin” clients, e.g. DODS and Data

Portals– Client software in the hands of environmental scientists– Production deployments at participating instances

Climate GRID Example for Ocean ModelClimate GRID Example for Ocean Model

Temperature(i,j)

Latitude(i,j)

Longitude(i,j)

Lat_bounds(i,j,4)

Lon_bounds(i,j,4)

Geographical OverviewGeographical Overview

ESG-II ArchitectureESG-II Architecture

ESG ComponentsESG Components

User authentication

Metadata Search

Replica Location and transfer

Data analysis and visualization

Demonstration Workflow:Demonstration Workflow:

• Globus Toolkit (ANL, ISI)– GridFTP data transfer– GRAM resource access– Community Authorization

Service (CAS)– Replica Location Service (RLS)– Metadata Catalog Service

(MCS)

• Web interface (NCAR) and workflow manager

• Hierarchical Resource Manager (HRM) (LBNL)

• Storage Resource Manager

• Metadata (NCAR, LLNL, ISI)

• OpenDAP-G (NCAR, ANL)

• Live Access Server (NCAR)

The Globus ToolkitThe Globus Toolkit™™

• An Open Source Project• Security• Directory, Metadata, and Replica Services• Resource Management• Data Access and Management• Distributed Computation• Open Grid Services Architecture (OGSA)

– Reliable, persistent web services

The Globus ToolkitThe Globus Toolkit™™

• Globus middleware supports linkage of distributed data archives, supercomputers, workstations, local disk caches into data/computational grids.

• GridFTP: high-performance, secure, robust data transfer mechanism: protocol, server, client library.• ESG is integrating OpenDAP (DODS protocol) with GridFTP

protocol.• Single sign-on using Grid Security Infrastructure

• Proxy certificates• Community Authorization Service (CAS)• Replica Location Service: manages copying and

placement of files in a distributed environment.• Logical vs. physical files

Distributed Data Access ProtocolDistributed Data Access Protocol

Data(local)

netCDF lib

Application

Data(remote)

OpenDAP Client

Application

OpenDAPViahttp

Big Data(remote)

ESG client

Application

ESGGrid +DODS

OpenDAP Server ESG Server

Distributed Application

dataOpenDAP

ViaGrid

Typical Application

Grid + OpenDAP-Transparency-Performance-Security-Resource Management-Analysis functions

ESG Metadata ServicesESG Metadata Services

METADATAEXTRACTION

METADATAEXTRACTION

METADATADISPLAY

METADATADISPLAY

METADATABROWSING

METADATABROWSING

METADATAQUERY

METADATAQUERY

ESG CLIENTS API & USER INTERFACES

Data &MetadataCatalog

Dublin CoreDatabase

CFDatabase

mirrorDublin CoreXML Files

COMMENTSXML Files

METADATA HOLDINGS

METADATAANNOTATION

METADATAANNOTATION

METADATAVALIDATION

METADATAVALIDATION

METADATA ACCESS(update, insert, delete, query)

METADATA ACCESS(update, insert, delete, query)

SERVICE TRANSLATIONLIBRARY

SERVICE TRANSLATIONLIBRARY

CORE METADATA SERVICES

METADATAAGGREGATION

METADATAAGGREGATION

METADATADISCOVERY

METADATADISCOVERY

METADATA & DATA REGISTRATION

METADATA & DATA REGISTRATION

PUBLISHINGPUBLISHING

HIGH LEVEL METADATA SERVICES

SEACH & DISCOVERYSEACH & DISCOVERYADMINISTRATIONADMINISTRATION BROWSING & DISPLAYBROWSING & DISPLAY

ANALYSIS & VISUALIZATIONANALYSIS & VISUALIZATION

Resource ManagementResource Management

• Hierarchical Resource Manager- queuing of file transfer requests - reordering of request to optimize Parallel FTP - monitoring progress and error messages - re-schedules failed transfers - enforces local resource policy

• Storage Resource Management - Manage space - Manage files on behalf of a user - Manage file sharing - Get files from remote locations when necessary - Manage multi-file requests - Provide grid access to/from mass storage - Transfer protocol negotiation

Live Access ServerLive Access Server

• General purpose Web server for geo-science data sets• Directs communications between a user and an application running

under a Web server • Converts requests into a series of commands which actually does

the data access

ESG Data PortalESG Data Portal

Goal: Make large ESG data sets Goal: Make large ESG data sets easily easily accessible toaccessible to

ScientistsScientists for production usefor production use

TOMCATServlet engine

TOMCATServlet engine

MCSMetadata Cataloguing Services

MCSMetadata Cataloguing Services

RLSReplica Location Services

RLSReplica Location Services

SOAP

RMI

MyProxyserver

MyProxyserver

MCS client

RLS client

MyProxy clientGRAM

gatekeeper

GRAMgatekeeper

CASCommunity Authorization Services

CASCommunity Authorization Services

CAS client

diskMSS

Mass Storage System

HPSSHigh PerformanceStorage System

disk

HPSSHigh PerformanceStorage System

disk

disk

SRMStorage Resource

Management

SRMStorage Resource

Management

SRMStorage Resource

Management

SRMStorage Resource

Management

SRMStorage Resource

Management

SRMStorage Resource

Management

SRMStorage Resource

Management

SRMStorage Resource

Management

gridFTP

gridFTP

gridFTPserver

gridFTPserver

gridFTPserver

gridFTPserver gridFTP

server

gridFTPserver

gridFTPserver

gridFTPserver

openDAPgserver

openDAPgserver

CAS-enabledStriped-gridFTP

server

CAS-enabledStriped-gridFTP

server

LBNL

LLNL

ISI

NCAR

ORNL

ANL

Striped gridFTPclient

Striped gridFTPclient

gridFTP

openDAPgserver

openDAPgserver

CAS-enabledStriped-gridFTP

server

CAS-enabledStriped-gridFTP

server

gridFTP

openDAPgserver

openDAPgserver

CAS-enabledStriped-gridFTP

server

CAS-enabledStriped-gridFTP

server

gridFTP

LASLive

AccessServer

LASLive

AccessServer

ESG: StrategiesESG: Strategies & Goals & Goals

• Move data a minimal amount, keep it close to computational point of origin when possible– Data access protocols, distributed analysis

• When we must move data, do it fast and with a minimum amount of human intervention– Storage Resource Management, fast networks

• Keep track of what we have, particularly what’s on deep storage– Metadata and Replica Catalogs

• Harness a federation of sites– Globus Toolkit -> The Earth System Grid -> The

UltraDataGrid

ESG Development in 2003ESG Development in 2003

• Metadata Conventions and Services– Application groups deciding on one (or more) metadata schemas– Better MCS support for XML schema– Distribution and federation of heterogeneous metadata catalogs

• Integration of DODS server and GridFTP data transport protocol

• Customization of Replica Location Service for ESG

• Storage Resource Manager (from LBNL) to optimize storage transfers

• Community authorization service to provide fine-grained access control