data access on the teragrid (possibilities & directions) dan fraser, anl ann chervenak, isi...

40
Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Upload: trey-warring

Post on 22-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Data Access on the TeraGrid(Possibilities & Directions)

Dan Fraser, ANLAnn Chervenak, ISI

TeraGrid data workshop Jan ’07, San Diego

Page 2: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Overview Architecture ideas from LCG

Trending toward SOA (changeable parts) Possible synergy with TeraGrid Data (end)

Globus Data Directions GridFTP and potential benefits for TG Users Reliable File Transfer (RFT) Replica Locator Service (RLS)

Distributed database that records locations of data copies Data Replication Service (DRS)

Integrates RFT & RLS to replicate & register files Data Access and Integration Service (DAIS)

Service to access relational and XML database

Page 3: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Architecture ideas from LCG

CAstor d-Cache DPM MOPS(3 T1s) (9 T1s & T2s) (T2s only) (T2s only)

GridFTP is the underlying transfer mechanism

RFT(OSG) FTS (EGEE) [SRM-cp, Unicore, Oracle, IBM]Tape Reliability & retries,

Single point of control for VOs & bulk transfers

Advanced client tools – RLS, DRS, RFT(Policies)

Phedex Don Quixote FTD DiracExperimental (CMS) (Atlas) (Alice) (LHCB)Tool Kits Subscribe to datasets. diy meta-scheduler

Pick best copy

SRM Interface (LCG Requirement) + POSIX I/O

Experimental framework – pure science codes

Proposed DMIS interface

Page 4: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

PhEDEx (Physics Experiment Data Export)(slide from LCG presentation)

Large scale dataset replica management system Managed data flow following a transfer topology (Tier0 → Tier1

Tier2) Routed multi-hop transfers. Routing agents determine the best route Reliable point-to-point transfers based on unreliable Grid transfer

tools Set of quasi-independent, asynchronous software agents posting

messages in a central blackboard Nodes subscribe for data allocated from other nodes Enables distribution management at dataset level rather than at file

level Implements experiment’s policy on data placement Allows prioritization and scheduling In production (~3 years)

Managing transfers of several TB/day ~100 TB known to PhEDEx, ~200 TB total replicated Running at CERN, 7 Tier-1’s, 10 Tier-2’s

Page 5: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

What can we learn? Different communities have different needs and

use their own data-specific tools. One monolithic file system could be nice but not

necessary to get work done. (yet) Common central tools help everyone (Catalogues,

Metadata, Replica Management, Easy reliable file access, Workflows [scheduling])

Trend toward isolating “data specialized” code. Common interfaces allow teams to play nicely

together. GSI is a big win, eventually. (hidden by portals) … details to be filled in by people in this room

Page 6: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Overview Architecture Ideas from the LCG

Trending toward SOA (changeable parts) Possible synergy with TeraGrid Data

Globus Data Directions GridFTP and potential benefits for TG Users Reliable File Transport (RFT) Replica Locator Service (RLS)

Distributed database that records locations of data copies Data Replication Service (DRS)

Integrates RFT & RLS to replicate & register files Data Access and Integration Service (DAIS)

Service to access relational and XML database

Page 7: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

What is GridFTP

Data StorageInterfaces (DSI) -POSIX -SRB

GridFTP Server-separate control, data-striping

XIO Drivers -TCP -UDT (UDP) -parallel streams -GSI

Client Interfaces -Globus-URL-Copy -C Library -RFT (3rd party)

I/OFileSystems

Clients

Page 8: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Extensible IO (XIO) system Provides a framework that implements a

Read/Write/Open/Close Abstraction Drivers are written that implement the functionality (file,

TCP, UDP, GSI, etc.) Different functionality is achieved by building protocol

stacks GridFTP drivers will allow 3rd party applications to easily

access files stored under a GridFTP server Other drivers could be written to allow access to other

data stores. Changing drivers requires minimal change to the

application code. Ported GridFTP to use UDT in less than a day

AFTER the UDT driver was written

Page 9: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Parallelism vs Striping

Page 10: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Memory to MemoryStriping Performance

BANDWIDTH Vs STRIPING

0

5000

10000

15000

20000

25000

30000

0 10 20 30 40 50 60 70

Degree of Striping

Ban

dw

idth

(M

bp

s)

# Stream = 1 # Stream = 2 # Stream = 4 # Stream = 8 # Stream = 16 # Stream = 32

Page 11: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Why people use GridFTP

Security (GSI, and now SSH) Performance using parallel streams Performance using striping (and PS) Partial File Transfer Third party control (reliable & restartable) Data extensibility Protocol extensibility

Page 12: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Top GridFTP Myths

Hard to install Requires all of Globus Requires GSI

Page 13: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Moving Forward

Data StorageInterfaces (DSI) -POSIX -SRB

I/OFileSystems

Clients

-HPSS-Small File optimization-Virtual Deployment-Dynamic Registration

-SSH

XIO Drivers -TCP -UDT (UDP) -parallel streams -GSI

GridFTP Server-separate control, data-striping

Client Interfaces -Globus-URL-Copy -C Library -RFT (3rd party)

Page 14: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Future GridFTP Directions Client/Server side

Lots of Small File optimizations (beta) – transfer a sequence of small files as if they were one file.

Dynamic Mover Registration Infrastructure (GFork) Enhance reliability, especially during striping Dynamically scale to meet ever-changing transfer demands Enable users to configure fast transfers

Dynamic deployment via Virtual Machines (Infiniband) Managed Object Placement Service (MOPS)

XIO side Enable transfers using SSH (beta)

DSI side HPSS (beta)

Help us prioritize for TeraGRID

Page 15: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Lots Of Small Files (LOSF)

Pipelining Many transfer requests outstanding at once Client sends second request before the first

completes Latency of request is hidden in data transfer

time Cached Data channel connections

Reuse established data channels (Mode E) No additional TCP or GSI connect overhead

Page 16: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Fast LOSF

1 GB of data partitioned into equal sized files Performance doesn't degrade for pipelining until 100KB

Page 17: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

GridFTP Advanced Configurations

GFork Robust unix fork/setuid model Allows server state to be maintained across

connections Dynamic backends

Stability in the event of backend failure Growing resource pools for peak demands

Frontend Replication

Page 18: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

GFork

Client

Server Host

GForkServer

GridFTPPlugin

GridFTP Server Instance

GridFTP Server Instance

GridFTP Server Instance

State Sharing Link

ClientClient Inherited Links

Control Channel Connections

Page 19: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Dynamic Backends

Frontend Host

GForkServer

GridFTPPlugin

Frontend Instance

Fork

Lookup available backend

BackendInstance

Backend Host

INetD

Fork

RegistrationDaemon

Registration

Control Connection

Multiple BEs register with plugin Plugin maintains the list of available

BEs FE instance selects N BEs for use If any one BE fails another can be used BE pool can grow and shrink

Page 20: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Reliable File Transfer RFT accepts a SOAP description of the desired

transfer It writes this to a database It then uses the Java GridFTP client library to

initiate 3rd part transfers on behalf of the requestor.

Restart Markers are stored in the database to allow for restart in the event of an RFT failure.

Supports concurrency, i.e., multiple files in transit at the same time. This gives good performance on many small files.

Page 21: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Reliable File Transfer Comparison with globus-url-copy

Supports all the same options (buffer size, etc) Increased reliability because state is stored in a

database. Service interface

The client can submit the transfer request and then disconnect and go away

Think of this as a job scheduler for transfer job

Two ways to check status Subscribe for notifications Poll for status

Page 22: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Globus Replica Management

Current Globus tools: Replica Location Service (RLS):

Provides registration and discovery of data items Data Replicatoin Service (DRS)

Pull-based data replication from existing data items using RFT and registration of files in RLS

Long-term plan (CEDS): provide flexible, policy-driven replication services

Maintain a certain level of redundancy for all data items Subscribe to data items with certain characteristics and

automatically receive copies of new, matching data items Keep replicas consistent with one another

Page 23: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Replication Scenario: The LIGO Project

Laser Interferometer Gravitational Wave Observatory

Data sets first published at Caltech Publication includes specification of metadata attributes

Data sets may be replicated at up to 10 LIGO sites Sites perform metadata queries to identify desired data Pull copies of data from Caltech or other LIGO sites

Customized data management system: the Lightweight Data Replicator System (LDR)

Uses existing Globus tools (GridFTP, RLS)

Page 24: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

The Globus Replica Location Service• A Replica Location Service (RLS) is a distributed

registry that records the locations of data copies and allows replica discovery

RLS maintains mappings between logical identifiers and target names

Must perform and scale well: support hundreds of millions of objects, hundreds of clients

E.g., LIGO (Laser Interferometer Gravitational Wave Observatory) Project

RLS servers at 10 sites Maintain associations between 11 million logical file names

& 120 million physical file locations

Page 25: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Replica Location Indexes

Local Replica Catalogs

•LRCs use soft state update mechanisms to inform RLIs about their state: relaxed consistency of index

• Optional compression of state updates reduces communication, CPU and storage overheads

RLS Framework

• Local Replica Catalogs (LRCs) contain consistent information about logical-to-target mappings

• Replica Location Index (RLI) nodes aggregate information about one or more LRCs

LRC LRC LRC LRC

RLI RLI RLI

RLI RLI

Page 26: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

RLS Status Stable component

Greatly improved performance and scalability in last 2 years No major changes to existing RLS functionality, interfaces New interface: WS-RF compatible web services interface

(WS-RLS)

Major difficulty for users has been installation and configuration of open source relational database backends

New features Support for embedded database backend (sqlite) Easier configuration of relational database backends Pure Java client for RLS (available approx. March 2007)

Planned Features Dynamic deployment of RLS services Better support for RLS configuration management in VOs Finer-grained authorization support for users

Page 27: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Motivation for Data Replication Service

Data-intensive applications need higher-level data management services that integrate lower-level Grid functionality

Efficient data transfer (GridFTP, RFT) Replica registration and discovery (RLS) Eventually validation of replicas, consistency management,

etc.

Goal is to generalize the custom data management systems developed by several application communities

Eventually plan to provide a suite of general, configurable, higher-level data management services

Globus Data Replication Service (DRS) is the first of these services

Page 28: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

The Data Replication Service Included in the GT4.0.2 release

Design based on publication component of the LIGO Lightweight Data Replicator system

Developed by Scott Koranda

Client specifies (via DRS interface) which files are required at local site

DRS uses: Globus Delegation Service to delegate proxy credentials RLS to discover where replicas exist in the Grid Selection algorithm to choose among available source

replicas (provides a callout; default is random selection) Reliable File Transfer (RFT) service to copy data to site

Via GridFTP data transport protocol RLS to register new replicas

Page 29: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

DRS Functionality Delegate credential via

Delegation Service Create a Replicator resource via

DRS Discover replicas of desired files

in RLS, select among replicas Transfer data to local site with

Reliable File Transfer Service using GridFTP Servers

Register new replicas in RLS catalogs

Monitor Replicator resource and trigger events

Inspect state of DRS resource and Resource Properties

Destroy Replicator resource

RPs

Replicator

DRS

RPs

Transfer

RFT

RLSIndex

RLSCatalog

GridFTPServer

GridFTPServer

Client

Page 30: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Next Generation: Data Placement Services

Center for Enabling Petascale Distributed Science (CEDS)

Recently funded by DOE Scidac2 as a Center for Enabling Technologies

Includes: USC Information Sciences Institute Argonne National Laboratory University of Wisconsin Madison Lawrence Berkeley National Laboratory Fermi National Laboratory

Higher-level, policy-driven placement of data End-to-end provisioning of data resources to carry out

placement decisions

Page 31: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Layered Architecture

Page 32: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Higher-Level Data Placement Services Decide where to place objects and replicas in the distributed Grid

environment Policy-driven, based on needs of application

Effectively creates a placement workflow that is passed to the Reliable Distribution Service Layer for execution

Push- or pull-based service that places explicit list of data items

Metadata-based placement Decide where data objects are placed based on results of metadata

queries for data with certain attributes

N-Copies: maintain N copies of data items Placement service checks existing replicas, creates/delete replicas to

maintain N copies

Publication/Subscription Allows sites or clients to subscribe to topics of interest Data objects are placed as indicated by subscriptions

Page 33: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Reliable Distribution Layer

Responsible for carrying out the distribution or placement “plan” generated by higher-level service

Extend functionality of reliable file transfer services

Needs to provide feedback to higher level placement services on the outcome of the placement workflow

Call on lower-level services to coordinate (e.g., GridFTP data transport service)

Page 34: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

OGSA-DAI in a nutshell An extensible framework for data access and integration Expose heterogeneous data resources to a grid through web

services Interact with data resources

Queries and updates Data transformation / compression Data delivery Application-specific functionality Supports relational, XML and text and binary files Supports various delivery options and transforms Supports secure conversation message-level security using

X509 certificates A base for higher-level services

Federation, mining, visualisation,…

Page 35: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

OGSA-DAI motivation Entering an age of data

Data Explosion CERN LHC will generate 1GB/s =

10PB/y Pixar generate 100 TB/movie

Storage getting cheaper Data stored in many different ways

Relational databases XML databases Text and binary files

Need ways to facilitate Data discovery Data access Data integration

Empower e-Business and e-Science The grid is a vehicle for achieving this

Page 36: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Data services

Data Resource Accessor

Relational

XMLDBData

Resource Accessor

Data Resource Accessor

Data Service

Resource

Files

Data Service

Resource

Data Service

Resource

SQLOne

XMLOne

FilesOne

Data

Service

Page 37: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Architecture Ideas from LCG

CAstor d-Cache DPM MOPS(3 T1s) (9 T1s & T2s) (T2s only) (T2s only)

GridFTP is the underlying transfer mechanism

RFT(OSG) FTS (EGEE) [SRM-cp, Unicore, Oracle, IBM]Tape Reliability & retries,

Single point of control for VOs & bulk transfers

Advanced client tools – RLS, DRS, RFT(Policies)

Phedex Don Quixote FTD DiracExperimental (CMS) (Atlas) (Alice) (LHCB)Tool Kits Subscribe to datasets. diy meta-scheduler

Pick best copy

SRM Interface (LCG Requirement) + POSIX I/O

Experimental framework – pure science codes

Proposed DMIS interface

Page 38: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Translation to TeraGrid ??

GPFS HPSS SRB xfer pNFS? GridFTP, pFTP GridFTP, ?? Mechanism single point of control for VOs

Advanced client tools – RLS, DRS, RFT(Policies)

Atmospheric Astronomy Medicine …ExperimentalTool Kits Subscribe to datasets. diy meta-scheduler

Pick best copy

Experimental framework – pure science codes

TGCP Interface | HPSS interface | (possible use of DMIS?)

Page 39: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

For More Information GridFTP

http://gridftp.org RLS

“Performance and Scalability of a Replica Location Service,” High Performance Distributed Computing Conference, 2004 http://www.isi.edu/~annc/papers/chervenakhpdc13.pdf

Documentation: http://www.globus.org/toolkit/docs/4.0/data/rls

DRS “Wide Area Data Replication for Scientific Collaborations,” Grid

Computing (Grid2005), http://www.isi.edu/~annc/papers/grid2005final.pdf

Documentation: http://www.globus.org/toolkit/docs/4.0/techpreview/datarep

Page 40: Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid data workshop Jan ’07, San Diego

Discussion ?(over dinner?)