the data logistics toolkit martin swany professor, school of informatics and computing executive...

14
The Data Logistics Toolkit Martin Swany Professor, School of Informatics and Computing Executive Associate Director, Center for Research in Extreme Scale Computing (CREST) Indiana University

Upload: kevin-george

Post on 13-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

The Data Logistics Toolkit

Martin Swany

Professor, School of Informatics and ComputingExecutive Associate Director, Center for Research in Extreme

Scale Computing (CREST)Indiana University

The Data Logistics Toolkit

• Logistics - the management of the flow of resources from the point of origin to the point of consumption

• The DLT integrates local and distributed storage infrastructure, file transfer software, performance monitoring and tuning

• The DLT software distribution supports the creation of network- optimized data nodes

DLT Overview

• Set of packages with configuration scripts, etc.

• Allows the configuration of – DTN with GridFTP– IBP storage depot for content distribution– Phoebus WAN accelerator– On-ramp for Internet2 AL2S using XSP

• Includes Periscope/perfSONAR monitoring• Automatic network tuning

DTN with AL2S On-Ramp

• Working with the Globus team at U. Chicago and Argonne

• Leveraging our eXtensible Session Protocol (XSP) to create end-to-end, “sessions”– user-network interface (UNI)

• XSP daemon acts as network controller– signals AL2S/OESS, OSCARS, OpenFlow

• GridFTP XIO driver, updating to use the Globus Transfer Network Controller API

• Generic, transparent on-ramp to circuit networks like AL2S

WAN Acceleration

• A key reason the Science DMZ model “works” is the separation of lossy access networks from high-bandwidth, long-latency links

• Termination of TCP connections in “middleboxes” can increase throughput by reducing the RTT

• Protocol translation

• Storage in the network to buffer and burst

Distributed Storage for Content Distribution

• IBP provides a primitive, scalable, in-network storage service

• File-like abstractions can be built on top of this• Uses a data structure known as an exNode (like

a Unix inode) to track allocations• These basic building blocks can be used to build

various instances– Parallel filesystem– Distributed RAID-like storage– Content distribution network– Bittorrent-like peer to peer transfers

Architecture• Unified Network Information Service (UNIS)

– Descendant of perfSONAR Lookup and Topology Services– Network and service “graph”

• Intelligent Data Movement Service (IDMS)– Data dispatcher– Operates on UNIS data– Spawn storage services dynamically in GENI

• Periscope/perfSONAR– Monitoring for operational integrity and optimization, BLiPP

• Storage Services– IBP, prototype based on Ceph

• Other services– Data transfer (GridFTP), WAN acceleration

Earth Observation Depot Network (EODN) –An open, community specific content distribution

network for remote sensing data

Landsat data• Landsat 8 launched February 13th, 2013

• Covers the entire land surface of the Earth every 16 days – 8 day offset from Landsat 7

– ~700 scenes each day

• Each scene contains a GeoTIFF product: high-resolution sensor images

– ~1GB compressed, 2GB uncompressed

• Traditionally used for environmental monitoring and land use and land cover change studies

EODN

Client

EODN (DLT) WISC

IUNYSER

MIZZ

RealEarthUW-Madison

UNISDMS

discover / measure

(3) stage sensing data

(2) harvest

(6) Processing…

(7) WMS upload

(5) fast download

EODNHarvester

(1) subscribe

(4) publish

webGUI

Landsat Ground Network

Cisco Appliance Platform

• In collaboration with Internet2, Cisco and Fusion-io

• Cisco C220 server– 2x Intel® Xeon® E5-2680, 16 cores@4GHz, 64GB DDR3 RAM– Fusion-io ioDrive2 1.2 TB

• CentOS 6.4 Linux with DLT RPMs and tuning for data transfer throughput

12

Acknowledgements

• Staff Scientist Dr. Ezra Kissel leads the DLT development efforts, PI of the GENI IDMS effort

• CC-NIE integration project with U. Tennessee and Vanderbilt U.

• CC-NIE integration project with the Globus team at U. Chicago and Argonne Nat’l Lab

• EODN development with AmericaView, U. Wisconsin

Phoebus-SLaBS performanceGridFTP transfers over dedicated 10G path, increasing WAN latency, 4ms LAN RTT and .001%

edge loss