bringing cloud technology to distributed data infrastructures egi cf 2013 martin hellmich...

Download Bringing cloud technology to distributed data infrastructures EGI CF 2013 Martin Hellmich (presenter) Jedrzej Rybicki Maciej Brzeźniak Date :

If you can't read please download the document

Upload: christina-marylou-campbell

Post on 18-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

Three Projects Cloud storage integration –iRODS managing an OpenStack Swift backend –Extending DPM with S3 storage In-storage processing –Call Hadoop jobs from iRODS 3

TRANSCRIPT

Bringing cloud technology to distributed data infrastructures EGI CF 2013 Martin Hellmich (presenter) Jedrzej Rybicki Maciej Brzeniak Date : A bit of context 2 Towards a pan-European Collaborative Data Infrastructure Production Services Safe Replication Data Staging Metadata AAI Research & Development Scalable Federation Architectures Data Preservation Data Access and Transfer Workflows Three Projects Cloud storage integration iRODS managing an OpenStack Swift backend Extending DPM with S3 storage In-storage processing Call Hadoop jobs from iRODS 3 My Goal Show the projects Find interest in the communities (we are interdisciplinary) Start discussion about cloud integration Backend or frontend? Outsource or restructure? Where are limitations? 4 The Cloud Integration Projects iRODS-OpenStack Expose existing S3/OpenStack storage (managed otherwise) iRODS frontend protocols Local storage as cache 5 DPM-S3 Add new storage to DPM Expose HTTP only (but grid-aware, X509, VOMS) Outsource storage and network traffic iRODS-OpenStack Swift Maciej Brzezniak Date : Sidestep: iRODS compound resources 7 iRODS resources: Cache Archive Virtual iRODS compound resources: Virtual resource Maps from PUT/GET to POSIX Provides a cache iRODS managing an S3 backend Ingredients: iRODS server S3 Driver (in C) iRODS-S3 Driver Glue Swift-to-S3 frontend 8 iRODS Site Disks OpenStack Swift/S3 Achievements Transparent cloud storage Cloud auth through central accounts Low Overhead through iRODS Speedups with caching Limitations: Filesize limit (2/5GB) Issue moving files inside the cloud 9 iRODS Site Disks S3/OpenStack DPM-S3 Martin Hellmich Date : DPM now uses dmlite 11 S3 Sidestep: the S3 protocol HTTP + custom headers Access ID + Secret Key + HTTP Cmd + Time => Signature Can be: Header: Authorization: AWS WSAccessKeyId:Signature In URL: ?AWSAccessKeyId=AKIAIOSFODNN7EXAMPLE&Signature= NpgCjnDzr%2BWFzoENXmpNDUsSn8%3D&Expires= Extending DPM with S3 Storage 13 Site Disks S3 Signed URL redirect Ingredients: dmlite dmlite-plugins-s3 Amazon S3 OpenStack Swift S3 frontend Ceph/RadosGW Achievements Only nameserver traffic local Cloud storage managed with central account Grid-enabled HTTP Standard HTTP clients Filesize limit (or S3 client) 14 Site Disks S3 Signed URL redirect In-Storage Processing Jedrzej Rybicki & Benedikt von St. Vieth Date : Motivation Example HPC workflow: 16 Site High Performance Computing Storage preprocessing Site High Performance Computing Storage + preprocessing Sidestep: iRODS rules 17 Condition: $objPath like /x/y/z/* Or $rescName == demoResc8 Rule: printHello { print_hello; } Act freely on certain triggers At least C and Python Benedikt von St. Vieth & Jedrzej Rybicki 18 In-Storage Processing Achievements 19 Everything is a file Easy job specification in Apache Pig Caching of results Predefined scripts or custom jobs? Summary 20 There are different ways to integrate cloud storage for different scenarios Storage-based computing can be made transparent Thank you! OpenStack/iRODS Maciej Brzezniak (PSNC) DPM-S3 Martin Hellmich (CERN) In-storage processing on iRODS Jedrzej Rybicki / Benedikt von St. Vieth (JSC) 21 Projects contacts Any Questions?