hdfs and s3 plugins andrea manzi martin hellmich 13/12/2013

15
IT-SDC : Support for Distributed Computing HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013

Upload: marianna-barefoot

Post on 14-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013

IT-SDC : Support for Distributed Computing

HDFS and S3 plugins

Andrea Manzi Martin Hellmich

13/12/2013

Page 2: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013

DPM Workshop 2IT-SDC

Plugins functionalities

13/12/2013

NFS HTTP/DAV XROOT GridFTP RFIO

Namespace Management Pool Management Pool Driver I/O

Legacy DPM Legacy DPM Legacy DPM Legacy DPM

MySQL MySQL HDFS HDFS

Oracle Oracle S3

HDFS

Memcache

Page 3: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013

DPM Workshop 3IT-SDC

HDFS plugin

dmlite plugin implementing I/O, pool driver and namespace functionalities through Apache Hadoop HDFS ensuring: Automatic data replication Fault tolerance to client’s read

Dead of Datanode and Namenode Scalability

13/12/2013

Page 4: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013

DPM Workshop 4IT-SDC

Deployment with Lcgdm-dav

13/12/2013

DPM Head Node Lcgdm-dav + dmliteHDFS-plugin

HDFS Namenode

HDFS Datanode(s)Lcgdm-dav + dmliteHDFS-plugin

Page 5: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013

DPM Workshop 5IT-SDC

Some details

13/12/2013

HDFS C APIs (libhdfs) do not implement functions to retrieve the available datanodes ( LIVE nodes) Patch implemented and submitted to Hadoop hadoop-libhdfs rpm from our repo

First version for Puppet installation is available. To be adapted to recent dav/dmlite module

changes

Page 6: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013

DPM Workshop 6IT-SDC

On-going issues

13/12/2013

Tested with new dmlite-based GridFTP plugin Same deployment model as http/dav

frontend or single node writing to HDFS But…HDFS does not support multiple

write streams / random writes: OSG developed in-memory stream reordering in

GridFTP in order to avoid this limitation ( gridftp-hdfs DSI available also in Globus toolkit)

To test and understand integration

Page 7: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013

DPM Workshop 7IT-SDC

On-going issues

13/12/2013

SRM frontend does not speak dmlite

SRM calls through old dpm daemons do not handle properly new pools (as HDFS)

Patch to dpm daemon to be implemented

Page 8: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013

DPM Workshop 8IT-SDC

Future steps

13/12/2013

Distribution: Need to understand how to distribute

the plugin HDFS client only in Fedora 20 and

Rawhide https://apps.fedoraproject.org/packages/libh

dfs

Support for security enabled HDFS clusters ( Kerberos)

Page 9: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013

DPM Workshop 9IT-SDC

Performances

13/12/2013

Tests through LCDM-DAV: HDFS Namespace

stat/s half performances compared to Mysql plugin namespace

To be optimized with Memcached in front ROOT analysis with massive Vector

I/O and TTreeCache Comparable performance with standard

disk pools

Page 10: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013

10IT-SDC

S3 plugin

13/12/2013DPM Workshop

Page 11: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013

11IT-SDC

Key Facts

Data directly to the cloud

HTTP/HTTPS only

DPM provides the namespace

13/12/2013DPM Workshop

3

2

1

Page 12: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013

12IT-SDC

Data in the Cloud

REDIRECTGET

GET

No data through DPM Inherits all capabilities

from S3 provider: Amazon: range-header, no

multi-range, multi-stream download only, no 3rd party copy, http access only

DATA

DPM Workshop 13/12/2013

Page 13: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013

13IT-SDC

How to install an S3 pool

yum install dmlite-plugins-s3

dmlite-shell> pooladd poolaws s3> poolmodify poolaws bucketsalt xFVlsrg> poolmodify poolaws s3accesskeyid <ID>> poolmodify poolaws s3secretaccesskey <SK>

<create an s3 bucket on your storage>

13/12/2013DPM Workshop

Page 15: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013

15IT-SDC

Thanks!

Questions?

DPM Workshop 13/12/2013