hops - distributed metadata for hadoop

Post on 16-Apr-2017

105 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Hops – Distributed MetaData for Hadoop

Jim Dowling Associate Prof @ KTH

Senior Researcher @ SICSCEO @ Hops AB

BDOOP Meetup, Hadoop Summit, Dublin, 12th April 2016

www.hops.io @hopshadoop

MetaData Services in Hadoop

2

3

Metadata Totem Poles in Hadoop

Eventual Consistency

4

With Many Hadoop Clusters

Cluster 1 Cluster N

MetaDataService

MetaDataService

MetaData Service (Aggregator)

MetaData consistency protocols have O(N) operational complexity.

Case Study: Access Control as a MetaData Service

5

6

Access Control in Relational Databases# Multi-tenancy for alice and bob on db1 and db2

grant all privileges on db1.* to ‘alice'@‘%‘;grant all privileges on db2.* to ‘bob'@‘%‘;

#More fine-grained privilegesgrant SELECT privileges on db2.sensitiveTable to ‘alice'@‘192.168.1.2‘;

Databases ensure the consistency of security and policies using foreign keys.

“drop table db2.sensitiveTable” => delete associated privileges

7

Access Control in Hadoop: Apache Sentry

How do you ensure the consistency of the policies and the data?

[Mujumdar’15]

8

Policy Editor for Sentry

Administrators administer privileges for users

9

Talk Overview• Our Story of Distributed Metadata for Hadoop

• Metadata at work in Hops: Multi-tenancy

• Metadata at work in Hops: HopsWorks

Bill Gates’ biggest product regret?*

Windows Future Storage (WinFS*)

*http://www.zdnet.com/article/bill-gates-biggest-microsoft-product-regret-winfs/

12

HDFS v2

DataNodes

HDFS Client

Journal Nodes Zookeeper

SnapshotNode

ActiveNameNode

StandbyNameNode

Asynchronous Replication of NN LogAgreement on the Active NameNodeFaster Recovery - Cut the NN Log

13

Max Pause times for NameNode Heap Sizes*

Max Pause-Times (ms)

100

1000

10000

10

JVM Heap Size (GB)

25 50 75 100

Unopt

imize

d

Optimized

*OpenJDK or Oracle JVM

14

NameNode and Decreasing Memory Costs

Size (GB)

250

500

1000

Year

2015 2016 2017 2018

Projected Max NameNode JVM Heap Size

2019

0

750

Size of RAM in a COTS $7,000 Rack Server

15

Externalizing the NameNode State• Problem:NameNode not scaling up with lower RAM prices

• Solution:Move the metadata off the JVM Heap

• Move it where?An in-memory storage system that can be efficiently queried and managed. Preferably Open-Source.

• MySQL Cluster (NDB)

16

HopsFS Architecture

NameNodes

NDB

Leader

HDFS Client

HopsFS Client

Load Balancer

DataNodes

17

Pluggable DBs: Data Abstraction Layer (DAL)

NameNode(Apache v2)

DAL API(Apache v2)

NDB-DAL-Impl(GPL v2)

Other DB(Other License)

hops-2.5.0.jar dal-ndb-2.5.0-7.4.7.jar

The Global Lock in the NameNode

18

Apache NameNode InternalsClient: mkdir, getblocklocations, createFile,…..

NameNode

Journal Nodes

Client

Reader1 ReaderN…

Handler1 HandlerM

ConnectionList

Call Queue

Meta Data & In-Memory EditLogFSNameSystem Lock

EditLog Buffer

EditLog1 EditLog2 EditLog3

Listener(Nio Thread)

Responder(Nio Thread)

dfs.namenode.service.handlercount (default 10)

ipc.server.read.threadpool.size (default 1)

Handler1 HandlerM… Done RPCs

ackIdsflush

HopsFS NameNode InternalsClient: mkdir, getblocklocations, createFile,…..

NameNode

NDB

Client

Reader1 ReaderN…

Handler1 HandlerM

ConnectionList

Call Queue

inodes block_infos replicas

Listener(Nio Thread)

Responder(Nio Thread)

dfs.namenode.service.handlercount (default 10)

ipc.server.read.threadpool.size (default 1)

Handler1 HandlerM…

Done RPCs

ackIds

leases…

DAL-ImplDAL API

HARD PART

21

Consistency: Transactions & Implicit Locking

• Serializabile FS ops using implicit locking of subtrees.

[Hakimzadeh, Peiro, Dowling, ”Scaling HDFS with a Strongly Consistent Relational Model for Metadata”, DAIS 2014]

22

Preventing Deadlock and Starvation

• Acquire FS locks in agreed order using FS Hierarchy. • Block-level operations follow the same agreed order.• No cycles => Freedom from deadlock• Pessimistic Concurrency Control ensures progress

/user/jim/myFilemv

readblock_report

Client DataNodeNameNode

Client

Per Transaction Cache• Reusing the HDFS codebase resulted in too many roundtrips to the database per transaction.

• Cache intermediate transaction results at NameNodes.

24

Sometimes, Transactions Just ain’t Enough• Large Subtree Operations (delete, mv, set-quota) can’t always be executed in a single Transaction.

• 4-phase Protocol• Isolation and Consistency• Aggressive batching• Transparent failure handling• Failed ops retried on new NN.• Lease timeout for failed clients.

Leader Election using NDB• Leader to coordinate replication/lease management• NDB as shared memory for Leader Election of NN.

• No more Zookeeper, yay!25[Niazi, Berthou, Ismail, Dowling, ”Leader Election in a NewSQL Database”, DAIS 2015]

Path Component Caching• Path of length N needs O(N) round-trips to resolve• With our cache, O(1) round-trip for a cache hit

/user/jim/myFile

NDB

getInode(0, “user”) getInode

(1, “jim”) getInode(2, “myFile”)

NameNode

/user/jim/myFile

NDB

validateInodes([(0, “user”), (1,”jim”),(2,”myFile”)])

NameNode

CachegetInodes(“/user/jim/myFile”)

Scalable Blocking Reporting• On 100PB+ clusters, internal maintenance protocol traffic makes up much of the network traffic

• Block Reporting - Leader Load Balances- Work-steal when exiting

safe-mode

SafeBlocks

DataNodes

NameNodes

NDB

Leader

Blocks

work steal

HopsFS Performance

28

29

HopsFS Metadata Scaleout

Assuming 256MB Block Size, 100 GB JVM Heap for Apache Hadoop

30

HopsFS Throughput (Spotify Workload)

Experiments performed on AWS EC2 with enhanced networking and C3.8xLarge instances

Hops-YARN

31

32

YARN Architecture

NodeManagers

YARN Client

Zookeeper Nodes

ResourceMgr StandbyResourceMgr

1. Master-Slave Replication of RM State2. Agreement on the Active ResourceMgr

33

NDB

ResourceManager– Monolithic but Modular

ApplicationMasterService

ResourceTrackerService

Scheduler

ClientService

YARN Client

AdminService

Security

Cluster State

HopsResourceTracker

Cluster State

HopsScheduler

NodeManagerNodeManagerYARN Client App MasterApp Master

ResourceManager

34

Hops-YARN Architecture

ResourceMgrs

NDB

Scheduler

YARN Client

NodeManagers

Resource Trackers Leader Election forFailed Scheduler

What do we do with all this Metadata?

35

Hops MetaData Tree

36

HopsFSHopsYARN

NDB

ProjectsDataSets

Hops Users

ProvenanceSearch

HistoryServiceExt-Metadata

37

Problem: Need Cluster per Sensitive DataSet

NSA DataSet

User DataSet

has access to

has access to

Alice can copy/cross-link between data sets

Alice has only one Kerberos Identity. Dynamic Roles not supported in Hadoop.

Alice

38

Solution: Project-Specific UserIDs

Project NSA

Project UsersMember of

NSA__Alice

Users__Alice

Member of

HDFS enforcesaccess control

39

Sharing DataSets with HopsWorks

Project NSA

Project UsersMember of

DataSetowns

Add members of Project NSA to the DataSet group

NSA__Alice

Users__Alice

Member of

HopsWorks enforces Dynamic Roles

40

Alice@gmail.com

NSA__Alice

Authenticate

Users__Alice

HopsWorks

HopsFS

HopsYARN

Projects

SecureImpersonation

41

User• Authentication Provider

- JDBC Realm- 2-Factor Authentication- LDAP

42

Project• Members

- Roles: Owner, Data Scientist

• DataSets - Home project- Can be shared

43

Project Roles• Data Owner Privileges

- Import/Export data- Manage Membership- Share DataSets

• Data Scientist Privileges- Write code- Run code- Request access to DataSets

We delegate administration of privileges to users

45

Sharing DataSets between Projects

The same as Sharing Folders in Dropbox

46

Delegate Access Control to HDFS• HDFS enforces access control- UserID per Project- GroupID per

Project and DataSet

• Metadata Integrity using Foreign Keys- Removing a project removes

all users, groups, and (optionally) DataSets.

47

How ACME Inc. handles Free-Text Search

HDFS

In Theory

Unified Search and Update API

In Practice

Inconsistent Metadata

48

Free Text Search with Consistent Metadata

Free-Text Search

Distributed DatabaseElasticSearch

The Distributed Database is the Single Source of Truth.Foreign keys ensure the integrity of Metadata.

MetaDataDesigner

MetaDataEntry

49

Global Search: Projects and DataSets

50

Project Search: Files, Directories

51

Design your own Extended Metadata

Analytics in HopsWorks

52

53

Batch Job Analytics

Interactive Analytics: Zeppelin

Other Features• Audit Logs

• Erasure Coding Replication

• Online upgrade of Hops (and NDB)

• Automated Installation with Karamel

• Tinker friendly – easy to extend metadata!

55

56

Conclusions• Hops is a next-generation distribution of Hadoop.

• HopsWorks is a frontend to Hops that supports true multi-tenancy, free-text search, interactive analytics with Zeppelin/Flink/Spark, and batch jobs.

• Looking for contributors/committers- Pick-me-ups on GitHub

www.hops.io

The TeamActive: Jim Dowling, Seif Haridi, Tor Björn Minde,

Gautier Berthou, Salman Niazi, Mahmoud Ismail,Kamal Hakimzadeh, Ermias Gebremeskel, Theofilos Kakantousis, Johan Svedlund Nordström, Someya Sayeh, Vasileios Giannokostas, Antonios Kouzoupis, Misganu Dessalegn, Ahmad Al-Shishtawy, Ali Gholami.

Alumni: K. “Sri” Srijeyanthan, Steffen Grohsschmiedt, Alberto Lorente, Andre Moré, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Jude D’Souza, Qi Qi, Gayana Chandrasekara,Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos,Peter Buechler, Pushparaj Motamari, Hamid Afzali,Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu.

Hops

Hops[Hadoop For Humans]

Join us!http://github.com/hopshadoop

top related