hdfs 2015: past, present, and future€¦ · apache hadoop committer 130+ commits in 2015 working...

38
Copyright © 2015 NTT DATA Corporation 9/30/2015 NTT DATA Corporation Akira Ajisaka HDFS 2015: Past, Present, and Future Apache: Big Data Europe 2015

Upload: others

Post on 20-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

Copyright © 2015 NTT DATA Corporation

9/30/2015NTT DATA CorporationAkira Ajisaka

HDFS 2015: Past, Present, and Future

Apache: Big Data Europe 2015

Page 2: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

2Copyright © 2015 NTT DATA Corporation

Self introduction

Akira Ajisaka (NTT DATA)

Apache Hadoop Committer

130+ commits in 2015

Working on usability

80+ documentation patches

"Open-Source Professional Services" team

Has deployed and supported 10k+ nodes of Hadoop clusters overall for 7 years

Contributing to Apache Hadoop 6th in the world with NTT [1]

[1] The Activities of Apache Hadoop Community 2014 http://ajisakaa.blogspot.com/2015/02/the-activities-of-apache-hadoop.html

Page 3: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

3Copyright © 2015 NTT DATA Corporation

About

Similar to "YARN 2015" presentation by @tshooter

HDFS is developed faster than YARN

Need a summary of HDFS new features

0

200

400

600

800

1000

1200

1400

1-Jan-15 1-Feb-15 1-Mar-15 1-Apr-15 1-May-15 1-Jun-15 1-Jul-15 1-Aug-15 1-Sep-15

Resolved issues in 2015 (cumulative)

HDFS YARN

Page 4: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

4Copyright © 2015 NTT DATA Corporation

Agenda

Past

Present

Future

Page 5: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

Copyright © 2015 NTT DATA Corporation 5

Past

Page 6: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

6Copyright © 2015 NTT DATA Corporation

2.X is the release branch

1.X and 0.23.X are no longer maintained

Past releases

20142010 2011 201320122009

branch-2

2.2.0 (GA)

2.3.0

2.4.02.0.0-alpha

2.1.0-beta

branch-1

(branch-0.20)

1.0.0 1.1.0 1.2.1(stable)0.20.1 0.20.205

0.22.00.21.0

New append

Security

0.23.0

0.23.11(final)

NameNode Federation, YARN

NameNode HA

2015

2.5.0

2.6.0

2.7.0

trunk

Page 7: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

7Copyright © 2015 NTT DATA Corporation

Hadoop 2.2 (2013-10-13)

NameNode High-Availability

No Single Point of Failure

Federation

Multiple NameNodes, multiple namespaces

Improve scalability

Snapshots

Read only point-in-time copy (Copy on Write)

NFSv3 mount

Page 8: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

8Copyright © 2015 NTT DATA Corporation

DataNode

Hadoop 2.3 (2014-02-20)

Heterogeneous Storages (Phase 1)

In-memory caching

Introduce memory-locality

Make efficient use of memory in DNs

DFSClient NameNode1. Ask NN to cache a file

DISK Memory

File

Page 9: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

9Copyright © 2015 NTT DATA Corporation

DataNode

Hadoop 2.3 (2014-02-20)

Heterogeneous Storages (Phase 1)

In-memory caching

Introduce memory-locality

Make efficient use of memory in DNs

DFSClient NameNode

DISK Memory

File2. Ask DN to cache blocks

File

Page 10: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

10Copyright © 2015 NTT DATA Corporation

DataNode

Hadoop 2.3 (2014-02-20)

Heterogeneous Storages (Phase 1)

In-memory caching

Introduce memory-locality

Make efficient use of memory in DNs

DFSClient

DISK Memory

File File

If cached locally,

read directly from memory and

skip checksum calculation

Page 11: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

11Copyright © 2015 NTT DATA Corporation

Hadoop 2.4 (2014-04-07)

Rolling Upgrades

No need to wait for hours

ACLs

More fine-grained permissions

Similar to POSIX ACL

-rw-rw-r-- 3 tester hadoop 129 2015-09-15 12:00 /user/tester/test.txt

$ hdfs dfs -setfacl -m group:hive:rw- /user/tester/test.txt

gives write permission to hive group

Page 12: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

12Copyright © 2015 NTT DATA Corporation

Hadoop 2.5 (2014-08-11)

Extended Attributes (XAttrs)

Similar to extended attributes in Linux

Currently used by transparent encryption

-rw-r--r-- 3 tester hadoop 129 2015-09-15 12:00 /user/tester/test.txt

Set XAttrs

$ hdfs dfs -setfattr -n user.locale -v jp /user/tester/test.txt

$ hdfs dfs -setfattr -n user.city -v tokyo /user/tester/test.txt

Get XAttrs

$ hdfs dfs -getfattr -d /user/tester/test.txt

# file: /user/tester/test.txt

user.locale="jp"

user.city="tokyo"

Page 13: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

13Copyright © 2015 NTT DATA Corporation

Hadoop 2.6 (2014-11-18)

Hot swap volumes

Recover from disk failures w/o stopping DNs

Integrate Apache HTrace (incubating)

Trace RPCs inside HDFS

Finding bottlenecks becomes easier

Time

Span A trace id: 12345

parent: rootnode 1

Span B trace id: 12345

parent: Anode 2

Span C Span Dnode 3

RPC

RPC RPC

Easy to find

parent-child

relations

Page 14: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

14Copyright © 2015 NTT DATA Corporation

Hadoop 2.6 (2014-11-18) (Cont.d)

Heterogeneous Storages (Phase 2)

Archival Storage

Memory as storage tier

Transparent Encryption

Page 15: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

15Copyright © 2015 NTT DATA Corporation

Heterogeneous Storages

Problem

SSD is getting cheaper

Want to store hot data in SSD to achieve higher throughput

Solution: Introduce storage type and block placement policy

Storage: HDD, SSD, ARCHIVE, ...

Policy: One_SSD, HOT, WARM, COLD, ...

Example: A -> One_SSD, B -> HOT

DN1

SSD DISK

DISK DISK

A

B

DN2

SSD DISK

DISK DISKA

B DN3

SSD DISK

DISK DISK

A B

Hadoop 2.6

Page 16: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

16Copyright © 2015 NTT DATA Corporation

How to use

Configure HDFS to recognize storage type for each disk

Set block placement policy to HDFS path

Reset policy after putting data is possible

Mover will move blocks to satisfy the policy considering rack awareness

Hadoop 2.6

Heterogeneous Storages

<parameter>

<name>dfs.datanode.data.dir</name>

<value>[SSD]file:///data/ssd,[HDD]file:///data/hdd</value>

</parameter>

$ hdfs setstoragepolicies -setStoragePolicy -path <path> -policy <policy>

Page 17: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

17Copyright © 2015 NTT DATA Corporation

Archival Storage

DISK or ARCHIVE?

ARCHIVE is for cold data

eBay reduces cost/GB by 5x [1]

Use low-spec DNs for ARCHIVE

No need to split cluster![1] Reduce Storage Costs by 5x Using The New HDFS Tierd Storage Feature http://www.slideshare.net/Hadoop_Summit/reduce-storage-costs-by-5x-using-the-new-hdfs-tiered-storage-feature

Regular Node Archival Node

Drives 12 HDDs 60 HDDs

CPU 32 Cores 4 Cores

Memory 128GB 64GB

Run NodeManager Yes No

Hadoop 2.6

Page 18: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

18Copyright © 2015 NTT DATA Corporation

Transparent Encryption

Problem

Cannot guard data from OS-level attacks

Solution

Provide end-to-end encryption

Encrypt/decrypt data transparently

No need to rewrite user application

Hadoop 2.6

Client

DataNode

DataTransferProtocol

can be encrypted

DISK

Data

DataEncrypted data

NOT encrypted!

Page 19: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

19Copyright © 2015 NTT DATA Corporation

Transparent Encryption: How to encrypt data

DEK (Data Encryption Key)

A unique key for each file in EZ (Encryption Zone)

Stored in an Xattr of the file, encrypted (EDEK)

Client NameNode

Key

Management

Server

1. Create file in EZ

2. Get EDEK

3. Store EDEK in metadata

EDEK

• Proxy to underlying key provider

• ACLs on per key basis

• Bundled with Hadoop package

Hadoop 2.6

Page 20: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

20Copyright © 2015 NTT DATA Corporation

Transparent Encryption: How to encrypt data

DEK (Data Encryption Key)

A unique key for each file in EZ (Encryption Zone)

Stored in an Xattr of the file, encrypted (EDEK)

Client NameNode

Key

Management

Server

4. EDEK returned EDEK

5. Call to decrypt EDEK to DEK

EDEK

Hadoop 2.6

Page 21: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

21Copyright © 2015 NTT DATA Corporation

Transparent Encryption: How to encrypt data

DEK (Data Encryption Key)

A unique key for each file in EZ (Encryption Zone)

Stored in an Xattr of the file, encrypted (EDEK)

Client NameNode

Key

Management

Server

EDEKDEK

DataNode

6. Write encrypted data to DN using DEK

Hadoop 2.6

Encrypted data

Encrypted data

Page 22: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

22Copyright © 2015 NTT DATA Corporation

Transparent Encryption: Very low overhead

Very low overhead

Simple benchmark with 3 slaves (m3.xlarge, 4 core Xeon E5-2670 v2)

Use AES-NI

Known issue

Encryption is sometimes done incorrectly (HADOOP-11343)

Recommend 2.7.1 or 2.6.1

Hadoop 2.6

Encryption Off Encryption On

1GB Teragen 17 sec 18 sec

1GB Terasort 47 sec 49 sec

Page 23: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

Copyright © 2015 NTT DATA Corporation 23

Present

Page 24: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

24Copyright © 2015 NTT DATA Corporation

Hadoop 2.7 (2014-11-18)

Quota per storage type

Truncate API

Files with variable-length blocks

Web UI for NFS gateway

NNTop: top-like tool for NameNode

List top users for each operation

Exposed via metric

fsck -blockId option

Print the file which the blockId belongs to

Inotify

Page 25: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

25Copyright © 2015 NTT DATA Corporation

INotify for HDFS

Problem

Some components do caching

Hive caches path names

Impala caches block locations

When to invalidate cache?

Solution

Introduce a tool similar to Linux inotify

Client can monitor the events without parsing NN log or edits

Hadoop 2.7

Page 26: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

26Copyright © 2015 NTT DATA Corporation

INotify for HDFS: Technical Approach

Client polls NameNode periodically

Not push model

Known issue

Truncate is not notified (HDFS-8742)

Fixed in 2.8.0

Client NameNode

1. Poll any events after #XX

2. Return events after #XX

Caches the highest

event number

Hadoop 2.7

Page 27: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

Copyright © 2015 NTT DATA Corporation 27

Future

Page 28: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

28Copyright © 2015 NTT DATA Corporation

Many features are being developed

2.8 (not released)

Support OAuth2 in WebHDFS

RPC Congestion control

Feature branches

Erasure Coding (HDFS-7285)

Ozone: Object store (HDFS-7240)

BlockManager Scalability Improvements (HDFS-7836)

HTTP/2 support for DataTransferProtocol(HDFS-7966)

Implement an async pure c++ HDFS client (HDFS-8707)

Page 29: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

29Copyright © 2015 NTT DATA Corporation

RPC Congestion Control

Problem

NameNode RPC queue is FIFO

DDoS can kill entire cluster

Solution

Fair scheduling for RPC queue (2.6.0)

Retriable exception with exponential backoff(2.8.0)

Enable by default in 2.8

while (true) {

dfs.exists("/data");

}Don't do this!

Hadoop 2.8

Page 30: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

30Copyright © 2015 NTT DATA Corporation

Erasure Coding

Problem

Reduce costs of storage

Blocks are replicated to 3 DNs

3x storage overhead is costly

Solution

Use Erasure Code

3-replication (6,3)-Reed-Solomon

Tolerates 2 failures 3 failures

Disk Usage 3x 1.5x

Page 31: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

31Copyright © 2015 NTT DATA Corporation

Erasure Coding: Write files using (6,3)-Reed-Solomon

Write data to 9 DNs in parallel

DN1

DN6

DN7

DN9

・・・・・・

Incoming Data

・・・

ECClient

・・・

3 Parity Blocks

6 Data Blocks

Page 32: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

32Copyright © 2015 NTT DATA Corporation

Erasure Coding: Read files

Read data from 6 DNs in parallel

DN1

DN6

DN7

DN9

・・・・・・

ECClient

・・・

Page 33: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

33Copyright © 2015 NTT DATA Corporation

Erasure Coding: Read files when DN fails

Read data from (arbitrary) 6 DNs in parallel

DN1

DN6

DN7

DN9

・・・・・・

ECClient

・・・

×

Page 34: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

34Copyright © 2015 NTT DATA Corporation

Erasure Coding: Current status

Suitable for cold data

No data locality

Very low cost/GB with archival storage

Now preparing for merge

Follow on work

Intel ISA-L support for faster encoding

Support append/truncate/hflush/hsync

More encoding schemas

Pipeline error handling

Support contiguous layout (HDFS EC Phase 2)

Page 35: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

35Copyright © 2015 NTT DATA Corporation

Summary

Many features are still in development

I cannot predict when the feature will be available

Recommend anyone who wants a feature to join contributing to it to make the development faster

There are many ways to contribute

Creating/Testing/Reviewing patches

Reporting bugs

Writing documents

Discussing architecture design

https://wiki.apache.org/hadoop/HowToContribute

Page 36: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

Copyright © 2011 NTT DATA Corporation

Copyright © 2015 NTT DATA Corporation

Page 37: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

37Copyright © 2015 NTT DATA Corporation

References

Apache Hadoop Docs: http://hadoop.apache.org/docs/current/

In-memory caching (HDFS-4949)

In-memory Caching in HDFS: Lower Latency, Same Grate Taste: http://www.slideshare.net/Hadoop_Summit/inmemory-caching-in-hdfs-lower-latency-same-great-taste-33921794

Heterogeneous Storages (HDFS-5682)

Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature: http://www.slideshare.net/Hadoop_Summit/reduce-storage-costs-by-5x-using-the-new-hdfs-tiered-storage-feature

Transparent Encryption (HDFS-6134)

Transparent Encryption in HDFS: http://www.slideshare.net/Hadoop_Summit/transparent-encryption-in-hdfs

INotify (HDFS-6634)

Keep Me in the Loop: Introducing HDFS Inotify: http://www.slideshare.net/Hadoop_Summit/keep-me-in-the-loop-inotify-in-hdfs

Page 38: HDFS 2015: Past, Present, and Future€¦ · Apache Hadoop Committer 130+ commits in 2015 Working on usability 80+ documentation patches "Open-Source Professional Services" team Has

38Copyright © 2015 NTT DATA Corporation

References

RPC congestion control (HADOOP-9640, HADOOP-10597, HDFS-8820)

Improving HDFS Availability with Hadoop RPC Quality of Service: http://www.slideshare.net/MingMa4/hadoop-rpcqoshadoopsummit2015

Erasure Coding (HDFS-7285)

HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency: http://www.slideshare.net/Hadoop_Summit/hdfs-erasure-code-storage-same-reliability-at-better-storage-efficiency