setting high availability in hadoop cluster

Post on 16-Apr-2017

1.760 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

www.edureka.co/hadoop-admin

Setting High Availability in Hadoop Cluster

www.edureka.co/hadoop-admin

What will you learn today?

Hadoop: A synonym for Big Data

Hadoop High Availability

Hands-On: Achieving NameNode and YARN high availability

Hands-On: Securing HDFS through ACL

Hadoop as a Data Warehouse

www.edureka.co/hadoop-admin

What is Hadoop?

Apache Hadoop is an open source, scalable and reliable solution that stores and allows distributed processing of large data sets across clusters of computers using simple programming model

www.edureka.co/hadoop-admin

A closer look at Apache Hadoop

Apache Hadoop includes following modules :

Hadoop Distributed File System (HDFS): A distributed file system

Hadoop Common: The common utilities that support the other Hadoop modules

Hadoop YARN: A framework for job scheduling and cluster resource management

Hadoop MapReduce: A YARN-based system for parallel processing of large data sets

www.edureka.co/hadoop-admin

High Availability

www.edureka.co/hadoop-admin

Maintaining High Availability

In Distributed Computing, failure is a norm, which means YARN should have acceptable amount of availability

NameNode - No Horizontal Scale NameNode - No High Availability

DataNode

DataNode

DataNode

….

Client get Block Locations

Read Data

NameNodeNS

Block Management

www.edureka.co/hadoop-admin

NameNode: Single Point of Failure

SecondaryNameNode

NameNode

Secondary NameNode:

"Not a hot standby" for the NameNode

Connects to NameNode every hour*

Housekeeping, backup of NemeNode metadata

Saved metadata can build a failed NameNode

metadata

metadata

Single PointFailure

You give me metadata

every hour, I will make it

secure

www.edureka.co/hadoop-admin

Hadoop 2.0 Cluster Architecture: High Availability

Node Manager

HDFS

YARN

Resource Manager

Shared edit logs

All name space edits logged to shared NFS storage; single writer

(fencing)

Read edit logs and applies to its own namespace

Secondary Name Node

DataNode

Standby NameNode

Active NameNode

ContainerApp

Master

Node Manager

DataNode

ContainerApp

Master

Data Node

Client

DataNode

ContainerApp

Master

Node Manager

DataNode

ContainerApp

Master

Node Manager

NameNode High Availability

Next Generation MapReduce

http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html

HDFS HIGH AVAILABILITY

www.edureka.co/hadoop-admin

NN ActiveNN

Standby

DN 1 DN 2 DN n

Shared storage

Failover ControllerActive

ZK ZK ZK

Failover Controller Standby

Heartbeat Heartbeat

Monitors NN’s Health

Monitors NN’s Health

Block Reports to Active and standby NN: Update cmds from one

Sharead NN state with single writer(fencing)

HDFS

Cmds

www.edureka.co/hadoop-admin

ZooKeeperRMState

ZooKeeperRMState

ZKFC

Resource ManagerActive

ZKFC

Resource ManagerPassive

1. Active Node stores all state in ZKStore

2. Failure 4. Failover

3. Standby Nodebecome active

3. ZKFC Detects failure

www.edureka.co/hadoop-admin

Monitor liveness &

heath

zookeeper

Journal Node

zookeeper

zookeeper

Journal Node

Journal Node

ZookeeperFC

NameNode

StandbyNameNode

Active

DataNode DataNode DataNode

ZookeeperFC

Zookeeper Service

Shared Edits

Monitor and maintain

active lockMonitor and try to take active lock

Monitor liveness &

heath

ReadWrite

www.edureka.co/hadoop-admin

Hands-OnAchieving HDFS and YARN High Availability

www.edureka.co/hadoop-admin

Hands-OnSecuring HDFS through ACL

www.edureka.co/hadoop-admin

What to do with Big Data?

www.edureka.co/hadoop-admin

Hadoop: The Perfect Data Warehouse

Free TextImages/Videos

HCatalog

HiveSQL Others …ImpalaSQL

Tableau CognosQlikView

LogsTransaction Sensors

Pentaho

HDFS Files

Metadata

Query Engines

BI Tools

www.edureka.co/hadoop-admin

What a Data Warehouse is good at?

Among others, a data warehouse is the foundation for a successful business intelligence program

The Data Warehouse Institute

www.tdwi.org

www.edureka.co/hadoop-admin

Thank You …

Questions/Queries/Feedback

Recording and presentation will be made available to you within 24 hours

top related