2013 nov 20 toronto hadoop user group (thug) - hadoop 2.2.0

© Hortonworks Inc. 2013. Confidential and Proprietary.

Hadoop 2.2.0 Hadoop grows up

Adam Muise


Rob Ford says…

…turn off your #*@!#%!!! Mobile Phones!


YARN Yet Another Resource Negotiator


A new abstraction layer

HADOOP 1.0

HDFS (redundant, reliable storage)

MapReduce (cluster resource management

& data processing)

HDFS2 (redundant, reliable storage)

YARN (cluster resource management)

MapReduce (data processing)

Others (data processing)

HADOOP 2.0

Single Use System Batch Apps

Multi Purpose Platform Batch, Interactive, Online, Streaming, …


Concepts

• Application – Application is a job submitted to the framework – Example – Map Reduce Job

• Container – Basic unit of allocation – Fine-grained resource allocation across multiple resource

types (memory, cpu, disk, network, gpu etc.) –  container_0 = 2GB, 1CPU

–  container_1 = 1GB, 6 CPU

– Replaces the fixed map/reduce slots

5


YARN Architecture

• Resource Manager – Global resource scheduler – Hierarchical queues

• Node Manager – Per-machine agent – Manages the life-cycle of container – Container resource monitoring

• Application Master – Per-application – Manages application scheduling and task execution – E.g. MapReduce Application Master

6

© Hortonworks Inc. 2012

NodeManager NodeManager NodeManager NodeManager

Container 1.1

Container 2.4



Container 1.2

Container 1.3

AM 1

Container 2.2

Container 2.1

Container 2.3

AM2

YARN Architecture - Walkthrough

Client2

ResourceManager

Scheduler



map 1.1

vertex1.2.2



map1.2

reduce1.1

Batch

vertex1.1.1

vertex1.1.2

vertex1.2.1

InteracFve SQL

YARN as OS for Data Lake ResourceManager

Scheduler

Real-‐Time

nimbus0

nimbus1

nimbus2


Multi-Tenant YARN ResourceManager

Scheduler root

Adhoc 10%

DW 60%

Mrkting 30%

Dev 10%

Reserved 20%

Prod 70%

Prod 80%

Dev 20%

P0 70%

P1 30%


Multi-Tenancy with New Capacity Scheduler

•  Queues •  Economics as queue-capacity

– Heirarchical Queues •  SLAs

– Preemption •  Resource Isolation

–  Linux: cgroups – MS Windows: Job Control – Roadmap: Virtualization (Xen, KVM)

•  Administration – Queue ACLs – Run-time re-configuration for queues – Charge-back

ResourceManager

Scheduler

root

Adhoc 10%

DW 70%

Mrkting 20%

Dev 10%

Reserved 20%

Prod 70%

Prod 80%

Dev 20%

P0 70%

P1 30%

Capacity Scheduler

Hierarchical Queues


MapReduce v2 Changes to MapReduce on YARN


MapReduce V2 is a library now… •  MapReduce runs on YARN like all other Hadoop 2.x applications

– Gone are the map and reduce slots, that’s up to containers in YARN now – Gone is the JobTracker, replaced by the YARN AppMaster library

•  Multiple versions of MapReduce –  The older mapred APIs work without modification or recompilation –  The newer mapreduce APIs may need to be recompiled

•  Still has one master server component: the Job History Server –  The Job History Server stores the execution of jobs – Used to audit prior execution of jobs – Will also be used by YARN framework to store charge backs at that level


Shuffle in MapReduce v2 •  Faster Shuffle

– Better embedded server: Netty •  Encrypted Shuffle

– Secure the shuffle phase as data moves across the cluster – Requires 2 way HTTPS, certificates on both sides –  Incurs significant CPU overhead, reserve 1 core for this work – Certs stored on each node (provision with the cluster), refreshed every 10secs

•  Pluggable Shuffle Sort – Shuffle is the first phase in MapReduce that is guaranteed to not be data-local – Pluggable Shuffle/Sort allows for intrepid application developers or hardware

developers to intercept the network-heavy workload and optimize it –  Typical implementations have hardware components like fast networks and

software components like sorting algorithms – API will change with future versions of Hadoop


Efficiency Gains of MRv2

• Key Optimizations – No hard segmentation of resource into map and reduce slots – Yarn scheduler is more efficient – MRv2 framework has become more efficient than MRv1; shuffle phase in MRv2 is

more performant with the usage of netty.

• Yahoo has over 30000 nodes running YARN across over 365PB of data.

• They calculate running about 400,000 jobs per day for about 10 million hours of compute time.

• They also have estimated a 60% – 150% improvement on node usage per day.

• Yahoo got rid of a whole colo (10,000 node datacenter) because of their increased utilization.


HDFS v2 In a NutShell


HA


HDFS Snapshots: Feature Overview

•  Admin can create point in time snapshots of HDFS

– Of the entire file system (/root)

– Of a specific data-set (sub-tree directory of file system)

•  Restore state of entire file system or data-set to a snapshot (like Apple

Time Machine)

– Protect against user errors

•  Snapshot diffs identify changes made to data set

– Keep track of how raw or derived/analytical data changes over time


NFS Gateway: Feature Overview

•  NFS v3 standard

•  Supports all HDFS commands

–  List files

– Copy, move files

– Create and delete directories

•  Ingest for large scale analytical workloads

–  Load immutable files as source for analytical processing

– No random writes

•  Stream files into HDFS

–  Log ingest by applications writing directly to HDFS client mount


Federation


Managing Namespaces


Performance


Other Features


Apache Tez A New Hadoop Data Processing Framework


Moving Hadoop Beyond MapReduce •  Low level data-processing execution engine •  Built on YARN

•  Enables pipelining of jobs •  Removes task and job launch times •  Does not write intermediate output to HDFS

– Much lighter disk and network usage

•  New base of MapReduce, Hive, Pig, Cascading etc. •  Hive and Pig jobs no longer need to move to the end of the queue

between steps in the pipeline


Apache Tez as the new Primitive

HADOOP 1.0



& data processing)

Pig (data flow)

Hive (sql)

Others (cascading)



Tez (execu:on engine)

HADOOP 2.0

Data Flow Pig

SQL Hive

Others (cascading)

Batch MapReduce Real Time

Stream Processing

Storm

Online Data

Processing HBase,

Accumulo

MapReduce as Base Apache Tez as Base


Hive – MR Hive – Tez

Hive-on-MR vs. Hive-on-Tez SELECT a.x, AVERAGE(b.y) AS avg FROM a JOIN b ON (a.id = b.id) GROUP BY a UNION SELECT x, AVERAGE(y) AS AVG FROM c GROUP BY x

ORDER BY AVG;

SELECT a.state

JOIN (a, c) SELECT c.price

SELECT b.id

JOIN(a, b) GROUP BY a.state

COUNT(*) AVERAGE(c.price)

M M M

R R

M M

R

M M

R

M M

R

HDFS

HDFS

HDFS

M M M

R R

R

M M

R

R

SELECT a.state, c.itemId

JOIN (a, c)



SELECT b.id

Tez avoids unneeded writes to

HDFS


Apache Tez (“Speed”) • Replaces MapReduce as primitive for Pig, Hive, Cascading etc.

– Smaller latency for interactive queries – Higher throughput for batch queries – 22 contributors: Hortonworks (13), Facebook, Twitter, Yahoo, Microsoft

YARN ApplicationMaster to run DAG of Tez Tasks

Task with pluggable Input, Processor and Output

Tez Task - <Input, Processor, Output>

Task

Processor Input Output


Tez: Building blocks for scalable data processing

Classical ‘Map’ Classical ‘Reduce’

Intermediate ‘Reduce’ for Map-Reduce-Reduce

Map Processor

HDFS Input

Sorted Output

Reduce Processor

Shuffle Input

HDFS Output

Reduce Processor

Shuffle Input

Sorted Output


Hive

29


SQL: Enhancing SQL Semantics

Hive SQL Datatypes Hive SQL SemanFcs INT SELECT, INSERT

TINYINT/SMALLINT/BIGINT GROUP BY, ORDER BY, SORT BY

BOOLEAN JOIN on explicit join key

FLOAT Inner, outer, cross and semi joins

DOUBLE Sub-‐queries in FROM clause

STRING ROLLUP and CUBE

TIMESTAMP UNION

BINARY Windowing Func:ons (OVER, RANK, etc)

DECIMAL Custom Java UDFs

ARRAY, MAP, STRUCT, UNION Standard Aggrega:on (SUM, AVG, etc.)

DATE Advanced UDFs (ngram, Xpath, URL)

VARCHAR Sub-‐queries in WHERE, HAVING

CHAR Expanded JOIN Syntax

SQL Compliant Security (GRANT, etc.)

INSERT/UPDATE/DELETE (ACID)

Hive 0.12

Available

Roadmap

SQL Compliance Hive 12 provides a wide array of SQL datatypes and semantics so your existing tools integrate more seamlessly with Hadoop


SPEED: Increasing Hive Performance

Performance Improvements included in Hive 12 –  Base & advanced query optimization –  Startup time improvement –  Join optimizations

Interactive Query Times across ALL use cases •  Simple and advanced queries in seconds •  Integrates seamlessly with existing tools •  Currently a >100x improvement in just nine months


Apache Tez as the new Primitive

HADOOP 1.0



& data processing)

Pig (data flow)

Hive (sql)

Others (cascading)



Tez (execu:on engine)

HADOOP 2.0

Data Flow Pig

SQL Hive

Others (cascading)

Batch MapReduce Real Time

Stream Processing

Storm

Online Data

Processing HBase,

Accumulo

MapReduce as Base Apache Tez as Base


Hive – MR Hive – Tez

Hive-on-MR vs. Hive-on-Tez SELECT a.x, AVERAGE(b.y) AS avg FROM a JOIN b ON (a.id = b.id) GROUP BY a UNION SELECT x, AVERAGE(y) AS AVG FROM c GROUP BY x

ORDER BY AVG;

SELECT a.state

JOIN (a, c) SELECT c.price

SELECT b.id



M M M

R R

M M

R

M M

R

M M

R

HDFS

HDFS

HDFS

M M M

R R

R

M M

R

R

SELECT a.state, c.itemId

JOIN (a, c)



SELECT b.id

Tez avoids unneeded writes to

HDFS



map 1.1 vertex1.2.2



map1.2

reduce1.1

Batch

vertex1.1.1

vertex1.1.2

vertex1.2.1

Hive/Tez (SQL)

Tez on YARN ResourceManager

Scheduler

Real-‐Time

nimbus0

nimbus1

nimbus2


Apache Falcon Data Lifecycle Management for Hadoop


Data Lifecycle on Hadoop is Challenging

Data Management Needs Tools Data Processing Oozie Replication Sqoop Retention Distcp Scheduling Flume Reprocessing Map / Reduce Multi Cluster Management Hive and Pig Jobs

Problem: Patchwork of tools complicate data lifecycle management. Result: Long development cycles and quality challenges.


Falcon: One-stop Shop for Data Lifecycle

Apache Falcon Provides Orchestrates

Data Management Needs Tools Data Processing Oozie Replication Sqoop Retention Distcp Scheduling Flume Reprocessing Map / Reduce Multi Cluster Management Hive and Pig Jobs

Falcon provides a single interface to orchestrate data lifecycle. Sophisticated DLM easily added to Hadoop applications.


Falcon Core Capabilities •  Core Functionality

– Pipeline processing – Replication – Retention –  Late data handling

•  Automates – Scheduling and retry – Recording audit, lineage and metrics

•  Operations and Management – Monitoring, management, metering – Alerts and notifications – Multi Cluster Federation

•  CLI and REST API


Falcon At A Glance

>  Falcon offers a high-level abstraction of key services for Hadoop data management needs. >  Complex data processing logic is handled by Falcon instead of hard-coded in data processing apps. >  Falcon enables faster development of ETL, reporting and other data processing apps on Hadoop.

Data Processing Applications

Data Import and

Replication

Scheduling and

Coordination

Data Lifecycle Policies

Multi-Cluster Management

SLA Management

Falcon Data Management Framework


>  Falcon manages workflow and replication. >  Enables business continuity without requiring full data representation. >  Failover clusters can be smaller than primary clusters.

Falcon Example: Replication

Staged Data

Staged Data

Cleansed Data

Access Data

Processed Data

Conformed Data

Rep

licat

ion

Rep

licat

ion


>  Sophisticated retention policies expressed in one place. >  Simplify data retention for audit, compliance, or for data re-processing.

Falcon Example: Retention

Staged Data

Retain 20 Years

Cleansed Data

Retain 3 Years

Conformed Data

Retain 3 Years

Access Data

Retain Last Copy Only


Falcon Example: Late Data Handling

>  Processing waits until all required input data is available. >  Checks for late data arrivals, issues retrigger processing as necessary. >  Eliminates writing complex data handling rules within applications.

Online Transaction

Data (via Sqoop)

Web Log Data (via FTP)

Staged Data Combined Dataset

Wait up to 4 hours for FTP data

to arrive


Examples

Example: Cluster Specification

<?xml version="1.0"?>!!

<cluster colo=”my-local-cluster" description="" name="cluster-alpha"> ! <interfaces>!

<interface type="readonly" endpoint="hftp://nn:50070" version="2.2.0" />! <interface type="write" endpoint="hdfs://nn:8020" version="2.2.0" />!

<interface type="execute" endpoint=”rm:8050" version="2.2.0" />! <interface type="workflow" endpoint="http://os:11000/oozie/" version="4.0.0" />!

<interface type="messaging" endpoint="tcp://mq:61616?daemon=true" version="5.1.6" />! </interfaces>!

<locations>! <location name="staging" path="/apps/falcon/cluster-alpha/staging" />!

<location name="temp" path="/tmp" />! <location name="working" path="/apps/falcon/cluster-alpha/working" />!

</locations>!</cluster>!

NameNode

Resource Manager

Oozie Server

readonly!

write!

execute!

workflow!


Example: Weblogs Replication and Retention


Example 1: Weblogs •  Weblogs land hourly in my primary cluster •  HDFS location is /weblogs/{date}

•  I want to: – Evict weblogs from primary cluster after 1 day


Feed Specification 1: Weblogs

<feed description="" name="feed-weblogs1" xmlns="uri:falcon:feed:0.1” >! <frequency>hours(1)</frequency>! ! <clusters>!

!<cluster name="cluster-primary" type="source”>!! <validity start="2013-10-24T00:00Z" end="2014-12-31T00:00Z"/>!! <retention limit="days(1)" action="delete"/>!!</cluster>!

</clusters>!! <locations>!

!<location type="data" path="/weblogs/${YEAR}-${MONTH}-${DAY}-${HOUR}" />! </locations>!! <ACL owner="hdfs" group="users" permission="0755" />! <schema location="/none" provider="none"/>!</feed>!

Location of the data

Cluster where data is located

Retention policy 1 day



•  I want to: – Replicate weblogs to my secondary cluster – Evict weblogs from primary cluster after 2 days – Evict weblogs from secondary cluster after 1 week


Feed Specification 2: Weblogs

<feed description=“" name=”feed-weblogs2” xmlns="uri:falcon:feed:0.1">! <frequency>hours(1)</frequency>!! <clusters>!

<cluster name=”cluster-primary" type="source">! <validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/>!

<retention limit="days(2)" action="delete"/>! </cluster>! <cluster name=”cluster-secondary" type="target">!

<validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/>! <retention limit=”days(7)" action="delete"/>!

</cluster>! </clusters>!!

<locations>! <location type="data” path="/weblogs/${YEAR}-${MONTH}-${DAY}-${HOUR} "/>! </locations>!

! <ACL owner=”hdfs" group="users" permission="0755"/>!

<schema location="/none" provider="none"/>!</feed>!

Location of the data

Cluster where data is located

Retention policy 2 days

Cluster where data will be replicated

Retention policy 1 week



•  I want to: – Replicate weblogs to a discovery cluster – Replicate weblogs to a BCP cluster – Evict weblogs from primary cluster after 2 days – Evict weblogs from discovery cluster after 1 week – Evict weblogs from BCP cluster after 3 months


Feed Specification 3: Weblogs <feed description=“” name=”feed-weblogs” xmlns="uri:falcon:feed:0.1">! <frequency>hours(1)</frequency>!

! <clusters>!

<cluster name=”cluster-primary" type="source">! <validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/>!

<retention limit="days(2)" action="delete"/>! </cluster>!

<cluster name=“cluster-discovery" type="target">! <validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/>!

<retention limit=”days(7)" action="delete"/>! <locations>!

<location type="data” path="/projects/recommendations/${YEAR}-${MONTH}-${DAY}-${HOUR} "/>! </locations>!

</cluster>! <cluster name=”cluster-bcp" type="target">!

<validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/>! <retention limit=”months(3)" action="delete"/>!

<locations>! <location type="data” path="/weblogs/${YEAR}-${MONTH}-${DAY}-${HOUR} "/>!

</locations>!

</cluster>! </clusters>!

! <locations>!

<location type="data” path="/weblogs/${YEAR}-${MONTH}-${DAY}-${HOUR} "/>! </locations>!

! <ACL owner=”hdfs" group="users" permission="0755"/>!

<schema location="/none" provider="none"/>!</feed>!

Cluster specific location

Cluster specific location


Apache Knox Secure Access to Hadoop


Connecting to the Cluster..Edge Nodes •  What is an Edge Node?

– Nodes in a DMZ zone that has access to the cluster. Only way to access the cluster

– Hadoop client Apis and MR/Pig/Hive jobs would be executed from these edge nodes.

– Users SSH to Edge Node and upload all job artifacts and then execute API/Commands commands from shell

Hadoop User Edge Node

SSH!

• Challenges – SSH, Edge Node, and job maintenance nightmare – Difficult to integrate with Applications


Connecting to the Cluster..REST API

•  Useful for connecting to Hadoop from the outside the cluster •  When more client language flexibility is required

–  i.e. Java binding not an option

• Challenges – Client must have knowledge of cluster topology – Required to open ports (and in some cases, on every host) outside the cluster

Service API WebHDFS Supports HDFS user operations including reading files,

writing to files, making directories, changing permissions and renaming. Learn more about WebHDFS.

WebHCat Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL commands. Learn more about WebHCat.

Oozie Job submission and management, and Oozie administration. Learn more about Oozie.


Apache Knox Gateway – Perimeter Security

Simplified Access

•  Single Hadoop access point •  Rationalized REST API hierarchy •  Consolidated API calls •  Multi-cluster support •  Client DSL

Centralized Security

•  Eliminate SSH “edge node” •  LDAP and ActiveDirectory auth •  Central API management + audit


Knox Gateway Network Architecture

Ambari Server/Hue Server

Kerberos/Enterprise

Identity Provider

Enterprise/Cloud SSO

Provider

Identity Providers

Knox Gateway Cluster

GW GW GW

DMZ

A stateless cluster of reverse proxy instances deployed in DMZ

Firewall

Secure Hadoop Cluster 1

Masters

JT NN Web HCat Oozie

YARN HBase Hive

DN TT

Secure Hadoop Cluster 2

Masters

JT NN Web HCat Oozie

YARN HBase Hive

DN TT -Requests streamed through GW to Hadoop services after auth. -URLs rewritten to refer to gateway

Firewall

REST Client

JDBC Client

Ambari Client

Browser


Wot no 2.2.0? Where can I get the Hadoop 2.2.0 fix?


Like the Truth, Hadoop 2.2.0 is out there…

Component HDP2.0 CDH4 CDH5 Beta

Intel IDH3.0

MapR 3 IBM Big Insights 2.1

Hadoop Common

2.2.0 2.0.0 2.2.0 2.0.4 N/A 1.1.1

Hive + HCatalog

0.12 0.10 + 0.5

0.11 0.10 + 0.5 0.11 0.9 + 0.4

Pig 0.12 0.11 0.11 0.10 0.11 0.10

Mahout 0.8 0.7 0.8 0.8 0.8 N/A

Flume 1.4.0 1.4.0 1.4.0 1.3.0 1.4.0 1.3.0

Oozie 4.0.0 3.3.2 4.0.0 3.3.0 3.3.2 3.2.0

Sqoop 1.4.4 1.4.3 1.4.4 1.4.3 1.4.4 1.4.2

HBase 0.96.0 0.94.6 95.2 0.94.7 94.9 0.94.3


Thank You THUG Life

2013 nov 20 toronto hadoop user group (thug) - hadoop 2.2.0

Technology

yarn hortonworks

slots hortonworks

reliab hortonworks

nutshell hortonworks

level hortonworks

ha hortonworks

capacity scheduler hortonworks

yarn framework