managing enterprise hadoop clusters with apache ambari

60
1© Hortonworks Inc. 2011 – 2016. All Rights Reserved Managing Enterprise Hadoop Clusters with Apache Ambari Jayush Luniya @ Hortonworks Apache Ambari PMC © Hortonworks Inc. 2011 – 2016. All Rights Reserved May 2016

Upload: jayush-luniya

Post on 13-Apr-2017

189 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Managing Enterprise Hadoop Clusters with Apache Ambari

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Managing Enterprise Hadoop Clusters with

Apache Ambari

Jayush Luniya @ Hortonworks Apache Ambari PMC

© Hortonworks Inc. 2011 – 2016. All Rights Reserved May 2016

Page 2: Managing Enterprise Hadoop Clusters with Apache Ambari

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

Ambari Overvie

w

Ambari Features Demo Q&A

Page 3: Managing Enterprise Hadoop Clusters with Apache Ambari

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

What’s Apache Ambari?

100% open-source platform for simplifying

Hadoop cluster management and

use.

Highly extensible.

Page 4: Managing Enterprise Hadoop Clusters with Apache Ambari

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

It’s a wild zoo out there!Gotta manage this

efficiently.

Page 5: Managing Enterprise Hadoop Clusters with Apache Ambari

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Ambari Themes

• Deliver the core operational capabilities to provision, manage and monitor Hadoop clusters at scale.

Operate Hadoop at Scale

• Robust API for integration with existing enterprise systems, such as Microsoft SCOM and Teradata Viewpoint.

Integrate with the Enterprise

• Provide extensible platform for Customers, Partners and the Community (Stacks, Views)

Extend for the Ecosystem

Page 6: Managing Enterprise Hadoop Clusters with Apache Ambari

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Ambari

Page 7: Managing Enterprise Hadoop Clusters with Apache Ambari

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Open Source Activity

Page 8: Managing Enterprise Hadoop Clusters with Apache Ambari

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Inception: AMBARI-1 (Sept, 2011)

Page 9: Managing Enterprise Hadoop Clusters with Apache Ambari

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Fast forward 5 years to today…

Latest JIRA: AMBARI-16131 150+ Contributors 60+ Committers 16131 JIRAs filed 14254 JIRAs fixed

At 1.5 day per JIRA ~ 90 person years!

Used by hundreds of companies

Page 10: Managing Enterprise Hadoop Clusters with Apache Ambari

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari – 3rd Biggest Project* @ Apache

* Based on total JIRAs filed on a project basis as of April 26, 2016

#2: Hadoop at ~32k as it is split across multiple JIRA Projects

#1#3#4#5

Page 11: Managing Enterprise Hadoop Clusters with Apache Ambari

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Timeline

Ambari 1.6.*May 2014908 JIRAs

Ambari 1.5.*Apr 2014

1218 JIRAs

Ambari 1.7.*Dec 2014

1620 JIRAs

Ambari 2.0.* April 20151804 JIRAs

Current GA Version (2.2.2)

Ambari 2.1.*July 2015

2674 JIRAs

Ambari Stacks

Resolution of 9k+ JIRAs

Ambari Blueprints Ambari Views

Alerts FrameworkMetrics SystemRolling UpgradeKerberos Automation

Enhanced DashboardsSmart Configs

Ambari 2.2.*Dec 2015

1542 JIRAs

Express UpgradeAMS Grafana

Page 12: Managing Enterprise Hadoop Clusters with Apache Ambari

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

Ambari Overvie

w

Ambari Features Demo Q&A

Page 13: Managing Enterprise Hadoop Clusters with Apache Ambari

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Extensibility Features

• To add new Services (ISV or otherwise) beyond HDP stack• To customize a Stack for customer specific environmentsStacks

• To use Ambari for automating cluster installations.• To share best practices on layout and cluster configurationBlueprints

• To extend and customize the Ambari Web UI• Add new capabilities, customize existing capabilitiesViews

Page 14: Managing Enterprise Hadoop Clusters with Apache Ambari

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Anatomy of Ambari Extension Points

Page 15: Managing Enterprise Hadoop Clusters with Apache Ambari

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari Stacks

Page 16: Managing Enterprise Hadoop Clusters with Apache Ambari

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stack Terminology

Term Definition Examples

STACK Defines a set of Services, where to obtain the software packages and how to manage the lifecycle.

HDP-2.3, HDP-2.2

SERVICE Defines the Components that make-up the service. HDFS, NAGIOS, YARN

COMPONENT The building-blocks of a Service, that adhere to a certain lifecycle.

NAMENODE, DATANODE, OOZIE_SERVER

CATEGORY The category of Component. MASTER, SLAVE, CLIENT

REPO Repository metadata where the artifacts reside http://public-repo-1.hortonworks.com/HDP/centos6/2.x/GA/2.3.0.0

Page 17: Managing Enterprise Hadoop Clusters with Apache Ambari

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari Stack Stacks define Services + Repo

– What is a stack, and where to get the bits

Each service has a definition– What components are part of the Service

Each service has defined lifecycle commands– start, stop, status, install, configure

Lifecycle is controlled via command scripts Ability to define “custom” commands

Ambari Server

Stack

Service Definitions

Command Scripts

xml python

Ambari Agents

Repos

Page 18: Managing Enterprise Hadoop Clusters with Apache Ambari

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stacks Support Inheritance

HDP 2.1 Stack

HDP 2.0 Stack

Overrides any Service definitions, commands and configurations Adds new Services specific to this Stack

Defines a set of Service definitions Default service configurations and command scripts

Page 19: Managing Enterprise Hadoop Clusters with Apache Ambari

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Blueprints

Page 20: Managing Enterprise Hadoop Clusters with Apache Ambari

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Automated Cluster Deployment

Deploy clusters of any scale with ease Two REST API calls is all it takes to provision a clusterWho uses it? HDInsight (Microsoft Azure) Hortonworks QA

Page 21: Managing Enterprise Hadoop Clusters with Apache Ambari

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Example: Create a 100-node Cluster

{ "configurations" : [ { ”hdfs-site" : {

"dfs.datanode.data.dir" : ”/hadoop/1,/hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : ”master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" }, { "name" : ”worker-host", "components" : [ { "name" : ”DATANODE” }, { "name" : ”NODEMANAGER” }, … ], "cardinality" : "1+" }, ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.0" }}

{ "blueprint" : ”my-blueprint", "host_groups" :[ { "name" : ”master-host", "hosts" : [ { "fqdn" : ”master001.ambari.apache.org”

} ] }, { "name" : ”worker-host", "hosts" : [ { "fqdn" : ”worker001.ambari.apache.org”

}, { "fqdn" : ”worker002.ambari.apache.org”

}, … { "fqdn" : ”worker099.ambari.apache.org”

} ] } ]}

1. POST /api/v1/blueprints/my-blueprint 2. POST /api/v1/clusters/my-cluster

Page 22: Managing Enterprise Hadoop Clusters with Apache Ambari

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Cluster Replication

{ "configurations" : [ { ”cluster-env" : {

”user_group" : ”hadoop" } ”hdfs-site" : {

"dfs.datanode.data.dir" : ”/hadoop/1,/hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : ”master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" } ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.0" }}

GET/api/v1/clusters/my-cluster?format=blueprint

Export blueprint from an existing cluster Import blueprint to replicate the cluster

Page 23: Managing Enterprise Hadoop Clusters with Apache Ambari

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Blueprint Features

Ambari 2.0: High availability (HA) cluster deployments Adding hosts using blueprints (AMBARI-8458)Ambari 2.1: Advanced cluster creation options (AMBARI-10750)Ambari 2.2: Kerberized cluster deployments (AMBARI-13431) Stack advisor recommendations (AMBARI-13487)

Page 24: Managing Enterprise Hadoop Clusters with Apache Ambari

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stack Upgrades

Page 25: Managing Enterprise Hadoop Clusters with Apache Ambari

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stack Upgrades Rolling vs Express Upgrade modes Side-by-Side Bits and Configs

Bits:/usr/hdp/2.2.0.0-2041/usr/hdp/2.2.4.2-2/usr/hdp/2.3.0.0-3000

Configs:/etc/hive/conf/ (initial)/etc/hive/conf/v0 (HDP 2.2.4.2)/etc/hive/conf/v1 (HDP 2.3)

2.2.0.0 2.2.4.2 2.3.0.0minor jump major jump

Page 26: Managing Enterprise Hadoop Clusters with Apache Ambari

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Express vs Rolling Upgrade

Rolling Upgrade Services are up the entire time Upgrade one component at a time Robust and fault-tolerant Service checks performed frequently during the upgradeExpress Upgrade All services are brought down, upgraded and restarted Faster upgrade mode Planned service downtime Relatively service checks performed less frequently during the upgrade.

Page 27: Managing Enterprise Hadoop Clusters with Apache Ambari

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stack Upgrade – Install Version

Install new version in parallel on all agents No downtime

Page 28: Managing Enterprise Hadoop Clusters with Apache Ambari

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stack Upgrade – Orchestration

Not necessarily “one-click” but fully guided

Page 29: Managing Enterprise Hadoop Clusters with Apache Ambari

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stack Upgrade – Upgrade Catalog

Upgrades are driven by upgrade catalogs defined in stack definitions. Defines upgrade groups and upgrade order Provides ability to modify configurations

– Set, move, delete, transform Upgrade steps can be marked as skippable and retryable Supports executing custom scripts during upgrade

Page 30: Managing Enterprise Hadoop Clusters with Apache Ambari

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stack Upgrade – Upgrade Catalog

Page 31: Managing Enterprise Hadoop Clusters with Apache Ambari

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stack Downgrade

Can trigger downgrade at any stage of the stack upgrade Cannot downgrade once stack upgrade has been finalized

Page 32: Managing Enterprise Hadoop Clusters with Apache Ambari

32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Smart Configurations

Page 33: Managing Enterprise Hadoop Clusters with Apache Ambari

33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hadoop Configuration Challenges

Too many configurations– Which ones are important?

Too easy to mess up– What are valid/reasonable values?– What are the units?– Ok, what about dependencies?

Gets harder with combinations of services, host assignments, enabled features, CPU/RAM/disks, etc– Any recommendations? What am I doing wrong?

Smart Configurations

Page 34: Managing Enterprise Hadoop Clusters with Apache Ambari

34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari Smart Configs UI

Customizable layout

- Tabs- Sections- Sub-sections- Simple grid layout

(Advanced Tab contains remaining configurations)

New Widgets

- Sliders- Recommended- Minimum- Maximum- Increment Step

- Combos- Enumerated values

- Toggles- Binary options

- Spinners- Splits value into multiple

controls. Time in milliseconds split into days, hours, minutes.

- Lists- Enumerated values- Single select- Multi select

Implemented- HDFS- YARN- MapReduce- Hive- HBase

Page 35: Managing Enterprise Hadoop Clusters with Apache Ambari

35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stack Driven Layouts

Stack has theme.json file

Layout Tabs Sections Sub-sections

Placement Configs placement in sub-sections

Widgets Widget type Optional Units Bytes (B, KB, MB, GB, TB, PB) Time (Millis, Seconds, Minutes, Hours, Days, Months,

Years)

{ "name": "default", "description": "Default theme for HBASE service", "configuration": { "layouts": [ { "name": "default", "tabs": [ { "name": "settings", "display-name": "Settings", "layout": { "tab-columns": "3", "tab-rows": "3", "sections": [ ... ] } } ] } ], "placement": { "configuration-layout": "default", "configs": [...] }, "widgets": [ { "config": "hbase-env/hbase_master_heapsize", "widget": { "type": "slider", "units": [ { "unit-name": "GB" } ] } }, ... ] }}

Page 36: Managing Enterprise Hadoop Clusters with Apache Ambari

36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Config Metadata and Dependencies

Extended Metadata Defined in property_value_attributes Hold non-UI metadata about value range,

increment, unit, etc

Dependencies Models bi-directional relationship between configs Depends On (property_depends_on)

Answers “which configs do I depend on?”

Depended By (dependencies) Answers “which configs are dependent on me?”

Ambari automatically updates dependencies

{ "StackConfigurations": { "final": "false", "property_depends_on": [ { "type": "yarn-site", "name": "yarn.nodemanager.resource.memory-mb" } ], "property_description": “The minimum allocation for every", "property_display_name": "Minimum Container Size (Memory)", "property_name": "yarn.scheduler.minimum-allocation-mb", "property_type": [], "property_value": "512", "property_value_attributes": { "type": "int", "maximum": "5120", "minimum": "0", "unit": "MB", "increment_step": "256" }, "type": "yarn-site.xml" }, "dependencies": [ { "StackConfigurationDependency": { "dependency_name": "hive.tez.container.size", "property_name": "yarn.scheduler.minimum-allocation-mb” } }, { "StackConfigurationDependency": { "dependency_name": "mapreduce.map.memory.mb", "property_name": "yarn.scheduler.minimum-allocation-mb” } }, { "StackConfigurationDependency": { "dependency_name": "mapreduce.reduce.memory.mb", "property_name": "yarn.scheduler.minimum-allocation-mb” } }… ]}

Page 37: Managing Enterprise Hadoop Clusters with Apache Ambari

37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metrics

Page 38: Managing Enterprise Hadoop Clusters with Apache Ambari

38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari Metrics Service (AMS) - Goals

Ability to collect metrics from Hadoop and other Stack services Ability to collect system level metrics Ability to retain metrics at a high precision for a configurable time period Ability to automatically purge metrics after retention period Provide integration point for metrics collection and retention by external system Trigger alerts based on metrics in Ambari

Page 39: Managing Enterprise Hadoop Clusters with Apache Ambari

39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari Metrics System - Architecture

Page 40: Managing Enterprise Hadoop Clusters with Apache Ambari

40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AMS Grafana

Ambari 2.2.2 Powerful dashboard builder integrated with AMS Pre-built Grafana dashboards for host-level and service-level metrics User can build and save custom dashboards

Page 41: Managing Enterprise Hadoop Clusters with Apache Ambari

41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AMS Grafana

Page 42: Managing Enterprise Hadoop Clusters with Apache Ambari

42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Alerts

Page 43: Managing Enterprise Hadoop Clusters with Apache Ambari

43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Alert – Types

Type Description Status ThresholdsConfigurable?

PORT Watches a port based on a configuration property such as the URI. OK, WARN, CRIT Yes (seconds)

WEB Watches an HTTP or HTTPS endpoint and determines connectivity and HTTP status code. OK, WARN, CRIT No

AGGREGATE Aggregate of status for another alert definition. OK, WARN, CRIT Yes (percentage)

METRIC Watches a metric or series of metrics in JMX and compares a mathematical result against a threshold. OK, WARN, CRIT Yes (variable)

SCRIPT Uses a custom script to handle checking. OK or CRIT No

Page 44: Managing Enterprise Hadoop Clusters with Apache Ambari

44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

UI – Current Alerts

Configured by default; managed via the the web client

Page 45: Managing Enterprise Hadoop Clusters with Apache Ambari

45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

UI – Host Alerts

Automatically refreshes Query alert history

Page 46: Managing Enterprise Hadoop Clusters with Apache Ambari

46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

UI– Customization & Instances

Status text, thresholds, and interval

Page 47: Managing Enterprise Hadoop Clusters with Apache Ambari

47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Views

Page 48: Managing Enterprise Hadoop Clusters with Apache Ambari

48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari Views

View Framework Provide various applications accessible from Ambari Web UI – interact with the cluster via a

browser from a single place for all users (cluster operators, data analysis, developers, etc)

Easy to develop No need to understand Ambari core code – view development is just like creating any other web

application

Easy to deploy Packaged as a single jar file Auto create / auto configure

Page 49: Managing Enterprise Hadoop Clusters with Apache Ambari

49 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

CS Queue Manager for Cluster Operators

Capacity Scheduler Queue Manager

Page 50: Managing Enterprise Hadoop Clusters with Apache Ambari

50 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS File Browser for General Users

HDFS File Browser

Page 51: Managing Enterprise Hadoop Clusters with Apache Ambari

51 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Job Analysis for Developers

Troubleshoot Tez JobsTroubleshoot / Improve Hive queries

Page 52: Managing Enterprise Hadoop Clusters with Apache Ambari

52 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Query Editors for Data Analysts

Create, edit, execute, and analyze Hive queries Create, edit, and execute Pig scripts

Page 53: Managing Enterprise Hadoop Clusters with Apache Ambari

53 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ambari Server in Views-Only mode

AmbariServer Cluster managed by Ambari

AmbariServer “Views-only” mode

(aka “Stand-alone” mode)Cluster not managed by Ambari

Management

Use Views

Use Views

Use Views

Use Views on existing clusters not managed by Ambari Can use Views against multiple clusters

Page 54: Managing Enterprise Hadoop Clusters with Apache Ambari

54 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Kerberos Automation

Page 55: Managing Enterprise Hadoop Clusters with Apache Ambari

55 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Kerberos Automation

Ambari 2.0 Ambari manage Kerberos principals and keytabs Works with existing MIT KDC or Active Directory Once Kerberized, seamlessly handle:

Adding new hosts Adding new components to existing hosts Adding new services Moving components to different hosts

Page 56: Managing Enterprise Hadoop Clusters with Apache Ambari

56 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

Ambari Overvie

w

Ambari Features Demo Q&A

Page 57: Managing Enterprise Hadoop Clusters with Apache Ambari

57 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

Ambari Overvie

w

Ambari Features Demo Q&A

Page 58: Managing Enterprise Hadoop Clusters with Apache Ambari

58 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank You!

Try Ambari Follow the Ambari Quick Start Guide https://cwiki.apache.org/confluence/display/AMBARI/Quick+Start+Guide

Learn more Visit the project website http://ambari.apache.org/

Get Involved User Mailing List: [email protected]

Developer Mailing List: [email protected]

Use JIRA to file bugs and improvement requests https://issues.apache.org/jira/browse/AMBARI/

Jayush Luniya @ Hortonworks (Apache Ambari PMC)

Page 59: Managing Enterprise Hadoop Clusters with Apache Ambari

59 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Future Roadmap

AMS Grafana Integration Ambari Management Packs Ambari Logsearch Patch Upgrades Multi Service Versions Multi Service Instances

Page 60: Managing Enterprise Hadoop Clusters with Apache Ambari

60 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Q&A

Stats

Largest production clusters managed by Ambari ~1600 nodes, ~800 nodes

Largest test cluster for Ambari scale testing ~400 nodes

Largest test cluster where rolling upgrade was performed ~400 nodes~40 hours