Transcript
Page 1: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

Apache Accumulo on YARNwith Apache SliderBillie Rinaldi

Sr. Member of Technical Staff Hortonworks, Inc.

June 12, 2014

Page 1

Apache, Accumulo, Slider, Ambari, Hadoop, Yarn, Apache Accumulo, Apache Slider, Apache Ambari,

and the Accumulo logo are trademarks of the Apache Software Foundation.

Page 2: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

Topics

•What is YARN?

•Why would you want to run Accumulo on YARN?

•What is Slider and why is it needed?

•How is Accumulo deployed & managed with Slider?

Page 2

Page 3: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

Getting more from Hadoop

HADOOP 1.0

HDFS(redundant, reliable storage)

MapReduce(cluster resource management

& data processing)

HDFS2(redundant, reliable storage)

YARN(cluster resource management)

MapReduce(data processing)

App X(data processing)

HADOOP 2.0

Failure handling and resource management

are no longer just for MapReduce

… and this separation enables much more flexibility

Page 3

App Y(data processing)

Primarily Batch Batch, Interactive, Online, Streaming, …

Page 4: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

App on YARN Use Cases

•Small app clusters in a large YARN cluster

•Dynamic clusters

•Self-healing clusters

•Elastic clusters

•Transient clusters for workflows

•Custom versions & configurations

•More efficient utilization/sharing

Page 4

Page 5: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

YARN Structure

Page 5

HDFS

YARN Node Manager

HDFS

YARN Node Manager

HDFS

YARN Resource Manager“The RM”

HDFS

YARN Node Manager

• Servers run YARN Node Managers• NM's heartbeat to Resource Manager• RM schedules work over cluster• RM allocates containers to apps• NMs start containers• NMs report container health

Page 6: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

Client Creates App Master

Page 6

HDFS

YARN Node Manager

HDFS

YARN Node Manager

Client

HDFS

YARN Resource Manager“The RM”

HDFS

YARN Node Manager

Application Master

Page 7: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

AM Asks for Containers

Page 7

HDFS

YARN Node Manager

HDFS

YARN Node Manager

Container

Container

HDFS

YARN Resource Manager“The RM”

HDFS

YARN Node Manager

Container

Application Master

Page 8: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

YARN Notifies AM of Failures

Page 8

HDFS

YARN Node Manager

HDFS

YARN Node Manager

ContainerContainer

Container

HDFS

YARN Resource Manager“The RM”

HDFS

YARN Node Manager

Application Master

Page 9: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

Issues to Consider

•Do I need to re-write parts of my application?•How do I package my application for YARN?•How do I configure my application?•How do I debug my application?•Can I still manage my application?•Can I monitor my application?•Can I manage inter-/intra-application dependencies?

•How will the external clients communicate?•What does it take to secure the application?

Page 9

Page 10: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

Apache Slider

Apache Slider is a project in incubation at the Apache Software Foundation with the goal of making it possible and easy to deploy existing applications onto a YARN cluster

• History– HBase on YARN (HOYA)– AccumuloProvider/HBaseProvider on YARN– Agent Provider + App Packages for Accumulo/HBase/Storm/…

• Goals for long-lived applications– Execute management operations (Start/Stop, Reconfigure, Scale

up/down, Rolling-restart, Decommission/Recommission, Upgrade)– Detect and remedy failures– Manage logs– Monitor (Ganglia, JMX)

Page 10

Page 11: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

Components of Slider

Page 11

SliderApp Package

SliderCLI

HDFS

YARN Resource Manager“The RM”

HDFS

YARN Node Manager

Agent Comp. Inst.

HDFS

YARN Node Manager

Agent Comp. Inst.

App Master / Agent Provider

Registry

• AppMaster

• AgentProvider

• Agent

• Component Instance

• AppPackage

• CLI

• Registry

Page 12: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

Application by Slider

Page 12

Similar to any YARN application1. CLI starts an instance of the AM2. AM requests containers3. Containers activate with an Agent4. Agent gets application definition5. Agent registers with AM6. AM issues commands7. Agent reports back status,

configuration, etc.8. AM publishes endpoints,

configurations

SliderApp Package

SliderCLI

HDFS

YARN Resource Manager“The RM”

HDFS

YARN Node Manager

Agent Comp. Inst.

HDFS

YARN Node Manager

Agent Comp. Inst.

Application Registry

App Master/Agent Provider

1

2

3

3

4

4

5 5

6

8

7

6

7

AM commandsinstall, start, stop, status, …CLI commandscreate, freeze, thaw, flex, destroy

Page 13: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

Accumulo Slider App Package

Page 13

Page 14: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

Slider Metainfo

Page 14

<metainfo><services><service> <name>ACCUMULO</name> <version>1.5.1</version> <exportGroups><exportGroup> <name>QuickLinks</name> <exports><export> <name>org.apache.slider.monitor</name> <value>http://${ACCUMULO_MONITOR_HOST}:${site.accumulo-site.monitor.port.client}</value> </export></exports></exportGroup></exportGroups> <commandOrders><commandOrder> <command>ACCUMULO_TSERVER-START</command> <requires>ACCUMULO_MASTER-STARTED</requires> </commandOrder></commandOrders> <components><component> <name>ACCUMULO_MASTER</name> <category>MASTER</category> <minInstanceCount>1</minInstanceCount> <commandScript> <script>scripts/accumulo_master.py</script> </commandScript></component></components></service></services></metainfo>

Application Info

Commands have dependencies

URIs can be published

Component information

Commands are implemented as

scripts

Page 15: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

Slider App Resource Spec

Page 15

{ "schema": "http://example.org/specification/v2.0.0", "metadata": { }, "global": { }, "components": { "ACCUMULO_MASTER": { "yarn.role.priority": "1", "yarn.component.instances": "1" }, "slider-appmaster": { }, "ACCUMULO_TSERVER": { "yarn.role.priority": "2", "yarn.component.instances": "1" }, "ACCUMULO_MONITOR": { "yarn.role.priority": "3", "yarn.component.instances": "1" },

YARN resource requirements

Unique priorities

"ACCUMULO_GC": { "yarn.role.priority": "4", "yarn.component.instances": "1" }, "ACCUMULO_TRACER": { "yarn.role.priority": "5", "yarn.component.instances": "1" } }}

Page 16: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

Slider AppConfig Spec

Page 16

{ "application.def": "/slider/accumulo_v151.zip", "java_home": "/usr/jdk64/jdk1.7.0_45",

"site.global.app_log_dir": "${AGENT_LOG_ROOT}/app/log", "site.global.app_pid_dir": "${AGENT_WORK_ROOT}/app/run",

"site.global.tserver_heapsize": "128m", "site.global.hadoop_prefix": "/usr/lib/hadoop", "site.global.zookeeper_home": "/usr/lib/zookeeper", "site.global.accumulo_instance_name": "instancename", "site.global.accumulo_root_password": "secret",

"site.accumulo-site.instance.dfs.dir": "/apps/accumulo/data", "site.accumulo-site.master.port.client": "0", "site.accumulo-site.trace.port.client": "0", "site.accumulo-site.tserver.port.client": "0", "site.accumulo-site.gc.port.client": "0", "site.accumulo-site.monitor.port.log4j": "0", "site.accumulo-site.monitor.port.client": "${ACCUMULO_MONITOR.ALLOCATED_PORT}",

"site.accumulo-site.instance.zookeeper.host": "${ZK_HOST}",}

Configurations needed by Slider

Named variables

Site variables for application

Named variables for cluster details

Allocate and advertise

Variables for the application scripts

(a representative sampling of various types of configuration parameters)

Page 17: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

Slider Install

• Set up Local Install

• Set up HDFS

Page 17

/slider/accumulo_v151.zip/slider/agent/slider/agent/conf/slider/agent/conf/agent.ini/slider/agent/slider-agent.tar.gz

Plus any additional directories needed by the app

mvn clean package –DskipTests (builds tarball)Get slider-0.31.0-incubating-SNAPSHOT-all.tar.gz from slider-assembly/target/Untar tarball in desired directoryEdit conf/slider-client.xml:

yarn.application.classpathslider.zookeeper.quorumyarn.resourcemanager.addressyarn.resourcemanager.scheduler.addressfs.defaultFS

Page 18: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

Slider Execution

• Create an Accumulo instance

• Modify an existing instance

Page 18

bin/slider create name --image hdfs://c6401.ambari.apache.org:8020/ slider/agent/slider-agent.tar.gz --template appConfig.json --resources resources.json

bin/slider freeze namebin/slider thaw name

bin/slider destroy name

bin/slider flex name --component ACCUMULO_TSERVER 2

Page 19: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

Managing a YARN Application

Goal is to have Slider integrate with any application management framework, e.g. Ambari

Apache Ambari is an open source framework for provisioning, managing and monitoring Apache Hadoop clusters• Ambari Views allows development of custom user interfaces• Slider App View will deploy, monitor, manage YARN apps using Slider,

embedded in Ambari (currently, Tech Preview)

Page 19

AmbariServer

AmbariWeb FE

ViewUI

ViewBE

SliderCLI

HDFS

YARN Node Manager

HDFS

YARN Node Manager

HDFS

YARN Node Manager

Page 20: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014Page 20

Page 21: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014Page 21

Page 22: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

What’s Next in Slider

Page 22

•Lock-in Application Specification • Integration with the YARN Registry• Inter/Intra-Application Dependencies•Robust failure handling• Improved debugging•Security•More applications!

Page 23: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

YARN-896: Long-Lived Apps

•Container reconnect on AM restart – mostly complete

•Token renewal on long-lived apps – patch available

•Containers: signaling, >1 process sequence•AM/RM managed gang scheduling•Anti-affinity hint in container requests•ZK Service Registry•Logging

Page 23

Page 24: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

Slider is Seeking Contributors

• Bring Your Favorite Applications to YARN–Create packages, give feedback, create patches, …

• Useful Links–Source: https://git-wip-us.apache.org/repos/asf/incubator-slider.git–Website: http://slider.incubator.apache.org–Mailing List: [email protected]–JIRA: https://issues.apache.org/jira/browse/SLIDER

• Current and Upcoming Releases–Slider 0.30-incubating (May)–Slider 0.40-incubating (planned)

Page 24

Page 25: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

[email protected]@[email protected] #accumulo

Page 25

Page 26: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

AM Restart – leading edge

Page 26

NodeMapmodel of YARN cluster

ComponentHistorypersistent history of

component placements

Specificationresources.json &c

Container Queuesrequested, starting,

releasing

Component Mapcontainer ID -> component

instance

Event Historyapplication history

Persisted in HDFS Rebuilt Transient

ctx.setKeepContainersAcrossApplicationAttempts(true)

Page 27: Accumulo Summit 2014: Accumulo on YARN

© Hortonworks Inc. 2014

Application Registry

Page 27

• A common problem (not specific to Slider)s://issues.apache.org/jira/browse/YARN-913

• Current– Apache Curator based– Register URLs pointing to actual data– AM doubles up as a webserver for published data

• Future– Registry should be stand-alone– Slider is a consumer as well as publisher– Slider focuses on declarative solution for Applications to publish data– Allows integration of Applications independent of how they are hosted


Top Related