accumulo summit 2014: accumulo on yarn

of 27 /27
© Hortonworks Inc. 2014 Apache Accumulo on YARN with Apache Slider Billie Rinaldi Sr. Member of Technical Staff Hortonworks, Inc. June 12, 2014 Page 1 Apache, Accumulo, Slider, Ambari, Hadoop, Yarn, Apache Accumulo, Apache Slider, Ap and the Accumulo logo are trademarks of the Apache Software Foundation

Author: accumulo-summit

Post on 10-May-2015

1.377 views

Category:

Technology


3 download

Embed Size (px)

DESCRIPTION

Speaker: Billie Rinaldi In their OSDI 2006 paper, Google describes that "Bigtable depends on a cluster management system for scheduling jobs, managing resources on shared machines, dealing with machine failures, and monitoring machine status." Until recently, no such system existed for Apache Accumulo to rely upon. Apache Hadoop 2 introduced the Yarn resource management system to the Hadoop ecosystem. This talk will describe the benefits Yarn can provide for Accumulo installations and how the Slider project (proposed for the Apache Incubator) makes it easier to deploy long-running applications on Yarn. It will describe the details of the Accumulo App Package for Slider and how to use Slider to deploy an Accumulo instance, as well as how instances can be actively managed by other applications such as Apache Ambari.

TRANSCRIPT

  • 1. Hortonworks Inc. 2014 Apache Accumulo on YARN with Apache Slider Billie Rinaldi Sr. Member of Technical Staff Hortonworks, Inc. June 12, 2014 Page 1 Apache, Accumulo, Slider, Ambari, Hadoop, Yarn, Apache Accumulo, Apache Slider, Apache Ambari, and the Accumulo logo are trademarks of the Apache Software Foundation.

2. Hortonworks Inc. 2014 Topics What is YARN? Why would you want to run Accumulo on YARN? What is Slider and why is it needed? How is Accumulo deployed & managed with Slider? Page 2 3. Hortonworks Inc. 2014 Getting more from Hadoop HADOOP 1.0 HDFS (redundant, reliable storage) MapReduce (cluster resource management & data processing) HDFS2 (redundant, reliable storage) YARN (cluster resource management) MapReduce (data processing) App X (data processing) HADOOP 2.0 Failure handling and resource management are no longer just for MapReduce and this separation enables much more flexibility Page 3 App Y (data processing) Primarily Batch Batch, Interactive, Online, Streaming, 4. Hortonworks Inc. 2014 App on YARN Use Cases Small app clusters in a large YARN cluster Dynamic clusters Self-healing clusters Elastic clusters Transient clusters for workflows Custom versions & configurations More efficient utilization/sharing Page 4 5. Hortonworks Inc. 2014 YARN Structure Page 5 HDFS YARN Node Manager HDFS YARN Node Manager HDFS YARN Resource Manager The RM HDFS YARN Node Manager Servers run YARN Node Managers NM's heartbeat to Resource Manager RM schedules work over cluster RM allocates containers to apps NMs start containers NMs report container health 6. Hortonworks Inc. 2014 Client Creates App Master Page 6 HDFS YARN Node Manager HDFS YARN Node Manager Client HDFS YARN Resource Manager The RM HDFS YARN Node Manager Application Master 7. Hortonworks Inc. 2014 AM Asks for Containers Page 7 HDFS YARN Node Manager HDFS YARN Node Manager Container Container HDFS YARN Resource Manager The RM HDFS YARN Node Manager Container Application Master 8. Hortonworks Inc. 2014 YARN Notifies AM of Failures Page 8 HDFS YARN Node Manager HDFS YARN Node Manager ContainerContainer Container HDFS YARN Resource Manager The RM HDFS YARN Node Manager Application Master 9. Hortonworks Inc. 2014 Issues to Consider Do I need to re-write parts of my application? How do I package my application for YARN? How do I configure my application? How do I debug my application? Can I still manage my application? Can I monitor my application? Can I manage inter-/intra-application dependencies? How will the external clients communicate? What does it take to secure the application? Page 9 10. Hortonworks Inc. 2014 Apache Slider Apache Slider is a project in incubation at the Apache Software Foundation with the goal of making it possible and easy to deploy existing applications onto a YARN cluster History HBase on YARN (HOYA) AccumuloProvider/HBaseProvider on YARN Agent Provider + App Packages for Accumulo/HBase/Storm/ Goals for long-lived applications Execute management operations (Start/Stop, Reconfigure, Scale up/down, Rolling-restart, Decommission/Recommission, Upgrade) Detect and remedy failures Manage logs Monitor (Ganglia, JMX) Page 10 11. Hortonworks Inc. 2014 Components of Slider Page 11 Slider App Package Slider CLI HDFS YARN Resource Manager The RM HDFS YARN Node Manager Agent Comp. Inst. HDFS YARN Node Manager Agent Comp. Inst. App Master / Agent Provider Registry AppMaster AgentProvider Agent Component Instance AppPackage CLI Registry 12. Hortonworks Inc. 2014 Application by Slider Page 12 Similar to any YARN application 1. CLI starts an instance of the AM 2. AM requests containers 3. Containers activate with an Agent 4. Agent gets application definition 5. Agent registers with AM 6. AM issues commands 7. Agent reports back status, configuration, etc. 8. AM publishes endpoints, configurations Slider App Package Slider CLI HDFS YARN Resource Manager The RM HDFS YARN Node Manager Agent Comp. Inst. HDFS YARN Node Manager Agent Comp. Inst. Application Registry App Master/Agent Provider AM commands install, start, stop, status, CLI commands create, freeze, thaw, flex, destroy 13. Hortonworks Inc. 2014 Accumulo Slider App Package Page 13 14. Hortonworks Inc. 2014 Slider Metainfo Page 14 ACCUMULO1.5.1QuickLinksorg.apache.slider.monitorhttp://${ACCUMULO_MONITOR_HOST}:${site.accumulo-site.monitor.port.client} ACCUMULO_TSERVER-STARTACCUMULO_MASTER-STARTEDACCUMULO_MASTERMASTER1 Application Info Commands have dependencies URIs can be published Component information Commands are implemented as scripts 15. Hortonworks Inc. 2014 Slider App Resource Spec Page 15 { "schema": "http://example.org/specification/v2.0.0", "metadata": { }, "global": { }, "components": { "ACCUMULO_MASTER": { "yarn.role.priority": "1", "yarn.component.instances": "1" }, "slider-appmaster": { }, "ACCUMULO_TSERVER": { "yarn.role.priority": "2", "yarn.component.instances": "1" }, "ACCUMULO_MONITOR": { "yarn.role.priority": "3", "yarn.component.instances": "1" }, YARN resource requirements Unique priorities "ACCUMULO_GC": { "yarn.role.priority": "4", "yarn.component.instances": "1" }, "ACCUMULO_TRACER": { "yarn.role.priority": "5", "yarn.component.instances": "1" } } } 16. Hortonworks Inc. 2014 Slider AppConfig Spec Page 16 { "application.def": "/slider/accumulo_v151.zip", "java_home": "/usr/jdk64/jdk1.7.0_45", "site.global.app_log_dir": "${AGENT_LOG_ROOT}/app/log", "site.global.app_pid_dir": "${AGENT_WORK_ROOT}/app/run", "site.global.tserver_heapsize": "128m", "site.global.hadoop_prefix": "/usr/lib/hadoop", "site.global.zookeeper_home": "/usr/lib/zookeeper", "site.global.accumulo_instance_name": "instancename", "site.global.accumulo_root_password": "secret", "site.accumulo-site.instance.dfs.dir": "/apps/accumulo/data", "site.accumulo-site.master.port.client": "0", "site.accumulo-site.trace.port.client": "0", "site.accumulo-site.tserver.port.client": "0", "site.accumulo-site.gc.port.client": "0", "site.accumulo-site.monitor.port.log4j": "0", "site.accumulo-site.monitor.port.client": "${ACCUMULO_MONITOR.ALLOCATED_PORT}", "site.accumulo-site.instance.zookeeper.host": "${ZK_HOST}", } Configurations needed by Slider Named variables Site variables for application Named variables for cluster details Allocate and advertise Variables for the application scripts (a representative sampling of various types of configuration parameters) 17. Hortonworks Inc. 2014 Slider Install Set up Local Install Set up HDFS Page 17 /slider/accumulo_v151.zip /slider/agent /slider/agent/conf /slider/agent/conf/agent.ini /slider/agent/slider-agent.tar.gz Plus any additional directories needed by the app mvn clean package DskipTests (builds tarball) Get slider-0.31.0-incubating-SNAPSHOT-all.tar.gz from slider-assembly/target/ Untar tarball in desired directory Edit conf/slider-client.xml: yarn.application.classpath slider.zookeeper.quorum yarn.resourcemanager.address yarn.resourcemanager.scheduler.address fs.defaultFS 18. Hortonworks Inc. 2014 Slider Execution Create an Accumulo instance Modify an existing instance Page 18 bin/slider create name --image hdfs://c6401.ambari.apache.org:8020/ slider/agent/slider-agent.tar.gz --template appConfig.json --resources resources.json bin/slider freeze name bin/slider thaw name bin/slider destroy name bin/slider flex name --component ACCUMULO_TSERVER 2 19. Hortonworks Inc. 2014 Managing a YARN Application Goal is to have Slider integrate with any application management framework, e.g. Ambari Apache Ambari is an open source framework for provisioning, managing and monitoring Apache Hadoop clusters Ambari Views allows development of custom user interfaces Slider App View will deploy, monitor, manage YARN apps using Slider, embedded in Ambari (currently, Tech Preview) Page 19 Ambari Server Ambari Web FE View UI View BE Slider CLI HDFS YARN Node Manager HDFS YARN Node Manager HDFS YARN Node Manager 20. Hortonworks Inc. 2014 Page 20 21. Hortonworks Inc. 2014 Page 21 22. Hortonworks Inc. 2014 Whats Next in Slider Page 22 Lock-in Application Specification Integration with the YARN Registry Inter/Intra-Application Dependencies Robust failure handling Improved debugging Security More applications! 23. Hortonworks Inc. 2014 YARN-896: Long-Lived Apps Container reconnect on AM restart mostly complete Token renewal on long-lived apps patch available Containers: signaling, >1 process sequence AM/RM managed gang scheduling Anti-affinity hint in container requests ZK Service Registry Logging Page 23 24. Hortonworks Inc. 2014 Slider is Seeking Contributors Bring Your Favorite Applications to YARN Create packages, give feedback, create patches, Useful Links Source: https://git-wip-us.apache.org/repos/asf/incubator-slider.git Website: http://slider.incubator.apache.org Mailing List: [email protected] JIRA: https://issues.apache.org/jira/browse/SLIDER Current and Upcoming Releases Slider 0.30-incubating (May) Slider 0.40-incubating (planned) Page 24 25. Hortonworks Inc. 2014 Questions? [email protected] [email protected] [email protected] IRC #accumulo Page 25 26. Hortonworks Inc. 2014 AM Restart leading edge Page 26 NodeMap model of YARN cluster ComponentHistory persistent history of component placements Specification resources.json &c Container Queues requested, starting, releasing Component Map container ID -> component instance Event History application history Persisted in HDFS Rebuilt Transient ctx.setKeepContainersAcrossApplicationAttempts(true) 27. Hortonworks Inc. 2014 Application Registry Page 27 A common problem (not specific to Slider) s://issues.apache.org/jira/browse/YARN-913 Current Apache Curator based Register URLs pointing to actual data AM doubles up as a webserver for published data Future Registry should be stand-alone Slider is a consumer as well as publisher Slider focuses on declarative solution for Applications to publish data Allows integration of Applications independent of how they are hosted