2015 zdata inc. - apache ambari overview

18
Apache Ambari Overview

Upload: zdata-inc

Post on 15-Jul-2015

220 views

Category:

Technology


3 download

TRANSCRIPT

Apache Ambari Overview

Topics Covered

• Overview• What is Ambari?• Provisioning • Managing• Monitoring

• Technical Layout• Terminology• Stacks• Blueprints• API Reference

• Building Custom Services

What is Apache Ambari?

• “the seat that one sits upon an elephant”• Provisioner, manager, and monitor of Apache Hadoop clusters• 100% open source• Driven by web app or RESTful APIs• Step-by-step wizards for installing / provisioning a cluster• Can be used to automate a cluster install• Distribution agnostic• “server-agent” type architecture• Central place for managing everything in Hadoop ecosystem• Built-in, extensible, pre-configured metrics collection and system alerting for

monitoring

Provisioning

• Manually provisioning a cluster doesn’t scale, Ambari will.

• Takes care of software dependencies• Installs user and service accounts• Scales to hundreds to a couple thousand nodes• Simple step-by-step installation wizard to guide you

through cluster setup• Choose what services should be on which host(s)• Customize specific service settings or use defaults• Steps to install:

• Install the Ambari Server• Install Ambari Agent• Choose and configure services to hosts• Install and sit back

• Note – Ambari’s definition of provisioning is in the scope of the Hadoop ecosystem, not general provisioning (salt, chef, puppet, etc.) Steps through wizard in GUI

Managing• Add and remove hosts• Add, remove, or modify services & components• Decommission or recommission nodes• Move hadoop namenode or secondary

namenode• Rolling restarts of hosts• Restart entire cluster or specific services • Rollback to previous configurations• View history of past configuration changes• Define host groups for better management• Search for specific hosts by name, ip address,

hardware specs, etc.• More management capabilities specific to

service• View MapReduce job history• Job logs• Jobs currently running

Figure 2. Service Management Options Example

Figure 3. Host management

Managing Con’t.

• Supports wide array of user authentication methods• Single user (default)• LDAP• Active Directory

• Kerberos support• Built-in user access control

• Control what users view and interact in GUI

Monitoring

• Uses existing open-source projects• Pre-configured from installation• Ganglia

• Monitoring, trending patterns, metrics collection

• Used by web interface for metric views & customizable widgets

• Lots of heat maps• Nagios

• Used for health checking and alerting• Email alerting• Customizable for new services

Monitoring Con’t.

Figure 4. Monitoring Widgets in web app

Monitoring Con’t.

Figure 5. Monitoring Widgets in web app

Stacks

• Stacks are a set of services, repo information, and meta information

• Separate from Ambari – Anyone can create and use a new stack

• Supports versioning• Supports inherits – new stack can inherit old stack

• New stack only contains new changes / services• Not part of Ambari. Stack is separate from Ambari.• By default, Ambari comes with the HDP stack

(Hortonworks)• Services in stacks define lifecycle commands (start,

stop, status, install, configure)• Lifecycle commands are controlled via command

scripts• Ability to define “custom” commands

Figure 6. Inside a Stack

Stack Details

• Agents download stack definitions and command scripts

• Agent executes commands locally

• If stack definition changes, agents will pull down latest stack definition

• Services are made up of components:• NameNode• SNameNode• DataNode• HDFS Client

• 3 types of components:• MASTER, SLAVE,

CLIENT

Figure 7. Layout of a Stack and it’s services

Blueprints

• Stacks are just a definition of what’s available • Blueprints are a specific cluster definition

• Maps what is installed in the cluster• Maps which hosts have what service components

• Stacks + Hosts = Blueprint• Can be exported from existing cluster and reused• Used for installation and automation with API• Contains the specifics• Blueprints in JSON file format

Ambari API

• API –anything web ui can do and more• Used for automation and integration • Examples of API uses:

• Get access to monitoring & metrics information

• Get resource usage of specific services• Create, delete, and update services• Start and stop services• Delete entire cluster• Query cluster with parameters

curl –username:password –H ‘X-Requested-By: ambari’ –X POST

–d @ambari-blueprint.json

http://{your.ambari.server}/api/v1/clusters/{cluster-name}

Building a Custom Service

• Choose to define a separate stack, inherit from another stack, or just put new service definition in existing stack (easiest for development)

• Define a metainfo.xml with the following:• Service name• Display name• Comment• Version• Components

• Component category• Cardinality• Command script• Timeouts

<service><name>GREENPLUM</name><displayName>Greenplum</displayName><comment>Pivotal Greenplum Database</comment><version>0.1</version>

<components><component>

<name>GREENPLUM_MASTER</name><displayName>Greenplum Master</displayName><category>MASTER</category><cardinality>1</cardinality><commandScript>

<script>scripts/master.py</script><scriptType>PYTHON</scriptType><timeout>4800</timeout>

</commandScript></component>

<component><name>GREENPLUM_SLAVE</name><displayName>Greenplum Segment</displayName><category>SLAVE</category><cardinality>1+</cardinality><commandScript>

<script>scripts/segment.py</script><scriptType>PYTHON</scriptType><timeout>600</timeout>

</commandScript>

………………..

Building a Custom Service Con’t.

• Create Xml Configuration Files• Define properties command scripts can use and users can edit through GUI or blueprint

<configuration><property>

<name>gp.installer.zip.file.location</name><value></value><description>The absolute file path of where the Greenplum installer zip file is on the master

host.</description></property><property>

<name>gp.installation.path</name><value>/usr/local</value><description>The absolute path to the install location. You must have write permissions to the location

you specify.</description></property><property>

<name>gp.admin.user</name><value>gpadmin</value><description>The Greenplum system user used to administer the Greenplum Database. The user will be

created on all Greenplum hosts.</description></property><property>

<name>gp.admin.password</name><value></value><description>The password for gp.admin.user.</description>

</property><property>

<name>use.mirrors</name><value>false</value><description>Create segment mirrors</description>

</property></configuration>

Creating a Custom Service Con’t.

import sys

from resource_management import *

class Slave(Script):

def install(self, env):

print 'Install the Sample Srv Slave';

def stop(self, env):

print 'Stop the Sample Srv Slave';

def start(self, env):

print 'Start the Sample Srv Slave';

def status(self, env):

print 'Status of the Sample Srv Slave';

def configure(self, env):

print 'Configure the Sample Srv Slave';

if __name__ == "__main__":

Slave().execute()

• Write the command scripts in Python inheriting from the “Script” class• Overload all lifecycle commands

• Install• Stop• Start• Status• configure

Summary

• Ambari is THE provisioner, manager, and monitor for Apache Hadoop clusters

• Great for automation, integration, and extensibility

• Easy step-by-step installation wizards• Simple managing and monitoring• Very powerful API• Stacks separated from Ambari framework• Services can be built for anything

Contact Us | [email protected]