apex as a yarn application

17
Apache Apex as YARN Application Chinmay Kolhatkar ([email protected]) Mar 22, 2016 Apache Apex Meetup

Upload: datatorrent

Post on 08-Jan-2017

33 views

Category:

Technology


0 download

TRANSCRIPT

Apache Apex as YARN Application

Chinmay Kolhatkar ([email protected])Mar 22, 2016

Apache Apex Meetup

Agenda• Directed Acyclic Graph

• Apex as a YARN Application

• Application Components of Apex

• Lifecycle of Apex as a YARN Application

Apache Apex Meetup

Directed Acyclic Graph (DAG)

• Defines compute stages of streaming application

• Defines tuple flow across Operators via Stream

Compute1

Apache Apex Meetup

Compute3

Compute2

Compute4

DAG Components

• Tuple● Atomic data that flows over a stream

• Operator● Basic compute unit per tuple

• Stream● Connector abstraction between operators● Tuples flow over this

Operator1

Operator2

Apache Apex Meetup

Streamtuple

3tuple

1tuple

2

DAG Types

O1 O2

O3

O4

Physical DAG

Apache Apex Meetup

O5

Logical DAG

• Logical Plan● Logical representation of computation● Defines operators, streams and dataflow

• Physical Plan● Deployable plan on cluster● Contains partition information of operators● Has ready-to-deploy serialized operatorinstances

O1P1

O1P2

O1P3

O2P1

O2P2

O2P3

U

O3

O4

O5

Apex as YARN application

Node

ResourceManager(AsM + Scheduler)

NM Node NM Node NM

YarnClient

AppMaster

YarnContainer

YarnContainer

YarnContainerStrAM

(AppMaster)

YarnContainerStrAMChild

O1 O2

YarnContainerStrAMChild

O3

DTCLIStrAMClient

YarnClient

Apache Apex Meetup

ClientRMProtocol

AMRMProtocol

ContainerManagerProtocol

ContainerManagerProtocol

ClientRMProtocol

AMRMProtocol

ContainerManagerProtocol

Application Components of Apex - StrAMClient• Part of dtcli client interface• Invoked by “launch” command of dtcli

• Tasks:● Copy required the application package files into HDFS● Validate Logical Plan● Serialize Logical plan to HDFS● Launch Application Master i.e. StrAM

Apache Apex Meetup

Application Components of Apex - StrAM

• Streaming Application Master• Started by StrAMClient on a YarnContainer• Tasks:

● Convert logical plan to physical plan● Serialize operators to HDFS● Request for resources to ResourceManager● Start StrAMChild in YarnContainer(s)● Monitor StrAMChild using ContainerManager protocol● Generate Application statistics● Host results on WebService (dtManage)● Fault Tolerance● Checkpointing/Committing Application States● Support Security● Shutdown Application

Apache Apex Meetup

Application Components of Apex - StrAMChild• Deployed on YarnContainer• Started by NodeManager as instructed by StrAM• Instance of StreamingContainer• Contains Operators (compute-related)• Contains BufferServer (stream-related)• Tasks:

● Regularly send heartbeat to StrAM● Execute commands from StrAM● Shutdown or Kill self if instructed● Manage lifecycle of an Operator● Network communication using BufferServer

Apache Apex Meetup

Lifecycle of Apex/YARN Application - Start

Node

ResourceManager(AsM + Scheduler)

NM Node NM Node NM

DTCLI/StrAMClient(YarnClient)

1) Access cluster information

HDFS3) Submit Application to RM

StrAM(AppMaster)

4) StrAM Registers with RM5) StrAM sends heartbeats regularly6) StrAM request containers with specifications

7) StrAMChild reads serialized operator from HDFS8) StrAMChild starts operator lifecycle

Apache Apex Meetup

2) Copies files from HDFS

ClientRMProtocol

AMRMProtocol

YarnContainerStrAMChild

O2

O1 YarnContainerStrAMChild

O3

YarnContainerStrAMChild

O4ContainerManagerProtocol

ContainerManagerProtocol

Lifecycle of Apex/YARN Application - Running

Node

ResourceManager(AsM + Scheduler)

NM Node NM Node NM

DTCLI/StrAMClient(YarnClient)

HDFS

StrAM(AppMaster)

Apache Apex Meetup

ClientRMProtocol

AMRMProtocol

YarnContainerStrAMChild

O2

O1 YarnContainerStrAMChild

O3

YarnContainerStrAMChild

O4ContainerManagerProtocol

ContainerManagerProtocol

1) StrAMChild sends heartbeats2) StrAMChild sends operator data

3) StrAM send regular heartbeats to RM

4) Query status of application

Lifecycle of Apex/YARN Application - Shutdown

Node

ResourceManager(AsM + Scheduler)

NM Node NM Node NM

DTCLI/StrAMClient(YarnClient)

HDFS

StrAM(AppMaster)

Apache Apex Meetup

ClientRMProtocol

AMRMProtocol

YarnContainerStrAMChild

O2

O1 YarnContainerStrAMChild

O3

YarnContainerStrAMChild

O4ContainerManagerProtocol

ContainerManagerProtocol

1) Connect on WebService

REST API

3) Send shutdown signal to StrAMChild4) StrAMChild finishes operator lifecycle

5) Check if all containers are freed6) StrAM unregisters itself7) StrAM exits

8) Check if application has shutdown

2) Send command to StrAM

Lifecycle of Apex/YARN Application - Kill

Node

ResourceManager(AsM + Scheduler)

NM Node NM Node NM

DTCLI/StrAMClient(YarnClient)

HDFS

StrAM(AppMaster)

Apache Apex Meetup

ClientRMProtocol

AMRMProtocol

YarnContainerStrAMChild

O2

O1 YarnContainerStrAMChild

O3

YarnContainerStrAMChild

O4ContainerManagerProtocol

ContainerManagerProtocol

1) Send kill-app command to YARN2) RM kills all containers

Summary – Apex platform

• Enables YARN to be used for Streaming Applications

• Takes care of YARN specific work

• User can focus on business logic defined in Operators

Apache Apex Meetup

15

Apache Apex Meetup

Resources

Apache Apex Meetup

• Apache Apex - http://apex.apache.org/• Subscribe - http://apex.apache.org/community.html• Download - https://www.datatorrent.com/download/• Twitter

o @ApacheApex; Follow - https://twitter.com/apacheapexo @DataTorrent; Follow – https://twitter.com/datatorrent

• Meetups - http://www.meetup.com/topics/apache-apex• Webinars - https://www.datatorrent.com/webinars/• Videos - https://www.youtube.com/user/DataTorrent• Slides - http://www.slideshare.net/DataTorrent/presentations • Startup Accelerator Program - Full featured enterprise product

o https://www.datatorrent.com/product/startup-accelerator/

We Are Hiring

Apache Apex Meetup

[email protected]• Developers/Architects• QA Automation Developers• Information Developers• Build and Release• Community Leaders