next-gen decision making in 2ms with apache apex

57
Next-Gen Decision Making in <2ms

Upload: datatorrent

Post on 16-Apr-2017

107 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Next-Gen Decision Making in 2ms with Apache Apex

Next-Gen Decision Making in <2ms

Page 2: Next-Gen Decision Making in 2ms with Apache Apex

2

Page 3: Next-Gen Decision Making in 2ms with Apache Apex

3

VS.

Page 4: Next-Gen Decision Making in 2ms with Apache Apex

4

Page 5: Next-Gen Decision Making in 2ms with Apache Apex

5

Page 6: Next-Gen Decision Making in 2ms with Apache Apex

6

Page 7: Next-Gen Decision Making in 2ms with Apache Apex

7

X (predictor)Spend amount

Y (response)Likelihood of millionaire

Simple Velocity Advanced

Page 8: Next-Gen Decision Making in 2ms with Apache Apex

8

Page 9: Next-Gen Decision Making in 2ms with Apache Apex

9

Page 10: Next-Gen Decision Making in 2ms with Apache Apex

10

Page 11: Next-Gen Decision Making in 2ms with Apache Apex

11

Page 12: Next-Gen Decision Making in 2ms with Apache Apex

12

Hard Metrics Goal

Latency < 40msIdeally < 16ms

Throughput Goal of 2000 events / second

Durability No loss, every message gets exactly one response

Availability 99.5% uptime (downtime of 1.83 days / year);Ideally 99.999% uptime (downtime of 5.26 minutes / year)

Scalability Can add resources, still meet latency requirements

Integration Transparently connected to existing systems – Hardware, Messaging, HDFS

Soft Metrics Goal

Open Source All components licensed as open source

Extensibility Rules can be updated, model is regularly refreshed

Page 13: Next-Gen Decision Making in 2ms with Apache Apex

13

Page 14: Next-Gen Decision Making in 2ms with Apache Apex

14

Onyx

Page 15: Next-Gen Decision Making in 2ms with Apache Apex

15

Enterprise Readiness

RoadmapPerformance

Community

Page 16: Next-Gen Decision Making in 2ms with Apache Apex

16

Page 17: Next-Gen Decision Making in 2ms with Apache Apex

17

Page 18: Next-Gen Decision Making in 2ms with Apache Apex

18

Page 19: Next-Gen Decision Making in 2ms with Apache Apex

19

Page 20: Next-Gen Decision Making in 2ms with Apache Apex

20

Page 21: Next-Gen Decision Making in 2ms with Apache Apex

21

Page 22: Next-Gen Decision Making in 2ms with Apache Apex

22

YARN

Page 23: Next-Gen Decision Making in 2ms with Apache Apex

23

Page 24: Next-Gen Decision Making in 2ms with Apache Apex

24

Page 25: Next-Gen Decision Making in 2ms with Apache Apex

25

Page 26: Next-Gen Decision Making in 2ms with Apache Apex

26

Page 27: Next-Gen Decision Making in 2ms with Apache Apex

27

• Avg. 0.25ms, @70k records/sec, w/ 600GB RAM

Thread Local on ~54M eventsPercentiles (in ms)

Throughput Count Avg (ms) 90% 95% 99% 99.9% 4 9’s 5 9’s 6 9’s

70k/sec 54,126,122 0.19 1 1 1 2 2 5 6

Performance

Page 28: Next-Gen Decision Making in 2ms with Apache Apex

28

Durability

• Two physically independent pipelines on the same cluster processing identical data

• For the same tuple, we find the best-case time between two pipelines– 39 records out of 5.2M exceeded 16ms – 173 out of 5.2M exceeded 16ms in one pipeline but succeeded in the other

• 99.99925% success rate – “Five Nines”•Average Latency of 0.0981ms

Page 29: Next-Gen Decision Making in 2ms with Apache Apex

29

Page 30: Next-Gen Decision Making in 2ms with Apache Apex

30

Page 31: Next-Gen Decision Making in 2ms with Apache Apex

31

Appendix

Page 32: Next-Gen Decision Making in 2ms with Apache Apex

32

Streaming Technologies Evaluated

• Spark Streaming• Samza• Storm• Feedzai• Infosphere Streams• Flink• Ignite• VoltDB• Cassandra• Apex

• Of all evaluated technologies, Apache Apex is the only technology that is ready to bring the decision making solution to production based on:– Maturity– Fault-tolerance– Enterprise-readiness– Performance

• Focus on open source• Drive Roadmap• Competitive Advantage for C1

Page 33: Next-Gen Decision Making in 2ms with Apache Apex

33

Stream Processing – Apache Storm

• An open-source, distributed, real-time computation system– Logical operators (spouts and bolts) form statically parallelizable topologies– Very high throughput of messages with very low latency– Can provide <10ms latency end-end under normal operation

• Basic abstractions provide an at-least-once processing guarantee

Limitations• Nimbus is a single point of failure

– Rectified by Hortonworks, but not yet available to the public (no timeline for release)• Upstream bolt/spout failure triggers re-compute on entire tree

– Can only create parallel independent stream by having separate redundant topologies• Bolts/spouts share JVM Hard to debug• Failed tuples cannot be replayed quicker than 1s• No dynamic topologies• Cannot add or remove applications without service interruption

Page 34: Next-Gen Decision Making in 2ms with Apache Apex

34

Stream Processing – Apache Flink

• An open-source, distributed, real-time computation system– Logical operators are compiled into a DAG of tasks executed by Task Managers– Supports streaming, micro-batch, batch compute– Supports aggregate operations on streams (reduce, join, groupBy)– Capable of <10 ms end-end latency with streaming under normal operation

• Can provide exactly-once processing guarantees

Limitations• Failures trigger reset of ALL operators to last checkpoint

– Depends on upstream message broker to track state• Operators share JVM

– Failure in one brings down all tasks sharing that JVM– Hard to debug

• No dynamic topologies• Young community, young product

Page 35: Next-Gen Decision Making in 2ms with Apache Apex

35

Stream Processing – Apache Apex

• An open-source, distributed, real-time computation system on YARN• Apex is the core system powering DataTorrent, released under ASF• Demonstrated high throughput with low latency running a next-generation

C1 model (avg. 0.25ms, max 2ms, @ 70k records/sec) w/ 600GB RAM• True YARN application developed from principles of Hadoop and YARN at

Yahoo!

• Mature product (derived from proven solutions in Yahoo! Finance and Hadoop) – Built by team under Phu Hoang (CEO of DataTorrent, Head of Engineering at Yahoo)

who built Hadoop– Amol (CTO of DataTorrent) led the team that built YARN

• DataTorrent (Apex) is executing on production clusters at Fortune 100 companies.

Page 36: Next-Gen Decision Making in 2ms with Apache Apex

36

Stream Processing – Apache Apex

Maturity• Designed to process and manage global data for Yahoo! Finance

– Primary focus is on stability, fault-tolerance and data management– Only OSS streaming technology considered designed explicitly for the financial world

• Data or computation could never be lost or replicated• Architecture had to never go down • Goal was to make it rock-solid and enterprise-ready before worrying about performance

• Data flow across countries – perfect for use-case that requires cross-cluster interaction

Enterprise Readiness• Advanced support for:

– Encryption, authentication, compression, administration, and monitoring– Deployment at scale in the cloud and on-prem – AWS, Google Cloud, Azure

• Integrates with huge set of existing tools:– HDFS, Kafka, Cassandra, MongoDB, Redis, ElasticSearch, CouchDB, Splunk, etc.

Page 37: Next-Gen Decision Making in 2ms with Apache Apex

37

Apex Platform – Summary

• Apex Architecture– Networks of physically independent, parallelizable operators that scale dynamically– Dynamic topology modification and deployment– Self-healing, fault tolerant, & recoverable

• Durable messaging queues between operators, check-pointed in memory and on disk• Resource manager is a replicated YARN process, monitors and restarts downed operators

– No single point of failure, highly modular design– Can specify locality of execution (avoids network and inter-process latency)

• Guarantees at-least-once, at-most-once, or exactly-once processing

Directed Acyclic Graph (DAG)

Filtered

Stream

Output Stream

Tuple Tuple

Filtered Stream

Enriched Stream

Enriched

Stream

er

Operator

er

Operator

er

Operator

er

Operator

Page 38: Next-Gen Decision Making in 2ms with Apache Apex

38

Apex Platform – Overview

Page 39: Next-Gen Decision Making in 2ms with Apache Apex

39

Apex Platform – Malhar

Page 40: Next-Gen Decision Making in 2ms with Apache Apex

40

Apex Platform – Cluster View

Hadoop Edge Node

DT RTS Management

Server

Hadoop Node

YARN ContainerRTS App Master

Hadoop Node

YARN ContainerYARN Container

YARN Container

Thread1

Op2

Op1

Thread-N

Op3

Streaming Container

Hadoop Node

YARN ContainerYARN Container

YARN Container

Thread1

Op2

Op1

Thread-N

Op3

Streaming Container

CLI

REST API

DT RTS Management

Server

REST API

Part of Community Edition

Page 41: Next-Gen Decision Making in 2ms with Apache Apex

41

Apex Platform – Operators

• Operators can be dynamically scaled

• Flexible stream configuration• Parallel Redis / HDHT DAGs• Separate visualization DAG

• Parallel partitioning• Durability of data• Scalability• Organization for in-memory store

• Unifiers• Combine statistics from physical

partitions

Page 42: Next-Gen Decision Making in 2ms with Apache Apex

42

Dynamic Topology Modification

• Can redeploy new operators and models at run-time!

• Can reconfigure settings on the fly

Page 43: Next-Gen Decision Making in 2ms with Apache Apex

43

Apex Platform – Failure Recovery

• Physical independence of partitions is critical

• Redundant STRAMs

• Configurable window size and heartbeat for low-latency recovery

• Downstream failures do not affect upstream components – Snapshotting only depends on previous operator, not all previous operators– Can deploy parallel DAGs with same point of origin (simpler from a hardware and

deployment perspective)

Page 44: Next-Gen Decision Making in 2ms with Apache Apex

44

Apex Platform – Windowing

• Sliding window and tumbling window

• Window based on checkpoint

• No artificial latency

• Used for stats measurement

Page 45: Next-Gen Decision Making in 2ms with Apache Apex

45

• Apex– Great UI to monitor, debug, and control system performance– Fault-tolerance and recovery out of the box - no additional setup, or improvement

needed• YARN is still a single point of failure, a name node failure can still impact the system

– Built-in support for dynamic and automatic scaling to handle larger throughputs– Native integration with Hadoop, YARN, and Kafka – next-gen standard at C1– Mature product

• Apex is derived from the principles of Hadoop and YARN over the course of many years• Built and planned by chief Hadoop architects

– Proven performance in production at Fortune 100 companies

Enterprise Readiness

Page 46: Next-Gen Decision Making in 2ms with Apache Apex

46

Enterprise Readiness

• Storm – Widely used but abandoned by creators at Twitter for Heron in production

• Storm debug-ability - topology components are bundled in one process• Resource demands

– Need dedicated hardware– Can’t scale on demand or share usage

• Topology creation/tear-down is expensive, topologies can’t share cluster resources– Have to manually isolate & de-commission machines

– Performance in failure scenarios is insufficient for this use-case

• Flink– Operational performance has not been proven

• Only one company (ResearchGate) officially uses Flink in production– Architecture shares fundamental limitations of Storm with regards to

dynamically scaling operators & topologies and debugability– Performance in failure scenarios is insufficient for this use-case

Page 47: Next-Gen Decision Making in 2ms with Apache Apex

47

Performance

• Storm – Meets latency and throughput requirements only when no failures occur. – Resilience to failures only possible by running fully independent clusters– Difficult to debug and operationalize complex systems (due to shared JVM and poor

resource management)• Flink

– Broader toolset than Storm or Apex – ML, batch processing, and SQL-like queries– Meets latency and throughput requirements only when no failures occur. – Failures reset ALL operators back to the source – resilience only possible across

clusters– Difficult to debug and operationalize complex systems (due to shared JVM)

• Apex– Supports redundant parallel pipelines within the same cluster– Outstanding latency and throughput even in failure scenarios– Self-healing independent operators (simple to isolate failures)– Only framework to provide fine-grained control over data and compute locality

Page 48: Next-Gen Decision Making in 2ms with Apache Apex

48

Roadmap – Storm

• Commercial support from from Hortonworks but limited code contributions

• Twitter - Storm’s largest user - has completely abandoned Storm for Heron• Business Continuity

– Enhance Storm’s enterprise readiness with high availability (HA) and failover to standby clusters

– Eliminate Nimbus as a single point of failure• Operations

– Apache Ambari support for Nimbus HA node setup– Elastic topologies via YARN and Apache Slider. – Incremental improvements to Storm UI to easily deploy, manage and monitor

topologies.• Enterprise readiness

– Declarative writing of spouts, bolts, and data-sources into topologies

Page 49: Next-Gen Decision Making in 2ms with Apache Apex

49

Roadmap – Flink

• Fine-grained fault tolerance (avoid rollback to data source) – Q2 2015

• SQL on Flink – Q3/Q4 2015

• Integrate with distributed memory storage – No ECD

• Use off-heap memory – Q1 2015

• Integration with Samoa, Tez, Mahout DSL – No ECD

Page 50: Next-Gen Decision Making in 2ms with Apache Apex

50

Roadmap – Apex

• Roadmap for next 6 months

• Support creation of reusable pluggable modules (topologies)• Add additional operators to connect to existing technology

– Databases– Messaging– Modeling systems

• Add additional SQL-like operations– Join– Filter– GroupBy– Caching

• Add ability to create cycles in graph– Allows re-use of data for ML algorithms (similar to Spark’s caching)

Page 51: Next-Gen Decision Making in 2ms with Apache Apex

51

Road Map Comparison

• Storm– Roadmap is intended to bring Storm to enterprise readiness Storm is not enterprise

ready today according to Hortonworks• Flink

– Roadmap brings Flink up to par with Spark and Apex, does not create new capabilities relative to either

– Spark is more mature for batch-processing and micro-batch and Apex is more mature from a streaming standpoint.

• Apex– No need to improve core architecture, focus is instead on adding functionality

• Better support for ML• Better support for wide variety of business use cases• Better integration with existing tools

– Stated commitment to letting the community dictate direction. From incubator proposal:• “DataTorrent plans to develop new functionality in an open, community-driven way”

Page 52: Next-Gen Decision Making in 2ms with Apache Apex

52

Community

• Vendor and community involvement drive roadmap and project growth

• Storm– Limited improvements to core components of Storm in recent months– Limited focused and active committers– Actively promoted and supported in public by Hortonworks

• Flink– Some adoption in Europe, growing response in U.S.– 11 active committers, 10 are from Data Artisans (company behind Flink)– Community is very young, but there is substantial interest

• Apex– Wide support network around Apex due to its evolution from Hadoop and YARN– Young but actively growing community: http://incubator.apache.org/projects/apex.html– Opportunity for C1 to drive growth and define the direction of this product

Page 53: Next-Gen Decision Making in 2ms with Apache Apex

53

Streaming Solutions Comparison

• Apex– Ideal for this use case, meets all performance requirements and is ready for out-of-the-

box enterprise deployment– Committer status from C1 allows us to collaboratively drive roadmap and product

evolution to fit our business need.• Storm

– Great for many streaming use cases but not the right fit for this effort– Performance in failure scenarios does not meet our requirements– Community involvement is waning and there is a limited road map for substantial

product growth• Flink

– Poised to compete with Spark in the future based on community activity and roadmap – Not ready for enterprise deployment:

• Technical limitations around fault-tolerance and failure recovery• Lack of broad community involvement• Roadmap only brings it up to par with existing frameworks

Page 54: Next-Gen Decision Making in 2ms with Apache Apex

54

New Capabilities Provided by Proposed Architecture

• Millisecond Level Streaming Solution• Fault Tolerant & Highly Available• Parallel Model Scoring for Arbitrary Number of Models• Quick Model Generation & Execution• Dynamic Scalability based on Latency or Throughput• Live Model Refresh• A/B Testing of Models in Production• System is Self Healing upon failure of components (**)

Page 55: Next-Gen Decision Making in 2ms with Apache Apex

55

Decisioning System Architecture - Strengths

• Internal– Capital One software, running on Capital One hardware, designed by Capital One

• Open source– Internally maintainable code

• Living Model– Can be re-trained on current data & updated in minutes, not years– Offline models can expanded and re-developed and deployed to production at will

• Extensible– Modular architecture with swappable components

• A/B Model Testing in Production• Dynamic Deployment / Refresh of Models

Page 56: Next-Gen Decision Making in 2ms with Apache Apex

56

Hardware

MDC Hardware Specifications• Server Quantity – 15• Server Model – Supermicro• CPU – Intel Xeon E5-2695v2 2.4Ghz

12Cores • Memory – 256GB• HDD – (5) 4TB Seagate SATA • Network Switch – Cisco Nexus 6001

10GB • NIC – 2port SFP+ 10GbE

MDC Software Specifications• Hadoop – v2.6.0• Yarn – v2.6.0• Apache Apex – v3.0• Linux OS – RHEL v6.7• Linux OS Kernel - 2.6.32-

573.7.1.el6.x86_64

Page 57: Next-Gen Decision Making in 2ms with Apache Apex

57

Performance Comparison - Redis vs. Apex-HDHT

Apex-HDHT - Thread Local on ~2M eventsStats Percentiles (in ms)

Throughput Count Avg (ms) 90% 95% 99% 99.9% 4 9’s 5 9’s 6 9’s

70k/sec 1,807,283 0.253 1 1 1 2 2 2 2

Apex-HDHT Thread Local on ~54M eventsStats Percentiles (in ms)

Throughput Count Avg (ms) 90% 95% 99% 99.9% 4 9’s 5 9’s 6 9’s

70k/sec 54,126,122 0.19 1 1 1 2 2 5 6

Apex-HDHT No locality on ~2M eventsStats Percentiles (in ms)

Throughput Count Avg (ms) 90% 95% 99% 99.9% 4 9’s 5 9’s 6 9’s

40k/sec 2,214,777 51.651 98 126 381 489 494 495 495

Redis Thread local on ~2M eventsStats Percentiles (in ms)

Throughput Count Avg (ms) 90% 95% 99% 99.9% 4 9’s 5 9’s 6 9’s

8.5k/sec 2,018,057 13.654 16 18 20 21 22 22 22