building robust, adaptive streaming apps with spark streaming

Download Building Robust, Adaptive Streaming Apps with Spark Streaming

Post on 16-Apr-2017

2.819 views

Category:

Engineering

2 download

Embed Size (px)

TRANSCRIPT

  • Building Robust, Adaptive Streaming

    Apps with Spark Streaming

    Tathagata TD Das@tathadas

    Spark Summit East 2016

  • Who am I?

    Project Management Committee (PMC) member of Spark

    Started Spark Streaming in grad school - AMPLab, UC Berkeley

    Current technical lead of Spark Streaming

    Software engineer at Databricks

    2

  • Streaming Apps in the Real World

    Building a high volume stream processing system in production has many challenges

    3

    Fast and Scalable Spark Streaming is fast, distributed and scalable by design!

    Running in production at many places with large clusters and high data volumes

  • Streaming Apps in the Real World

    Building a high volume stream processing system in production has many challenges

    4

    Fast and Scalable Easy to program

    Spark Streaming makes it easy to express complex streaming business logic

    Interoperates with Spark RDDs, Spark SQL DataFrames/Datasets and MLlib

  • Streaming Apps in the Real World

    Building a high volume stream processing system in production has many challenges

    5

    Fast and Scalable Easy to programFault-tolerant

    Spark Streaming is fully fault-tolerant and can provide end-to-end semantic guarantees

    See my previous Spark Summit talk for more details

  • Streaming Apps in the Real World

    Building a high volume stream processing system in production has many challenges

    6

    Fast and Scalable Easy to programFault-tolerantAdaptive Focus of this talk

  • Adaptive Streaming Apps

    Processing conditions can change dynamically Sudden surges in data ratesDiurnal variations in processing loadUnexpected slowdowns in downstream data stores

    Streaming apps should be able to adapt accordingly

    7

  • 8

    BackpressureMake apps robust

    against data surges

    Elastic ScalingMake apps scale with

    load variations

  • 9

    BackpressureMake apps robust

    against data surges

  • Motivation

    Stability condition for any streaming app Receive data only as fast as the system can process it

    Stability condition for Spark Streamings micro-batch modelFinish processing previous batch before next one arrives

    10

  • Stable micro-batch operation

    11

    batch interval

    1s 1s 1s 1s

    batch processing time stable

    Spark Streaming runs micro-batches at fixed batch intervals

    time0s 2s 4s 6s 8s

  • Unstable micro-batch operation

    12

    batch processing time > batch interval

    2.1s 2.1s 2.1s 2.1s

    scheduling delay builds up

    Batches continuously gets delayed and backlogged => unstable

    time

    Spark Streaming runs micro-batches at fixed batch intervals

    0s 2s 4s 6s 8s

  • Backpressure: Feedback Loop

    13

    Backpressure introduces a feedback loop to dynamically adapt the system and avoid instability

  • Backpressure: Dynamic rate limiting

    14

    Batch processing timesScheduling delays

    Batch processing times and scheduling delays used to continuously estimate current processing rates

  • Backpressure: Dynamic rate limiting

    15

    Batch processing timesScheduling delays

    Max stable processing rate estimated with PID Controller theoryWell known feedback mechanism used in industrial control systems

    PID Controller

  • Backpressure: Dynamic rate limiting

    16

    Batch processing timesScheduling delays

    Rate limits on ingestion

    Accordingly, the system dynamically adapts the limits on the data ingestion rates

    PID Controller

  • Data buffered in Kafka, ensures Spark Streaming stays stable

    Backpressure: Dynamic rate limiting

    17

    If HDFS ingestion slows down, processing times increase

    SS lowers rate limits to slow down receiving from Kafka

  • Backpressure: Configuration

    Available since Spark 1.5

    Enabled through SparkConf, setspark.streaming.backpressure.enabled = true

    18

  • 19

    Elastic ScalingMake apps scale with

    load variations

  • Elastic Scaling (aka Dynamic Allocation)

    Scaling the number of Spark executors according to the load

    Spark already supports Dynamic Allocation for batch jobsScale down if executors are idleScale up if tasks are queueing up

    Streaming micro-batch jobs need different scaling policyNo executor is idle for a long time!

    20

  • Scaling policy with Streaming

    21

    Load is very lowLots idle time between batches

    Cluster resources wasted

    stable but inefficient

  • Scaling policy with Streaming

    22

    scale down the cluster

    scaling down increases batch processing timesstable operation with less resources wasted

    stable but inefficient

    stable and efficient

  • Scaling policy with Streaming

    23

    scaling up reduces batch processing timesunstable system becomes stable again

    scale up the cluster

    unstable

    stable

  • Elastic Scaling

    24

    Data buffered in Kafka starts draining, allowing app adapt to any data rate

    If Kafka gets data faster than what backpressure allows

    SS scales up cluster to increase processing rate

  • Elastic Scaling: Configuration

    Will be available in Spark 2.0

    Enabled through SparkConf, setspark.streaming.dynamicAllocation.enabled = true

    More parameters will be in the online programming guide

    25

  • Elastic Scaling: Configuration

    Make sure there is enough parallelism to take advantage of max cluster size

    # of partitions in reduce, join, etc.# of Kafka partitions# of receivers

    Gives usual fault-tolerance guarantees with files, Kafka Direct, Kinesis, and receiver-based sources with WAL enabled

    26

  • 27

    Backpressure Elastic Scaling

  • 28

    Awesome Adaptive

    Apps

  • 29

    Demo

    How app behaves when the data rate suddenly

    increases 20x

  • Demo

    30

    Processing time increases with data rate, until equal to batch interval

    Backpressure limits the ingestion rate lower than 20k recs/sec to keep app stable

  • Demo

    31

    Elastic Scaling detects heavy load and increases cluster size

    Processing times reduces as more resources available

  • Demo

    32

    Backpressure relaxes limits to allow higher ingestion rate

    But still less than 20x as cluster is fully utilized

  • Takeaways

    Backpressure: makes apps robust to sudden changes

    Elastic Scaling: makes apps adapt to slower changes

    Backpressure + Elastic Scaling = Awesome Adaptive Apps

    Follow me @tathadas