massively-parallel stream processing under qos constraints with nephele

17
Massively-Parallel Stream Processing Under QoS Constraints with Nephele Björn Lohrmann, Daniel Warneke, and Odej Kao Technische Universität Berlin

Upload: technische-universitaet-berlin

Post on 13-Jan-2015

156 views

Category:

Technology


2 download

DESCRIPTION

Today, a growing number of commodity devices, like mobile phones or smart meters, is equipped with rich sensors and capable of producing continuous data streams. The sheer amount of these devices and the resulting overall data volumes of the streams raise new challenges with respect to the scalability of existing stream processing systems. At the same time, massively-parallel data processing systems like MapReduce have proven that they scale to large numbers of nodes and efficiently organize data transfers between them. Many of these systems also provide streaming capabilities. However, unlike traditional stream processors, these systems have disregarded QoS requirements of prospective stream processing applications so far. We intend to address this gap. First, we analyze common design principles of today's parallel data processing frameworks and identify those principles that provide degrees of freedom in trading off the QoS goals latency and throughput. Second, we propose a scheme which allows these frameworks to detect violations of user-defined latency constraints and optimize the job execution without manual interaction in order to meet these constraints while keeping the throughput as high as possible. As a proof of concept, we implemented our approach for our parallel data processing framework Nephele and evaluated its effectiveness through a comparison with Hadoop Online. For a multimedia streaming application we can demonstrate an improved processing latency by factor of at least 15 while preserving high data throughput when needed. The Stratosphere Streaming Distribution, that implements the researched techniques, is available as open-source via github.com: https://github.com/bjoernlohrmann/stratosphere More about me: http://www.cit.tu-berlin.de/menue/personen/lohrmann_bjoern/parameter/en/

TRANSCRIPT

Page 1: Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Massively-Parallel

Stream Processing

Under QoS Constraints

with Nephele

Björn Lohrmann, Daniel Warneke, and Odej Kao

Technische Universität Berlin

Page 2: Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Background

22.06.2012 2Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Nephele is part of the Stratosphere platform for

massively-parallel data processing

in in

map red

match

out

Cloud

Cluster

PACTs

Compiler

Nephele

Runtime

Open Source, downloadable at stratosphere.eu

Page 3: Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Background

22.06.2012 3Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Nephele and PACTs currently focus on batch-job

workloads

-to-

What about streaming workloads?

Possible with Nephele, but (as of now) not PACTs

May have different goals

Meet pipeline latency and throughput requirements

Max/Min other custom metrics

Page 4: Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Motivation

22.06.2012 4Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Live Processing of streamed data is an important issue

Proliferation of mobile devices capable of producing

streamed data (video, audio, other sensors)

Large Scale Deployments of Sensors in Science and

Industry

Examples: Smart Grids, Traffic Monitoring, Astronomy

Why not adapt todays mass.-parallel frameworks?

Page 5: Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Goals

22.06.2012 5Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Identify major aspects of massively-parallel

frameworks that affect QoS goals

Find general strategies to deal with QoS goals

Implement & Evaluate them using the Nephele

Execution Engine

Page 6: Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Agenda

22.06.2012 6Massively-Parallel Stream Processing Under QoS Constraints with Nephele

1. Highlight common mass.-parallel framework design

principles

2. Explain implications for streamed workloads

3. Meeting latency requirements in Nephele

4. Experimental Results

Page 7: Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Framework Design

Principles

22.06.2012 7Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Task

n

Task

n+1

Task

n

Task

n+1

Task

n

Task

n+1

Task

n

Compute Node X

Compute Node Y

Compute Node Z

Input Buffer

Queue

Thread/ProcessOutput

Buffer

Data

Item

Page 8: Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Implications for Streaming

Applications

22.06.2012 8Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Large buffer = high tp, high latency

Small buffer = low tp, low latency

Trade-off needs to be found to meet latency goals

Thread/Process Model

1 Task= 1 Thread model is flexible, but has overhead

Thread scheduling, synchronization, communication

Serialization may be necessary (bad for TP & latency)

N Tasks = 1 Thread model can sometimes provide

better better tp and latency

Page 9: Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Meeting Latency

Requirements

22.06.2012 9Massively-Parallel Stream Processing Under QoS Constraints with Nephele

QoS goal:

Meet latency constraint X, then maximize throughput

Based on observations we designed two strategies:

1. Adaptive Output Buffer Sizing

2. Dynamic Task Chaining

Both strategies

work autonomously (only latency constraint is required)

are applied on-demand at runtime

are applicable in systems with similar design principles

Page 10: Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Adaptive Output Buffer

Sizing

22.06.2012 10Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Only applied when latency constraint violated

For each channel

Determine output buffer latency (obl)

If obl > threshold, decrease buffer size:

If obl < threshold, increase buffer size again

200,98.0

),max(:

r

rsizesize obl

310500,1.1

),min(:

r

rsizesize obl

Page 11: Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Task Chaining

Conditions:

Pipeline of unchained

tasks

Sum of CPU utilizations

is < 90% of capacity of

one core

Only apply to longest

chainable pipeline of

tasks

18.11.2013 Autor - Vortragstitel 11

Task

n

Task

n+1

Compute Node

Task

n

Task

n+1

Compute Node

Again, only applied when overall latency constraint is

violated

Page 12: Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Complete System Overview

22.06.2012 12Massively-Parallel Stream Processing Under QoS Constraints with Nephele

JM

300ms

TM TM TM TMTM TM TM

Periodical measurements

(latency, throughput)Buffer Size Updates,

Chain Commands

Page 13: Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Sample Application: Video

Livestreaming

22.06.2012 13Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Node 1 Node 2 Node n-1 Node n

Decoder

Merger

Overlay

Encoder

Partitioner

RTP

Server

Page 14: Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Latency w/o Optimizations

22.06.2012 14Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Setup:

10 nodes, 80 cores

32 KB output buffer

size

320 video streams

Results:

Latency oscillates

around 4s

Large buffers cause

Page 15: Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Latency w/ Adaptive Buffer

Sizing

22.06.2012 15Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Final Latency:

improvement)

Page 16: Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Latency /w ABS+TC

22.06.2012 16Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Final Latency:

improvement)

Page 17: Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Conclusion and Future Work

22.06.2012 17Massively-Parallel Stream Processing Under QoS Constraints with Nephele

Massively-parallel frameworks can be adapted to do

latency constrained stream processing

Prototype implementation on Nephele showed up to

94% latency improvement on video livestreaming job

Future Work

Distribute latency monitoring (better scalability)

Adapt PACT layer of Stratosphere to provide streaming

capabilities and latency awareness