haren: a framework for ad-hoc thread scheduling policies ...2019/06/26  · •haren is an...

25
Haren: A Framework for Ad-Hoc Thread Scheduling Policies for Data Streaming Applications Dimitris Palyvos-Giannas, Vincenzo Gulisano, Marina Papatriantafilou 13 th International Conference on Distributed and Event-Based Systems June 24-28, 2019, Darmstadt

Upload: others

Post on 16-Aug-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

Haren: A Framework for Ad-Hoc Thread Scheduling Policies for Data Streaming Applications

Dimitris Palyvos-Giannas, Vincenzo Gulisano, Marina Papatriantafilou

13th International Conference on Distributed and Event-Based Systems

June 24-28, 2019, Darmstadt

Page 2: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

2

Evaluation & Conclusions

Haren Implementation

Haren Framework Overview

Stream Processing & Scheduling

Page 3: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

Stream Processing Engine (SPE)

Stream Processing Basics

Data

Source

Tuple

Operator

Performance Metrics

Throughput

Latency

CPU Utilization

Memory Utilization

Stream

3

Page 4: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

Resource Scheduling

A

CB

E F

D

Input

SPE Instance (Process) 1

A B C D

SPE Instance (Process) 2

E F

4

Page 5: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

Thread Scheduling inside SPE Instances

• Operators are executed by CPU threads.

• Usually one dedicated thread per operator.

• What if #CPUs < #Operators?

• Operating System scheduler allocates CPU…

• …but it has no knowledge of streaming goals!

• Alternative: Application-level thread scheduling

• Can optimize for specific performance goals!

• For a (short) time interval, two questions:

1. How to assign operators to threads (inter-thread)?

2. What is the priority of operators for each thread (intra-thread)?

SPE Instance (Process) 1

A B C D

CPU

Thread

CPU

Thread

SPE Instance (Process) 1

A

B

CDCPU

Thread

CPU

Thread

CPU

Thread

CPU

Thread

5

Two scheduling functions – almost any policy!

Page 6: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

Custom Thread Scheduling

But there are obstacles…

• Low-level programming details.

• Difficult to program.

• Difficult to ensure efficiency and correctness.

• Schedulers programmed to specific SPE.

• Reinventing the wheel - cannot reuse code.

• Difficult to port scheduling policies to other SPEs.

Which operator to execute?

A B

It depends! Custom scheduling = more choice!

Minimize queue sizes Minimize latency

6

Most SPEs avoid custom

thread scheduling!

Page 7: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

Haren: A Scheduling Framework for Streaming

Compact interface that allows implementing arbitrary scheduling policies.

Configuration of both inter-thread and intra-thread rules.

Parallelization of scheduling computation when possible.

Haren hides the complexity of custom scheduling: User only programs high-level scheduling logic!

7

Reusable scheduling policies in different SPEs!

Page 8: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

8

Evaluation & Conclusions

Haren Implementation

Haren Framework Overview

Stream Processing & Scheduling

Page 9: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

Haren Overview

• Orchestrates operator execution using a group of

Processing Threads (PTs).

• Remains SPE-agnostic by using the features

abstraction.

• Retrieves necessary features of the operators from

the SPE through a well-defined interface.

• A feature is any value that characterizes an operator,

its streams or its tuples.

• Example features: cost, input stream size, …

• Maintains a table of operator features.

• Applies high-level user-defined scheduling

functions to the features to take scheduling

decisions.

Features

Op

era

tors

SPE Instance

getFeatures()

Processing Threads

Haren

run(operator)

Scheduling Functions,

Configuration Params

9

Page 10: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

Feature Categories

Independent (e.g., cost)

Can only change upon execution of Op

Op

Dependent (e.g., input stream size)

Can change upon execution of some other operator ≠ Op

Op

Static (e.g., operator type)

Remain constant

Dynamic (e.g., cost, input stream size)

Can change over time

10

Page 11: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

g

g

g

Intra-thread scheduling functionOperator Features → Priority

Inter & Intra Thread Scheduling Functions

1. Inter-thread: How to assign operators to threads?

2. Intra-thread: How to compute the priority of operators in each thread?

C

E

D

F

AB

A

BF

DC

E

AD C

BF

E

f

Inter-thread scheduling functionOperator Features → Thread ID

11

Operators

Page 12: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

Haren Processing Thread (PT) Behavior

• PTs execute operators most of the time (TE).

• Dynamic nature of stream processing ➞ features & priorities change over time.

• PTs periodically switch to scheduling, updating features and scheduling decisions (TS).

• Fine-grained control over scheduling overhead by tuning the scheduling period P.

time

Scheduling Period P

TE – Execution Task TS – Scheduling Task

12

Page 13: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

13

Evaluation & Conclusions

Haren Implementation

Haren Framework Overview

Stream Processing & Scheduling

Page 14: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

Execution Task TE

while running:

while elapsed_time < scheduling_period:

starting from the beginning of assigned

pick first operator that can run

(has input > 0 and output capacity > 0)

if found operator that can run:

process max b tuples

if no operator can run:

back-off (sleep)

goto scheduling task TS

A DE

Assigned Array

Processing Thread

Main Loop State

14

(Computed in previous TS)

Page 15: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

Scheduling Task TS

• Decides schedule for the next TE.

• Computes a new assigned array for each PT where operators sorted on priority.

• Most steps are executed in parallel by all PTs.

• Few sequential steps for PTs to synchronize and agree on scheduling decisions.

15

Execution Task TE

Scheduling Task TS

Execution Task TE

Dependent Features Update

Operator Assignment (f)

Priority Computation (g)

Operator Sorting

Independent Features Update

Processing Threads (PTs)

time

Page 16: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

TS: Independent Features Update

Execution Task TE

Scheduling Task TS

Execution Task TE

Dependent Features Update

Operator Assignment (f)

Priority Computation (g)

Operator Sorting

Independent Features Update

Processing Threads (PTs)

Features

Op

era

tors

F1 F2 F3 F4 F5 F6 F7 F8 F9

A

B

C

D

E

for op in Executed

update independent features of op

mark op & dependent operators

Marked Table (Bool)

(needed for next step)Feature Table

PT#1

PT#3

Processing Thread

16

PT#1

PT#2

Concurrent

Page 17: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

TS: Dependent Features Update

Execution Task TE

Scheduling Task TS

Execution Task TE

Dependent Features Update

Operator Assignment (f)

Priority Computation (g)

Operator Sorting

Independent Features Update

Processing Threads (PTs)

Features

Op

era

tors

F1 F2 F3 F4 F5 F6 F7 F8 F9

A

B

C

D

E

for op in All_Operators

if op is marked

update dependent features of op

Marked TableFeature Table

Thread t*

17

Sequential

Haren only updates features that:

Have (potentially) changed. Are used by the scheduling functions f, g.

Page 18: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

Features

Op

era

tors

F1 F2 F3 F4 F5 F6 F7 F8 F9

A

B

C

D

E

Feature Table

PT1

PT2

PT3

PT4

TS: Operator Assignment

Execution Task TE

Scheduling Task TS

Execution Task TE

Dependent Features Update

Operator Assignment (f)

Priority Computation (g)

Operator Sorting

Independent Features Update

Processing Threads (PTs)

for op in All_Operators

threadID = f(op)

append op to assigned[threadID]

Assigned

Operators per PT

Thread t*

18

f

Sequential

Page 19: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

TS: Priority Computation & Sorting

Execution Task TE

Scheduling Task TS

Execution Task TE

Dependent Features Update

Operator Assignment (f)

Priority Computation (g)

Operator Sorting

Independent Features Update

Processing Threads (PTs)

E C A B D

for op in assigned

priority[op] = g(op)

sort assigned on priority

Assigned (after priority computation)

Processing Thread

D E B C A

Assigned (after sort)

19

E C A B D

Assigned (after previous step)

g

Concurrent

Page 20: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

20

Evaluation & Conclusions

Haren Implementation

Haren Framework Overview

Stream Processing & Scheduling

Page 21: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

Evaluation

1. Performance comparison of dedicated threads (OS) vs Haren policies.

2. Scheduling overhead evaluation.

3. Multi-Class scheduling.

Evaluation setup:

• Queries → chains of operators.

• Varying cost and selectivity for each query.

• Varying parallelism (#queries).

• Odroid-XU4 devices.• Samsung Exynos5422 Cortex-A15 2Ghz and Cortex-A7 Octa core CPU, 2 GB RAM

• Resource constrained → Custom scheduling even more important.

• Java Haren implementation.

• Integrated with the lightweight Liebre SPE (https://github.com/vincenzo-gulisano/Liebre)

21

Page 22: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

Evaluation 1: Performance Comparison

500

1000

5 q

ueri

es

Throughput (t/s)

0.05

0.10

Mean Latency (s)

0.0

0.5

Max Latency (s)

0

250

Queued (tuples)

11

12Max Memory (MB)

25

50

75CPU (%)

500

1000

10

qu

eries

0

25

0

200

0.0

2.5

× 105

25

50

50

100

500

1000

15

qu

eries

0

50

0

250

0

1× 10

6

50

100

60

80

500 1000 1500

base cost (B)

500

1000

20 q

ue

rie

s

500 1000 1500

base cost (B)

0

50

500 1000 1500

base cost (B)

0

250

500 1000 1500

base cost (B)

0

1

× 106

500 1000 1500

base cost (B)

50

100

500 1000 1500

base cost (B)

80

90

OS HR FCFS Chain22

500

10005 q

ueri

es

Throughput (t/s)

0.05

0.10

Mean Latency (s)

0.0

0.5

Max Latency (s)

0

250

Queued (tuples)

11

12Max Memory (MB)

25

50

75CPU (%)

500

1000

10

qu

eries

0

25

0

200

0.0

2.5

× 105

25

50

50

100

500

1000

15

qu

eries

0

50

0

250

0

1× 10

6

50

100

60

80

500 1000 1500

base cost (B)

500

1000

20 q

ue

rie

s

500 1000 1500

base cost (B)

0

50

500 1000 1500

base cost (B)

0

250

500 1000 1500

base cost (B)

0

1

× 106

500 1000 1500

base cost (B)

50

100

500 1000 1500

base cost (B)

80

90

OS HR FCFS Chain

Operating System Scheduling (Dedicated Threads)

Unaware of streaming goals Highest Rate

Optimize mean latencyFirst-Come-First-Serve

Optimize maximum latency

Chain

Optimize total queue sizes

Haren

CPU,

Memory

and more

in the paper…

Page 23: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

Evaluation 2: Scheduling Overheads

0

2

%

parallelism = 5 parallelism = 10

Priority Sort Update Coord Total

0

2

%

parallelism = 15

Priority Sort Update Coord Total

parallelism = 20

HR FCFS Chain

Coord (Sequential Part)

Execution Task TE

Scheduling Task TS

Execution Task TE

Dependent Features Update

Operator Assignment (f)

Priority Computation (g)

Operator Sorting

Independent Features Update

Processing Threads (PTs)

23

Page 24: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

Evaluation 3: Multi-Class Scheduling

24

3 High Priority Queries

10 Low Priority Queries

Haren Scheduling Policy1. Prioritize High Queries over Low Queries

2. Optimize Max Latency for High Queries

3. Optimize Mean Latency for Low queries

Throughput (t/s)

Mean Latency (s)

Max Latency (s)

Page 25: Haren: A Framework for Ad-Hoc Thread Scheduling Policies ...2019/06/26  · •Haren is an all-purpose framework for scheduling in streaming. •Easy definition of ad-hoc thread scheduling

Conclusions

• Haren is an all-purpose framework for scheduling in streaming.

• Easy definition of ad-hoc thread scheduling policies.

• Expressive and efficient, can outperform dedicated threads approach.

• Parallelizes scheduling computations.

Dimitris Palyvos-Giannas

[email protected]

25