apache flink training - advanced windowing

20
1 Apache Flink® Training Flink v1.3 9.9.2017 DataStream API Advanced Windowing

Upload: dataartisans

Post on 17-Mar-2018

1.336 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Apache Flink Training - Advanced Windowing

1

Apache Flink® Training

Flink v1.3 – 9.9.2017

DataStream API

Advanced Windowing

Page 2: Apache Flink Training - Advanced Windowing

Windows

The lower-level details

2

Page 3: Apache Flink Training - Advanced Windowing

Specifying Windowing

3

stream

.keyBy() / keyed vs non-keyed windows

.window() / “Assigner”

.trigger() / each Assigner has a default Trigger

.evictor() / default: no Evictor

.allowedLateness() / default: zero

.reduce|apply|process()

Page 4: Apache Flink Training - Advanced Windowing

Triggers

Determines when a window is ready to have its

process or window function called, e.g.,

• by counting

• by comparing timestamps to the current watermark

Each WindowAssigner comes with a default

Trigger

4

Page 5: Apache Flink Training - Advanced Windowing

Trigger interface

onElement()

onEventTime()

onProcessingTime()

onMerge()• merges the states of two triggers when their windows merge

clear()• use this to clear any managed state, etc

5

Page 6: Apache Flink Training - Advanced Windowing

Trigger interface

onElement()

onEventTime()

onProcessingTime()

onMerge()• merges the states of two triggers when their windows merge

clear()• use this to clear any managed state

6

return a TriggerResult

Page 7: Apache Flink Training - Advanced Windowing

TriggerResult

CONTINUE

FIRE

PURGE

FIRE_AND_PURGE

Purging only removes the window’s contents

7

Page 8: Apache Flink Training - Advanced Windowing

8

public class CountTrigger<W extends Window> extends Trigger<Object, W> {

private final long maxCount;

private final ReducingStateDescriptor<Long> stateDesc =

new ReducingStateDescriptor<>("count", new Sum(),

LongSerializer.INSTANCE);

private CountTrigger(long maxCount) {

this.maxCount = maxCount;

}

. . .

}

Page 9: Apache Flink Training - Advanced Windowing

9

@Override

public TriggerResult onElement(Object element,

long timestamp,

W window,

TriggerContext ctx) throws Exception {

ReducingState<Long> count = ctx.getPartitionedState(stateDesc);

count.add(1L);

if (count.get() >= maxCount) {

count.clear();

return TriggerResult.FIRE;

}

return TriggerResult.CONTINUE;

}

Page 10: Apache Flink Training - Advanced Windowing

10

@Overridepublic TriggerResult onEventTime(long time, W window, TriggerContext ctx) {

return TriggerResult.CONTINUE;}

@Overridepublic TriggerResult onProcessingTime(long time, W window,

TriggerContext ctx) throws Exception {return TriggerResult.CONTINUE;

}

@Overridepublic void clear(W window, TriggerContext ctx) throws Exception {

ctx.getPartitionedState(stateDesc).clear();}

@Overridepublic boolean canMerge() {

return true;}

@Overridepublic void onMerge(W window, OnMergeContext ctx) throws Exception {

ctx.mergePartitionedState(stateDesc);}

Page 11: Apache Flink Training - Advanced Windowing

Evictors

Control which elements are passed to the window function (by removing elements from a window)

Always called after the trigger fires, but beforeand/or after the window function• evictBefore()

• evictAfter()

11

Page 12: Apache Flink Training - Advanced Windowing

Late elements

When allowedLateness > 0

Late elements can trigger late firings

With merging windows, late firings can lead to

late merging

12

Page 13: Apache Flink Training - Advanced Windowing

ProcessWindowFunction

New and improved version of WindowFunction

that has more information about the window-

firing context

Has API for accessing both per-window state

and global state (both per key)

1

3

Page 14: Apache Flink Training - Advanced Windowing

ProcessWindowFunction (API)

1

4

abstract class ProcessWindowFunction<IN, OUT, KEY, W extends Window>extends AbstractRichFunction {

/** Called for each window firing. */ public abstract void process(

KEY key,Context context,Iterable<IN> elements,Collector<OUT> out);

/** Called when a window is being purged. */public void clear(Context context) throws Exception {}

}

Page 15: Apache Flink Training - Advanced Windowing

Context (API)

1

5

public abstract class Context implements java.io.Serializable {public abstract W window();

public abstract long currentProcessingTime();public abstract long currentWatermark();

public abstract KeyedStateStore windowState();public abstract KeyedStateStore globalState();

}

Page 16: Apache Flink Training - Advanced Windowing

Per-Window vs Global state

Both per-window and global state are still tied to

a key

Per-window state is additionally scoped to the

key of the processed window

Global state is the same for all windows

1

6

Page 17: Apache Flink Training - Advanced Windowing

Example – Differential Window

1

7

private static class DifferentialWindowFunctionextends ProcessWindowFunction<Long, Tuple2<Long, Long>, String, TimeWindow> {

private final static ValueStateDescriptor<Long> previousFiringState =new ValueStateDescriptor<>("previous-firing", LongSerializer.INSTANCE);

private final static ReducingStateDescriptor<Long> firingCounterState =new ReducingStateDescriptor<>(

"firing-counter", new Sum(), LongSerializer.INSTANCE);

@Overridepublic void process(…) { … }

}

Page 18: Apache Flink Training - Advanced Windowing

Example – Differential Window

1

8

@Overridepublic void process(

String key,Context context, Iterable<Long> values, Collector<Tuple2<Long, Long>> out) {

ValueState<Long> previousFiring =context.windowState().getState(previousFiringState);

ReducingState<Long> firingCounter =context.windowState().getState(firingCounterState);

Long output = Iterables.getOnlyElement(values);if (firingCounter.get() == null) {// first firingout.collect(Tuple2.of(0L, output));

} else {// subsequent firingout.collect(Tuple2.of(firingCounter.get(), output - previousFiring.value()));

} firingCounter.add(1L);previousFiring.update(output);

}

Page 19: Apache Flink Training - Advanced Windowing

Example – Differential Window

1

9

@Overridepublic void clear(Context context) {

ValueState<Long> previousFiring =context.windowState().getState(previousFiringState);

ReducingState<Long> firingCounter =context.windowState().getState(firingCounterState);

previousFiring.clear();firingCounter.clear();

}

Page 20: Apache Flink Training - Advanced Windowing

Window anti-patterns

Using an Evictor prevents pre-aggregation

A sliding window 1 day long with a slide of 1

minute will make 24x60 copies of every

element – so don’t do that!

20