complex event processing: use cases & flinkcep library (flink.tw meetup 2016/07/19)

Post on 15-Apr-2017

615 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Complex Event Processing:Use Cases & FlinkCEP Library

Gordon Tai - @tzulitai

July 19, 2016 @ Flink.tw Meetup

00 This Talk is About ...● How FlinkCEP got me interested in Flink

● CEP use cases & applications○ Use case study #1: tracking an order process○ Use case study #2: advertisement targeting

● A look at the API

1

● 戴資力(Gordon)● Data Engineer @ VMFive● Java, Scala● Using Flink as an user on VMFive’s Adtech platform● Enjoy working on distributed computing systems● Works on Flink during free time● Contributor: Flink Kinesis Consumer connector

00 Me & Flink

2

Tale of a Data Engineer trying to figure out how to build up a streaming analytics pipeline ...

1. First lesson: non-trivial streaming applications are never stateless

2. Second lesson: statefull streaming topologies are a pain

3

1. Exactly-once state updates on failures for correctness2. Idempotance wrt. external state stores3. Out-of-order events4. Aggregating on time windows5. Rapid application development

Applications I was working on:Streaming aggregation for reporting &Conversion patterns for alerting

4

● Generate derived events when a specified pattern on raw events occur in a data stream○ if A and then B → infer complex event C

● Goal: identify meaningful event patterns and respond to them as quickly as possible

● Demanding on the stream processor to provide robust state handling & out-of-order events support while keeping low latency with high throughput

01 Complex Event Processing

6

02 Apache Flink CEP Library● Built upon Flink’s

DataStream API

● Allows users to define patterns, inject them on event streams, and generates new event streams based on the pattern

● Exploits Flink’s exactly-once semantics for definite correctness

7

eCommerce Order Process TrackingUse case study #1

** Note: the illustrations & content in this section is from Data Artisans’ presentation: Streaming Analytics & CEP - Two Sides of the Same Coin?

03 Order Tracking Data Model

● Order(orderId, tStamp, “received”) extends Event● Shipment(orderId, tStamp, “shipped”) extends Event● Delivery(orderId, tStamp, “delivered”) extends Event

8

04 Real-Time Warnings for SLAs

● ProcessSucc(orderId, tStamp, duration)● ProcessWarn(orderId, tStamp)● DeliverySucc(orderId, tStamp, duration)● DeliveryWarn(orderId, tStamp)

New inferred events:

9

05 Glimpse at the FlinkCEP APIval processingPattern = Pattern .begin[Event]("orderReceived").subtype(classOf[ Order]) .followedBy( "orderShipped").where(_.status == "shipped") .within(Time.hours(1))

val processingPatternStream = CEP.pattern( input.keyBy( "orderId"), processingPattern)

val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select { (pP, timestamp) => // Timeout handler ProcessWarn(pP("orderReceived").orderId, timestamp) } { fP => // Select function ProcessSucc( fP("orderReceived").orderId, fP( "orderShipped").tStamp, fP("orderShipped").tStamp – fP( "orderReceived").tStamp) }

10

06 Glimpse at the FlinkCEP APIval env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment

val input: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(...))

val processingPattern = Pattern.begin(...)...

val processingPatternStream = CEP.pattern(input.keyBy( "orderId"), processingPattern)

val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select(...)

procResult.addSink(new RedisSink(...)) // .addSink(new FlinkKafkaProducer09(...)) // .addSink(new ElasticsearchSink(...)) // .map(new MapFunction{...}) // … anything you’d like to continue to do with the inferred event stream

env.execute()

11

07 Glimpse at the FlinkCEP APIval env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironmentenv.setStreamTimeCharacteristic( TimeCharacteristic.EventTime)

val input: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(...)).assignTimestampsAndWatermarks(new CustomExtractor)

val processingPattern = Pattern.begin(...)...

val processingPatternStream = CEP.pattern(input.keyBy( "orderId"), processingPattern)

val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select(...)

procResult.addSink(new RedisSink(...)) // .addSink(new FlinkKafkaProducer09(...)) // .addSink(new ElasticsearchSink(...)) // .map(new MapFunction{...})

env.execute()

12

08 Combining Stream SQL & CEP

● Further reading: Streaming Analytics & CEP - Two Sides of the Same Coin?

13

Ad Targeting based on User AttributionUse case study #2

** Note: the content in this section is heavily based on my experience at VMFive 14

09 Ad Targeting 101

● What an ad server does, in a nutshell →determine an appropriate advertisement, chosen from an advertisement campaign pool, for each incoming ad request

AdServer

CampaignPool

(1) request advertisement

(2) return appropriateadvertisement infofrom campaign pool

● “appropriate”:fulfill the targeting rules ofeach campaign

15

10 Ad Targeting Rule Types

● Fundamental campaign targeting rule types:○ Target users’ current location, ex. users in Taipei○ Target specific user device type, ex. tablet or phone○ ...

● Advanced campaign targeting rule types:○ Target user’s past location trace, ex. in Taipei for the past 7 days○ Target users entering / departuring countries○ Target users with specific attribution, ex. viewed ○ ...

16

11 Ad Targeting Rule Types

● Fundamental campaign targeting rule types:○ Target users’ current location, ex. users in Taipei○ Target specific user device type, ex. tablet or phone○ ...

● Advanced campaign targeting rule types:○ Target user’s past location trace, ex. in Taipei for the past 7 days○ Target users entering / departuring countries○ Target users with specific attribution, ex. viewed ○ ...

● Does not require event aggregation● The rules can be matched simply

based on info at request time

● Requires aggregation of historical events● Aggregating at request time will be far too slow● Requires inferring complex events from patterns in

raw event stream → CEP to the rescue!

16

12 Basic Ad Targeting Architecture

Campaign PoolTargeting Cache

Ad Targeter

register adcampaigns

Event Logger

Web

Ser

vice

AdServerData Warehouse

17

(1) initialconnection

12 Basic Ad Targeting Architecture

Campaign PoolTargeting Cache

Ad Targeter

Event Logger

Web

Ser

vice

AdServerData Warehouse

17

(2) fetch ad

12 Basic Ad Targeting Architecture

Ad Targeter

Event Logger

Web

Ser

vice

AdServerData Warehouse

Raw Logs

Event Bus Service

Reporting & analyticsservices

Bat

ch

Stre

amin

g

...

Campaign PoolTargeting Cache

18

(3) eventtracking

13 Advanced Ad Targeting Architecture

Ad Targeter

Event Logger

Web

Ser

vice

AdServerData Warehouse

Raw Logs

Event Bus Service

Reporting & analyticsservices

Bat

ch

Stre

amin

g

...

Rul

es S

ervu

ce

Campaign PoolTargeting Cache

CEP

19

13 Advanced Ad Targeting ArchitectureData Warehouse

Raw Logs

Event Bus Service

Bat

ch

Stre

amin

g

...

Rul

es S

ervi

ce

CEP

CEP-Rule Templates

RuleFulfillment

Cache(Redis)

Entry /Depart

UserAttribution ...

(1) Inject a ruleto start matchingon event stream

(3)submitCEPtopology

(2) Return Rule ID

20

13 Advanced Ad Targeting ArchitectureData Warehouse

Raw Logs

Event Bus Service

Bat

ch

Stre

amin

g

...

Rul

es S

ervi

ce

CEP

CEP-Rule Templates

RuleFulfillment

Cache(Redis)

Entry /Depart

UserAttribution ...

(4) When CEP pattern is fulfilled,write to cache:UID → RuleID

(5) Lookup whether a UID has fulfilled aRuleID

21

13 Advanced Ad Targeting Architecture

Ad Targeter

register adcampaigns

Event Logger

Web

Ser

vice

AdServerData Warehouse

Raw Logs

Event Bus Service

Reporting & analyticsservices

Bat

ch

Stre

amin

g

...

Rul

es S

ervi

ce

Campaign PoolTargeting Cache

CEP

22

(1) register rulefor campaign

(2) lookup whetheruser fulfils a rule

14 Some Discussion

● Why a fixed pool of CEP-Rule Templates?○ Prevent rogue rules to match, ex. rules that will consume too much resource○ It’s a lot less work and complication ;)

● Would be very nice to have a freestyle rule service○ Pattern matching across different event streams of an organization○ For BI, there will be arbitrary complex events / patterns analysts want to monitor

● Further study for similar use case: King’s RBEA○ RBEA: Rule-Based Event Aggregator○ https://techblog.king.com/rbea-scalable-real-time-analytics-king/

○ http://data-artisans.com/rbea-scalable-real-time-analytics-at-king/

23

Closing

XX Closing● Complex Event Processing is an emerging way to draw

insights from data streams, and is demanding of the underlying stream processor for exactly-once semantics for correctness

● FlinkCEP builds on the DataStreamAPI to make this possible and easy

24

top related