complex event processing: use cases & flinkcep library (flink.tw meetup 2016/07/19)

29
Complex Event Processing: Use Cases & FlinkCEP Library Gordon Tai - @tzulitai July 19, 2016 @ Flink.tw Meetup

Upload: apache-flink-taiwan-user-group

Post on 15-Apr-2017

615 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

Complex Event Processing:Use Cases & FlinkCEP Library

Gordon Tai - @tzulitai

July 19, 2016 @ Flink.tw Meetup

Page 2: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

00 This Talk is About ...● How FlinkCEP got me interested in Flink

● CEP use cases & applications○ Use case study #1: tracking an order process○ Use case study #2: advertisement targeting

● A look at the API

1

Page 3: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

● 戴資力(Gordon)● Data Engineer @ VMFive● Java, Scala● Using Flink as an user on VMFive’s Adtech platform● Enjoy working on distributed computing systems● Works on Flink during free time● Contributor: Flink Kinesis Consumer connector

00 Me & Flink

2

Page 4: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

Tale of a Data Engineer trying to figure out how to build up a streaming analytics pipeline ...

1. First lesson: non-trivial streaming applications are never stateless

2. Second lesson: statefull streaming topologies are a pain

3

Page 5: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

1. Exactly-once state updates on failures for correctness2. Idempotance wrt. external state stores3. Out-of-order events4. Aggregating on time windows5. Rapid application development

Applications I was working on:Streaming aggregation for reporting &Conversion patterns for alerting

4

Page 7: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

● Generate derived events when a specified pattern on raw events occur in a data stream○ if A and then B → infer complex event C

● Goal: identify meaningful event patterns and respond to them as quickly as possible

● Demanding on the stream processor to provide robust state handling & out-of-order events support while keeping low latency with high throughput

01 Complex Event Processing

6

Page 8: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

02 Apache Flink CEP Library● Built upon Flink’s

DataStream API

● Allows users to define patterns, inject them on event streams, and generates new event streams based on the pattern

● Exploits Flink’s exactly-once semantics for definite correctness

7

Page 9: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

eCommerce Order Process TrackingUse case study #1

** Note: the illustrations & content in this section is from Data Artisans’ presentation: Streaming Analytics & CEP - Two Sides of the Same Coin?

Page 10: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

03 Order Tracking Data Model

● Order(orderId, tStamp, “received”) extends Event● Shipment(orderId, tStamp, “shipped”) extends Event● Delivery(orderId, tStamp, “delivered”) extends Event

8

Page 11: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

04 Real-Time Warnings for SLAs

● ProcessSucc(orderId, tStamp, duration)● ProcessWarn(orderId, tStamp)● DeliverySucc(orderId, tStamp, duration)● DeliveryWarn(orderId, tStamp)

New inferred events:

9

Page 12: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

05 Glimpse at the FlinkCEP APIval processingPattern = Pattern .begin[Event]("orderReceived").subtype(classOf[ Order]) .followedBy( "orderShipped").where(_.status == "shipped") .within(Time.hours(1))

val processingPatternStream = CEP.pattern( input.keyBy( "orderId"), processingPattern)

val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select { (pP, timestamp) => // Timeout handler ProcessWarn(pP("orderReceived").orderId, timestamp) } { fP => // Select function ProcessSucc( fP("orderReceived").orderId, fP( "orderShipped").tStamp, fP("orderShipped").tStamp – fP( "orderReceived").tStamp) }

10

Page 13: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

06 Glimpse at the FlinkCEP APIval env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment

val input: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(...))

val processingPattern = Pattern.begin(...)...

val processingPatternStream = CEP.pattern(input.keyBy( "orderId"), processingPattern)

val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select(...)

procResult.addSink(new RedisSink(...)) // .addSink(new FlinkKafkaProducer09(...)) // .addSink(new ElasticsearchSink(...)) // .map(new MapFunction{...}) // … anything you’d like to continue to do with the inferred event stream

env.execute()

11

Page 14: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

07 Glimpse at the FlinkCEP APIval env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironmentenv.setStreamTimeCharacteristic( TimeCharacteristic.EventTime)

val input: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(...)).assignTimestampsAndWatermarks(new CustomExtractor)

val processingPattern = Pattern.begin(...)...

val processingPatternStream = CEP.pattern(input.keyBy( "orderId"), processingPattern)

val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select(...)

procResult.addSink(new RedisSink(...)) // .addSink(new FlinkKafkaProducer09(...)) // .addSink(new ElasticsearchSink(...)) // .map(new MapFunction{...})

env.execute()

12

Page 15: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

08 Combining Stream SQL & CEP

● Further reading: Streaming Analytics & CEP - Two Sides of the Same Coin?

13

Page 16: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

Ad Targeting based on User AttributionUse case study #2

** Note: the content in this section is heavily based on my experience at VMFive 14

Page 17: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

09 Ad Targeting 101

● What an ad server does, in a nutshell →determine an appropriate advertisement, chosen from an advertisement campaign pool, for each incoming ad request

AdServer

CampaignPool

(1) request advertisement

(2) return appropriateadvertisement infofrom campaign pool

● “appropriate”:fulfill the targeting rules ofeach campaign

15

Page 18: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

10 Ad Targeting Rule Types

● Fundamental campaign targeting rule types:○ Target users’ current location, ex. users in Taipei○ Target specific user device type, ex. tablet or phone○ ...

● Advanced campaign targeting rule types:○ Target user’s past location trace, ex. in Taipei for the past 7 days○ Target users entering / departuring countries○ Target users with specific attribution, ex. viewed ○ ...

16

Page 19: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

11 Ad Targeting Rule Types

● Fundamental campaign targeting rule types:○ Target users’ current location, ex. users in Taipei○ Target specific user device type, ex. tablet or phone○ ...

● Advanced campaign targeting rule types:○ Target user’s past location trace, ex. in Taipei for the past 7 days○ Target users entering / departuring countries○ Target users with specific attribution, ex. viewed ○ ...

● Does not require event aggregation● The rules can be matched simply

based on info at request time

● Requires aggregation of historical events● Aggregating at request time will be far too slow● Requires inferring complex events from patterns in

raw event stream → CEP to the rescue!

16

Page 20: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

12 Basic Ad Targeting Architecture

Campaign PoolTargeting Cache

Ad Targeter

register adcampaigns

Event Logger

Web

Ser

vice

AdServerData Warehouse

17

(1) initialconnection

Page 21: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

12 Basic Ad Targeting Architecture

Campaign PoolTargeting Cache

Ad Targeter

Event Logger

Web

Ser

vice

AdServerData Warehouse

17

(2) fetch ad

Page 22: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

12 Basic Ad Targeting Architecture

Ad Targeter

Event Logger

Web

Ser

vice

AdServerData Warehouse

Raw Logs

Event Bus Service

Reporting & analyticsservices

Bat

ch

Stre

amin

g

...

Campaign PoolTargeting Cache

18

(3) eventtracking

Page 23: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

13 Advanced Ad Targeting Architecture

Ad Targeter

Event Logger

Web

Ser

vice

AdServerData Warehouse

Raw Logs

Event Bus Service

Reporting & analyticsservices

Bat

ch

Stre

amin

g

...

Rul

es S

ervu

ce

Campaign PoolTargeting Cache

CEP

19

Page 24: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

13 Advanced Ad Targeting ArchitectureData Warehouse

Raw Logs

Event Bus Service

Bat

ch

Stre

amin

g

...

Rul

es S

ervi

ce

CEP

CEP-Rule Templates

RuleFulfillment

Cache(Redis)

Entry /Depart

UserAttribution ...

(1) Inject a ruleto start matchingon event stream

(3)submitCEPtopology

(2) Return Rule ID

20

Page 25: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

13 Advanced Ad Targeting ArchitectureData Warehouse

Raw Logs

Event Bus Service

Bat

ch

Stre

amin

g

...

Rul

es S

ervi

ce

CEP

CEP-Rule Templates

RuleFulfillment

Cache(Redis)

Entry /Depart

UserAttribution ...

(4) When CEP pattern is fulfilled,write to cache:UID → RuleID

(5) Lookup whether a UID has fulfilled aRuleID

21

Page 26: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

13 Advanced Ad Targeting Architecture

Ad Targeter

register adcampaigns

Event Logger

Web

Ser

vice

AdServerData Warehouse

Raw Logs

Event Bus Service

Reporting & analyticsservices

Bat

ch

Stre

amin

g

...

Rul

es S

ervi

ce

Campaign PoolTargeting Cache

CEP

22

(1) register rulefor campaign

(2) lookup whetheruser fulfils a rule

Page 27: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

14 Some Discussion

● Why a fixed pool of CEP-Rule Templates?○ Prevent rogue rules to match, ex. rules that will consume too much resource○ It’s a lot less work and complication ;)

● Would be very nice to have a freestyle rule service○ Pattern matching across different event streams of an organization○ For BI, there will be arbitrary complex events / patterns analysts want to monitor

● Further study for similar use case: King’s RBEA○ RBEA: Rule-Based Event Aggregator○ https://techblog.king.com/rbea-scalable-real-time-analytics-king/

○ http://data-artisans.com/rbea-scalable-real-time-analytics-at-king/

23

Page 28: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

Closing

Page 29: Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

XX Closing● Complex Event Processing is an emerging way to draw

insights from data streams, and is demanding of the underlying stream processor for exactly-once semantics for correctness

● FlinkCEP builds on the DataStreamAPI to make this possible and easy

24