speeding up big data with event processing

48
<Insert Picture Here> Speeding-up Big Data with Event Processing Alexandre de Castro Alves 1 Thursday, July 18, 13

Upload: alexandre-de-castro-alves

Post on 11-May-2015

363 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Speeding up big data with event processing

<Insert Picture Here>

Speeding-up Big Data with Event ProcessingAlexandre de Castro Alves

1Thursday, July 18, 13

Page 2: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Disclaimers

• The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

2Thursday, July 18, 13

Page 3: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 3

<Insert Picture Here>

Agenda• CEP

• Drivers• Formal description

• Big Data• Scenarios• Architecture• Integration with CEP

• Fast Data• Architecture• Integration with CEP

• Predictive Analytics• Data Mining• Online data mining

• Scenarios

3Thursday, July 18, 13

Page 4: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Event-Driven Applications

Financial Services

Transportation & Logistics

Public Sector & Military

Manufacturing

Utilities & Insurance

Telecommunications & ServicesAlgorithmic trading

Asset management

Distributed order orchestration

‘Negative Working Capital’ inventory management

Grid Infrastructure ManagementReponses to calamities – earthquake, flooding

• Proximity/Location Tracking• Intrusion detection systems• Military asset allocation

4Thursday, July 18, 13

Page 5: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Business Drivers & Enablers

• Exploding volume of digital event data: • The cost of sensors and computing power has dropped, network

capacity has increased

• Accelerating business process: • “the pace of business has increased, the world is changing faster,

and competition is getting tougher” • Roy Schulte - VP Gartner Analyst

• "Event-driven systems are intrinsically smart because they are context-aware and run when they detect changes in the business world rather than occurring on a simple schedule or requiring someone to tell them when to run."

• K. Mani Chandy, Simon Ramo Professor at the California Institute of Technology in Pasadena

5Thursday, July 18, 13

Page 6: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Event processingTaxonomy

• Event passing• Events are exchanged, but not processed• Simple pub-sup applications• Example: JMS

• Event mediation (brokering)• Events are filtered, routed, and enriched• However not state-full

• Example: ESB• Complex Event Processing

• Events are aggregated and new complex events are created• Extremely state-full

6Thursday, July 18, 13

Page 7: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Inverted Database

RDBMS

Data

Query CEP

Query

Event

DataData

QueryQuery

• Data is ‘static’• Queries are ‘dynamic’

• Data (event) is ‘dynamic’• Queries are ‘static’

7Thursday, July 18, 13

Page 8: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

EPTS and Standards

• Event processing technical society• Defines glossary

• http://www.ep-ts.com/component/option,com_docman/task,cat_view/gid,16/Itemid,84/

• Steering committee:• Opher Etzion (IBM), Louis Lovas (Apama), David Luckham

(Stanford), Alan Lundberg (TIBCO), John Morrell (SAP Corel8), Roy Schulte (Gartner), Richard Tibbetts (Streambase), Alexandre Alves (Oracle)

• Participation at DEBS• ANSI SQL Standards Proposal for CQL Pattern Matching

• Oracle, IBM, Stanford University• OpenSource Adoption of CQL (Swiss University)

8Thursday, July 18, 13

Page 9: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

CEP Models

9Thursday, July 18, 13

Page 10: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

CEP Languages

inferencerules

ECA

State-oriented

Script-oriented

Agent-oriented

SQL-idioms

TIBCO

ApamaRuleCore

AgentLogic

Streambase

IBM (AptSoft)

Oracle CEP

Oracle CEP

Source: EPTS/DEBS Tutorial 2009

10Thursday, July 18, 13

Page 11: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Contextual Data

EVENT SOURCES

EVENT SINKSSTREAM

RELATION

NOT JEE!

Application Model

11Thursday, July 18, 13

Page 12: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Application Model

Contextual Data NOT JEE!

• Event Processing Network (EPN)• Non-rooted directed graph describing event flow from event sources to event

sinks• References to contextual static data (e.g. table, cache, HDFS)

• Intermediate nodes:• Process events (CQL processor, Java Event-Beans)• Stage or route processing (channels)

• Edge nodes:• Adapters (e.g. JMS, HTTP pub/sub JSON)

Event Sinks

Event Sources

12Thursday, July 18, 13

Page 13: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Application Model

• Event models:• STREAM (append-only, unbounded)• RELATION (insert/delete, bounded)

• Event formats:• Java Class• Map (key-value pairs)• XML

• Timing models:• system timestamped• application timestamped

Adapter

Adapter

Processor

Listener- POJO

Event Source

Data Source

Query

RuleProcessor

Query

Query

RuleProcessor

Query

RuleProcessor

Query

RuleCache Rule

Processor

QueryListener- ALSB

13Thursday, July 18, 13

Page 14: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

• EVENT• Defined by a schema: event -type • Tuple of event properties

StockEventTypeStockEventTypesymbol stringlastBid floatlastAsk float

Event properties

Application Model

14Thursday, July 18, 13

Page 15: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

• STREAM• Time ordered sequence of events in time• APPEND-only

• One cannot remove events, just add them to the sequence• Unbounded

• There is no end to the sequence{event1, event2, event3, event4, …, eventN}

Application Model

15Thursday, July 18, 13

Page 16: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

• STREAM• Examples:

• {{1s, event1}, {2s, event2}, {4s, event3}}

• {{1s, event1}, {4s, event2}, {2s, event3}}

Application Model

16Thursday, July 18, 13

Page 17: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

• STREAM• Examples:

• {{1s, event1}, {2s, event2}, {4s, event3}}

• {{1s, event1}, {4s, event2}, {2s, event3}}

Application Model

STREAM

16Thursday, July 18, 13

Page 18: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

• STREAM• Examples:

• {{1s, event1}, {2s, event2}, {4s, event3}}

• {{1s, event1}, {4s, event2}, {2s, event3}}

Application Model

STREAM

EVENT CLOUD

16Thursday, July 18, 13

Page 19: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

• RELATION• Bag of events at some instantaneous time T• Allow for INSERT, DELETE, and UPDATE• Example:

• At T=1: {{event1}, {event2}, {event3}}• At T=2: {{event1}, {event3}, {event4}}

• No changes to event1 and event3• Event2 was deleted• Event4 was inserted

Application Model

17Thursday, July 18, 13

Page 20: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Event Processing Language: CQL

• High-level descriptive language for EP, dynamically changeable

• Continuous and incremental• Driven by time and events, incremental calculations

• Leverages SQL principles/implementation, and extends it with formal STREAM calculus.

• Based on STREAMs project in Stanford

continuous continuous

Stream-Relational Algebra Control Rate of Event Output

Define Window of Events

18Thursday, July 18, 13

Page 21: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Stream-relation Window Operator

Time (in secs) Input event Output event00 ∅ {AVG(price) = 0.0}01 {symbol = “aaa”, price = 4.0} {AVG(price) = 4.0}10 {symbol = “bbb”, price = 2.0} {AVG(price) = 3.0}59 {symbol = “aaa”, price = 5.0} {AVG(price) = 3.6}61 ∅ {AVG(price) = 3.5}70 ∅ {AVG(price) = 5.0}

80 {symbol = “aaa”, price = 6.0} {AVG(price) = 5.5}

SELECT AVG(price) FROM marketFeed [RANGE 1 MINUTE]

19Thursday, July 18, 13

Page 22: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

• Window variations:• Sliding• Jumping (batching)• Partitioned• User-defined windows• Time-based• Tuple-based• Value windows• CurrentHour (left edge is fixed, and right edge moves)

Stream-relation Window Operator

20Thursday, July 18, 13

Page 23: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Relation-stream operators

21Thursday, July 18, 13

Page 24: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Relation-stream operators

Time Input event WINDOW ISTREAM output output

00 ∅ +{AVG(price) = 0.0} +{AVG(price) = 0.0}

01 +{price = 4.0} -{AVG(price) = 0.0}, +{AVG(price) = 4.0}+{AVG(price) = 4.0}

10 +{price = 2.0} -{AVG(price) = 4.0}, +{AVG(price) = 3.0} +{AVG(price) = 3.0}

59 +{price = 5.0} -{AVG(price) = 3.0}, +{AVG(price) = 3.6}+{AVG(price) = 3.6}

61 ∅ -{AVG(price) = 3.6}, +{AVG(price) = 3.5}+{AVG(price) = 3.5}

70 ∅ -{AVG(price) = 3.5}, +{AVG(price) = 5.0}+{AVG(price) = 5.0}

80 +{price = 6.0} -{AVG(price) = 5.0}, +{AVG(price) = 5.5}+{AVG(price) = 5.5}

ISTREAM (SELECT AVG(price) FROM marketFeed [RANGE 1 MINUTE])

DSTREAM (SELECT AVG(price) FROM marketFeed [RANGE 1 MINUTE])

Time Input event WINDOW DSTREAM output output

00 ∅ +{AVG(price) = 0.0} ∅

01 +{price = 4.0} -{AVG(price) = 0.0}, +{AVG(price) = 0.0}+{AVG(price) = 4.0}

10 +{price = 2.0} -{AVG(price) = 4.0}, +{AVG(price) = 4.0}+{AVG(price) = 3.0}

59 +{price = 5.0} -{AVG(price) = 3.0}, +{AVG(price) = 3.0}+{AVG(price) = 3.6}

61 ∅ -{AVG(price) = 3.6}, +{AVG(price) = 3.6}+{AVG(price) = 3.5}

70 ∅ -{AVG(price) = 3.5}, +{AVG(price) = 3.5}+{AVG(price) = 5.0}

80 +{price = 6.0} -{AVG(price) = 5.0}, +{AVG(price) = 5.0}+{AVG(price) = 5.5}

22Thursday, July 18, 13

Page 25: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Pattern Matching

• Detect complex relationships amongst events

• State-machine model

• ANSI standards proposal• IBM, Oracle, Streambase

• Starting to see adoption by other vendors/users (e.g. MySQL) [1]

23Thursday, July 18, 13

Page 26: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Pattern Matching

SELECT M.up

FROM ticker

MATCH_RECOGNIZE ( MEASURES B.price as up, A.price as down PATTERN (A B)

DEFINE A as price < 10.0, B as price => 10.0

) as M

Input event Output event+{symbol = ‘ORCL’, price = 9.0} ∅

+{symbol = ‘ORCL’, price = 9.5} ∅

+{symbol = ‘ORCL’, price = 12.0} +{M.up = 12.0}

A

A B

price=9.0

price=9.5

price=12.0 up=12.0

price=9.5

24Thursday, July 18, 13

Page 27: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Pattern Matching

25Thursday, July 18, 13

Page 28: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Event Processing Ecosystem

JMS

HTTP PUB/SUB

JMS

HTTP PUB/SUB

Events Events

Contextual Data

IDE OEP Server Visualizer Web Console / BAM

deploy manage

RDBMS Cache Hadoop NoSqlDb

26Thursday, July 18, 13

Page 29: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Summary

• Event Processing Network defines the assembly

• CQL defines the processing

• STREAM vs RELATION

• RELATION can be any relational source:• tables, caches, Hadoop HDFS files, etc.

27Thursday, July 18, 13

Page 30: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 28

<Insert Picture Here>

Agenda• CEP

• Drivers• Formal description

• Big Data• Scenarios• Architecture• Integration with CEP

• Fast Data• Architecture• Integration with CEP

• Predictive Analytics• Data Mining• Online data mining

• Scenarios

28Thursday, July 18, 13

Page 31: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Big Data Scenarios

MEDIA/ENTERTAINMENTViewers / advertising effectivenessCross Sell

COMMUNICATIONSLocation-based advertising

EDUCATION &RESEARCHExperiment sensor analysis

Retail / CPGSentiment analysisHot productsOptimized Marketing

HEALTH CAREPatient sensors, monitoring, EHRsQuality of care

LIFE SCIENCESClinical trialsGenomics

HIGH TECHNOLOGY / INDUSTRIAL MFG.Mfg qualityWarranty analysis

OIL & GASDrilling exploration sensor analysis

FINANCIALSERVICESRisk & portfolio analysis New products

AUTOMOTIVEAuto sensors reporting location, problems

GamesAdjust toplayer behaviorIn-Game Ads

LAW ENFORCEMENT & DEFENSEThreat analysis - social media monitoring, photo analysis

TRAVEL &TRANSPORTATIONSensor analysis for optimal traffic flowsCustomer sentiment

UTILITIESSmart Meter analysis for network capacity,

ON-LINE SERVICES / SOCIAL MEDIAPeople & career matchingWeb-site optimization

29Thursday, July 18, 13

Page 32: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

What’s Big Data?

VELOCITYVOLUME VARIETY

10110010100100100110101010101110010101010010010

Web

SMS

VALUE

30Thursday, July 18, 13

Page 33: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Big Data Architecture (Map-Reduce)

DataData

DataData

DataData

DataData

Data

Big,Immutable

(append-only, avoids corruption)

Batch-Layer

Batch viewsquery = function(data)

e.g. Hadoop

Data

batchinput

batchinput

map

key1, value1

key2, value2

key1, value3

key2, value4

key1, value5

reduce

key1, {value1, value3, value5}

key2, {value2, value4}

31Thursday, July 18, 13

Page 34: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

When is CEP needed?

• If Big Data is about VVV (volume, variety, velocity), then Stream Processing is needed when at least 2 of the 3 V’s are present.• If there is high volume and low-latency is needed (velocity),

then stream processing must be done.• If there is NOT a lot of volume, but the data is semi-structured

(variety), such as the case of social feeds, and low-latency is needed, then stream processing must still be applied.

• If volume is low, and no need to do it fast, then use regular messaging processing technology, such as JMS.

32Thursday, July 18, 13

Page 35: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

CEP with Big Data

33Thursday, July 18, 13

Page 36: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 34

<Insert Picture Here>

Agenda• CEP

• Drivers• Formal description

• Big Data• Scenarios• Architecture• Integration with CEP

• Fast Data• Architecture• Integration with CEP

• Predictive Analytics• Data Mining• Online data mining

• Scenarios

34Thursday, July 18, 13

Page 37: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Big Data Architecture Limitations

DataData

DataData

DataData

DataData

Data

Big,Immutable

(append-only, avoids corruption)

Batch-Layer

Batch viewsquery = function(data)

e.g. Hadoop

Data

batchinput

batchinput

map

key1, value1

key2, value2

key1, value3

key2, value4

key1, value5

reduce

key1, {value1, value3, value5}

key2, {value2, value4}

Batch output

Deep, but not real-time

35Thursday, July 18, 13

Page 38: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

DataData

DataData

DataData

DataData

Data

Big,Immutable

(append-only, avoids corruption)

Batch-Layer

Batch viewsquery = function(data)

e.g. Hadoop

Indexing-Layere.g. ElephantDB,

Cassandra,NoSqlDB

Indexed batch viewsquery = function(data)

Fast-Layere.g. CEP,

Storm

real-time viewsquery = function(data)

+ inc-update

Data

Fast Data Architecture

36Thursday, July 18, 13

Page 39: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

• Integration with other Big Data technologies:• HBase, • Hive• Avro (Flume)

• Incremental merge of Hadoop Jobs with OEP queries• Avoids developer from

having to create own Hadoop job

Fast Data with CEP

37Thursday, July 18, 13

Page 40: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 38

<Insert Picture Here>

Agenda• CEP

• Drivers• Formal description

• Big Data• Scenarios• Architecture• Integration with CEP

• Fast Data• Architecture• Integration with CEP

• Predictive Analytics• Data Mining• Online data mining

• Scenarios

38Thursday, July 18, 13

Page 41: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Data Mining

• Identify patterns and relationships in real world

• Develop descriptive models of datasets

• Use models to evaluate future options, risks and decisions

39Thursday, July 18, 13

Page 42: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Data Mining

Data-SetWorld Model

population sample

statistical summaries,regressions,

machine-learning

Data Model Prediction

(1) TRAIN

(2) SCORE

(3) RE-TRAIN

40Thursday, July 18, 13

Page 43: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Online Data Mining

continuous continuous

EventModel

Export model

Rebuild modelScore events

Predict if price of next event will be above 0.8 using model

Model Repository

41Thursday, July 18, 13

Page 44: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Challenges (Right Model, Right Cost)

Data

Model

Induction

Data

Deduction

k-Nearest-Neighbors

Decision trees

Neural nets/SVMIncreased

Compression

Computational Cost

42Thursday, July 18, 13

Page 45: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Challenges

• All models are wrong, some are useful (George Box)• Central Limit Theorem

• Means of random samples of the same population will be normally distributed (even if the data is not normally distributed)

• However, all bets are off if not from the same population• Consider a regression function of weight -> height• Will not work if model is build using samples of a city bus

and scored in bus containing only basketball players• What confidence level to use?

• Scientific papers demand a 95% confidence level. What about streaming use-cases? 95% seems too high...

43Thursday, July 18, 13

Page 46: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

[email protected]

• http://www.oracle.com/technetwork/middleware/complex-event-processing/overview/index.html

• http://adcalves.wordpress.com

• http://www.packtpub.com/getting-started-with-oracle-event-processing-11g/book

Material

44Thursday, July 18, 13

Page 47: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

45Thursday, July 18, 13

Page 48: Speeding up big data with event processing

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

46

46Thursday, July 18, 13