data stream processing - uni konstanz

102
1 Data Stream Processing Data Stream Processing Weiwei SUN Weiwei SUN University of Konstanz University of Konstanz

Upload: others

Post on 03-Feb-2022

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Stream Processing - Uni Konstanz

11

Data Stream ProcessingData Stream Processing

Weiwei SUNWeiwei SUNUniversity of KonstanzUniversity of Konstanz

Page 2: Data Stream Processing - Uni Konstanz

22

Data Stream ProcessingData Stream Processing

STREAM-STREAM-ststanfordanford st streream datam dataammanageranager–– Semantics and query languageSemantics and query language–– Query executions and optimizationsQuery executions and optimizations

YfilterYfilter-XML stream filtering engine-XML stream filtering engine

Page 3: Data Stream Processing - Uni Konstanz

33

STREAM-STREAM-ststanfordanford st strereamamdatdata ma manageranager

STREAMSTREAM is a general-purpose DSMS is a general-purpose DSMS(Data Stream Management System)(Data Stream Management System)prototypeprototype

The motivation of this talk is toThe motivation of this talk is tointroduce problems, solutions andintroduce problems, solutions andchallenges of data stream processingchallenges of data stream processing

Page 4: Data Stream Processing - Uni Konstanz

44

Data StreamsData Streams

Continuous, unbounded, rapid, time-varyingContinuous, unbounded, rapid, time-varyingstreams of data elementsstreams of data elements

Occur in a variety of modern applicationsOccur in a variety of modern applications–– Network monitoring and traffic engineeringNetwork monitoring and traffic engineering–– Sensor networks, RFID tagsSensor networks, RFID tags–– Telecom call recordsTelecom call records–– Financial applicationsFinancial applications–– Web logs and click-streamsWeb logs and click-streams–– Manufacturing processesManufacturing processes

DSMS = Data Stream Management SystemDSMS = Data Stream Management System

Page 5: Data Stream Processing - Uni Konstanz

55

Using Traditional DatabaseUsing Traditional DatabaseUser/Application

Loader

QueryQuery ResultResultResultResult

……QueryQuery

……

Table R

Table S

Page 6: Data Stream Processing - Uni Konstanz

66

New Approach for DataNew Approach for DataStreamsStreams User/Application

RegisterRegisterContinuousContinuous

QueryQuery

Stream QueryProcessor

ResultResult

Input streams

Page 7: Data Stream Processing - Uni Konstanz

77

DBMS versus DSMS DBMS versus DSMS

Persistent relationsPersistent relations

One-time queriesOne-time queries

Random accessRandom access

Access planAccess plandetermined bydetermined byquery processorquery processorand physical DBand physical DBdesigndesign

Transient streams (andTransient streams (andpersistent relations)persistent relations)

Continuous queriesContinuous queries

Sequential accessSequential access

Unpredictable dataUnpredictable datacharacteristics andcharacteristics andarrival patternsarrival patterns

Page 8: Data Stream Processing - Uni Konstanz

88

DSMS

Scratch Store

The (Simplified) Big PictureThe (Simplified) Big Picture

Input streams

RegisterQuery

StreamedResult

StoredResult

ArchiveStored

Relations

Page 9: Data Stream Processing - Uni Konstanz

99

A (Simplified) System ArchitectureA (Simplified) System Architectureof Network Monitoringof Network Monitoring

RegisterMonitoring

Queries

DSMS

Scratch Store

Network measurements,Packet traces

IntrusionWarnings

OnlinePerformance

Metrics

ArchiveLookupTables

Page 10: Data Stream Processing - Uni Konstanz

1010

Using Conventional DBMSUsing Conventional DBMS

Data streams as Data streams as relation insertsrelation inserts, continuous, continuousqueries as queries as triggers triggers oror materialized views materialized views

Problems with this approachProblems with this approach–– Inserts are typically batched, high overheadInserts are typically batched, high overhead–– Expressiveness: simple conditions (triggers), noExpressiveness: simple conditions (triggers), no

built-in notion of sequence (views)built-in notion of sequence (views)–– No notion of approximationNo notion of approximation–– Current systems donCurrent systems don’’t scale to large # oft scale to large # of

triggerstriggers–– Views donViews don’’t provide streamed resultst provide streamed results

Page 11: Data Stream Processing - Uni Konstanz

1111

The STREAM SystemThe STREAM System

Data streams and stored relationsData streams and stored relations Declarative language for registeringDeclarative language for registering

continuous queriescontinuous queries Flexible query plans and executionFlexible query plans and execution

strategiesstrategies Textual, graphical, and applicationTextual, graphical, and application

interfacesinterfaces Relational, centralized (for now)Relational, centralized (for now)

Page 12: Data Stream Processing - Uni Konstanz

1212

STREAM System ChallengesSTREAM System Challenges

Must cope with:Must cope with:–– Stream ratesStream rates that may be that may be high,variablehigh,variable,,

burstybursty–– Stream dataStream data that may be unpredictable, that may be unpredictable,

variablevariable–– Continuous query loadsContinuous query loads that may be that may be

high, variablehigh, variable

Page 13: Data Stream Processing - Uni Konstanz

1313

STREAM System ChallengesSTREAM System Challenges

Must cope with:Must cope with:–– Stream ratesStream rates that may be that may be highhigh,variable,variable,,

burstybursty–– Stream dataStream data that may be unpredictable, that may be unpredictable,

variablevariable–– Continuous query loadsContinuous query loads that may be that may be

highhigh, variable, variable

OverloadOverload

Page 14: Data Stream Processing - Uni Konstanz

1414

STREAM System ChallengesSTREAM System Challenges

Must cope with:Must cope with:–– Stream ratesStream rates that may be that may be high,high,variablevariable,,

burstybursty–– Stream dataStream data that may be that may be

unpredictable, variableunpredictable, variable–– Continuous query loadsContinuous query loads that may be that may be

high, high, variablevariable

OverloadOverload Changing conditionsChanging conditions

Page 15: Data Stream Processing - Uni Konstanz

1515

STREAM System FeaturesSTREAM System Features

Aggressive Aggressive sharingsharing of state and of state andcomputationcomputation

Careful Careful resource allocation and useresource allocation and use Continuous Continuous self-monitoringself-monitoring and and

reoptimizationreoptimization Graceful Graceful approximationapproximation as necessary as necessary

Page 16: Data Stream Processing - Uni Konstanz

1616

We will mainly talk aboutWe will mainly talk about

Query languageQuery language–– Semantics of CQLSemantics of CQL

Query plans and execution issuesQuery plans and execution issues–– Operator, Queue, and StateOperator, Queue, and State–– State sharingState sharing–– Stream constraintsStream constraints–– Operator scheduling optimizationOperator scheduling optimization

Page 17: Data Stream Processing - Uni Konstanz

1717

Query LanguageQuery Language

CQL CQL –– Continuous Query Continuous QueryLanguageLanguage

Page 18: Data Stream Processing - Uni Konstanz

1818

Aside on SemanticsAside on Semantics

The semantics of SQL queries is (relatively)The semantics of SQL queries is (relatively)easy to understandeasy to understand–– Even lots of SQL queries running togetherEven lots of SQL queries running together

The semantics of a single trigger isThe semantics of a single trigger is(relatively) easy to understand(relatively) easy to understand–– But lots of triggers together can be complexBut lots of triggers together can be complex

The semantics of even a single continuousThe semantics of even a single continuousquery may not be obviousquery may not be obvious–– But lots running together is no harderBut lots running together is no harder

Page 19: Data Stream Processing - Uni Konstanz

1919

A A NonobviousNonobvious Continuous ContinuousQueryQuery

Stream of stock quotes: Stream of stock quotes: Stocks(ticker,priceStocks(ticker,price))

Monitor last 10 minutes of quotes:Monitor last 10 minutes of quotes:Select Select ∗∗ From Stocks [Range 10 minutes] From Stocks [Range 10 minutes]

Is result a relation, a stream, or something else?Is result a relation, a stream, or something else?

If a relation, what exactly does it contain?If a relation, what exactly does it contain?

If a stream, how does query differ from:If a stream, how does query differ from:Select Select ∗∗ From Stocks [Range 1 minute] From Stocks [Range 1 minute]oror Select Select ∗∗ From Stocks [ From Stocks [∞∞]]

Page 20: Data Stream Processing - Uni Konstanz

2020

Another Another NonobviousNonobvious CQ CQ

Stream of ordered items, table of itemStream of ordered items, table of itempricesprices

Prices for five most recent ordered items:Prices for five most recent ordered items:Select Select P.priceP.priceFrom Items I [Rows 5], From Items I [Rows 5], PriceTablePriceTable P PWhere Where II.itemID.itemID = = P.itemIDP.itemID

Is result a stream or a relation?Is result a stream or a relation? What if item price changes?What if item price changes?

Page 21: Data Stream Processing - Uni Konstanz

2121

Start with SQLStart with SQLThen addThen add……

StreamsStreams as new data type as new data type ContinuousContinuous instead of one-time semanticsinstead of one-time semantics WindowsWindows on streams ( on streams (Stream-to-Relation)) SamplingSampling on streams (Approximate results) on streams (Approximate results) RRelation-to-Stream operatorselation-to-Stream operators

IstreamIstream, , DstreamDstream RstreamRstream

Continuous QueryContinuous QueryLanguage Language –– CQL CQL

Page 22: Data Stream Processing - Uni Konstanz

2222

Relations and StreamsRelations and Streams

Assume global, discrete, ordered timeAssume global, discrete, ordered timedomaindomain

RelationRelation–– Maps Maps time time TT toto set-of-set-of-tuplestuples RR–– It differs from the traditional oneIt differs from the traditional one

StreamStream–– Set of pairs Set of pairs <<tuple,timestamptuple,timestamp>>–– Unbounded, TransientUnbounded, Transient

Page 23: Data Stream Processing - Uni Konstanz

2323

ConversionsConversions

A relation-to-relation operator takes one or morerelations as input and produces a relation as output.

A stream-to-relation operator takes a stream as inputand produces a relation as output.

A relation-to-stream operator takes a relation as inputand produces a stream as output.

Streams Relations

Window specification

Special operators:Istream, Dstream, Rstream

Any relationalquery language

Page 24: Data Stream Processing - Uni Konstanz

2424

The Relation-to-RelationOperators in CQL

CQL uses SQL constructs to expressits relation-to-relation operators, andmuch of the data manipulation in atypical CQL query is performed usingthese constructs, exploiting the richexpressive power of SQL.

Page 25: Data Stream Processing - Uni Konstanz

2525

The Stream-to-RelationOperators in CQLThe stream-to-relation operators in CQL

are based on the concept of a slidingwindow over a stream:

tuple-based sliding window– Items [Rows 100]

time-based sliding window– Items [Range 5 Minutes]

partitioned sliding window–– Fulfillments [Partition By clerk Rows 5]Fulfillments [Partition By clerk Rows 5]

Page 26: Data Stream Processing - Uni Konstanz

2626

Three Relation-to-Three Relation-to-Stream Operators Stream Operators in CQL Three Three relation-to-stream operatorsrelation-to-stream operators

IstreamIstream, , DstreamDstream RstreamRstream–– Istream(Istream(RR)) contains all contains all ((r,Tr,T )) where where rr∈∈RR at time at time

T T but but rr∉∉RR at time at time TT––11 insert streaminsert stream

–– Dstream(Dstream(RR)) contains all contains all ((r,Tr,T )) where where rr∈∈RR at attime time TT––1 1 but but rr∉∉RR at time at time TT delete streamdelete stream

–– Rstream(Rstream(RR)) contains all contains all ((r,Tr,T )) where where rr∈∈RR at time at timeTT relation streamrelation stream

Page 27: Data Stream Processing - Uni Konstanz

2727

Abstract SemanticsAbstract Semantics

Take any relational query languageTake any relational query language Can reference streams in place of relationsCan reference streams in place of relations

–– But must convert to relations using any windowBut must convert to relations using any windowspecification languagespecification language( default window = [( default window = [∞∞] )] )

Can convert relations to streamsCan convert relations to streams–– For streamed resultsFor streamed results–– For windows over relationsFor windows over relations

(note: converts back to relation)(note: converts back to relation)

Page 28: Data Stream Processing - Uni Konstanz

2828

Query Result at Time Query Result at Time TT

Use all relations at time Use all relations at time TTUse all streams up to Use all streams up to TT, converted, convertedto relationsto relations

Compute relational resultCompute relational result Convert result to streams if desiredConvert result to streams if desired

Page 29: Data Stream Processing - Uni Konstanz

2929

CQL Example Query 1CQL Example Query 1

Two streams, contrived for ease of examples:Two streams, contrived for ease of examples: Orders (Orders (orderIDorderID, customer, cost), customer, cost) Fulfillments ( Fulfillments (orderIDorderID, clerk), clerk)

Page 30: Data Stream Processing - Uni Konstanz

3030

CQL Example Query 1CQL Example Query 1

Two streams, contrived for ease of examples:Two streams, contrived for ease of examples: Orders (Orders (orderIDorderID, customer, cost), customer, cost) Fulfillments ( Fulfillments (orderIDorderID, clerk), clerk)

Total cost of orders fulfilled over the last dayTotal cost of orders fulfilled over the last dayby clerk by clerk ““SueSue”” for customer for customer ““JoeJoe””

Select Sum(Select Sum(O.cost)O.cost)From Orders O, Fulfillments F [Range 1 Day]From Orders O, Fulfillments F [Range 1 Day]Where Where O.orderIDO.orderID = = F.orderIDF.orderID And F.clerk = And F.clerk = ““SueSue”” And O.customer = And O.customer = ““JoeJoe””

Page 31: Data Stream Processing - Uni Konstanz

3131

CQL Example Query 1CQL Example Query 1

Two streams, contrived for ease of examples:Two streams, contrived for ease of examples: Orders (Orders (orderIDorderID, customer, cost), customer, cost) Fulfillments ( Fulfillments (orderIDorderID, clerk), clerk)

Total cost of orders fulfilled over the last dayTotal cost of orders fulfilled over the last dayby clerk by clerk ““SueSue”” for customer for customer ““JoeJoe””

Select Sum(Select Sum(O.cost)O.cost)From Orders O, From Orders O, Fulfillments F [Range 1 Day]Fulfillments F [Range 1 Day]Where Where O.orderIDO.orderID = = F.orderIDF.orderID And F.clerk = And F.clerk = ““SueSue”” And O.customer = And O.customer = ““JoeJoe””

Page 32: Data Stream Processing - Uni Konstanz

3232

CQL Example Query 1CQL Example Query 1

Two streams, contrived for ease of examples:Two streams, contrived for ease of examples: Orders (Orders (orderIDorderID, customer, cost), customer, cost) Fulfillments ( Fulfillments (orderIDorderID, clerk), clerk)

Total cost of orders fulfilled over the last dayTotal cost of orders fulfilled over the last dayby clerk by clerk ““SueSue”” for customer for customer ““JoeJoe””

Select Sum(Select Sum(O.cost)O.cost)From Orders O,From Orders O, Fulfillments F [Range 1 Day] Fulfillments F [Range 1 Day]Where Where O.orderIDO.orderID = = F.orderIDF.orderID And F.clerk = And F.clerk = ““SueSue”” And O.customer = And O.customer = ““JoeJoe””

Page 33: Data Stream Processing - Uni Konstanz

3333

CQL Example Query 1CQL Example Query 1

Two streams, contrived for ease of examples:Two streams, contrived for ease of examples: Orders (Orders (orderIDorderID, customer, cost), customer, cost) Fulfillments ( Fulfillments (orderIDorderID, clerk), clerk)

Total cost of orders fulfilled over the last dayTotal cost of orders fulfilled over the last dayby clerk by clerk ““SueSue”” for customer for customer ““JoeJoe””

Select Sum(Select Sum(O.cost)O.cost)From Orders O, Fulfillments F [Range 1 Day]From Orders O, Fulfillments F [Range 1 Day]Where Where O.orderIDO.orderID = = F.orderIDF.orderID And F.clerk = And F.clerk = ““SueSue”” And O.customer = And O.customer = ““JoeJoe””

Page 34: Data Stream Processing - Uni Konstanz

3434

CQL Example Query 1CQL Example Query 1

Two streams, contrived for ease of examples:Two streams, contrived for ease of examples: Orders (Orders (orderIDorderID, customer, cost), customer, cost) Fulfillments ( Fulfillments (orderIDorderID, clerk), clerk)

Total cost of orders fulfilled over the last dayTotal cost of orders fulfilled over the last dayby clerk by clerk ““SueSue”” for customer for customer ““JoeJoe””

Select Sum(Select Sum(O.cost)O.cost)From Orders O, Fulfillments F [Range 1 Day]From Orders O, Fulfillments F [Range 1 Day]Where Where O.orderIDO.orderID = = F.orderIDF.orderID And F.clerk = And F.clerk = ““SueSue”” And O.customer = And O.customer = ““JoeJoe””

Page 35: Data Stream Processing - Uni Konstanz

3535

CQL Example Query 1CQL Example Query 1

Syntactic shortcuts and defaults for convenience

Select Select IStream(IStream(Sum(Sum(O.costO.cost))))From Orders O From Orders O [[∞∞]], Fulfillments F [Range 1 Day], Fulfillments F [Range 1 Day]Where Where O.orderIDO.orderID = = F.orderIDF.orderID And F.clerk = And F.clerk = ““SueSue”” And O.customer = And O.customer = ““JoeJoe””

At time At time TT:: Entire stream Entire stream OO and and tuplestuples of last day of of last day of FF as relations as relations Evaluate query, update result relation at Evaluate query, update result relation at TT Streamed result:Streamed result: New element New element <<Sum(Sum(O.cost)O.cost),,TT>> whenever whenever

Sum(Sum(O.costO.cost)) changes from changes from TT––11

Page 36: Data Stream Processing - Uni Konstanz

3636

CQL Example Query 2CQL Example Query 2

Using a 10% sample of the FulfillmentsUsing a 10% sample of the Fulfillmentsstream, take the 5 most recent fulfillmentsstream, take the 5 most recent fulfillmentsfor each clerk and return the maximum costfor each clerk and return the maximum cost

Select Select F.clerkF.clerk, , Max(O.costMax(O.cost))From Orders O,From Orders O, Fulfillments F [Partition By clerk Rows 5] Fulfillments F [Partition By clerk Rows 5]

10% Sample10% SampleWhere Where O.orderIDO.orderID = = F.orderIDF.orderIDGroup By Group By F.clerkF.clerk

Page 37: Data Stream Processing - Uni Konstanz

3737

CQL Example Query 2CQL Example Query 2

Using a 10% sample of the FulfillmentsUsing a 10% sample of the Fulfillmentsstream, take the 5 most recent fulfillmentsstream, take the 5 most recent fulfillmentsfor each clerk and return the maximum costfor each clerk and return the maximum cost

Select Select F.clerkF.clerk, , Max(O.costMax(O.cost))From Orders O,From Orders O, Fulfillments F Fulfillments F [Partition By clerk Rows 5] [Partition By clerk Rows 5]

10% Sample10% SampleWhere Where O.orderIDO.orderID = = F.orderIDF.orderIDGroup By Group By F.clerkF.clerk

Page 38: Data Stream Processing - Uni Konstanz

3838

CQL Example Query 2CQL Example Query 2

Using a 10% sample of the FulfillmentsUsing a 10% sample of the Fulfillmentsstream, take the 5 most recent fulfillmentsstream, take the 5 most recent fulfillmentsfor each clerk and return the maximum costfor each clerk and return the maximum cost

Select Select F.clerkF.clerk, , Max(O.costMax(O.cost))From Orders O,From Orders O, Fulfillments F [Partition By clerk Rows 5]Fulfillments F [Partition By clerk Rows 5]

10% Sample10% SampleWhere Where O.orderIDO.orderID = = F.orderIDF.orderIDGroup By Group By F.clerkF.clerk

Page 39: Data Stream Processing - Uni Konstanz

3939

CQL Example Query 2CQL Example Query 2

Using a 10% sample of the FulfillmentsUsing a 10% sample of the Fulfillmentsstream, take the 5 most recent fulfillmentsstream, take the 5 most recent fulfillmentsfor each clerk and return the maximum costfor each clerk and return the maximum cost

Select Select F.clerkF.clerk, , Max(O.costMax(O.cost))From Orders O,From Orders O, Fulfillments F [Partition By clerk Rows 5] Fulfillments F [Partition By clerk Rows 5]

10% Sample10% SampleWhere Where O.orderIDO.orderID = = F.orderIDF.orderIDGroup By Group By F.clerkF.clerk

Page 40: Data Stream Processing - Uni Konstanz

4040

CQL Example Query 2CQL Example Query 2

Using a 10% sample of the FulfillmentsUsing a 10% sample of the Fulfillmentsstream, take the 5 most recent fulfillmentsstream, take the 5 most recent fulfillmentsfor each clerk and return the maximum costfor each clerk and return the maximum cost

Select Select F.clerkF.clerk, , Max(O.costMax(O.cost))From Orders O,From Orders O, Fulfillments F [Partition By clerk Rows 5] Fulfillments F [Partition By clerk Rows 5]

10% Sample10% SampleWhere Where O.orderIDO.orderID = = F.orderIDF.orderIDGroup By Group By F.clerkF.clerk

Page 41: Data Stream Processing - Uni Konstanz

4141

CQL Example3: Result TypeCQL Example3: Result Type

Simpler version of Example Query 2:Simpler version of Example Query 2:Select Select IstreamIstream( ( F.clerkF.clerk, , Max(O.costMax(O.cost) )) )From O, F [Rows 100]From O, F [Rows 100]Where Where O.orderIDO.orderID = = F.orderIDF.orderIDGroup By F.clerkGroup By F.clerk

Streamed result:Streamed result: Emits Emits <<clerk,maxclerk,max>>stream element whenever maxstream element whenever maxchanges for a clerk (or new clerk)changes for a clerk (or new clerk)

Page 42: Data Stream Processing - Uni Konstanz

4242

CQL Example3: Result TypeCQL Example3: Result Type

Simpler version of Example Query 2:Simpler version of Example Query 2:Select Select RStream(RStream(F.clerkF.clerk, , Max(O.costMax(O.cost))))From O, F [Rows 100]From O, F [Rows 100]Where Where O.orderIDO.orderID = = F.orderIDF.orderIDGroup By F.clerkGroup By F.clerk

Result is a relation, updated asResult is a relation, updated asstream elements arrivestream elements arrive

Page 43: Data Stream Processing - Uni Konstanz

4343

CQL Example Query 4CQL Example Query 4

Relation Relation CurPrice(stockCurPrice(stock, price), price) Select stock, Select stock, Avg(priceAvg(price)) From From Istream(Istream(CurPriceCurPrice)) [Range 1 Day] [Range 1 Day] Group By stock Group By stock

Average price over last day for eachAverage price over last day for eachstockstock

IstreamIstream provides history of provides history of CurPriceCurPrice Window on history (back to relation),Window on history (back to relation),

group and aggregategroup and aggregate

Page 44: Data Stream Processing - Uni Konstanz

4444

Any questions?Any questions?

Page 45: Data Stream Processing - Uni Konstanz

4545

Query plans andQuery plans andexecution issuesexecution issues

Page 46: Data Stream Processing - Uni Konstanz

4646

Query ExecutionQuery Execution

When a continuous query is registered,When a continuous query is registered,generate a generate a query planquery plan–– New plan merged with existing plansNew plan merged with existing plans–– Users can also create & manipulate plans directlyUsers can also create & manipulate plans directly

Plans composed of three main components:Plans composed of three main components:–– OperatorsOperators–– QueuesQueues–– Synopses/StatesSynopses/States (windows, operators requiring(windows, operators requiring

history)history) Global Global schedulerscheduler for plan execution for plan execution

Page 47: Data Stream Processing - Uni Konstanz

4747

Operators used inOperators used inSTREAM query plansSTREAM query plans

Page 48: Data Stream Processing - Uni Konstanz

4848

QueueQueue

A queue in a query plan connects its“producing” plan operator OP to its“consuming” operator OC

The elements that OP produces are insertedinto the queue and buered there until theyare processed by OC

Elements in a queue are increasing onElements in a queue are increasing ontimestamptimestamp–– To maintain the semantics of sliding windowTo maintain the semantics of sliding window

Page 49: Data Stream Processing - Uni Konstanz

4949

Synopsis/StateSynopsis/State

Logically, a synopsis belongs to a specific planoperator, storing state that may be required forfuture evaluation of that operator.

For example, to perform a windowed join oftwo streams, the join operator must be ableto probe all tuples in the current window oneach input stream. Thus, the join operatormaintains one synopsis (e.g., a hash table) foreach of its inputs. On the other hand,operators such as selection and duplicate-preserving union do not require any synopses.

State1 State2

Page 50: Data Stream Processing - Uni Konstanz

5050

A simple query plan illustratingA simple query plan illustratingoperators, queues, and synopsesoperators, queues, and synopses

Query:Select *From S1 [Rows 1000], S2 [Range 2 Minutes]Where S1.A = S2.A And S1.A > 10

q3 holds elementsrepresenting therelation "S1 [Rows1000]"

q4 holds elements for"S2 [Range 2 Minutes]“

Page 51: Data Stream Processing - Uni Konstanz

5151

A simple query plan illustratingA simple query plan illustratingoperators, queues, and synopsesoperators, queues, and synopses

Query:Select *From S1 [Rows 1000], S2 [Range 2 Minutes]Where S1.A = S2.A And S1.A > 10

q5 holds elements ofthe joined relation "S1

[Rows 1000] S2[Range 2 Minutes]"

Page 52: Data Stream Processing - Uni Konstanz

5252

A simple query plan illustratingA simple query plan illustratingoperators, queues, and synopsesoperators, queues, and synopses

The select operator canbe pushed down intoone or both branchesbelow the binary-joinoperator, and alsobelow the seq-windowoperator on S2.

However, tuple-basedwindows do notcommute with filterconditions, andtherefore the selectoperator cannot bepushed below the seq-window operator on S1.

Page 53: Data Stream Processing - Uni Konstanz

5353

A simple query plan illustratingA simple query plan illustratingoperators, queues, and synopsesoperators, queues, and synopses

Each seq-windowoperator maintains asynopsis so that it cangenerate "−" elementswhen tuples expirefrom the slidingwindow.

The binary-joinoperator maintains asynopsis materializingeach of its relationalinputs for use inperforming joins withtuples on the oppositeinput.

Page 54: Data Stream Processing - Uni Konstanz

5454

A simple query plan illustratingA simple query plan illustratingoperators, queues, and synopsesoperators, queues, and synopses

The contents ofsynopsis1 andsynopsis3 aresimilar (as are thecontents ofsynopsis2 andsynopsis4)– both maintain a

materialization ofthe same window

– but at slightlydifferent positions ofstream S1.

Page 55: Data Stream Processing - Uni Konstanz

5555

A simple query plan illustratingA simple query plan illustratingoperators, queues, and synopsesoperators, queues, and synopses

Query:Select *From S1 [Rows 1000], S2 [Range 2 Minutes]Where S1.A = S2.A And S1.A > 10

Query PlanQuery PlanExecutionExecution

Page 56: Data Stream Processing - Uni Konstanz

5656

Any questions?Any questions?

Page 57: Data Stream Processing - Uni Konstanz

5757

Memory Overhead inMemory Overhead inQuery ProcessingQuery Processing

Queues + StateQueues + State Continuous queries keep stateContinuous queries keep state

indefinitelyindefinitely Online requirements suggest usingOnline requirements suggest using

memory rather than diskmemory rather than disk Goal: minimize memory use whileGoal: minimize memory use while

providing timely, accurate answersproviding timely, accurate answers

Page 58: Data Stream Processing - Uni Konstanz

5858

Reducing MemoryReducing MemoryOverheadOverhead

1)1) Enable Enable state sharingstate sharing within andwithin andacross queriesacross queries

2)2) Exploit Exploit constraints on streamsconstraints on streams to toreduce statereduce state

3)3) Specialized Specialized operator schedulingoperator scheduling to toreduce queue sizesreduce queue sizes

Page 59: Data Stream Processing - Uni Konstanz

5959

State sharing in one query planState sharing in one query plan

Multiple synopsesMultiple synopseswithin a single querywithin a single queryplan may materializeplan may materializenearly identical relationsnearly identical relations

Select *From S1 [Rows 1000], S2 [Range 2 Minutes]Where S1.A = S2.A And

S1.A > 10 The contents of

synopsis1 andsynopsis3 are similar(as are the contents ofsynopsis2 andsynopsis4)

Page 60: Data Stream Processing - Uni Konstanz

6060

State sharing in one query planState sharing in one query plan

Use light-weightUse light-weightstubs to replacestubs to replacethe synopsesthe synopses–– Implement theImplement the

same interfaces assame interfaces asnon-sharednon-sharedsynopsessynopses

A single store toA single store tohold the actualhold the actualtuplestuples

Page 61: Data Stream Processing - Uni Konstanz

6161

State sharing in multiple plansState sharing in multiple plans

Q1:Select *From S1 [Rows 1000], S2 [Range 2 Minutes]Where S1.A = S2.A And S1.A > 10Q2:Select A, Max(B)From S1 [Rows 200]Group By A

Clearly the store mustClearly the store mustcontain the union of itscontain the union of itscorresponding stubs:corresponding stubs:–– A A tupletuple is inserted into the is inserted into the

store as soon as it isstore as soon as it isinserted by any one of theinserted by any one of thestubsstubs

–– A A tupletuple is removed only is removed onlywhen it has been removedwhen it has been removedfrom all of the stubs.from all of the stubs.

Page 62: Data Stream Processing - Uni Konstanz

6262

Any questions?Any questions?

Page 63: Data Stream Processing - Uni Konstanz

6363

Stream ConstraintsStream Constraints

For many queries, large or unbounded state isFor many queries, large or unbounded state isrequired for required for arbitraryarbitrary streams streams

Orders (Orders (orderIDorderID, customer, cost), customer, cost)Fulfillments (Fulfillments (orderIDorderID, portion, portion, clerk), clerk)

Select Select Sum(Sum(O.costO.cost))From Orders O, Fulfillments F [Range 1 Day]From Orders O, Fulfillments F [Range 1 Day]Where Where O.orderIDO.orderID = = F.orderIDF.orderID And And F.clerkF.clerk = = ““SueSue”” And And O.customerO.customer = = ““JoeJoe””

If there are no constraints, we have to keep allIf there are no constraints, we have to keep allOrders Orders tuplestuples..

Page 64: Data Stream Processing - Uni Konstanz

6464

kk-constraints-constraints

But streams may exhibit But streams may exhibit constraintsconstraintsthat reduce, bound, or even eliminatethat reduce, bound, or even eliminatestatesstates–– ClusteredClustered–– OrderedOrdered–– Stream-based referential integrityStream-based referential integrity

Relaxed version: Relaxed version: kk-constraints-constraints

Page 65: Data Stream Processing - Uni Konstanz

6565

Clustered-arrival Clustered-arrival kk-constraint-constraint

A clustered-arrival k-constraint on a stream attribute S.Adefines a bound k on the distance between any two elementsthat have the same value of S.A.

Select Select Sum(Sum(O.costO.cost))From Orders O, Fulfillments F [Range 1 Day]From Orders O, Fulfillments F [Range 1 Day]Where Where O.orderIDO.orderID = = F.orderIDF.orderID And And F.clerkF.clerk = = ““SueSue”” And And O.customerO.customer = = ““JoeJoe””

If If FulfillmentsFulfillments is is kk-clustered-clustered on on orderIDorderID, can infer when to, can infer when todiscard discard Orders Orders tupletuple

When there are more than When there are more than kk tuplestuples their their orderIDorderID<>oID1<>oID1after we read the first after we read the first tupletuple its its orderIDorderID=oID1, then we can=oID1, then we candiscard the Orders discard the Orders tupletuple its its orderIDorderID=oID1.=oID1.

For the special case of For the special case of kk=0 for this constraint, the=0 for this constraint, theFulfillments stream is Fulfillments stream is strict clustered.strict clustered.

Orders (Orders (orderIDorderID, customer, cost), customer, cost)Fulfillments (Fulfillments (orderIDorderID, portion, portion, clerk), clerk)

Page 66: Data Stream Processing - Uni Konstanz

6666

Ordered-arrival Ordered-arrival kk-constraint-constraint

An ordered-arrival k-constraint on a stream attribute S.Adefines a bound k on the amount of reordering in values ofS.A. Specifically, given any tuple s in stream S, for all tupless’ that arrive at least k + 1 elements after s, it must be truethat s’.A>=s.A (or s’.A<=s.A)* .

Select Select Sum(Sum(O.costO.cost))From Orders O, Fulfillments F [Range 1 Day]From Orders O, Fulfillments F [Range 1 Day]Where Where O.orderIDO.orderID = = F.orderIDF.orderID And And F.clerkF.clerk = = ““SueSue”” And And O.customerO.customer = = ““JoeJoe””

If If FulfillmentsFulfillments is is kk-ordered-ordered on on orderIDorderID, can infer when to, can infer when todiscard discard Orders Orders tupletupless

When there are more than When there are more than kk tuplestuples their their orderIDorderID<>oID1<>oID1after we read the first after we read the first tupletuple its its orderIDorderID=oID1, then we can=oID1, then we candiscard the Orders discard the Orders tuplestuples there there orderIDorderID=>oID1 (or=>oID1 (ororderIDorderID<=oID1)*.<=oID1)*.–– We may discard a batch of Orders We may discard a batch of Orders tuplestuples at one time at one time

For the special case of For the special case of kk=0 for this constraint, the=0 for this constraint, theFulfillments stream is Fulfillments stream is strict ordered.strict ordered.

Page 67: Data Stream Processing - Uni Konstanz

6767

Referential integrityReferential integritykk-constraint-constraint

A referential integrity k-constraint on a many-one join betweenstreams defines a bound k on the delay between the arrival of atuple on the “many” stream and the arrival of its joining “one” tupleon the other stream.

Select Select Sum(Sum(O.costO.cost))From Orders O, Fulfillments F [Range 1 Day]From Orders O, Fulfillments F [Range 1 Day]Where Where O.orderIDO.orderID = = F.orderIDF.orderID And And F.clerkF.clerk = = ““SueSue”” And And O.customerO.customer = = ““JoeJoe””

If If FulfillmentsFulfillments is is kk--referential-integrityreferential-integrity on on orderIDorderID, can, caninfer when to discard infer when to discard Orders Orders tupletuple

After we get an Orders After we get an Orders tupletuple its its orderIDorderID=oID1, all=oID1, allFulfillments Fulfillments tuplestuples with same with same orderIDorderID will arrive at within will arrive at within kktuplestuples in Fulfillments stream; so we can discard the Orders in Fulfillments stream; so we can discard the Orderstupletuple its its orderIDorderID=oID1 after we read =oID1 after we read kk tuplestuples in Fulfillments in Fulfillmentsstream.stream.

For the special case of For the special case of kk=0 for this constraint, termed =0 for this constraint, termed strictstrictreferential integrityreferential integrity, corresponding Fulfillments , corresponding Fulfillments tuplestuples will willalways arrive before Orders always arrive before Orders tupletuple. (Though it is not logical in. (Though it is not logical inthis example.)this example.)

Page 68: Data Stream Processing - Uni Konstanz

6868

Query execution plans reduce orQuery execution plans reduce oreliminate state based on eliminate state based on kk-constraints-constraints– The smaller the value of k for each

constraint, the more state that can bediscarded.

Page 69: Data Stream Processing - Uni Konstanz

6969

Exploiting ConstraintsExploiting Constraints

Stream data may be unpredictable andStream data may be unpredictable andvariable, so variable, so ……

Continuously monitorContinuously monitor streams to identify streams to identify kk--constraints relevant to queriesconstraints relevant to queries

If constraints violated, get If constraints violated, get approximateapproximateresultsresults

Details in: Details in: ““Exploiting k-Constraints to ReduceExploiting k-Constraints to ReduceMemory Overhead in Continuous Queries overMemory Overhead in Continuous Queries overData StreamsData Streams””, TODS 2004, TODS 2004

Page 70: Data Stream Processing - Uni Konstanz

7070

Any questions?Any questions?

Page 71: Data Stream Processing - Uni Konstanz

7171

Operator SchedulingOperator Scheduling

Many possible scheduling objectives: minimizeMany possible scheduling objectives: minimizecomputation, memory use, latency, inaccuracy,computation, memory use, latency, inaccuracy,starvation, starvation, ……

Page 72: Data Stream Processing - Uni Konstanz

7272

Operator SchedulingOperator Scheduling

Many possible scheduling objectives: minimizeMany possible scheduling objectives: minimizecomputationcomputation, , memory usememory use, latency, inaccuracy,, latency, inaccuracy,starvation, starvation, ……

If the operator sequence is not fixedIf the operator sequence is not fixed–– Optimize the sequence, reorder the sequenceOptimize the sequence, reorder the sequence–– Pipelined Filters, Pipelined Filters, ““A-GreedyA-Greedy””

If the operator sequence is fixedIf the operator sequence is fixed–– Optimize the operator scheduling in running time toOptimize the operator scheduling in running time to

minimize the memory useminimize the memory use–– ““ChainChain””

Page 73: Data Stream Processing - Uni Konstanz

7373

Pipelined FiltersPipelined Filters

Filter1

PacketsPackets

Bad packetsBad packets

Filter2

Filter3

Commutative filters over a streamCommutative filters over a stream Example:Example: Track HTTP packets Track HTTP packets

with destination address matchingwith destination address matchinga prefix in given table anda prefix in given table andcontent matchingcontent matching

Simple to complex filtersSimple to complex filters–– Boolean predicatesBoolean predicates–– Table lookupsTable lookups–– Pattern matchingPattern matching–– User-defined functionsUser-defined functions

Page 74: Data Stream Processing - Uni Konstanz

7474

Pipelined Filters:Pipelined Filters:Problem DefinitionProblem Definition

Commutative filters:Commutative filters: F F11, F, F22 , ,……, F, Fnn

Plan:Plan: TuplesTuples FFππ(1)(1) F Fππ(2)(2) …… …… FFππ(n(n))

Goal:Goal: Minimize expected cost to Minimize expected cost toprocess a process a tupletuple

Page 75: Data Stream Processing - Uni Konstanz

7575

Pipelined Filters: ExamplePipelined Filters: Example

1234

456

8

1 12 23

77

1122

F1 F2 F3 F4

1

Input tuples Output tuples

Informal Goal: If tuple will be dropped, then drop it as cheaply as possible

Page 76: Data Stream Processing - Uni Konstanz

7676

Why is Our ProblemWhy is Our ProblemHard?Hard?

High drop-rate firstHigh drop-rate first Low cost firstLow cost first High High drop-rate/costdrop-rate/cost first first Filter drop-rates and costs can changeFilter drop-rates and costs can change

over timeover time Filters can be Filters can be correlatedcorrelated

E.g., Protocol = HTTP and E.g., Protocol = HTTP and DestPortDestPort = 80 = 80

Page 77: Data Stream Processing - Uni Konstanz

7777

Speed of Speed of adaptivityadaptivity–– Detecting changes andDetecting changes and

finding new planfinding new plan

Run-time overheadRun-time overhead–– Re-optimization, collectingRe-optimization, collecting

statistics, plan switchingstatistics, plan switching

Convergence propertiesConvergence properties–– Plan properties under stablePlan properties under stable

statisticsstatistics

ProfilerProfiler Re-optimizerRe-optimizer

ExecutorExecutor

StreaMonStreaMon

Metrics for an AdaptiveMetrics for an AdaptiveAlgorithmAlgorithm

Page 78: Data Stream Processing - Uni Konstanz

7878

Assume statistics are not changingAssume statistics are not changing–– Order filters by decreasing drop-rate/costOrder filters by decreasing drop-rate/cost

–– Correlations Correlations NP-HardNP-Hard

Greedy algorithm: Greedy algorithm: Use conditionalUse conditionalstatisticsstatistics

1.1. FFππ(1)(1) has maximum drop-rate/cost has maximum drop-rate/cost

2.2. FFππ(2)(2) has maximum drop-rate/cost ratio for has maximum drop-rate/cost ratio fortuplestuples not dropped by F not dropped by Fππ(1)(1)

3.3. And so onAnd so on……

Pipelined Filters: StablePipelined Filters: StableStatisticsStatistics

Page 79: Data Stream Processing - Uni Konstanz

7979

Challenge:Challenge:–– Online algorithmOnline algorithm–– Fast Fast adaptivityadaptivity to Greedy ordering to Greedy ordering–– Low run-time overheadLow run-time overhead

A-Greedy: Adaptive GreedyA-Greedy: Adaptive Greedy

Adaptive Version of GreedyAdaptive Version of Greedy

Page 80: Data Stream Processing - Uni Konstanz

8080

Profiler: Maintains conditionalfilter drop-rates and costs

over recent tuples

Executor:Processes tuples with

current Greedy ordering

Re-optimizer: Ensures thatfilter ordering is Greedy for

current statisticsstatistics

Estimated

are requiredWhich statistics

Combined in part for

efficiency

Changes infilter ordering

A-GreedyA-Greedy

Page 81: Data Stream Processing - Uni Konstanz

8181

Main innovation: A-Main innovation: A-GreedyGreedy’’ss Profiler Profiler

Responsible for maintaining currentResponsible for maintaining currentstatisticsstatistics–– Filter costsFilter costs–– Conditional filter drop-rates: exponential!Conditional filter drop-rates: exponential!

Profile Window:Profile Window: Sampled statistics Sampled statisticsfrom which required conditional drop-from which required conditional drop-rates can be estimatedrates can be estimated

Page 82: Data Stream Processing - Uni Konstanz

8282

Profile WindowProfile Window

1234

456

8

1 12 23

77

44

0 1 1 0

0 0 1 11 0 0 1

1 0 0 1 ProfileWindow

1

F1 F2 F3 F4

Page 83: Data Stream Processing - Uni Konstanz

8383

Greedy Ordering Using ProfileGreedy Ordering Using ProfileWindowWindow

111100000000110000001100001100111100000000110011

F1 F2 F3 F4

22332222

F1 F2 F3 F4

22222233

F3 F1 F2 F4

112200

22222233

F3 F2 F4 F1

001122

0011Matrix View Greedy Ordering

Page 84: Data Stream Processing - Uni Konstanz

8484

Conclusions of A-GreedyConclusions of A-Greedy

Fast Fast adaptivityadaptivity to Greedy ordering to Greedy ordering–– Running cost of A-Greedy itself is lowRunning cost of A-Greedy itself is low

It can get the best plan in almost all cases.It can get the best plan in almost all cases.–– Running cost of filters is best almostRunning cost of filters is best almost

Low run-time overheadLow run-time overhead

Details in: Details in: ““Adaptive Processing ofAdaptive Processing ofPipelined Stream FiltersPipelined Stream Filters””, SIGMOD 2004, SIGMOD 2004

Page 85: Data Stream Processing - Uni Konstanz

8585

Any questions?Any questions?

Page 86: Data Stream Processing - Uni Konstanz

8686

Operator Scheduling inOperator Scheduling inRunning TimeRunning Time

Problem: The operator sequence isProblem: The operator sequence isfixed.fixed.

Goal: Optimize the operatorGoal: Optimize the operatorscheduling in running time toscheduling in running time tominimize the memory occupationminimize the memory occupation

Page 87: Data Stream Processing - Uni Konstanz

8787

A simple exampleA simple example

Two operators, O1 followed by O2– O1 takes one time unit to process a batch

of n elements, and it produces 0.2n outputelements per input batch.

– O2 takes one time unit to operate on 0.2nelements, and it sends its output out of thesystem.

Consider the following arrival pattern:n elements arrive at every time instantfrom t = 0 to t = 6, then no elementsarrive from time t = 7 through t = 13.

O2

O1

Page 88: Data Stream Processing - Uni Konstanz

8888

FIFO scheduling FIFO scheduling && Greedy Greedyschedulingscheduling

FIFO schedulingFIFO scheduling: When batches of : When batches of nn elements elementshave been accumulated, they are passed throughhave been accumulated, they are passed throughboth operators in two consecutive time units,both operators in two consecutive time units,during which no other element is processed.during which no other element is processed.

Greedy schedulingGreedy scheduling: At any time instant, if there: At any time instant, if thereis a batch of n elements buffered before O1, it isis a batch of n elements buffered before O1, it isprocessed in one time unit. Otherwise, if there areprocessed in one time unit. Otherwise, if there aremore than 0.2n elements buffered before O2, thenmore than 0.2n elements buffered before O2, then0.2n elements are processed using one time unit.0.2n elements are processed using one time unit.This strategy is "greedy" since it gives preferenceThis strategy is "greedy" since it gives preferenceto the operator that has the to the operator that has the greatest rate ofgreatest rate ofreduction in total queue sizereduction in total queue size per unit time. per unit time.

Page 89: Data Stream Processing - Uni Konstanz

8989

FIFO scheduling FIFO scheduling vsvs Greedy Greedyschedulingscheduling

1.41.4

OO11

3.23.2

OO11

77

MemMem..

Op.Op.

MemMem..

Op.Op.

2.22.22.02.01.81.81.61.61.41.41.21.21.01.0

OO11OO11OO11OO11OO11OO11GreedyGreedy

4.04.03.23.23.03.02.22.22.02.01.21.21.01.0

OO22OO11OO22OO11OO22OO11FIFOFIFO

66554433221100TimeTime

0.00.0

OO22

0.00.0

OO22

1414

1.21.2

2.12.1

Avg.Avg.(0~13(0~13

))

1.21.2

OO22

3.03.0

OO22

88

1.01.0

OO22

2.22.2

OO11

99

0.80.8

OO22

2.02.0

OO22

1010

0.60.6

OO22

1.21.2

OO11

1111

0.40.4

OO22

1.01.0

OO22

1212

0.20.2

OO22

0.20.2

OO11

1313

MemMem..

Op.Op.

MemMem..

Op.Op.

GreedyGreedy

FIFOFIFO

TimeTime

Page 90: Data Stream Processing - Uni Konstanz

9090

Is Greedy scheduling goodIs Greedy scheduling goodenough?enough?

In the above example, GreedyIn the above example, Greedyscheduling seems as a goodscheduling seems as a goodapproach.approach.

Is it good enough?Is it good enough?

O2

O1

Page 91: Data Stream Processing - Uni Konstanz

9191

Another exampleAnother example

O1 produces 0.9n elements per n inputelements in one time unit

O2 processes 0.9n elements in one timeunit without changing the input size

O3 processes 0.9n elements in one timeunit and sends its output out of thesystem

Priority: Priority: O3 > O1 > O2

O1

O2

O3

Page 92: Data Stream Processing - Uni Konstanz

9292

FIFO scheduling FIFO scheduling vsvs Greedy Greedyschedulingscheduling

In this case, FIFO scheduling is better than GreedyIn this case, FIFO scheduling is better than Greedyschedulingscheduling

……

……

……

……

……

0.0.00

OO33

0.0.00

OO33

2121

5.35.3

OO33

3.93.9

OO22

1111

3.63.6

2.92.9

Avg.Avg.(0~20(0~20

))

5.45.4

OO22

3.93.9

OO11

1010

5.45.4

OO33

4.04.0

OO33

99

6.36.3

OO22

4.94.9

OO22

88

6.36.3

OO11

4.94.9

OO11

77

MemMem..

Op.Op.

MemMem..

Op.Op.

6.46.45.55.54.64.63.73.72.82.81.91.91.01.0

OO11OO11OO11OO11OO11OO11

GreedyGreedy

5.05.04.94.93.93.93.03.02.92.91.91.91.01.0

OO33OO22OO11OO33OO22OO11

FIFOFIFO

66554433221100TimeTime

Page 93: Data Stream Processing - Uni Konstanz

9393

Greedy schedulingGreedy scheduling

Under the greedy strategy, although O3 hashighest priority, sometimes it is "blocked" fromrunning because it is preceded by O2, the operatorwith the lowest priority.

If O1, O2 and O3 are viewed as a single block,then together they reduce n elements to zeroelements over three units of time, for an averagereduction of 0.33n elements per unit time--betterthan the reduction rate of 0.1n elements O1provides.– Since the greedy algorithm considers individual operators

only, it does not take advantage of this fact.

Page 94: Data Stream Processing - Uni Konstanz

9494

Chain SchedulingChain Scheduling

The chain scheduling algorithm formsblocks (“chains”) of operators as follows:

Start by marking the first operatorin the plan as the “current”operator.

Next, find the block of consecutiveoperators starting at the "current"operator that maximizes thereduction in total queue sizeper unit time.

Mark the first operator followingthis block as the "current" operatorand repeat the previous step untilall operators have been assigned tochains.

Chains are scheduled according tothe greedy algorithm, but within achain, execution proceeds in FIFOorder.

Page 95: Data Stream Processing - Uni Konstanz

9595

Implied assumptionImplied assumption

We assume that the selectivities andper-tuple processing times are knownfor each operator.We use these toconstruct the progress chart asexplained above.

Page 96: Data Stream Processing - Uni Konstanz

9696

Gather statistics

Selectivities and processing times could belearned during query execution bygathering statistics over a period of time. Ifwe expect these values to change overtime, we could use the following strategy:1.divide time into fixed windows and collect

statistics independently in each window;2.use the statistics from the ith window to

compute the progress chart for the (i + 1)stwindow.

Page 97: Data Stream Processing - Uni Konstanz

9797

Multi-stream queriesMulti-stream queries

The query plan is a tree instead of aThe query plan is a tree instead of aqueuequeue

Page 98: Data Stream Processing - Uni Konstanz

9898

Any questions?Any questions?

Page 99: Data Stream Processing - Uni Konstanz

9999

Omitted topicsOmitted topics

Coping with OverloadCoping with Overload–– ““Load-sheddingLoad-shedding”” ≈≈ discarding discarding tuplestuples–– What is definition of What is definition of ““bestbest””??

Page 100: Data Stream Processing - Uni Konstanz

100100

Omitted topicsOmitted topics

Coping with Changing ConditionsCoping with Changing Conditions Continuous queries are long-running;Continuous queries are long-running;

conditions may changeconditions may change–– Data characteristics, arrival characteristics,Data characteristics, arrival characteristics,

query load, available resources, systemquery load, available resources, systemconditions, conditions, ……

–– Solution: Solution: self-monitoring self-monitoring andand adaptivityadaptivity Other results:Other results:

–– Adaptive operator reorderingAdaptive operator reordering–– Adaptive cachingAdaptive caching

Page 101: Data Stream Processing - Uni Konstanz

101101

ReferencesReferences

http://www-http://www-db.stanford.edudb.stanford.edu/stream//stream/ STREAM: The Stanford Data Stream Management System,STREAM: The Stanford Data Stream Management System,

to appear in a book on data stream managementto appear in a book on data stream management STREAM: The Stanford Stream Data ManagerSTREAM: The Stanford Stream Data Manager, IEEE Data, IEEE Data

Engineering Bulletin 2003Engineering Bulletin 2003 Models and Issues in Data Stream Systems, PODS 2002Models and Issues in Data Stream Systems, PODS 2002 The CQL Continuous Query Language: SemanticThe CQL Continuous Query Language: Semantic

Foundations and Query Execution, to appear in VLDBFoundations and Query Execution, to appear in VLDBJournalJournal

Exploiting k-Constraints to Reduce Memory Overhead inExploiting k-Constraints to Reduce Memory Overhead inContinuous Queries over Data Streams, TODS 2004Continuous Queries over Data Streams, TODS 2004

Adaptive Ordering of Pipelined Stream Filters, SIGMODAdaptive Ordering of Pipelined Stream Filters, SIGMOD20042004

Operator Scheduling in Data Stream Systems, to appear inOperator Scheduling in Data Stream Systems, to appear inVLDB JournalVLDB Journal

Page 102: Data Stream Processing - Uni Konstanz

102102

ThanksThanks