debs grand challenge: rdf stream processing …...debs grand challenge: rdf stream processing with...

33
DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh Nguyen Duc Manfred Hauswirth The 9th ACM International Conference on Distributed Event-Based Systems Oslo, June 30, 2015

Upload: others

Post on 22-May-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

DEBS Grand Challenge:RDF Stream Processing with CQELS

Framework for Real-time Analysis

Danh Le-Phuoc Minh Dao-Tran Anh Le TuanManh Nguyen Duc Manfred Hauswirth

The 9th ACM International Conference onDistributed Event-Based Systems

Oslo, June 30, 2015

Page 2: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

Why RDF Stream Processing?

Page 3: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

Interoperability

Page 4: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

Interoperability

Page 5: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

Interoperability

Page 6: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

Interoperability

Page 7: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

RDF

SubjectPredicate

Object

07290D3599E7A0D62097A346EFCC1FB5,

2013-01-01 00:00:00,2013-01-01 00:02:00,

-73.956528,40.716976,-73.962440,40.715008, 3.50,0.00

: trip1 : taxi "07290D3599E7A0D62097A346EFCC1FB5".: trip1 :pickup datetime "2013-01-01 00:00:00".: trip1 :dropoff datetime "2013-01-01 00:02:00".: trip1 :pickLon -73.956528.: trip1 :pickLat 40.716976.: trip1 :dropLon -73.962440.: trip1 :dropLat 40.715008.: trip1 : fare 3.5.: trip1 : tip 0.0.

Page 8: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

RDF

SubjectPredicate

Object

07290D3599E7A0D62097A346EFCC1FB5,

2013-01-01 00:00:00,2013-01-01 00:02:00,

-73.956528,40.716976,-73.962440,40.715008, 3.50,0.00

: trip1 : taxi "07290D3599E7A0D62097A346EFCC1FB5".: trip1 :pickup datetime "2013-01-01 00:00:00".: trip1 :dropoff datetime "2013-01-01 00:02:00".: trip1 :pickLon -73.956528.: trip1 :pickLat 40.716976.: trip1 :dropLon -73.962440.: trip1 :dropLat 40.715008.: trip1 : fare 3.5.: trip1 : tip 0.0.

Page 9: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

RDF

SubjectPredicate

Object

07290D3599E7A0D62097A346EFCC1FB5,

2013-01-01 00:00:00,2013-01-01 00:02:00,

-73.956528,40.716976,-73.962440,40.715008, 3.50,0.00

: trip1 : taxi "07290D3599E7A0D62097A346EFCC1FB5".: trip1 :pickup datetime "2013-01-01 00:00:00".: trip1 :dropoff datetime "2013-01-01 00:02:00".: trip1 :pickLon -73.956528.: trip1 :pickLat 40.716976.: trip1 :dropLon -73.962440.: trip1 :dropLat 40.715008.: trip1 : fare 3.5.: trip1 : tip 0.0.

Page 10: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

RDF

SubjectPredicate

Object

07290D3599E7A0D62097A346EFCC1FB5,

2013-01-01 00:00:00,2013-01-01 00:02:00,

-73.956528,40.716976,-73.962440,40.715008, 3.50,0.00

: trip1 : taxi "07290D3599E7A0D62097A346EFCC1FB5".: trip1 :pickup datetime "2013-01-01 00:00:00".: trip1 :dropoff datetime "2013-01-01 00:02:00".: trip1 :pickLon -73.956528.: trip1 :pickLat 40.716976.: trip1 :dropLon -73.962440.: trip1 :dropLat 40.715008.: trip1 : fare 3.5.: trip1 : tip 0.0.

Page 11: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

SPARQLSELECT

(ROUND((41.474937-?pLat)/0.005986) AS ?pE)(ROUND((74.913585+?pLon)/0.004491556) AS ?pS)(ROUND((41.474937-?dLat)/0.005986) AS ?dE)(ROUND((74.913585+?dLon)/0.004491556) AS ?dS)(COUNT(?trip) AS ?freq)

FROM

<http://example/taxi.rdf>

WHERE

{?trip :pickLon ?pLon. ?trip :pickLat ?pLat.?trip :dropLon ?dLon. ?trip :dropLat ?dLat.

}GROUP BY ?pE ?pS ?dE ?dSHAVING (?pE>0 && ?pE<301 && ?pS>0 && ?pS<301 &&

?dE>0 && ?dE<301 && ?dS>0 && ?dS<301)ORDER BY ?freqLIMIT 10

{{?pE 7→ 127, ?pS 7→ 213, ?dE 7→ 127, ?dS 7→ 212, ?freq 7→ 1}} .

Page 12: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

SPARQLSELECT

(ROUND((41.474937-?pLat)/0.005986) AS ?pE)(ROUND((74.913585+?pLon)/0.004491556) AS ?pS)(ROUND((41.474937-?dLat)/0.005986) AS ?dE)(ROUND((74.913585+?dLon)/0.004491556) AS ?dS)(COUNT(?trip) AS ?freq)

FROM <http://example/taxi.rdf>WHERE

{?trip :pickLon ?pLon. ?trip :pickLat ?pLat.?trip :dropLon ?dLon. ?trip :dropLat ?dLat.

}GROUP BY ?pE ?pS ?dE ?dSHAVING (?pE>0 && ?pE<301 && ?pS>0 && ?pS<301 &&

?dE>0 && ?dE<301 && ?dS>0 && ?dS<301)ORDER BY ?freqLIMIT 10

{{?pE 7→ 127, ?pS 7→ 213, ?dE 7→ 127, ?dS 7→ 212, ?freq 7→ 1}} .

Page 13: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

SPARQLSELECT

(ROUND((41.474937-?pLat)/0.005986) AS ?pE)(ROUND((74.913585+?pLon)/0.004491556) AS ?pS)(ROUND((41.474937-?dLat)/0.005986) AS ?dE)(ROUND((74.913585+?dLon)/0.004491556) AS ?dS)(COUNT(?trip) AS ?freq)

FROM <http://example/taxi.rdf>WHERE {

?trip :pickLon ?pLon. ?trip :pickLat ?pLat.?trip :dropLon ?dLon. ?trip :dropLat ?dLat.

}

GROUP BY ?pE ?pS ?dE ?dSHAVING (?pE>0 && ?pE<301 && ?pS>0 && ?pS<301 &&

?dE>0 && ?dE<301 && ?dS>0 && ?dS<301)ORDER BY ?freqLIMIT 10

{{?pE 7→ 127, ?pS 7→ 213, ?dE 7→ 127, ?dS 7→ 212, ?freq 7→ 1}} .

Page 14: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

SPARQLSELECT (ROUND((41.474937-?pLat)/0.005986) AS ?pE)

(ROUND((74.913585+?pLon)/0.004491556) AS ?pS)(ROUND((41.474937-?dLat)/0.005986) AS ?dE)(ROUND((74.913585+?dLon)/0.004491556) AS ?dS)

(COUNT(?trip) AS ?freq)

FROM <http://example/taxi.rdf>WHERE {

?trip :pickLon ?pLon. ?trip :pickLat ?pLat.?trip :dropLon ?dLon. ?trip :dropLat ?dLat.

}

GROUP BY ?pE ?pS ?dE ?dSHAVING (?pE>0 && ?pE<301 && ?pS>0 && ?pS<301 &&

?dE>0 && ?dE<301 && ?dS>0 && ?dS<301)ORDER BY ?freqLIMIT 10

{{?pE 7→ 127, ?pS 7→ 213, ?dE 7→ 127, ?dS 7→ 212, ?freq 7→ 1}} .

Page 15: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

SPARQLSELECT (ROUND((41.474937-?pLat)/0.005986) AS ?pE)

(ROUND((74.913585+?pLon)/0.004491556) AS ?pS)(ROUND((41.474937-?dLat)/0.005986) AS ?dE)(ROUND((74.913585+?dLon)/0.004491556) AS ?dS)

(COUNT(?trip) AS ?freq)

FROM <http://example/taxi.rdf>WHERE {

?trip :pickLon ?pLon. ?trip :pickLat ?pLat.?trip :dropLon ?dLon. ?trip :dropLat ?dLat.

}

GROUP BY ?pE ?pS ?dE ?dS

HAVING (?pE>0 && ?pE<301 && ?pS>0 && ?pS<301 &&?dE>0 && ?dE<301 && ?dS>0 && ?dS<301)

ORDER BY ?freqLIMIT 10

{{?pE 7→ 127, ?pS 7→ 213, ?dE 7→ 127, ?dS 7→ 212, ?freq 7→ 1}} .

Page 16: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

SPARQLSELECT (ROUND((41.474937-?pLat)/0.005986) AS ?pE)

(ROUND((74.913585+?pLon)/0.004491556) AS ?pS)(ROUND((41.474937-?dLat)/0.005986) AS ?dE)(ROUND((74.913585+?dLon)/0.004491556) AS ?dS)(COUNT(?trip) AS ?freq)

FROM <http://example/taxi.rdf>WHERE {

?trip :pickLon ?pLon. ?trip :pickLat ?pLat.?trip :dropLon ?dLon. ?trip :dropLat ?dLat.

}GROUP BY ?pE ?pS ?dE ?dSHAVING (?pE>0 && ?pE<301 && ?pS>0 && ?pS<301 &&

?dE>0 && ?dE<301 && ?dS>0 && ?dS<301)

ORDER BY ?freqLIMIT 10

{{?pE 7→ 127, ?pS 7→ 213, ?dE 7→ 127, ?dS 7→ 212, ?freq 7→ 1}} .

Page 17: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

SPARQLSELECT (ROUND((41.474937-?pLat)/0.005986) AS ?pE)

(ROUND((74.913585+?pLon)/0.004491556) AS ?pS)(ROUND((41.474937-?dLat)/0.005986) AS ?dE)(ROUND((74.913585+?dLon)/0.004491556) AS ?dS)(COUNT(?trip) AS ?freq)

FROM <http://example/taxi.rdf>WHERE {

?trip :pickLon ?pLon. ?trip :pickLat ?pLat.?trip :dropLon ?dLon. ?trip :dropLat ?dLat.

}GROUP BY ?pE ?pS ?dE ?dSHAVING (?pE>0 && ?pE<301 && ?pS>0 && ?pS<301 &&

?dE>0 && ?dE<301 && ?dS>0 && ?dS<301)ORDER BY ?freqLIMIT 10

{{?pE 7→ 127, ?pS 7→ 213, ?dE 7→ 127, ?dS 7→ 212, ?freq 7→ 1}} .

Page 18: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

SPARQLSELECT (ROUND((41.474937-?pLat)/0.005986) AS ?pE)

(ROUND((74.913585+?pLon)/0.004491556) AS ?pS)(ROUND((41.474937-?dLat)/0.005986) AS ?dE)(ROUND((74.913585+?dLon)/0.004491556) AS ?dS)(COUNT(?trip) AS ?freq)

FROM <http://example/taxi.rdf>WHERE {

?trip :pickLon ?pLon. ?trip :pickLat ?pLat.?trip :dropLon ?dLon. ?trip :dropLat ?dLat.

}GROUP BY ?pE ?pS ?dE ?dSHAVING (?pE>0 && ?pE<301 && ?pS>0 && ?pS<301 &&

?dE>0 && ?dE<301 && ?dS>0 && ?dS<301)ORDER BY ?freqLIMIT 10

{{?pE 7→ 127, ?pS 7→ 213, ?dE 7→ 127, ?dS 7→ 212, ?freq 7→ 1}} .

Page 19: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

RSP Queries in CQELS

SELECT (ROUND((41.474937-?pLat)/0.005986) AS ?pE)(ROUND((74.913585+?pLon)/0.004491556) AS ?pS)(ROUND((41.474937-?dLat)/0.005986) AS ?dE)(ROUND((74.913585+?dLon)/0.004491556) AS ?dS)(COUNT(?trip) AS ?freq)

WHERE {STREAM <Taxi> [RANGE 30 minutes] {?trip :pickLon ?pLon. ?trip :pickLat ?pLat.?trip :dropLon ?dLon. ?trip :dropLat ?dLat.

}}GROUP BY ?pE ?pS ?dE ?dSHAVING (?pE>0 && ?pE<301 && ?pS>0 && ?pS<301 &&

?dE>0 && ?dE<301 && ?dS>0 && ?dS<301)ORDER BY ?freqLIMIT 10

Page 20: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

Query Plan of Q2

Top-10./

MEDIAN

COUNTMINUS

[[P2]]

[[P3]]

[[P1]]

· · ·STaxi

range

15m

range 30m

range30m

Page 21: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

CQELS Solutions for the Grand Challenge

High Performance Data Structures Incremental Algorithms

Page 22: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

CQELS Solutions for the Grand Challenge

High Performance Data Structures

Incremental Algorithms

Page 23: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

CQELS Solutions for the Grand Challenge

High Performance Data Structures Incremental Algorithms

Page 24: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

Tree-based Data Structures

a1 {?c 7→ c1, ?d 7→ d1︸ ︷︷ ︸W3

}

a2 {?b 7→ b1, ?c 7→ c1}

a3 {?b 7→ b2, ?c 7→ c2︸ ︷︷ ︸W2

}

a4 {?a 7→ a1, ?b 7→ b1}a5 {?a 7→ a2, ?b 7→ b2}

a6 {?a 7→ a3, ?b 7→ b1︸ ︷︷ ︸W1

}./

12

a7

./11

a8

./13

a9 {?a 7→ a2, ?b 7→ b2, ?c 7→ c2}

./21

a10 {?a 7→ a1, ?b 7→ b1, ?c 7→ c1, ?d 7→ d1}

./22

a11 {?a 7→ a3, ?b 7→ b1, ?c 7→ c1, ?d 7→ d1} a1

a2

a3

a4

a5

a6

a7

a8

a9

a10

a11

Page 25: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

Input Buffers

a1 {?c 7→ c1, ?d 7→ d1︸ ︷︷ ︸W3

}

a2 {?b 7→ b1, ?c 7→ c1}

a3 {?b 7→ b2, ?c 7→ c2︸ ︷︷ ︸W2

}

a4 {?a 7→ a1, ?b 7→ b1}a5 {?a 7→ a2, ?b 7→ b2}

a6 {?a 7→ a3, ?b 7→ b1︸ ︷︷ ︸W1

}./

12

a7

./11

a8

./13

a9 {?a 7→ a2, ?b 7→ b2, ?c 7→ c2}

./21

a10 {?a 7→ a1, ?b 7→ b1, ?c 7→ c1, ?d 7→ d1}

./22

a11 {?a 7→ a3, ?b 7→ b1, ?c 7→ c1, ?d 7→ d1} a1

a2

a3

a4

a5

a6

a7

a8

a9

a10

a11

Page 26: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

Indexed Buffers

v1 4

v2 2

v3 1

v4 1

••••

key entry

......

v1

v2

v1

v3

v1

v4

v1

v2

Page 27: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

One-way Index

v1 4

v2 2

v3 1

v4 1

••••

key entry

......

v1

v2

v1

v3

v1

v4

v1

v2

Page 28: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

Ring Index

v1 4

v2 2

v3 1

v4 1

••••

key entry

......

v1

v2

v1

v3

v1

v4

v1

v2

Page 29: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

Deployment and Tests

CSV Input

Feeder CQELS

Listener Q1

Listener Q2

CSV Output

CSV Output

query registration

read CSV lines

RDF streamwrite CSV output

Page 30: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

Compare CQELS to a Base-line System

Avg. delay time (ms) Exe. time (s)

Indv. Mixed Indv. Mixed

Q1 Q2 Q1 Q2 Q1 Q2 Q1 & Q2

C 0.022 0.068 0.023 0.075 65.77 69.02 76.64B 5 194 174 220 5000 22000 28000

Delay time (per output):right after reading the input → right before writing the output

Execution time:first input line read → final output streamed out

Exp: vary window sizes → no change in average delay time!

Page 31: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

Compare CQELS to a Base-line System

Avg. delay time (ms) Exe. time (s)

Indv. Mixed Indv. Mixed

Q1 Q2 Q1 Q2 Q1 Q2 Q1 & Q2

C 0.022 0.068 0.023 0.075 65.77 69.02 76.64B 5 194 174 220 5000 22000 28000

Delay time (per output):right after reading the input → right before writing the output

Execution time:first input line read → final output streamed out

Exp: vary window sizes

→ no change in average delay time!

Page 32: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

Compare CQELS to a Base-line System

Avg. delay time (ms) Exe. time (s)

Indv. Mixed Indv. Mixed

Q1 Q2 Q1 Q2 Q1 Q2 Q1 & Q2

C 0.022 0.068 0.023 0.075 65.77 69.02 76.64B 5 194 174 220 5000 22000 28000

Delay time (per output):right after reading the input → right before writing the output

Execution time:first input line read → final output streamed out

Exp: vary window sizes → no change in average delay time!

Page 33: DEBS Grand Challenge: RDF Stream Processing …...DEBS Grand Challenge: RDF Stream Processing with CQELS Framework for Real-time Analysis Danh Le-Phuoc Minh Dao-Tran Anh Le Tuan Manh

Conclusions

RDF Stream Processing for the DEBS 2015 Grand Challenge

Techniques implemented in CQELSI High performance data structuresI Incremental algorithms (details in paper)

Experimental resultsI Comparison to a base-line systemI Experiment on varying window sizes