8th tuc meeting - eugene i. chong (oracle usa). balancing act to improve rdf query performance in...

18

Upload: ldbc-council

Post on 11-Jan-2017

13 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database

1 Confidential – Oracle Restricted

Page 2: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database

<Insert Picture Here>

Balancing Act to improve RDF Query Performance in Oracle Database Eugene I. Chong

Page 3: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database

3

Agenda

• RDF Query processing Issues • RDF Order-By and Filter Processing • RDF In-Memory Processing • RDF In-Memory Virtual Columns • Conclusion

Confidential – Oracle Restricted

Page 4: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database

4

Oracle RDF

• RDF_LINK$ table (triples) – normalized – subject, predicate, object IDs

• RDF_VALUE$ table (ID to value mapping) – value, type, etc.

• Issues – frequent joins with RDF_VALUE$ table to present results,

process filters and order-by queries – complete de-normalization incurs large storage requirements – self-joins: large intermediate join results

Page 5: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database

5

Oracle RDF Filters and Order-By Processing

• SPARQL order-by semantics – order: no values, blank nodes, IRIs, literals – case statement: value type, numeric value, date value, string

value – ORDER BY CASE WHEN (V4.VALUE_TYPE IS NULL)

THEN 0 WHEN (V4.VALUE_TYPE IN ('BLN','BN')) THEN 1 WHEN (V4.VALUE_TYPE IN ('URI','UR')) THEN 2 WHEN (V4.VALUE_TYPE IN ('PL', 'PLL', 'CPLL', 'PL@',

'PLL@', 'CPLL@', 'TL', 'TLL', 'CTLL', 'LIT')) THEN (CASE WHEN (V4.LANGUAGE_TYPE IS NOT NULL)

THEN 5 ……..

Page 6: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database

6

Oracle RDF Filters and Order-By Processing – literal type - numeric: TO_NUMBER( ) – literal type - date/time: TO_TIMESTAMP_TZ ( ), DECODE( ) – use function calls to generate SQL for order-by – case statements executed for every row at runtime – same problem for filters

• Solution – materialize value type and values in RDF_VALUE$ table – stored as ORDER_TYPE, ORDER_NUM, ORDER_DATE – filled in at load time – generate SQL: ORDER BY order_type, order_num,

order_date, value_name – filter clause: WHERE order_num < to_number(89)

Page 7: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database

7

Oracle RDF Order-By and Filter Performance using BSBM Benchmark Queries (in secs)

0

10

20

30

40

50

60

70

80

90

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 BI1 BI2 BI3 BI4 BI5 BI6 BI7 BI8

Without Order Columns With Order Columns

Page 8: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database

8

Oracle RDF In-Memory Processing

• Utilize Oracle IMC – load frequently accessed columns in memory

• RDF_LINK$ table: subject, predicate, object IDs • RDF_VALUE$: id, value

– fast full scan of the table: good for hash join

• Experiment – 32GB memory, 2TB disk space – LUBM benchmark queries (8,763,829 rows including

entailment) – varying the size of the memory: 6G(100%), 4G(56%),

2G(27%), 1G(12%)

Page 9: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database

9

Oracle RDF In-Memory Query Times (in sec) for LUBM Benchmark Queries • 100% : 4x – 6x gain

• 56%

0

10

20

30

40

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14

No IM

IM (100%)

0

10

20

30

40

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14

No IM

IM (56%)

Page 10: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database

10

Oracle RDF In-Memory Query Times (in sec) for LUBM Benchmark Queries • 27%

• 12%

0

10

20

30

40

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14

No IM

IM (27%)

0

10

20

30

40

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14

No IM

IM (12%)

Page 11: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database

11

Oracle RDF In-Memory Full Scan Performance (in sec)

• Fetching 3 IDs from RDF_LINK$ table • 100% - 190x gain

0

0.5

1

1.5

2

2.5

IM (100%) IM (56%) IM (27%) IM (12%)

No IM

IM

Page 12: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database

12

Oracle RDF In-Memory Virtual Columns • In-memory complete de-normalization without

incurring disk storage requirements – define virtual columns in RDF_LINK$ table for values, types,

etc. : VALUE_NAME_S, VALUE_NAME_P, VALUE_NAME_O, etc.

– useful for fully populated data in memory: virtual model Virtual column in-memory performance (in min) –fetching 3 IDs & 3 VCs

0

5

10

15

20

25

IM (100%) IM (56%) IM (27%) IM (12%)

No IM

IM-No VC

IM-VC

Page 13: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database

13

Oracle RDF In-Memory Virtual Columns

– remove joins with RDF_VALUE$ table – queries are processed on RDF_LINK$ table only – compression, smart scans (in-memory storage index),

dictionary code for values, SIMD vector processing

Confidential – Oracle Restricted

Page 14: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database

14

Oracle RDF In-memory Virtual Column Performance using LUBM Benchmark Queries (in secs) • Up to 8x gain

• As the number of joins increases, a bigger gain is achievable

0

10

20

30

40

50

60

70

80

90

100

Q9 (3 joins) Q2 (2 joins) Q6 (2 joins) Q13 (1 join) Q14 (1 join)

No IMVC IMVC

Page 15: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database

15

Oracle RDF In-Memory Virtual Columns

• Can apply to data mart/data warehousing star/ snowflake schema – remove joins with dimension tables

• Can apply to any applications where joined tables have one-to-one mapping on their join keys

Confidential – Oracle Restricted

Page 16: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database

16

Conclusion • Significant performance improvement

– use order columns in place of complex logic in the query for RDF filter and order-by processing

– improve hash joins by in-memory processing of frequently accessed columns

– remove costly joins using in-memory virtual columns by complete de-normalization for fully populated data

Page 17: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database

17 Confidential – Oracle Restricted

<Insert Picture Here>

Your Questions

Page 18: 8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF Query Performance in Oracle Database

18