icde 2009 - perm: processing provenance and data on the same data model through query rewriting
DESCRIPTION
Data provenance is information that describes how a given data item was produced. The provenance includes source and intermediate data as well as the transformations involved in producing the concrete data item. In the context of a relational databases, the source and intermediate data items are relations, tuples and attribute values. The transformations are SQL queries and/or functions on the relational data items. Existing approaches capture provenance information by extending the underlying data model. This has the intrinsic disadvantage that the provenance must be stored and accessed using a different model than the actual data. In this paper, we present an alternative approach that uses query rewriting to annotate result tuples with provenance information. The rewritten query and its result use the same model and can, thus, be queried, stored and optimized using standard relational database techniques. In the paper we formalize the query rewriting procedures, prove their correctness, and evaluate a first implementation of the ideas using PostgreSQL. As the experiments indicate, our approach efficiently provides provenance information inducing only a small overhead on normal operations.TRANSCRIPT
![Page 1: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/1.jpg)
Perm Processing Provenance and Data on the
Same Data Model through Query Rewriting
Boris Glavic
Database Technology Group
Department of Informatics University of Zurich
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Gustavo Alonso
Systems GroupDepartment of Computer
Science ETH Zurich
![Page 2: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/2.jpg)
2
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Overview
1. Introduction to Perm2. The Perm Provenance
Representation3. Query Rewriting for Provenance
Computation4. Perm Implementation5. Experimental Results6. Conclusion
![Page 3: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/3.jpg)
3
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
1. Introduction
Query Transformation
Data items: Result relation
Data items: Base relations
Relational Provenance
![Page 4: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/4.jpg)
4
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
1. Introduction
Query
Which input data item(s) influenced which output data item(s)? Granularity
Tuple Attribute Value ...
Contribution semantics Influence (Why) Copy (Where) ...
![Page 5: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/5.jpg)
5
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
The problem of computing this type of provenance has been solved before See e.g. [Cui, Widom ICDE ‘00]
but... Non-relational representation of
provenance data Separation of provenance and “normal”
data Non-relational computation of
provenance data
1. Introduction
![Page 6: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/6.jpg)
6
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
1. Introduction
Perm Provenance Extension of the Relational
Model Provenance Management System
“Pure” Relational representation of provenance
Query result tuples and provenance tuples are represented as a single relation
![Page 7: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/7.jpg)
7
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
1. Introduction
Benefits: Provenance can be... ... Stored in standard DBMS ... Queried using SQL ... Directly interpreted by a user Direct association between provenance
and “normal data”
![Page 8: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/8.jpg)
8
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
1. Introduction
Provenance Computation -> Use query rewrite
Given query q Generate query q+
Computes the provenance of all result tuples from q
![Page 9: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/9.jpg)
9
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
1. Introduction
Benefits: Rewritten query is expressed in
relational algebra Can be optimized and executed by a R-
DBMS E.g. can be stored as a view Used as a subquery
![Page 10: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/10.jpg)
10
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Overview
1. Introduction to Perm2. The Perm Provenance
Representation3. Query Rewriting for Provenance
Computation4. Perm Implementation5. Results6. Conclusion
![Page 11: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/11.jpg)
11
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
2. The Perm Approach
sName
itemID
Migros 1
Migros 2
Migros 2
Coop 3
Coop 3
id price
1 100
2 10
3 25
itemssales
![Page 12: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/12.jpg)
12
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
2. The Perm Approach
Compute the sum of sales for each shop
SELECT sName, sum(price) FROM sales, items WHERE itemId = id GROUP BY sName;
![Page 13: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/13.jpg)
13
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
2. The Perm Approach
sName
itemID
Migros 1
Migros 2
Migros 2
Coop 3
Coop 3
id price
1 100
2 10
3 25
itemssales
name Sum(price)
Migros 120
Coop 50
result
![Page 14: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/14.jpg)
14
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
2. The Perm Approach
sName
itemID
Migros 1
Migros 2
Migros 2
Coop 3
Coop 3
id price
1 100
2 10
3 25
itemssales
name Sum(price)
Migros 120
Coop 50
result
![Page 15: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/15.jpg)
15
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
2. The Perm Approach
sName
itemID
Migros 1
Migros 2
Migros 2
Coop 3
Coop 3
id price
1 100
2 10
3 25
itemssales
name Sum(price)
Migros 120
Coop 50
result
![Page 16: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/16.jpg)
16
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
2. The Perm Approach
Desired result format:
OriginalAttributes
Relation 1 Attributes
Relation n Attributes
![Page 17: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/17.jpg)
17
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
2. The Perm Approach
name sum(price) P(sName)
P(itemId)
P(id) P(price)
Migros
120 Migros 1 1 100
Migros
120 Migros 2 2 10
Migros
120 Migros 2 2 10
Coop 10 Coop 3 3 25
Coop 10 Coop 3 3 25
Original result sales items
![Page 18: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/18.jpg)
18
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Overview
1. Introduction to Perm2. The Perm Provenance
Representation3. Query Rewriting for Provenance
Computation4. Perm Implementation5. Results6. Conclusion
![Page 19: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/19.jpg)
19
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite method basics Use algebra representation of the query Replace every algebra operator with an
algebra statement that propagates provenance alongside with the original results
-> need a rewrite rule for each relational algebra operator
![Page 20: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/20.jpg)
20
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite process
op3
op1
op2
![Page 21: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/21.jpg)
21
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite process
op3
op1
op2 op3
op1b
op2
op1a
op1c
Apply Rewrite rule
![Page 22: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/22.jpg)
22
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite process
op3
op1b
op2
op1a
op1cApply Rewrite rules
![Page 23: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/23.jpg)
23
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite rules notations:
Rewritten statement (query)
Provenance attributes
€
T +
€
P(T + )
![Page 24: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/24.jpg)
24
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite rules example:SELECT agg, GFROM TGROUP BY G
SELECT agg, G, P(T)FROM
(SELECT agg, G FROM T GROUP BY G) AS aggLEFT OUTER JOIN(SELECT G AS G’, P(T) FROM T ) AS provON (G = G’)
+
![Page 25: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/25.jpg)
25
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite rules example:SELECT sum(revenue) AS sum, shopFROM salesGROUP BY shop
shop month revenue
Migros Jan 100
Migros Feb 10
Migros Mar 10
Coop Jan 25
Coop Feb 25
salessum shop
120 Migros
50 Coop
result
![Page 26: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/26.jpg)
26
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
SELECT sum, shop, pShop, pMonth, pRevenueFROM
(SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop) AS aggLEFT OUTER JOIN(SELECT shop AS shop’, pShop, pMonth, pRevenue FROM sales ) AS provON (shop = shop’)
sum shop pShop pMonth pRevenue
120 Migros Migros Jan 100
120 Migros Migros Feb 10
120 Migros Migros Mar 10
50 Coop Coop Jan 25
50 Coop Coop Feb 25
+
![Page 27: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/27.jpg)
27
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
SELECT sum, shop, pShop, pMonth, pRevenueFROM
(SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop) AS aggLEFT OUTER JOIN(SELECT shop AS shop’, pShop, pMonth, pRevenue FROM sales ) AS provON (shop = shop’)
3. Query Rewriting for Provenance Computation
sum shop pShop pMonth pRevenue
120 Migros Migros Jan 100
120 Migros Migros Feb 10
120 Migros Migros Mar 10
50 Coop Coop Jan 25
50 Coop Coop Feb 25
+
![Page 28: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/28.jpg)
28
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
SELECT sum, shop, pShop, pMonth, pRevenueFROM
(SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop) AS aggLEFT OUTER JOIN(SELECT shop AS shop’, pShop, pMonth, pRevenue FROM sales ) AS provON (shop = shop’)
3. Query Rewriting for Provenance Computation
sum shop pShop pMonth pRevenue
120 Migros Migros Jan 100
120 Migros Migros Feb 10
120 Migros Migros Mar 10
50 Coop Coop Jan 25
50 Coop Coop Feb 25
+
![Page 29: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/29.jpg)
29
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Overview
1. Introduction to Perm2. The Perm Provenance
Representation3. Query Rewriting for Provenance
Computation4. Perm Implementation5. Results6. Conclusion
![Page 30: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/30.jpg)
30
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
4. Perm Implementation
Extension of PostgreSQL DBMS Implemented inside of PostgreSQL
-> does not affect client applications Extended SQL language Perm module
Implements algebraic rewrite rules as query rewrites
![Page 31: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/31.jpg)
31
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
4. Perm Implementation
SQL-PLE: SQL extension SELECT PROVENANCE ...
Nice benefits: CREATE VIEW x AS SELECT
PROVENANCE ... SELECT PROVENANCE ... INTO x ... SELECT ... FROM (SELECT
PROVENANCE ...
![Page 32: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/32.jpg)
32
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
4. Perm Implementation
Perm Architecture
Parser & Analyser
Rewriter
Perm Module
Planner
Executor
SELECT PROVENANCE ....
Q =...
Q’+ =...
MergeJoin (...
Q’ =...
![Page 33: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/33.jpg)
33
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Overview
1. Introduction to Perm2. The Perm Provenance
Representation3. Query Rewriting for Provenance
Computation4. Perm Implementation5. Experimental Results6. Conclusion
![Page 34: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/34.jpg)
34
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
5. Experimental Results
TPC-H benchmark
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
![Page 35: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/35.jpg)
35
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Overview
1. Introduction to Perm2. The Perm Provenance
Representation3. Query Rewriting for Provenance
Computation4. Perm Implementation5. Experimental Results6. Conclusion
![Page 36: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/36.jpg)
36
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
6. Conclusion
Benefits Compute provenance for SQL Full SQL query power for provenance
data Lazy or eager computation Reuse existing database technology Supports external provenance
![Page 37: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/37.jpg)
37
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
6. Conclusion
Future work Physical operators for more efficient
provenance computation Storage compression Include transformation provenance Support different contribution semantics Support various granularities
![Page 38: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting](https://reader033.vdocuments.net/reader033/viewer/2022052412/55842487d8b42a785e8b4795/html5/thumbnails/38.jpg)
38
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Questions
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.