core schema mappings
DESCRIPTION
Core Schema Mappings. Gianni Mecca* Paolo Papotti ° Salvatore Raunich* * Università della Basilicata, Italy °Università Roma Tre, Italy Sigmod - Providence - 2009, July 2nd. I just want to move data. Source. Target. Given a minimal abstract specification. IBDBook [0..*]. - PowerPoint PPT PresentationTRANSCRIPT
Core Schema Mappings
Core Schema Mappings
Gianni Mecca* Paolo Papotti° Salvatore Raunich*
* Università della Basilicata, Italy°Università Roma Tre, Italy
Sigmod - Providence - 2009, July 2nd
I just want to move data
1. Given a minimal abstract specification
Sigmod 2009 Core Schema Mappings 2
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]
name
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
IBLPublisher [0..*]idname
I just want to move data
1. Given a minimal abstract specification
Sigmod 2009 Core Schema Mappings 3
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
IBLPublisher [0..*]idname
I just want to move data
1. Given a minimal abstract specification2. With source semantics preserved in the
target instance
[Clio Vldb02,Vldb06,Vldb08][HepTox Vldb05]
[ADO.net Sigmod07][MapForce09, StylusStudio09]
Target: Booktitle pubId
The Hobbit NULL
The Da Vinci Code NULL
The Lord of the Rings NULL
The Lord of the Rings I1
The Catcher in the Rye I2
The Hobbit 245
The Catcher in the Rye 776
Sigmod 2009 Core Schema Mappings 4
IBDBooktitle
The Hobbit
The Da Vinci Code
The Lord of the Rings
IBLBook
title publisher
The Lord of the Rings Houghton
The Catcher in the Rye Lb Books
LOC
Title pubId
The Hobbit 245
The Catcher in the Rye 901
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
id name
245 Ballantine
901 Lb Books
IBLPublisher
IBLPublisher [0..*]idname
Target: Publisher
id name
I1 Houghton
I2 Lb Books
245 Ballantine
901 Lb Books
Mapping generation
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
• A first generation schema mapping tool generatesm1 ∀ t1: IBDBook(t1) → ∃N: Book(t1, N )
Sigmod 2009 Core Schema Mappings 5
IBLPublisher [0..*]idname
Mapping generation
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
• A first generation schema mapping tool generatesm1 ∀ t1: IBDBook(t1) → ∃N: Book(t1, N)
m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)
Sigmod 2009 Core Schema Mappings 6
IBLPublisher [0..*]idname
Mapping generation
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
• A first generation schema mapping tool generatesm1 ∀ t1: IBDBook(t1) → ∃N: Book(t1, N)
m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)
m3 IBLBook (t3, i) → Book(t3, i)
m4 IBLPublisher(i’, p’) → Publisher (i’, p’)
Sigmod 2009 Core Schema Mappings 7
IBLPublisher [0..*]idname
I just want to move data
1. Given a minimal abstract specification2. With source semantics preserved in the
target instance
Sigmod 2009 Core Schema Mappings 8
IBDBooktitle
The Hobbit
The Da Vinci Code
The Lord of the Rings
IBLBook
title publisher
The Lord of the Rings Houghton
The Catcher in the Rye Lb Books
LOC
Title pubId
The Hobbit 245
The Catcher in the Rye 901
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
id name
245 Ballantine
901 Lb Books
IBLPublisher
IBLPublisher [0..*]idname
Target: Booktitle pubId
The Hobbit NULL
The Da Vinci Code NULL
The Lord of the Rings NULL
The Lord of the Rings I1
The Catcher in the Rye I2
The Hobbit 245
The Catcher in the Rye 776
Target: Publisher
id name
I1 Houghton
I2 Lb Books
245 Ballantine
901 Lb Books
I just want to move data
1. Given a minimal abstract specification2. With source semantics preserved in the
target instance3. With no redundancy in the target
instance
Post processing step [TODS05, JACM08]
canonical
core
Sigmod 2009
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
IBLPublisher [0..*]idname
Target: Booktitle pubId
The Hobbit NULL
The Da Vinci Code NULL
The Lord of the Rings NULL
The Lord of the Rings I1
The Catcher in the Rye I2
The Hobbit 245
The Catcher in the Rye 776
title pubId
The Da Vinci Code NULL
The Lord of the Rings I1
The Hobbit 245
The Catcher in the Rye 776
id name
I1 Houghton
245 Ballantine
901 Lb Books
9
Target: Publisher
id name
I1 Houghton
I2 Lb Books
245 Ballantine
901 Lb Books
I just want to move data
1. Given a minimal abstract specification2. With source semantics preserved in the
target instance3. With no redundancy in the target
instance
Target: Booktitle pubId
The Hobbit NULL
The Da Vinci Code NULL
The Lord of the Rings NULL
The Lord of the Rings I1
The Catcher in the Rye I2
The Hobbit 245
The Catcher in the Rye 776
Post processing step [TODS05, JACM08]title pubId
The Da Vinci Code NULL
The Lord of the Rings I1
The Hobbit 245
The Catcher in the Rye 776
id name
I1 Houghton
245 Ballantine
901 Lb Books
Sigmod 2009 10
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
IBLPublisher [0..*]idname
Target: Publisher
id name
I1 Houghton
I2 Lb Books
245 Ballantine
901 Lb Books
title pubId
The Hobbit NULLThe Hobbit 245
title pubId
The Hobbit 245
null → 245
Theoretically, the problem is solved
Sigmod 2009
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
IBLPublisher [0..*]idname
11
Theoretically, the problem is solved
In practice, for a simple schema mapping with 5000 source tuples• canonical solution = 1 sec• core computation= 8 hours
Sigmod 2009
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
IBLPublisher [0..*]idname
12
What’s the problem?
Core computation
(on top of a relational db)
• Semantics of data exchange is defined using the chase [ICDT03]: chase(I,M)= canonical univ sol (I,M)• chase = SQL scripts
• speed• flexibility and reuse
• Post processing looks recursively for homomorphisms between instances• custom engine Canonical solution
computation
Sigmod 2009 Core Schema Mappings 13
What’s the solution?
Core solution computation
Sigmod 2009 Core Schema Mappings 14
What’s the solution?Core schema mappings1. Find homomorphisms between
formulas2. Correlate mappings to avoid
homomorphism between facts• negation • sophisticated skolemization for
nulls
Core solution computation
• Core computation as result of the (standard) chase• chase(I,M)= core(I,M)
• Enables a scalable solution
Sigmod 2009 Core Schema Mappings 15
OutlineOutline• How difficult is my mapping?
– Subsumptions– Coverages– Self-joins
• How does it scale?– Complexity– SQL experiments– Modularity
Sigmod 2009 Core Schema Mappings 16
Subsumptions
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
Sigmod 2009 Core Schema Mappings 17
IBDBooktitle
The Hobbit
The Da Vinci Code
The Lord of the Rings
IBLBook
Title pubId
The Hobbit 245
The Catcher in the Rye 901
m1 IBDBook(t1) → Book(t1, N)
m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)
m3 IBLBook (t3, i) → Book(t3, i)
Target: Booktitle pubId
The Hobbit N1
The Da Vinci Code N2
The Lord of the Rings N3
The Hobbit 245
The Catcher in the Rye 776
N1 to 245?
Subsumptions
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
Sigmod 2009 Core Schema Mappings 18
IBDBooktitle
The Hobbit
The Da Vinci Code
The Lord of the Rings
IBLBook
Title pubId
The Hobbit 245
The Catcher in the Rye 901
m1 IBDBook(t1) → Book(t1, N)
m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)
m3 IBLBook (t3, i) → Book(t3, i)
Subsumptions
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
Sigmod 2009 Core Schema Mappings 19
IBDBooktitle
The Hobbit
The Da Vinci Code
The Lord of the Rings
IBLBook
Title pubId
The Hobbit 245
The Catcher in the Rye 901
m1 IBDBook(t1) → Book(t1, N)
m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)
m3 IBLBook (t3, i) → Book(t3, i)
Target: Booktitle pubId
The Hobbit 245
The Catcher in the Rye 776
Subsumptions
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
Sigmod 2009 Core Schema Mappings 20
IBDBooktitle
The Hobbit
The Da Vinci Code
The Lord of the Rings
IBLBook
Title pubId
The Hobbit 245
The Catcher in the Rye 901
m1 IBDBook(t1) → Book(t1, N)
m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)
m3 IBLBook (t3, i) → Book(t3, i)
Target: Booktitle pubId
The Hobbit N1
The Da Vinci Code N2
The Lord of the Rings N3
The Hobbit 245
The Catcher in the Rye 776
N1 to 245?
Subsumptions
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
• Our algorithm identifies homomorphisms in the target side of the tgds
m1 IBDBook(t1) → Book(t1, N)
m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)
m3 IBLBook (t3, i) → Book(t3, i)
m3 subsumes m1 1. generate target tuples for m3, the “more
informative” mapping2. for m1 generate only those tuples that
add new content to the target
homomorphism:t1 → t3, N → i
Sigmod 2009 Core Schema Mappings 21
Subsumptions
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
m1 IBDBook(t1) → Book(t1, N)
m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)
m3 IBLBook (t3, i) → Book(t3, i)
m3 subsumes m1 1. generate target tuples for m3, the “more
informative” mapping2. for m1 generate only those tuples that
add new content to the target
m’1 : IBDBook(t1) ∧ ¬(IBLBook(t1, i)) → Book(t1, N)
Sigmod 2009 Core Schema Mappings 22
Subsumptions
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
m1 IBDBook(t1) → Book(t1, N)
m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)
m3 IBLBook (t3, i) → Book(t3, i)
also m2 subsumes m1
m’1 : IBDBook(t1) ∧ ¬(IBLBook(t1, i)) ¬(LOC(∧ t1, p)) → Book(t1, N)
homomorphism:t1 → t2, N → N’
Sigmod 2009 Core Schema Mappings 23
Coverages
IBDBooktitle
The Hobbit
The Da Vinci Code
The Lord of the Rings
IBLBook
title publisher
The Lord of the Rings Houghton
The Catcher in the Rye Lb Books
LOC
Title pubId
The Hobbit 245
The Catcher in the Rye 901
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
id name
245 Ballantine
901 Lb Books
IBLPublisher
IBLPublisher [0..*]idname
m’1 IBDBook(t1) ¬(IBLBook(∧ t1, i)) ∧ ¬ (LOC(t1, p)) → Book(t1, N)
m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)
m3 IBLBook (t3, i) → Book(t3, i)
m4 IBLPublisher(i’, p’) → Publisher (i’, p’)
Sigmod 2009 Core Schema Mappings 24
Coverages
IBDBooktitle
The Hobbit
The Da Vinci Code
The Lord of the Rings
IBLBook
Target: Booktitle pubId
The Da Vinci Code NULL
The Lord of the Rings I1
The Catcher in the Rye I2
The Hobbit 245
The Catcher in the Rye 901
title publisher
The Lord of the Rings Houghton
The Catcher in the Rye Lb Books
LOC
Target: Publisher
id name
I1 Houghton
I2 Lb Books
245 Ballantine
901 Lb Books Title pubId
The Hobbit 245
The Catcher in the Rye 901
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
id name
245 Ballantine
901 Lb Books
IBLPublisher
IBLPublisher [0..*]idname
m’1 IBDBook(t1) ¬(IBLBook(∧ t1, i)) ∧ ¬ (LOC(t1, p)) → Book(t1, N)
m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)
m3 IBLBook (t3, i) → Book(t3, i)
m4 IBLPublisher(i’, p’) → Publisher (i’, p’)
Sigmod 2009 Core Schema Mappings 25
Coverages
IBDBooktitle
The Hobbit
The Da Vinci Code
The Lord of the Rings
IBLBook
Target: Booktitle pubId
The Da Vinci Code NULL
The Lord of the Rings I1
The Catcher in the Rye I2
The Hobbit 245
The Catcher in the Rye 901
title publisher
The Lord of the Rings Houghton
The Catcher in the Rye Lb Books
LOC
Target: Publisher
id name
I1 Houghton
I2 Lb Books
245 Ballantine
901 Lb Books Title pubId
The Hobbit 245
The Catcher in the Rye 901
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
id name
245 Ballantine
901 Lb Books
IBLPublisher
IBLPublisher [0..*]idname
m’1 IBDBook(t1) ¬(IBLBook(∧ t1, i)) ∧ ¬ (LOC(t1, p)) → Book(t1, N)
m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)
m3 IBLBook (t3, i) → Book(t3, i)
m4 IBLPublisher(i’, p’) → Publisher (i’, p’)
Sigmod 2009 Core Schema Mappings 26
Coverages
IBDBooktitle
The Hobbit
The Da Vinci Code
The Lord of the Rings
IBLBook
Target: Booktitle pubId
The Da Vinci Code NULL
The Lord of the Rings I1
The Hobbit 245
The Catcher in the Rye 901
title publisher
The Lord of the Rings Houghton
The Catcher in the Rye Lb Books
LOC
Target: Publisher
id name
I1 Houghton
245 Ballantine
901 Lb Books Title pubId
The Hobbit 245
The Catcher in the Rye 901
title
Source
pubId
IBLBook [0..*]
Publisher [0..*]idname
Target
Book [0..*]titlepubId
IBDBook [0..*]title
LOC [0..*]titlepublisher
id name
245 Ballantine
901 Lb Books
IBLPublisher
IBLPublisher [0..*]idname
m’1 IBDBook(t1) ¬(IBLBook(∧ t1, i)) ∧ ¬ (LOC(t1, p)) → Book(t1, N)
m’2 LOC(t2, p) ∧
¬(IBLBook (t2, i) ∧ IBLPublisher(i, p))
→ Book(t2, N’) Publisher (N’, p)
m3 IBLBook (t3, i) → Book(t3, i)
m4 IBLPublisher(i’, p’) → Publisher (i’, p’)
Sigmod 2009 Core Schema Mappings 27
more differences, new joins
Self-joinsSelf-joins
• techniques discussed so far are of little help: the chase will either generate three tuples or none
• duplicate symbols: many different ways of satisfying these constraints
m1 R(a, b, c, d) → S (x5, b, x1, x2, a) ∧ S (x5, c, x3, x4, a) ∧ S (d, c, x3, x4, b) m2 R(a, b, c, d) → S (d, a, a, x1, b) ∧ S (x, a, a, x, a) ∧ S (x, c, x, x, x)
Example from [TODS05]
Sigmod 2009 Core Schema Mappings 28
Self-joins: two-phaseSelf-joins: two-phase
m’1 R(a, b, c, d) → S1 (x5 , b, x1, x2, a) ∧ S2 (x5, c, x3, x4, a) ∧ S3 (d, c, x3, x4, b) m′2 R(e, f, g, h) → S4 (h, e, e, y1, f ) ∧ S5 (y5, e, e, y1, e) ∧ S6 (y5, g, y2, y3, y4)
RS2
S1 S
first exchange second exchange
S1 (x5 , b, x1, x2, a) … → S (…)S2 (x5, c, x3, x4, a) … → S (…) S (…)S3 (d, c, x3, x4, b) … → S (…)S4 (h, e, e, y1, f ) … → S (…)S5 (y5, e, e, y1, e) … → S (…)S6 (y5, g, y2, y3, y4) … → S (…)…
Expansions
Sigmod 2009 Core Schema Mappings 29
1. we rewrite the tgds using distinct symbols and variables and do a first exchange
2. second exchange considers all possible combinations and copy the data in the original relation avoiding redundancy
Self-joinsSelf-joins
Sigmod 2009 Core Schema Mappings 30
R(n, n, n, k)
Base view
S1(x5, b, x1, x2, a) ∧ S2(x5, c, x3, x4, a) ∧ S3(d, c, x3, x4, b)
S(N5, n, N1, N2, n),S(N5, n, N3, N4, n), S(k, n, N3, N4, n)
m’1 R(a, b, c, d) → S1 (x5, b, x1, x2, a) ∧ S2 (x5, c, x3, x4, a) ∧ S3 (d, c, x3, x4, b) m′2 R(e, f, g, h) → S4 (h, e, e, y1, f ) ∧ S5 (y5, e, e, y1, e) ∧ S6 (y5, g, y2, y3, y4)
Self-joins: expansionsSelf-joins: expansions
m’1 R(a, b, c, d) → S1 (x5, b, x1, x2, a) ∧ S2 (x5, c, x3, x4, a) ∧ S3 (d, c, x3, x4, b) m′2 R(e, f, g, h) → S4 (h, e, e, y1, f ) ∧ S5 (y5, e, e, y1, e) ∧ S6 (y5, g, y2, y3, y4)
Sigmod 2009 Core Schema Mappings 31
R(n, n, n, k)
Base view Expansion 1
S1(x5, b, x1, x2, a) ∧ S2(x5, c, x3, x4, a) ∧ S3(d, c, x3, x4, b)
S2(x5, c, x3, x4, a) ∧S3(d, c, x3, x4, b) ∧ (S1(x5, b, x1, x2, a) ∧ b = c)
S(N5, n, N1, N2, n),S(N5, n, N3, N4, n), S(k, n, N3, N4, n)
S(N5, n, N1, N2, n),S(k, n, N3, N4, n)
Self-joins: expansionsSelf-joins: expansions
m’1 R(a, b, c, d) → S1 (x5, b, x1, x2, a) ∧ S2 (x5, c, x3, x4, a) ∧ S3 (d, c, x3, x4, b) m′2 R(e, f, g, h) → S4 (h, e, e, y1, f ) ∧ S5 (y5, e, e, y1, e) ∧ S6 (y5, g, y2, y3, y4)
Sigmod 2009 Core Schema Mappings 32
R(n, n, n, k)
Base view Expansion 1 Expansion 2
S1(x5, b, x1, x2, a) ∧ S2(x5, c, x3, x4, a) ∧ S3(d, c, x3, x4, b)
S2(x5, c, x3, x4, a) ∧S3(d, c, x3, x4, b) ∧ (S1(x5, b, x1, x2, a) ∧ b = c)
S4(h, e, e, y1, f) ∧ S4(h’, e’, e’, y1, f’) ∧ h = h’ ∧ (S1(x5, b, x1, x2, a) ∧ S2(x5, c, x3, x4, a) ∧ S3(d, c, x3, x4, b) ∧ e=b ∧ f=a ∧ e′=c ∧ f′=a ∧ h′=d ∧ f′=b)
S(N5, n, N1, N2, n),S(N5, n, N3, N4, n), S(k, n, N3, N4, n)
S(N5, n, N1, N2, n),S(k, n, N3, N4, n)
S(k, n, n, N1, n)
Self-joins: two phaseSelf-joins: two phase
• use expansions as premises of the second exchange
• rewrite these new tgds using subsumptions to avoid redundancy – favor more compact and more informative
m1 R(a, b, c, d) → S1 (x5 , b, x1, x2, a) ∧ S2 (x5, c, x3, x4, a) ∧ S3 (d, c, x3, x4, b) m′2 R(e, f, g, h) → S4 (h, e, e, y1, f ) ∧ S5 (y5, e, e, y1, e) ∧ S6 (y5, g, y2, y3, y4)
RS2
S1 S
first exchange second exchange
e12 S2(x5, c, x3, x4, a) ∧ S3(d, c, x3, x4, b) ∧ (S1(x5, b, x1, x2, a) b = c) ∧ → S (…) S (…) e13 … → S (…) …
Expansions
Sigmod 2009 Core Schema Mappings 33
OutlineOutline• How difficult is my mapping?
– Subsumptions– Coverages– Self-joins
• How does it scale?– Complexity– SQL experiments– Modularity
Sigmod 2009 Core Schema Mappings 34
ComplexityComplexity
n: number of tgdsd: maximum number of different tgds that write into a tablek: maximum number of atoms in a tgd conclusion
expression complexity (not data complexity!)
Scenario Complexity Frequency
Subsumptions
O(n2) Very high
Coverages O(dk) Low
Self-joins O(2n) Very low
Real cases O(n)
In realistic cases are lower
Sigmod 2009 Core Schema Mappings 35
Experiments settingsExperiments settings
• Algorithms implemented in +Spicy• Scripts in SQL (and XQuery)• PostgreSQL 8.3 on a Intel CoreDuo 2.4Ghz/4GB
Ram/Linux
• Scenarios from the literature, mostly from STBenchmark [Vldb08 – www.stbenchmark.org]
• Each SQL test run with 10k, 100k, 250k, 500k, and 1M tuples in the source instance
• Time limit = 1 hour– custom engine exceeded the time limit in all scenarios
Sigmod 2009 Core Schema Mappings 36
Experiments resultsExperiments results
10K 100K 250K 500K 1M
0
500
1000
1500
2000
2500
3000
3500
SJ1SJ2SJ3
10K 100K 250K 500K 1M
0
20
40
60
80
100
120
S1S2C1C2
#tuples in the source
Tim
es (
sec)
Tim
es (
sec)
#tuples in the source
Subsumption and coverages Self joins
Sigmod 2009 Core Schema Mappings 37
Experiments resultsExperiments results
10K 100K 250K 500K 1M
0
500
1000
1500
2000
2500
3000
3500
SJ1SJ2SJ3
10K 100K 250K 500K 1M
0
20
40
60
80
100
120
S1S2C1C2
#tuples in the source
Tim
es (
sec)
Tim
es (
sec)
#tuples in the source
Subsumption and coverages Self joins
Scalability experiments with up to 100 tables (82 tgds, 51 subsuptions, 12 coverages): rewriting algorithm ran in 6 secs
Sigmod 2009 Core Schema Mappings 38
25 tables 50 tables 75 tables 100 tables
0
10
20
30
40
50
60
70
80
90
100
0
2
4
6
8
10
12
14
21
39
55
82
26
36
4551
28
12
# of TGDS # of Subsumptions # of CoveragesScript Gen Time (s) Execution Time (s)
Tim
es
(se
c)
Modular solutionModular solution
canonical solution
subsumption-free solution
coverage-free solution
core (self-join-coverage-free solution)
Algorithms allow to produce approximations of the core for expensive computations: “reduced” rewriting wrt to a subset of homomorphisms
Sigmod 2009 Core Schema Mappings 39
Repeatability & Workability EvaluationRepeatability & Workability Evaluation
We participated in the ACM SIGMOD 2009 Repeatability & WorkabilityEvaluation (cf., http://homepages.cwi.nl/~manegold/SIGMOD-2009-
RWE/)
The reviewers were able to repeat all the experiments presented in our
paper, yielding results that in most cases match the ones published in our paper, except from insignificant and to be expected variation due to randomness and/or hardware/software differences.
In addition, workability experiments confirmed - with few exceptions -
the soundness of our results beyond the parameter setting and/or data sets presented in our paper.
The detailed reports will shortly be made publicly available by ACM SIGMOD.
Sigmod 2009 Core Schema Mappings 40
Bridging the gapBridging the gap +Spicy is the first mapping tool which generates
core solutions efficiently• Standard solution for mapping generation• Novel algorithms for (natural) mapping rewriting• Execution times orders of magnitude faster than post
processing Two main results
• Bridge the gap between the practice of mapping generation [Popa et al. Vldb02] and the theory of data exchange [Fagin et al. TODS05, Gottlob&Nash JACM08]
• Enable schema mappings for more practical applications
Sigmod 2009 Core Schema Mappings 41
Thank youThank you
see you in Lyon for the demo:Mecca, Papotti, Raunich, Buoncristiano
“Concise and Expressive Mappings with +Spicy”VLDB 2009
Sigmod 2009 Core Schema Mappings 42