core schema mappings

42
Core Schema Mappings Gianni Mecca* Paolo Papotti ° Salvatore Raunich* * Università della Basilicata, Italy °Università Roma Tre, Italy Sigmod - Providence - 2009, July 2nd

Upload: dalton

Post on 09-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Core Schema Mappings. Gianni Mecca* Paolo Papotti ° Salvatore Raunich* * Università della Basilicata, Italy °Università Roma Tre, Italy Sigmod - Providence - 2009, July 2nd. I just want to move data. Source. Target. Given a minimal abstract specification. IBDBook [0..*]. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Core Schema Mappings

Core Schema Mappings

Core Schema Mappings

Gianni Mecca* Paolo Papotti° Salvatore Raunich*

* Università della Basilicata, Italy°Università Roma Tre, Italy

Sigmod - Providence - 2009, July 2nd

Page 2: Core Schema Mappings

I just want to move data

1. Given a minimal abstract specification

Sigmod 2009 Core Schema Mappings 2

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]

name

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

IBLPublisher [0..*]idname

Page 3: Core Schema Mappings

I just want to move data

1. Given a minimal abstract specification

Sigmod 2009 Core Schema Mappings 3

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

IBLPublisher [0..*]idname

Page 4: Core Schema Mappings

I just want to move data

1. Given a minimal abstract specification2. With source semantics preserved in the

target instance

[Clio Vldb02,Vldb06,Vldb08][HepTox Vldb05]

[ADO.net Sigmod07][MapForce09, StylusStudio09]

Target: Booktitle pubId

The Hobbit NULL

The Da Vinci Code NULL

The Lord of the Rings NULL

The Lord of the Rings I1

The Catcher in the Rye I2

The Hobbit 245

The Catcher in the Rye 776

Sigmod 2009 Core Schema Mappings 4

IBDBooktitle

The Hobbit

The Da Vinci Code

The Lord of the Rings

IBLBook

title publisher

The Lord of the Rings Houghton

The Catcher in the Rye Lb Books

LOC

Title pubId

The Hobbit 245

The Catcher in the Rye 901

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

id name

245 Ballantine

901 Lb Books

IBLPublisher

IBLPublisher [0..*]idname

Target: Publisher

id name

I1 Houghton

I2 Lb Books

245 Ballantine

901 Lb Books

Page 5: Core Schema Mappings

Mapping generation

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

• A first generation schema mapping tool generatesm1 ∀ t1: IBDBook(t1) → ∃N: Book(t1, N )

Sigmod 2009 Core Schema Mappings 5

IBLPublisher [0..*]idname

Page 6: Core Schema Mappings

Mapping generation

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

• A first generation schema mapping tool generatesm1 ∀ t1: IBDBook(t1) → ∃N: Book(t1, N)

m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)

Sigmod 2009 Core Schema Mappings 6

IBLPublisher [0..*]idname

Page 7: Core Schema Mappings

Mapping generation

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

• A first generation schema mapping tool generatesm1 ∀ t1: IBDBook(t1) → ∃N: Book(t1, N)

m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)

m3 IBLBook (t3, i) → Book(t3, i)

m4 IBLPublisher(i’, p’) → Publisher (i’, p’)

Sigmod 2009 Core Schema Mappings 7

IBLPublisher [0..*]idname

Page 8: Core Schema Mappings

I just want to move data

1. Given a minimal abstract specification2. With source semantics preserved in the

target instance

Sigmod 2009 Core Schema Mappings 8

IBDBooktitle

The Hobbit

The Da Vinci Code

The Lord of the Rings

IBLBook

title publisher

The Lord of the Rings Houghton

The Catcher in the Rye Lb Books

LOC

Title pubId

The Hobbit 245

The Catcher in the Rye 901

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

id name

245 Ballantine

901 Lb Books

IBLPublisher

IBLPublisher [0..*]idname

Target: Booktitle pubId

The Hobbit NULL

The Da Vinci Code NULL

The Lord of the Rings NULL

The Lord of the Rings I1

The Catcher in the Rye I2

The Hobbit 245

The Catcher in the Rye 776

Target: Publisher

id name

I1 Houghton

I2 Lb Books

245 Ballantine

901 Lb Books

Page 9: Core Schema Mappings

I just want to move data

1. Given a minimal abstract specification2. With source semantics preserved in the

target instance3. With no redundancy in the target

instance

Post processing step [TODS05, JACM08]

canonical

core

Sigmod 2009

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

IBLPublisher [0..*]idname

Target: Booktitle pubId

The Hobbit NULL

The Da Vinci Code NULL

The Lord of the Rings NULL

The Lord of the Rings I1

The Catcher in the Rye I2

The Hobbit 245

The Catcher in the Rye 776

title pubId

The Da Vinci Code NULL

The Lord of the Rings I1

The Hobbit 245

The Catcher in the Rye 776

id name

I1 Houghton

245 Ballantine

901 Lb Books

9

Target: Publisher

id name

I1 Houghton

I2 Lb Books

245 Ballantine

901 Lb Books

Page 10: Core Schema Mappings

I just want to move data

1. Given a minimal abstract specification2. With source semantics preserved in the

target instance3. With no redundancy in the target

instance

Target: Booktitle pubId

The Hobbit NULL

The Da Vinci Code NULL

The Lord of the Rings NULL

The Lord of the Rings I1

The Catcher in the Rye I2

The Hobbit 245

The Catcher in the Rye 776

Post processing step [TODS05, JACM08]title pubId

The Da Vinci Code NULL

The Lord of the Rings I1

The Hobbit 245

The Catcher in the Rye 776

id name

I1 Houghton

245 Ballantine

901 Lb Books

Sigmod 2009 10

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

IBLPublisher [0..*]idname

Target: Publisher

id name

I1 Houghton

I2 Lb Books

245 Ballantine

901 Lb Books

title pubId

The Hobbit NULLThe Hobbit 245

title pubId

The Hobbit 245

null → 245

Page 11: Core Schema Mappings

Theoretically, the problem is solved

Sigmod 2009

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

IBLPublisher [0..*]idname

11

Page 12: Core Schema Mappings

Theoretically, the problem is solved

In practice, for a simple schema mapping with 5000 source tuples• canonical solution = 1 sec• core computation= 8 hours

Sigmod 2009

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

IBLPublisher [0..*]idname

12

Page 13: Core Schema Mappings

What’s the problem?

Core computation

(on top of a relational db)

• Semantics of data exchange is defined using the chase [ICDT03]: chase(I,M)= canonical univ sol (I,M)• chase = SQL scripts

• speed• flexibility and reuse

• Post processing looks recursively for homomorphisms between instances• custom engine Canonical solution

computation

Sigmod 2009 Core Schema Mappings 13

Page 14: Core Schema Mappings

What’s the solution?

Core solution computation

Sigmod 2009 Core Schema Mappings 14

Page 15: Core Schema Mappings

What’s the solution?Core schema mappings1. Find homomorphisms between

formulas2. Correlate mappings to avoid

homomorphism between facts• negation • sophisticated skolemization for

nulls

Core solution computation

• Core computation as result of the (standard) chase• chase(I,M)= core(I,M)

• Enables a scalable solution

Sigmod 2009 Core Schema Mappings 15

Page 16: Core Schema Mappings

OutlineOutline• How difficult is my mapping?

– Subsumptions– Coverages– Self-joins

• How does it scale?– Complexity– SQL experiments– Modularity

Sigmod 2009 Core Schema Mappings 16

Page 17: Core Schema Mappings

Subsumptions

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

Sigmod 2009 Core Schema Mappings 17

IBDBooktitle

The Hobbit

The Da Vinci Code

The Lord of the Rings

IBLBook

Title pubId

The Hobbit 245

The Catcher in the Rye 901

m1 IBDBook(t1) → Book(t1, N)

m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)

m3 IBLBook (t3, i) → Book(t3, i)

Target: Booktitle pubId

The Hobbit N1

The Da Vinci Code N2

The Lord of the Rings N3

The Hobbit 245

The Catcher in the Rye 776

N1 to 245?

Page 18: Core Schema Mappings

Subsumptions

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

Sigmod 2009 Core Schema Mappings 18

IBDBooktitle

The Hobbit

The Da Vinci Code

The Lord of the Rings

IBLBook

Title pubId

The Hobbit 245

The Catcher in the Rye 901

m1 IBDBook(t1) → Book(t1, N)

m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)

m3 IBLBook (t3, i) → Book(t3, i)

Page 19: Core Schema Mappings

Subsumptions

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

Sigmod 2009 Core Schema Mappings 19

IBDBooktitle

The Hobbit

The Da Vinci Code

The Lord of the Rings

IBLBook

Title pubId

The Hobbit 245

The Catcher in the Rye 901

m1 IBDBook(t1) → Book(t1, N)

m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)

m3 IBLBook (t3, i) → Book(t3, i)

Target: Booktitle pubId

The Hobbit 245

The Catcher in the Rye 776

Page 20: Core Schema Mappings

Subsumptions

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

Sigmod 2009 Core Schema Mappings 20

IBDBooktitle

The Hobbit

The Da Vinci Code

The Lord of the Rings

IBLBook

Title pubId

The Hobbit 245

The Catcher in the Rye 901

m1 IBDBook(t1) → Book(t1, N)

m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)

m3 IBLBook (t3, i) → Book(t3, i)

Target: Booktitle pubId

The Hobbit N1

The Da Vinci Code N2

The Lord of the Rings N3

The Hobbit 245

The Catcher in the Rye 776

N1 to 245?

Page 21: Core Schema Mappings

Subsumptions

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

• Our algorithm identifies homomorphisms in the target side of the tgds

m1 IBDBook(t1) → Book(t1, N)

m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)

m3 IBLBook (t3, i) → Book(t3, i)

m3 subsumes m1 1. generate target tuples for m3, the “more

informative” mapping2. for m1 generate only those tuples that

add new content to the target

homomorphism:t1 → t3, N → i

Sigmod 2009 Core Schema Mappings 21

Page 22: Core Schema Mappings

Subsumptions

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

m1 IBDBook(t1) → Book(t1, N)

m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)

m3 IBLBook (t3, i) → Book(t3, i)

m3 subsumes m1 1. generate target tuples for m3, the “more

informative” mapping2. for m1 generate only those tuples that

add new content to the target

m’1 : IBDBook(t1) ∧ ¬(IBLBook(t1, i)) → Book(t1, N)

Sigmod 2009 Core Schema Mappings 22

Page 23: Core Schema Mappings

Subsumptions

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

m1 IBDBook(t1) → Book(t1, N)

m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)

m3 IBLBook (t3, i) → Book(t3, i)

also m2 subsumes m1

m’1 : IBDBook(t1) ∧ ¬(IBLBook(t1, i)) ¬(LOC(∧ t1, p)) → Book(t1, N)

homomorphism:t1 → t2, N → N’

Sigmod 2009 Core Schema Mappings 23

Page 24: Core Schema Mappings

Coverages

IBDBooktitle

The Hobbit

The Da Vinci Code

The Lord of the Rings

IBLBook

title publisher

The Lord of the Rings Houghton

The Catcher in the Rye Lb Books

LOC

Title pubId

The Hobbit 245

The Catcher in the Rye 901

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

id name

245 Ballantine

901 Lb Books

IBLPublisher

IBLPublisher [0..*]idname

m’1 IBDBook(t1) ¬(IBLBook(∧ t1, i)) ∧ ¬ (LOC(t1, p)) → Book(t1, N)

m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)

m3 IBLBook (t3, i) → Book(t3, i)

m4 IBLPublisher(i’, p’) → Publisher (i’, p’)

Sigmod 2009 Core Schema Mappings 24

Page 25: Core Schema Mappings

Coverages

IBDBooktitle

The Hobbit

The Da Vinci Code

The Lord of the Rings

IBLBook

Target: Booktitle pubId

The Da Vinci Code NULL

The Lord of the Rings I1

The Catcher in the Rye I2

The Hobbit 245

The Catcher in the Rye 901

title publisher

The Lord of the Rings Houghton

The Catcher in the Rye Lb Books

LOC

Target: Publisher

id name

I1 Houghton

I2 Lb Books

245 Ballantine

901 Lb Books Title pubId

The Hobbit 245

The Catcher in the Rye 901

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

id name

245 Ballantine

901 Lb Books

IBLPublisher

IBLPublisher [0..*]idname

m’1 IBDBook(t1) ¬(IBLBook(∧ t1, i)) ∧ ¬ (LOC(t1, p)) → Book(t1, N)

m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)

m3 IBLBook (t3, i) → Book(t3, i)

m4 IBLPublisher(i’, p’) → Publisher (i’, p’)

Sigmod 2009 Core Schema Mappings 25

Page 26: Core Schema Mappings

Coverages

IBDBooktitle

The Hobbit

The Da Vinci Code

The Lord of the Rings

IBLBook

Target: Booktitle pubId

The Da Vinci Code NULL

The Lord of the Rings I1

The Catcher in the Rye I2

The Hobbit 245

The Catcher in the Rye 901

title publisher

The Lord of the Rings Houghton

The Catcher in the Rye Lb Books

LOC

Target: Publisher

id name

I1 Houghton

I2 Lb Books

245 Ballantine

901 Lb Books Title pubId

The Hobbit 245

The Catcher in the Rye 901

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

id name

245 Ballantine

901 Lb Books

IBLPublisher

IBLPublisher [0..*]idname

m’1 IBDBook(t1) ¬(IBLBook(∧ t1, i)) ∧ ¬ (LOC(t1, p)) → Book(t1, N)

m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p)

m3 IBLBook (t3, i) → Book(t3, i)

m4 IBLPublisher(i’, p’) → Publisher (i’, p’)

Sigmod 2009 Core Schema Mappings 26

Page 27: Core Schema Mappings

Coverages

IBDBooktitle

The Hobbit

The Da Vinci Code

The Lord of the Rings

IBLBook

Target: Booktitle pubId

The Da Vinci Code NULL

The Lord of the Rings I1

The Hobbit 245

The Catcher in the Rye 901

title publisher

The Lord of the Rings Houghton

The Catcher in the Rye Lb Books

LOC

Target: Publisher

id name

I1 Houghton

245 Ballantine

901 Lb Books Title pubId

The Hobbit 245

The Catcher in the Rye 901

title

Source

pubId

IBLBook [0..*]

Publisher [0..*]idname

Target

Book [0..*]titlepubId

IBDBook [0..*]title

LOC [0..*]titlepublisher

id name

245 Ballantine

901 Lb Books

IBLPublisher

IBLPublisher [0..*]idname

m’1 IBDBook(t1) ¬(IBLBook(∧ t1, i)) ∧ ¬ (LOC(t1, p)) → Book(t1, N)

m’2 LOC(t2, p) ∧

¬(IBLBook (t2, i) ∧ IBLPublisher(i, p))

→ Book(t2, N’) Publisher (N’, p)

m3 IBLBook (t3, i) → Book(t3, i)

m4 IBLPublisher(i’, p’) → Publisher (i’, p’)

Sigmod 2009 Core Schema Mappings 27

more differences, new joins

Page 28: Core Schema Mappings

Self-joinsSelf-joins

• techniques discussed so far are of little help: the chase will either generate three tuples or none

• duplicate symbols: many different ways of satisfying these constraints

m1 R(a, b, c, d) → S (x5, b, x1, x2, a) ∧ S (x5, c, x3, x4, a) ∧ S (d, c, x3, x4, b) m2 R(a, b, c, d) → S (d, a, a, x1, b) ∧ S (x, a, a, x, a) ∧ S (x, c, x, x, x)

Example from [TODS05]

Sigmod 2009 Core Schema Mappings 28

Page 29: Core Schema Mappings

Self-joins: two-phaseSelf-joins: two-phase

m’1 R(a, b, c, d) → S1 (x5 , b, x1, x2, a) ∧ S2 (x5, c, x3, x4, a) ∧ S3 (d, c, x3, x4, b) m′2 R(e, f, g, h) → S4 (h, e, e, y1, f ) ∧ S5 (y5, e, e, y1, e) ∧ S6 (y5, g, y2, y3, y4)

RS2

S1 S

first exchange second exchange

S1 (x5 , b, x1, x2, a) … → S (…)S2 (x5, c, x3, x4, a) … → S (…) S (…)S3 (d, c, x3, x4, b) … → S (…)S4 (h, e, e, y1, f ) … → S (…)S5 (y5, e, e, y1, e) … → S (…)S6 (y5, g, y2, y3, y4) … → S (…)…

Expansions

Sigmod 2009 Core Schema Mappings 29

1. we rewrite the tgds using distinct symbols and variables and do a first exchange

2. second exchange considers all possible combinations and copy the data in the original relation avoiding redundancy

Page 30: Core Schema Mappings

Self-joinsSelf-joins

Sigmod 2009 Core Schema Mappings 30

R(n, n, n, k)

Base view

S1(x5, b, x1, x2, a) ∧ S2(x5, c, x3, x4, a) ∧ S3(d, c, x3, x4, b)

S(N5, n, N1, N2, n),S(N5, n, N3, N4, n), S(k, n, N3, N4, n)

m’1 R(a, b, c, d) → S1 (x5, b, x1, x2, a) ∧ S2 (x5, c, x3, x4, a) ∧ S3 (d, c, x3, x4, b) m′2 R(e, f, g, h) → S4 (h, e, e, y1, f ) ∧ S5 (y5, e, e, y1, e) ∧ S6 (y5, g, y2, y3, y4)

Page 31: Core Schema Mappings

Self-joins: expansionsSelf-joins: expansions

m’1 R(a, b, c, d) → S1 (x5, b, x1, x2, a) ∧ S2 (x5, c, x3, x4, a) ∧ S3 (d, c, x3, x4, b) m′2 R(e, f, g, h) → S4 (h, e, e, y1, f ) ∧ S5 (y5, e, e, y1, e) ∧ S6 (y5, g, y2, y3, y4)

Sigmod 2009 Core Schema Mappings 31

R(n, n, n, k)

Base view Expansion 1

S1(x5, b, x1, x2, a) ∧ S2(x5, c, x3, x4, a) ∧ S3(d, c, x3, x4, b)

S2(x5, c, x3, x4, a) ∧S3(d, c, x3, x4, b) ∧ (S1(x5, b, x1, x2, a) ∧ b = c)

S(N5, n, N1, N2, n),S(N5, n, N3, N4, n), S(k, n, N3, N4, n)

S(N5, n, N1, N2, n),S(k, n, N3, N4, n)

Page 32: Core Schema Mappings

Self-joins: expansionsSelf-joins: expansions

m’1 R(a, b, c, d) → S1 (x5, b, x1, x2, a) ∧ S2 (x5, c, x3, x4, a) ∧ S3 (d, c, x3, x4, b) m′2 R(e, f, g, h) → S4 (h, e, e, y1, f ) ∧ S5 (y5, e, e, y1, e) ∧ S6 (y5, g, y2, y3, y4)

Sigmod 2009 Core Schema Mappings 32

R(n, n, n, k)

Base view Expansion 1 Expansion 2

S1(x5, b, x1, x2, a) ∧ S2(x5, c, x3, x4, a) ∧ S3(d, c, x3, x4, b)

S2(x5, c, x3, x4, a) ∧S3(d, c, x3, x4, b) ∧ (S1(x5, b, x1, x2, a) ∧ b = c)

S4(h, e, e, y1, f) ∧ S4(h’, e’, e’, y1, f’) ∧ h = h’ ∧ (S1(x5, b, x1, x2, a) ∧ S2(x5, c, x3, x4, a) ∧ S3(d, c, x3, x4, b) ∧ e=b ∧ f=a ∧ e′=c ∧ f′=a ∧ h′=d ∧ f′=b)

S(N5, n, N1, N2, n),S(N5, n, N3, N4, n), S(k, n, N3, N4, n)

S(N5, n, N1, N2, n),S(k, n, N3, N4, n)

S(k, n, n, N1, n)

Page 33: Core Schema Mappings

Self-joins: two phaseSelf-joins: two phase

• use expansions as premises of the second exchange

• rewrite these new tgds using subsumptions to avoid redundancy – favor more compact and more informative

m1 R(a, b, c, d) → S1 (x5 , b, x1, x2, a) ∧ S2 (x5, c, x3, x4, a) ∧ S3 (d, c, x3, x4, b) m′2 R(e, f, g, h) → S4 (h, e, e, y1, f ) ∧ S5 (y5, e, e, y1, e) ∧ S6 (y5, g, y2, y3, y4)

RS2

S1 S

first exchange second exchange

e12 S2(x5, c, x3, x4, a) ∧ S3(d, c, x3, x4, b) ∧ (S1(x5, b, x1, x2, a) b = c) ∧ → S (…) S (…) e13 … → S (…) …

Expansions

Sigmod 2009 Core Schema Mappings 33

Page 34: Core Schema Mappings

OutlineOutline• How difficult is my mapping?

– Subsumptions– Coverages– Self-joins

• How does it scale?– Complexity– SQL experiments– Modularity

Sigmod 2009 Core Schema Mappings 34

Page 35: Core Schema Mappings

ComplexityComplexity

n: number of tgdsd: maximum number of different tgds that write into a tablek: maximum number of atoms in a tgd conclusion

expression complexity (not data complexity!)

Scenario Complexity Frequency

Subsumptions

O(n2) Very high

Coverages O(dk) Low

Self-joins O(2n) Very low

Real cases O(n)

In realistic cases are lower

Sigmod 2009 Core Schema Mappings 35

Page 36: Core Schema Mappings

Experiments settingsExperiments settings

• Algorithms implemented in +Spicy• Scripts in SQL (and XQuery)• PostgreSQL 8.3 on a Intel CoreDuo 2.4Ghz/4GB

Ram/Linux

• Scenarios from the literature, mostly from STBenchmark [Vldb08 – www.stbenchmark.org]

• Each SQL test run with 10k, 100k, 250k, 500k, and 1M tuples in the source instance

• Time limit = 1 hour– custom engine exceeded the time limit in all scenarios

Sigmod 2009 Core Schema Mappings 36

Page 37: Core Schema Mappings

Experiments resultsExperiments results

10K 100K 250K 500K 1M

0

500

1000

1500

2000

2500

3000

3500

SJ1SJ2SJ3

10K 100K 250K 500K 1M

0

20

40

60

80

100

120

S1S2C1C2

#tuples in the source

Tim

es (

sec)

Tim

es (

sec)

#tuples in the source

Subsumption and coverages Self joins

Sigmod 2009 Core Schema Mappings 37

Page 38: Core Schema Mappings

Experiments resultsExperiments results

10K 100K 250K 500K 1M

0

500

1000

1500

2000

2500

3000

3500

SJ1SJ2SJ3

10K 100K 250K 500K 1M

0

20

40

60

80

100

120

S1S2C1C2

#tuples in the source

Tim

es (

sec)

Tim

es (

sec)

#tuples in the source

Subsumption and coverages Self joins

Scalability experiments with up to 100 tables (82 tgds, 51 subsuptions, 12 coverages): rewriting algorithm ran in 6 secs

Sigmod 2009 Core Schema Mappings 38

25 tables 50 tables 75 tables 100 tables

0

10

20

30

40

50

60

70

80

90

100

0

2

4

6

8

10

12

14

21

39

55

82

26

36

4551

28

12

# of TGDS # of Subsumptions # of CoveragesScript Gen Time (s) Execution Time (s)

Tim

es

(se

c)

Page 39: Core Schema Mappings

Modular solutionModular solution

canonical solution

subsumption-free solution

coverage-free solution

core (self-join-coverage-free solution)

Algorithms allow to produce approximations of the core for expensive computations: “reduced” rewriting wrt to a subset of homomorphisms

Sigmod 2009 Core Schema Mappings 39

Page 40: Core Schema Mappings

Repeatability & Workability EvaluationRepeatability & Workability Evaluation

We participated in the ACM SIGMOD 2009 Repeatability & WorkabilityEvaluation (cf., http://homepages.cwi.nl/~manegold/SIGMOD-2009-

RWE/)

The reviewers were able to repeat all the experiments presented in our

paper, yielding results that in most cases match the ones published in our paper, except from insignificant and to be expected variation due to randomness and/or hardware/software differences.

In addition, workability experiments confirmed - with few exceptions -

the soundness of our results beyond the parameter setting and/or data sets presented in our paper.

The detailed reports will shortly be made publicly available by ACM SIGMOD.

Sigmod 2009 Core Schema Mappings 40

Page 41: Core Schema Mappings

Bridging the gapBridging the gap +Spicy is the first mapping tool which generates

core solutions efficiently• Standard solution for mapping generation• Novel algorithms for (natural) mapping rewriting• Execution times orders of magnitude faster than post

processing Two main results

• Bridge the gap between the practice of mapping generation [Popa et al. Vldb02] and the theory of data exchange [Fagin et al. TODS05, Gottlob&Nash JACM08]

• Enable schema mappings for more practical applications

Sigmod 2009 Core Schema Mappings 41

Page 42: Core Schema Mappings

Thank youThank you

see you in Lyon for the demo:Mecca, Papotti, Raunich, Buoncristiano

“Concise and Expressive Mappings with +Spicy”VLDB 2009

Sigmod 2009 Core Schema Mappings 42