conquer: efficient management of inconsistent databases

15
ConQuer: Efficient Management ConQuer: Efficient Management of Inconsistent Databases of Inconsistent Databases Presented by: Presented by: Ariel Fuxman (Univ. of Ariel Fuxman (Univ. of Toronto) Toronto) Joint work with: Joint work with: Renée J. Miller (Univ of Toronto) Renée J. Miller (Univ of Toronto) Diego Fuxman (Univ. Nacional del Sur) Diego Fuxman (Univ. Nacional del Sur)

Upload: guinevere-castro

Post on 30-Dec-2015

17 views

Category:

Documents


0 download

DESCRIPTION

ConQuer: Efficient Management of Inconsistent Databases. Presented by: Ariel Fuxman (Univ. of Toronto) Joint work with: Renée J. Miller (Univ of Toronto) Diego Fuxman (Univ. Nacional del Sur). ConQuer. A system designed to answer SQL queries over inconsistent databases. Name. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ConQuer: Efficient Management of Inconsistent Databases

ConQuer: Efficient ConQuer: Efficient Management of Inconsistent Management of Inconsistent

DatabasesDatabases

Presented by: Presented by:

Ariel Fuxman (Univ. of Toronto)Ariel Fuxman (Univ. of Toronto)

Joint work with: Joint work with:

Renée J. Miller (Univ of Toronto)Renée J. Miller (Univ of Toronto)

Diego Fuxman (Univ. Nacional del Sur)Diego Fuxman (Univ. Nacional del Sur)

Page 2: ConQuer: Efficient Management of Inconsistent Databases

Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 22

A system designed to answer SQL A system designed to answer SQL queries over queries over inconsistent inconsistent databasesdatabases

ConQuerConQuer

130K130KMaryMary

110K110KMaryMary

400K400KPaulPaul

200K200KPeterPeter

40K40KPeterPeter

IncomeIncomeNameName

namename should be theshould be the keykey

INCONSISTENT DATABASEINCONSISTENT DATABASE

Page 3: ConQuer: Efficient Management of Inconsistent Databases

Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 33

One ApplicationOne Application

SalesSales

ShippingShipping

Customer SupportCustomer Support

Web FormsWeb Forms

Demographic DataDemographic Data

IntegratedIntegratedCustomerCustomerDatabaseDatabase

Customer Relationship Management Customer Relationship Management (CRM)(CRM)

db2admin
db2admin5/2/2005As a motivation, let's focus on a domain known in IT as CRM -- Customer Relationship Management. One of the goals of CRM is to integrate customer information from such disparate sources as ..... This domain is of interest to us because customer data is notoriously dirty and inconsistnet.
Page 4: ConQuer: Efficient Management of Inconsistent Databases

Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 44

Disagreement Between Disagreement Between SourcesSources

Which tuple for Which tuple for PeterPeter should we delete? should we delete?• Removing both tuples loses consistent informationRemoving both tuples loses consistent information

• Deciding the correct income may require human Deciding the correct income may require human interventionintervention

110K110K……20 Union Street20 Union StreetMaryMary

400K400K……100 Bloor Street100 Bloor StreetPaulPaul

……..

……

276 College Street276 College Street

addressaddress

40K40KPeterPeter

incomeincomenamename

400K400K……100 Bloor Street100 Bloor StreetPaulPaul

130K130K……20 Union Street20 Union StreetMaryMary

……..

……

276 College Street276 College Street

addressaddress

200K200KPeterPeter

incomeincomenamename

salessales

webweb

Page 5: ConQuer: Efficient Management of Inconsistent Databases

Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 55

Inconsistent Integrated Inconsistent Integrated DatabaseDatabase

namename …… incomeincome

PeterPeter …… 40K40K

Paul Paul …… 400K400K

MaryMary …… 110K110K

namename …… incomeincome

Peter Peter …… 200K200K

PaulPaul …… 400K400K

MaryMary …… 130K130K

namename …… incomeincome

PeterPeter …… 40K40K

PeterPeter …… 200K200K

PaulPaul …… 400K400K

MaryMary …… 110K110K

MaryMary …… 130K130K

SalesSales

WebWeb

Integrated DatabaseIntegrated Database

Transfer all conflicting tuples to the integrated databaseTransfer all conflicting tuples to the integrated database

INCONSISTENT INCONSISTENT DATABASEDATABASE

Page 6: ConQuer: Efficient Management of Inconsistent Databases

Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 66

Query AnsweringQuery Answering

q=“Get customers who make more than 100K”q=“Get customers who make more than 100K”

130K130K

110K110K

400K400K

200K200K

40K40K

incomeincome

webweb

salessales

sales/websales/web

webweb

salessales

MaryMary

MaryMary

PaulPaul

PeterPeter

PeterPeter

namename

Peter,Paul,MaryPeter,Paul,Mary

Peter should NOT be offered a Platinum card!!Peter should NOT be offered a Platinum card!!

Offering a Platinum credit card…Offering a Platinum credit card…

Page 7: ConQuer: Efficient Management of Inconsistent Databases

Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 77

Semantics of Query AnsweringSemantics of Query Answering

Get customers who Get customers who possiblypossibly make more than make more than 100K100K• Peter, Paul, Mary Peter, Paul, Mary

Get customers who Get customers who certainlycertainly make more than make more than 100K100K• Paul, MaryPaul, Mary CONSISTENTCONSISTENT

ANSWERANSWER[Arenas et al. 99][Arenas et al. 99]

custidcustid incomeincome

PeterPeter 40K40K salessales

PeterPeter 200K200K webweb

PaulPaul 400K400K sales/websales/web

MaryMary 110K110K salessales

MaryMary 130K130K webweb

Page 8: ConQuer: Efficient Management of Inconsistent Databases

Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 88

RepairsRepairs

PeterPeter 40K40K

PaulPaul 400K400K

MaryMary 110K110K

PeterPeter 40K40K

PaulPaul 400K400K

MaryMary 130K130K

PeterPeter 200K200K

PaulPaul 400K400K

MaryMary 110K110K

PeterPeter 200K200K

PaulPaul 400K400K

MaryMary 130K130K

130K130K

110K110K

400K400K

200K200K

40K40K

incomeincome

webweb

salessales

sales/websales/web

webweb

salessales

MaryMary

MaryMary

PaulPaul

PeterPeter

PeterPeter

custidcustid

Inconsistent databaseInconsistent database

RepairRepairss

Key: Key: custidcustid

Page 9: ConQuer: Efficient Management of Inconsistent Databases

Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 99

CONSISTENT ANSWERSCONSISTENT ANSWERSAnswers obtainedAnswers obtained

no matter which repair no matter which repair we choosewe choose

Consistent Query AnswersConsistent Query Answers

PeterPeter 40K40K

PaulPaul 400K400K

MaryMary 110K110K

PeterPeter 40K40K

PaulPaul 400K400K

MaryMary 130K130K

PeterPeter 200K200K

PaulPaul 400K400K

MaryMary 110K110K

PeterPeter 200K200K

PaulPaul 400K400K

MaryMary 130K130K

q=q=“Get customers who make more than 100K”“Get customers who make more than 100K”

qq

qq

qq

qq

CONSISTENT CONSISTENT ANSWER=ANSWER=

{Paul,Mary}{Paul,Mary}

RepairsRepairs

MaryMary

PaulPaul

PeterPeter

MaryMary

PaulPaul

MaryMary

PaulPaul

MaryMary

PaulPaul

PeterPeter

Page 10: ConQuer: Efficient Management of Inconsistent Databases

Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 1010

ProblemProblem

Potentially HUGE number of repairs!Potentially HUGE number of repairs!

Page 11: ConQuer: Efficient Management of Inconsistent Databases

Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 1111

ConQuerConQuer

ConQuer is a system ConQuer is a system designeddesigned to to compute consistent answers compute consistent answers efficiently efficiently •avoids explicit construction of repairsavoids explicit construction of repairs

•reuses commercial database reuses commercial database technology technology

db2admin
A naïve algorithm: enumerate all repairsBut the number of repairs can be huge!
Page 12: ConQuer: Efficient Management of Inconsistent Databases

Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 1212

Commercial databaseCommercial databaseengineengine

ConQuer’s SolutionConQuer’s Solution

Query Query qq KeysKeys

Rewritten QRewritten Q**

ConQuer’sConQuer’sRewriting Rewriting AlgorithmAlgorithm

[ICDT 05][ICDT 05][SIGMOD 05][SIGMOD 05]

InconsistentInconsistentdatabasedatabase

Consistent Consistent answeranswer to to qq

Page 13: ConQuer: Efficient Management of Inconsistent Databases

Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 1313

ContributionsContributions

Rewriting algorithm Rewriting algorithm •From a large class of SPJ SQL queriesFrom a large class of SPJ SQL queries

•Into SQL queriesInto SQL queriesRewriting for queries with grouping and Rewriting for queries with grouping and

aggregationaggregationOptimized rewriting Optimized rewriting

•Exploits precomputed information, if Exploits precomputed information, if availableavailable

Experimental evaluation Experimental evaluation •Large databasesLarge databases

•TPC-H queriesTPC-H queries

db2admin
These restrictions are not arbitrary since it is known that there are some SPJ queries for which there is no SQL rewriting
Page 14: ConQuer: Efficient Management of Inconsistent Databases

Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 1414

DemoDemo

Present a case study of an Present a case study of an inconsistent database about airports inconsistent database about airports and citiesand cities

Explain the automatically generated Explain the automatically generated rewritingsrewritings

Deal with Select-Project-Join queries Deal with Select-Project-Join queries with grouping and aggregationwith grouping and aggregation

Page 15: ConQuer: Efficient Management of Inconsistent Databases

Ariel Fuxman, Diego Fuxman, Renée J. MillerAriel Fuxman, Diego Fuxman, Renée J. Miller 1515

ConQuer papersConQuer papers

A. Fuxman, E. Fazli, and R. J. Miller. A. Fuxman, E. Fazli, and R. J. Miller. ConQuer: Efficient Management of ConQuer: Efficient Management of Inconsistent DatabasesInconsistent Databases, SIGMOD , SIGMOD 2005.2005.

A. Fuxman and R. J. Miller. A. Fuxman and R. J. Miller. First-First-Order Query Rewriting for Order Query Rewriting for Inconsistent DatabasesInconsistent Databases, ICDT 2005., ICDT 2005.