entity resolution with evolving rules
DESCRIPTION
Entity Resolution with Evolving Rules. Youzhong Ma 2010-9-25 Lab of WAMDM. Outline. Motivations ER Related concepts ER properties Conclusions. Entity Resolution background. Entity Resolution background. Naïve ER Approach Vs. New Approach. Outline. Motivations ER Related concepts - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/1.jpg)
Entity Resolution with Evolving Rules
Youzhong Ma 2010-9-25Lab of WAMDM
![Page 2: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/2.jpg)
Outline
Motivations ER Related concepts ER properties Conclusions
![Page 3: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/3.jpg)
Entity Resolution background
![Page 4: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/4.jpg)
Entity Resolution background
![Page 5: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/5.jpg)
Naïve ER Approach Vs. New Approach
![Page 6: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/6.jpg)
Outline
Motivations
ER Related concepts ER properties Conclusions
![Page 7: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/7.jpg)
ER Related concepts
Suppose market A will merge market B They have to combine their customers The same person may occur in two
markets’ customer DB, but some attributes are different
How to deal with it?
![Page 8: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/8.jpg)
ER Rule
Boolean functions determines if two records represent the same
entity: true or false.
Distance functions How different(similar) the records are.
![Page 9: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/9.jpg)
ER Example
![Page 10: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/10.jpg)
ER procedure
B1:Pname E1 = {{r1,r2,r3},{r4}} (6 comps) )
B2: Pname ∧ Pzip E2 = {{r1,r2},{r3},{r4}}
Naïve approachNaïve approach6 comps6 comps
original records set S = {r1,r2,r3,r4}ER input Pi = {{r1},{r2},{r3},{r4}}
Evolving ruleEvolving rule3 comps3 comps
The Evolving rule approach only works if the ER algorithm satisfies Certain properties and B2 is Stricter than B1.
So one contribution of this paper is to exploitUnder what conditions and for what ER algorithmsAre incremental approaches feasible?
![Page 11: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/11.jpg)
B1:Pname ∧ Pzip E1 = {{r1,r2},{r3},{r4}} (6 comps) )
B2: Pname ∧ Phone E2 ={{r1},{r2,r3},{r4}}
3comps3comps
original records set S = {r1,r2,r3,r4}ER input Pi = {{r1},{r2},{r3},{r4}}
Pname Ename = {{r1,r2,r3},{r4}}
Pzip Ezip = {{r1,r2},{r3},{r4}}
Materialization!
![Page 12: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/12.jpg)
Outline
Motivations ER Related concepts
ER properties Conclusions
![Page 13: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/13.jpg)
Two important properties for ER algorithms that enable efficient rule evolution for match-based clustering
Rule Monotonicity(RM)
Context Free(CF)
![Page 14: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/14.jpg)
Pname ∧ Pzip ≤ Pname
![Page 15: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/15.jpg)
Rule Monotonicity(RM)
B2:Pname E2 = {{r1,r2,r3},{r4}}
B1: Pname ∧ Pzip E1 = {{r1,r2},{r3},{r4}}
![Page 16: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/16.jpg)
Context Free (CF)
![Page 17: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/17.jpg)
General Incremental VS. Context Free
Order independent VS. Rule Monotonicity An ER algorithm is order independent if the ER
result is same regardless of the order of the records processed.
Existing properties in literature
![Page 18: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/18.jpg)
![Page 19: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/19.jpg)
![Page 20: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/20.jpg)
![Page 21: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/21.jpg)
experiments
![Page 22: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/22.jpg)
Outline
Motivations ER Related concepts ER properties
Conclusions
![Page 23: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/23.jpg)
conclusions
Propose a new ER approach with evolving rules
Exploiting the properties (RM、 CF) of the ER algorithms that enable efficient rule evolution
Providing guidance to the ER algorithms designer
![Page 24: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/24.jpg)
Some problems
How are the comparision rules generated?
How to design the ER Algorithms that hold the RM and CF properties?
How to Implement the ER algorithms in MapReduce framework?
![Page 25: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/25.jpg)
Thanks to everyone of Web Group sincerely
![Page 26: Entity Resolution with Evolving Rules](https://reader035.vdocuments.net/reader035/viewer/2022062314/5681456c550346895db23c85/html5/thumbnails/26.jpg)