data-extraction ontology generation by example yuanqiu (joe) zhou data extraction group brigham...
Post on 19-Dec-2015
218 views
TRANSCRIPT
![Page 1: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2a5503460f949fedd7/html5/thumbnails/1.jpg)
Data-Extraction Ontology Generation by Example
Yuanqiu (Joe) ZhouData Extraction Group
Brigham Young UniversitySponsored by NSF
![Page 2: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2a5503460f949fedd7/html5/thumbnails/2.jpg)
Motivation
Semi-structured Web data need to be extracted for further manipulations.
Contrast to other wrapper generation techniques, BYU ontology-based data-extraction technique is resilient.
By-Example approach makes it possible to help common users generate ontologies easily.
![Page 3: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2a5503460f949fedd7/html5/thumbnails/3.jpg)
Web-based System GUI
Canon PowerShot S40
4.0 1600 x 12001024 x 768640 x 480
![Page 4: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2a5503460f949fedd7/html5/thumbnails/4.jpg)
Architecture
Data Frame Library
User Defined Form
System GUI
Sample Pages
Ontology Generator
Extraction Engine Test PagesPopulated Database
Extraction Ontology
![Page 5: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2a5503460f949fedd7/html5/thumbnails/5.jpg)
Extraction Ontology
Object and Relationship Sets and Constraints
Extraction Patterns
Keywords
Context Expressions
![Page 6: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2a5503460f949fedd7/html5/thumbnails/6.jpg)
BaseA
B
C
D1 D2
E1 E2
Base [0:1] A [1:*]
Base [0:2] B [1:*]
Base [0:*] C [1:*]
Base [0:2] D1 [1:*] D2 [1:*]
Base [0:*] E1 [1:*] E2 [1:*]
Ontology GenerationObject and Relationship Sets and Constraints
![Page 7: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2a5503460f949fedd7/html5/thumbnails/7.jpg)
Base
A
B
…
…
A
B1
B2
B1, B2 : B
G
H I
F
A [0:1] F [1:*]
B1 [0:1] G [1:*]
B2 [0:1] H [1:*] I [1:*]
Ontology GenerationObject and Relationship Sets and Constraints
![Page 8: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2a5503460f949fedd7/html5/thumbnails/8.jpg)
Sample Web Page User Created Form
CCD Resolution Image Resolution
Optical Zoom
Digital Zoom
Digital Camera
Brand Model
Zoom
Zoom
PowerShot G2 Canon
4.0 2272 x 1074
3
2
Object and Relationship Sets and Constraints
DigitalCamera [-> object]DigitalCamera [0:1] Brand [1:*]DigitalCamera [0:1] Model [1:*]DigitalCamera [0:1] CCDResolution [1:*]DigitalCamera [0:1] ImageResolution [1:*]DigitalCamera [0:1] Zoom [1:*]
Zoom [0:1] DigitalZoom [1:*]Zoom [0:1] OpticalZoom [1:*]
![Page 9: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2a5503460f949fedd7/html5/thumbnails/9.jpg)
Ontology GenerationExtraction Patterns
Data Frame Library Lexicons Synonym Dictionaries or thesauri Regular Expressions
Matching extraction patterns: Only one (bingo!) More than one (use extraction pattern filters) No matching extraction pattern (create one)
![Page 10: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2a5503460f949fedd7/html5/thumbnails/10.jpg)
Features a high-quality 4.0 Megapixel Resolution CCD
The new Nikon Coolpix 995 boasts of a 3.34 Megapixel CCD
3 effective megapixel
Ontology GenerationKeywords
![Page 11: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2a5503460f949fedd7/html5/thumbnails/11.jpg)
3.5x optical zoom (2.5x digital)
a superior 4x Optical Zoom Nikkor lens, plus 4x stepless digital zoom
optical 3X /digital 6X zoom
Ontology GenerationContext Expressions
![Page 12: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2a5503460f949fedd7/html5/thumbnails/12.jpg)
DigitalCamera [-> object];DigitalCamera [0:1] Brand [1:*];DigitalCamera [0:1] ImageResolution [1:*];DigitalCamera [0:1] Zoom [1:*];DigitalCamera [0:1] CCDResolution [1:*];
Zoom[0:1] OpticalZoom[1:*];
Brand matches [10] constant{ extract "\bNikon\b";},
{ extract "\bCanon\b";},{ extract "\bOlympus\b";},{ extract "\bMinolta\b";},{ extract "\bSony\b";};
end;
CCD Resolution matches [20] constant{ extract "\b\d(\.\d{1,2})?\b"; };
keyword "\bMegapixel\b“, "\bCCD\b", "\bCCD Resolution\b";
end;
OpticalZoom matches [10]constant{ extract "\b\d(\.\d)";
context "\b\d(\.\d)?(x)\b"; };keyword "\boptical\b";
end;
Extraction Ontology
![Page 13: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2a5503460f949fedd7/html5/thumbnails/13.jpg)
Measurements How much of the ontology was generated with
respect to how much could have been generated?
How many components generated should not have been generated?
What comparisons can we make about the precision and recall ratios of extraction data between a system-generated ontology and an expert-generated ontology?
How many sample pages are necessary for acceptable system performance?
![Page 14: Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2a5503460f949fedd7/html5/thumbnails/14.jpg)
Contributions
Proposes a by-example approach to semi-automatically generate data-extraction ontologies
Constructs a Web-based tool to generate data-extraction ontologies