ode: ontology-assisted data extraction weifeng su, jiying wang, frederick h. lochovsky summarized by...
TRANSCRIPT
![Page 1: ODE: Ontology-Assisted Data Extraction Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park](https://reader035.vdocuments.net/reader035/viewer/2022081811/5697bfe11a28abf838cb3e4f/html5/thumbnails/1.jpg)
ODE: Ontology-Assisted Data Extraction
Weifeng Su, Jiying Wang, Frederick H. Lochovsky
Summarized by Joseph Park
![Page 2: ODE: Ontology-Assisted Data Extraction Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park](https://reader035.vdocuments.net/reader035/viewer/2022081811/5697bfe11a28abf838cb3e4f/html5/thumbnails/2.jpg)
Overview• “Web databases…compose what is referred to as the deep Web”
• The goal of data extraction:– (1) Query result section identification - decides what section in a
dynamically generated query result page contains the data that need to be extracted.
– (2) Record segmentation - segments the query result section into records and extracts them.
– (3) Data value alignment - aligns the data values from multiple records that belong to the same attribute so that they can be arranged into a table.
– (4) Label assignment - assigns a suitable, meaningful label (i.e., an attribute name) to each column in an aligned table.
![Page 3: ODE: Ontology-Assisted Data Extraction Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park](https://reader035.vdocuments.net/reader035/viewer/2022081811/5697bfe11a28abf838cb3e4f/html5/thumbnails/3.jpg)
Problems
• Automatically extract data from query results• Limitations of other systems:– Incapable of processing either zero or few query
results.– Vulnerable to optional and disjunctive attributes.– Incapable of processing nested data structures.– No label assignment.
![Page 4: ODE: Ontology-Assisted Data Extraction Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park](https://reader035.vdocuments.net/reader035/viewer/2022081811/5697bfe11a28abf838cb3e4f/html5/thumbnails/4.jpg)
Approach
• ODE – Ontology-assisted data extraction• PADE wrapper• Query result annotation• Attribute matching• Ontology construction
![Page 5: ODE: Ontology-Assisted Data Extraction Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park](https://reader035.vdocuments.net/reader035/viewer/2022081811/5697bfe11a28abf838cb3e4f/html5/thumbnails/5.jpg)
Approach continued
• Query result section identification• Record segmentation• Data value alignment and label assignment– MaxEnt model is used
![Page 6: ODE: Ontology-Assisted Data Extraction Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park](https://reader035.vdocuments.net/reader035/viewer/2022081811/5697bfe11a28abf838cb3e4f/html5/thumbnails/6.jpg)
Experimental Results
Extraction performed using DeLa
![Page 7: ODE: Ontology-Assisted Data Extraction Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park](https://reader035.vdocuments.net/reader035/viewer/2022081811/5697bfe11a28abf838cb3e4f/html5/thumbnails/7.jpg)
Conclusion
• Can only label attributes that appear in query result pages
• References a few DEG papers– DKE99, Tisp, TANGO
• Could take advantage of MaxEnt for pre-labeling data
• Need to look into DeLa for data extraction