ode: ontology-assisted data extraction weifeng su, jiying wang, frederick h. lochovsky summarized by...

7
ODE: Ontology-Assisted Data Extraction Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park

Upload: emory-hood

Post on 18-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ODE: Ontology-Assisted Data Extraction Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park

ODE: Ontology-Assisted Data Extraction

Weifeng Su, Jiying Wang, Frederick H. Lochovsky

Summarized by Joseph Park

Page 2: ODE: Ontology-Assisted Data Extraction Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park

Overview• “Web databases…compose what is referred to as the deep Web”

• The goal of data extraction:– (1) Query result section identification - decides what section in a

dynamically generated query result page contains the data that need to be extracted.

– (2) Record segmentation - segments the query result section into records and extracts them.

– (3) Data value alignment - aligns the data values from multiple records that belong to the same attribute so that they can be arranged into a table.

– (4) Label assignment - assigns a suitable, meaningful label (i.e., an attribute name) to each column in an aligned table.

Page 3: ODE: Ontology-Assisted Data Extraction Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park

Problems

• Automatically extract data from query results• Limitations of other systems:– Incapable of processing either zero or few query

results.– Vulnerable to optional and disjunctive attributes.– Incapable of processing nested data structures.– No label assignment.

Page 4: ODE: Ontology-Assisted Data Extraction Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park

Approach

• ODE – Ontology-assisted data extraction• PADE wrapper• Query result annotation• Attribute matching• Ontology construction

Page 5: ODE: Ontology-Assisted Data Extraction Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park

Approach continued

• Query result section identification• Record segmentation• Data value alignment and label assignment– MaxEnt model is used

Page 6: ODE: Ontology-Assisted Data Extraction Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park

Experimental Results

Extraction performed using DeLa

Page 7: ODE: Ontology-Assisted Data Extraction Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park

Conclusion

• Can only label attributes that appear in query result pages

• References a few DEG papers– DKE99, Tisp, TANGO

• Could take advantage of MaxEnt for pre-labeling data

• Need to look into DeLa for data extraction