deployment and evaluation issues in ontology-based information extraction

Download Deployment and Evaluation Issues in Ontology-Based Information Extraction

If you can't read please download the document

Upload: hachi

Post on 10-Jan-2016

49 views

Category:

Documents


4 download

DESCRIPTION

Deployment and Evaluation Issues in Ontology-Based Information Extraction. Ex – progress and Perspectives. Agenda. WIE – motivation and Use Cases Extraction ontologies structure and content IE workflow Authoring extraction ontologies by hand with use of Domain Ontologies - PowerPoint PPT Presentation

TRANSCRIPT

  • Deployment and Evaluation Issues in Ontology-Based Information ExtractionEx progress and Perspectives

  • AgendaWIE motivation and Use CasesExtraction ontologies structure and contentIE workflowAuthoring extraction ontologiesby handwith use of Domain Ontologieswith use of other Business MetamodelsResults so far

  • (Web) Information Extractionpurpose:extract objects from documentsorsemantically annotate existing documents

    conferencename: 11th International Conference on Business Information Systemsplace: Innsbruck, Austria 5-7 May 2008

  • IE Use CasesExtraction of objects of a known, well-defined class(es)Annonatating document collections of any size

    Sources: Structured, semi-structured, free-textExtraction should improve if:documents contain some formatting (e.g. HTML)this formatting is similar within or across document(s)some examples are provided

  • Domains considered so-farOnline productsContact informationSeminars, eventsWeather forecastsFree-text business statementsFootball

  • Sources of Knowledge for IEontologythe only mandatory sourcemay include hand crafted patterns for typical attribute valuessample instancespossibly coupled with referring documentsused to get typical content and context of extractable itemscommon formatting structureof instances presented in a single document, oramong documents from the same source

  • The (Simple) Extraction Process

  • Extraction Ontologiescontain:semantic structure of extracted dataadditional IE hooks, i.e. learned or ad-hoc patterns for typical content of extracted values and their contexttheir semantic structure describes the presentation of objects instead of real- world objectswe speak about presentation ontologies

  • Presentation Ontologiescontains concepts that are to be populated with many instances=> can be viewed as information ontologiesclass attributes can be represented as a set of variables=> can be used as a data structurescan contain additional higher-level restrictions=> can be looked upon as knowledge ontologies

  • Nature of Presentation Ontologiespresentation ontologies are of slightly different nature than other modelsthey usually contain:a single core classits attributesadditional constraints

  • Example Presentation Ontology

  • Source of Presentation Ontologiestypically designed by human for a specific extraction tasksingle purpose hand-crafting from scratch is tedious and can introduce inconsistenciesit should be possible to craft extraction models with reuse of existing meta models so that:semantics of the extracted data are consistent with existing knowledge modelsthe need for initial domain analysis and for expert knowledge lessens (and so do the costs)

  • High-level scheme of EO-based IE

  • Hand-crafted Extraction Ontologies

  • Building upon Existing Modelsfor crafting a presentation ontology from a preexisting knowlegde model a transformation process is neededthe transformation will differ with use of distinct models but there are some general steps:choose / find the core class C create its attributes in the presentation ontologyformulate ontological constraints over attributescreate additional WIE hooks to form a complete extraction ontologyas the expressiveness of the source models is usually very high, the transformation cannot be processed deterministically

  • Reuse of Domain Ontologiestransformation of a domain ontology will mainly amount of the general steps mentioned beforeso far, we were able to formalize a few general heuristic rules that can help an experta) to choose the core class of an incipient presentation ontologyb) to populate it with attributes

  • Transformation Rule a1)Class C that has individuals directly asserted in the domain ontology should probably not become the core class in the presentation ontology.

  • Transformation Rule a2)If some property does not have an inverse property explicitly declared, a class C in the domain of this property is more likely to become the core class than any class that figures in its range.

  • Transformation Rule a3)If a class C has a minimum cardinality restriction on property D whose range is class C1, such that C1 does not have any restrictions on the inverse property of D, then C1 should not become the core class.

  • Transformation Rule a4)If there is a chain of object properties then the classes at the ends of such a chain are more likely to form the core class. If a class C is at the end of more such chains, it is even more suitable for becoming the core class.

  • Transformation Rule b1)A datatype property may directly yield an attribute. Furthermore a datatype property, together with a chain of object properties (typically part-of properties) may yield an attribute too.

  • Transformation Rule b2)A set of mutually disjoint subclasses may yield an attribute even without a property counterpart in the source ontology.

  • Transformation Rule b3)A set of mutually disjoint subclasses of some class together with a chain of object properties may yield an attribute.

  • Transformation Rule b4)An object property of a class C whose object has some individuals asserted in the ontology may yield an attribute.

  • Test Resultsnumbers in the table only show the rules whose results were really chosenthese test verify that the transformation is possible

  • Test Resultsthere is a correlation between the use of rules b4) and a1), because both are based on the presence of instances

  • Test Resultsif an ontology is in the form of a taxonomy, it can still be useful via the rule b3)

  • Using other Business MetamodelsWhat Metamodels?theoretically, anyin praxis it will be the most common ones:UMLBPMrelational database models...

  • UMLindustrial standardintegrates models used in software engineeringcontains more diagram groups:structural diagramsbehavioral diagramsother

  • UML Structural Diagramsdescribe static structural constructsconcept of a class is very simlar to ontologiesclass and object diagrams can be used similarly

  • Other UML UsesStructure diagrams may help significantly, mainly with populating ontology with attributes and with identifying part-of relationsBehavioral diagrams can yeild attributes and general realations but the use is rareUML supplements can only provide some technical details, like attributes datatypes or sample valuesOther UML diagrams can be used only very vaguely

  • Relational Modelbased on predicate logic and set theory, it has many things in common with other means of specification of a domainan entity (i.e. a table) can directly yield a class and its fields references can be mapped to class propertiesexplicit specification of primary and secondary keys allow to easily recognize an inverse propertysupporting tables (for m:n relations) should not yield a core class

  • Business Process Modeldescribes a collection of activities needed to produce a specific output every process depicts a change of a state of an entity and should yield a possible value of an attribute, or the attribute itself.the event element and the choice element express a relation to some other entitya set of processes should describe an entity, which can then possibly lead to a class

  • Results - SeminarsPRFGOLDAUTOAMATGMAT etime-strict98.4688.8993.43216195192192 etime-loose 99.4989.1294.0221619520.5 location-strict58.7874.1565.58325410241241 location-loose 79.7285.6182.5632541085.8437.23 speaker-strict71.1168.7369.90371360256255 speaker-loose 76.5873.8375.1837136019.6918.9 stime-strict95.7588.0791.75486447428428 stime-loose 95.7588.0791.7548644700 avg-strict79.1179.8379.471398141211171116 avg-loose 86.7283.8885.2813981412107.556.6

  • Results - WeatherPRF temperature-strict97.1282.1689.32 temperature -loose 99.7191.1795.34 location-strict92.5681.1286.42 location-loose 94.3486.1290.09 condition-strict81.6371.8175.52 condition-loose 86.2580.1583.79 time-strict93.1487.0590.84 time-loose 96.7291.1593.51

  • Results - Football

    PRFaction-strict88,45%79,00%83,59%action-loose92,87%93,51%93,19%country-strict89,55%91,50%90,52%country-loose94,51%93,68%94,09%event-strict86,15%79,18%82,59%event-loose91,76%83,17%87,36%name-strict72,15%69,70%70,91%name-loose78,18%72,34%75,20%time-strict94,63%92,43%93,52%time-loose96,63%94,13%95,37%

  • Thank you for your time

    *