towards purposeful reuse of semantic datasets through goal-driven summarization
Post on 05-Dec-2014
303 Views
Preview:
DESCRIPTION
TRANSCRIPT
K-DRIVE
Towards Purposeful Reuse of Semantic Datasets via Goal-Driven Data Summarization
Panos Alexopoulos, Jose Manuel Gomez Perez
6th International Conference on Advances in Semantic Processing
Porto, Portugal, October 3rd, 2013
2
The Linked Data Use ChallengeIntroduction
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
3
Motivating ScenarioIntroduction
●Assume that some entity (individual or organization) wants to reuse public semantic datasets from the Web to:
●Enrich with them its own data.●Use the data to provide added-value services to its users/clients.
●These organizations can be:
●Technology providers (e.g. iSOCO)● Information providers (e.g. publishers, media, etc.)●Knowledge-driven and knowledge-intensive organisations
4
Why Data Reuse?Data Enrichment
●The problem with semantic data is the high amount of time and effort required to construct and maintain it.
●The reuse of existing public semantic data can (partially) alleviate this problem:
●Their volume and diversity are increasing at high rates.
●Their maintenance and evolution is the responsibility of their publishers, reducing the required efforts and costs for this task in the organization's side
5
ExampleData Enrichment
●A news organization wants to create and maintain a knowledge base about European Football.
●The pace at which this knowledge changes is quite fast meaning that the organization needs to constantly monitor these changes and update the data.
●Much of this information is already available as public semantic data (e.g. DBPedia).
●Thus it could be better for the organization to reuse this public data instead of creating them from scratch.
6
Barriers to Data ReuseData Enrichment
●Difficulty for knowledge engineers to decide whether a given dataset is actually suitable for their needs.
● Semantic datasets typically cover diverse domains
● They do not follow a unified way of organizing the knowledge
● Differ in a number of features including size, coverage, granularity and descriptiveness
●This makes difficult the following tasks:
● Assessing whether a dataset satisfies particular requirements
● Comparing different datasets to select which one is more suitable for a given purpose.
7
Our ApproachData Reuse
●We suggest the provision of the ability to data consumers to derive semantic data summaries.
●Existing summarization approaches treat the summarization task in an application and user independent way.
●By contrast, we are interested in facilitating the generation of requirements-oriented and goal-driven summaries that may be significantly more helpful to users.
8
Problem DescriptionGoal-Driven Semantic Data Summarization
●Key question: “Given an application scenario where semantic data is required, how suitable is a given existing dataset for the purposes of this scenario?”
●To answer this, users normally need to be able to:
1. Explicitly express the requirements that a dataset needs to satisfy for a given task or goal.
2. Automatically measure/assess the extent to which a dataset satisfies each of these requirements and compile a summary report.
9
ApproachGoal-Driven Semantic Data Summarization
●To implement these two capabilities we follow a checklist-based approach.
●Checklists are practically lists of action items arranged in a systematic manner that allow users to record the completion of each of them.
●They are widely applied across multiple industries, like healthcare or aviation, to ensure reliable and consistent execution of complex operations.
●In our case we apply checklists to define and execute custom dataset summarization tasks in the form of lists of goal-specific requirements and associated summarization processes.
10
Summarization Task RepresentationGoal-Driven Semantic Data Summarization
●To represent custom summarization tasks according to the checklist paradigm we have adopted the Minim model.
●This defines the following information:
●The Goals the dataset summarization task is designed to serve
●The Requirements against which the summarization task evaluates the dataset.
●The Data Analysis Operations that the summarization task employs in order to assess the satisfaction of its requirements
11
Example GoalsGoal-Driven Semantic Data Summarization
●Decide if a dataset is appropriate for a Semantic Annotation scenario.
●Decide if a dataset is appropriate for a Question Answering scenario
●Determine which of two or more similar datasets best represent a given corpus.
●Detect arising inconsistencies or other quality problems.
●…
12
Example RequirementsGoal-Driven Semantic Data Summarization
●Evaluate the dataset’s coverage of a particular domain/topic: Aims to measure the extent to which a dataset describes a given domain or topic.
●Evaluate the dataset’s labeling adequacy and richness: Aims to measure the extent to which the dataset’s elements (concepts, instances, relations etc.) are accompanied by representative and comprehensible labels, in one or more languages.
●Evaluate Connectivity: This requirement checks the existence of paths between concepts or entities, i.e. whether it is possible to go from a given concept to another on the graph and in what ways.
13
Example Data OperationsGoal-Driven Semantic Data Summarization
●Check the existence of a particular element (concept, relation, attribute, instance, axiom) in the dataset.
●Check the dataset’s consistency (e.g. by running a reasoner).
●Measure the number of ambiguous entities in the dataset.
●Measure the number of labeled entities.
14
Application ExampleGoal-Driven Semantic Data Summarization
●We applied our framework to assess the suitability of public datasets for the purposes of reusing to semantically annotate texts describing football matches from the Spanish League.
●For that, we wanted the dataset to be reused to
●Contain information about all the current teams of the Spanish football league.
●All its entities to have at least one associated label and
●To relate teams with the players that current play in them.
15
Defined Summarization TaskGoal-Driven Semantic Data Summarization
16
Resulting Summary
Goal-Driven Semantic Data Summarization
●We executed this task against DBPedia and Freebase, automatically producing the following summary report
●The system provides a yes/no answer as to whether each dataset satisfies each requirement but also additional information on why this may or may not be the case.
●This is important because:
● A requirement might not be satisfied because of a high threshold● A requirement might seem to be satisfied, yet that might not be actually true.
17
Summary Generation ToolOngoing Work
●We are currently developing a summarization tool that enables the definition manipulation and execution of summarization tasks as well as the dashboard-like visualization of their output
18
Questions?Thank you!
iSOCO MadridAv. del Partenón, 16-18, 1º7ª
Campo de las Naciones
28042 Madrid
España
(t) +34 913 349 797
iSOCO PamplonaParque Tomás
Caballero, 2, 6º4ª
31006 Pamplona
España
(t) +34 948 102 408
iSOCO ValenciaC/ Prof. Beltrán Báguena, 4
Oficina 107
46009 Valencia
España
(t) +34 963 467 143
iSOCO BarcelonaAv. Torre Blanca, 57
Edificio ESADE CREAPOLIS
Oficina 3C 15
08172 Sant Cugat del Vallès
Barcelona, España
(t) +34 935 677 200
iSOCO ColombiaComplejo Ruta N
Calle 67, 52-20
Piso 3, Torre A
Medellín
Colombia
(t) +57 516 7770 ext. 1132
Key Vendor Virtual Assistant 2013
Quieres innovar?
Dr. Panos Alexopoulos
Semantic Applications Research Manager
palexopoulos@isoco.com
(t) +34 913 349 797
top related