Transcript
Page 1: Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

K-DRIVE

Towards Purposeful Reuse of Semantic Datasets via Goal-Driven Data Summarization

Panos Alexopoulos, Jose Manuel Gomez Perez

6th International Conference on Advances in Semantic Processing

Porto, Portugal, October 3rd, 2013

Page 2: Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

2

The Linked Data Use ChallengeIntroduction

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Page 3: Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

3

Motivating ScenarioIntroduction

●Assume that some entity (individual or organization) wants to reuse public semantic datasets from the Web to:

●Enrich with them its own data.●Use the data to provide added-value services to its users/clients.

●These organizations can be:

●Technology providers (e.g. iSOCO)● Information providers (e.g. publishers, media, etc.)●Knowledge-driven and knowledge-intensive organisations

Page 4: Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

4

Why Data Reuse?Data Enrichment

●The problem with semantic data is the high amount of time and effort required to construct and maintain it.

●The reuse of existing public semantic data can (partially) alleviate this problem:

●Their volume and diversity are increasing at high rates.

●Their maintenance and evolution is the responsibility of their publishers, reducing the required efforts and costs for this task in the organization's side

Page 5: Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

5

ExampleData Enrichment

●A news organization wants to create and maintain a knowledge base about European Football.

●The pace at which this knowledge changes is quite fast meaning that the organization needs to constantly monitor these changes and update the data.

●Much of this information is already available as public semantic data (e.g. DBPedia).

●Thus it could be better for the organization to reuse this public data instead of creating them from scratch.

Page 6: Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

6

Barriers to Data ReuseData Enrichment

●Difficulty for knowledge engineers to decide whether a given dataset is actually suitable for their needs.

● Semantic datasets typically cover diverse domains

● They do not follow a unified way of organizing the knowledge

● Differ in a number of features including size, coverage, granularity and descriptiveness

●This makes difficult the following tasks:

● Assessing whether a dataset satisfies particular requirements

● Comparing different datasets to select which one is more suitable for a given purpose.

Page 7: Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

7

Our ApproachData Reuse

●We suggest the provision of the ability to data consumers to derive semantic data summaries.

●Existing summarization approaches treat the summarization task in an application and user independent way.

●By contrast, we are interested in facilitating the generation of requirements-oriented and goal-driven summaries that may be significantly more helpful to users.

Page 8: Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

8

Problem DescriptionGoal-Driven Semantic Data Summarization

●Key question: “Given an application scenario where semantic data is required, how suitable is a given existing dataset for the purposes of this scenario?”

●To answer this, users normally need to be able to:

1. Explicitly express the requirements that a dataset needs to satisfy for a given task or goal.

2. Automatically measure/assess the extent to which a dataset satisfies each of these requirements and compile a summary report.

Page 9: Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

9

ApproachGoal-Driven Semantic Data Summarization

●To implement these two capabilities we follow a checklist-based approach.

●Checklists are practically lists of action items arranged in a systematic manner that allow users to record the completion of each of them.

●They are widely applied across multiple industries, like healthcare or aviation, to ensure reliable and consistent execution of complex operations.

●In our case we apply checklists to define and execute custom dataset summarization tasks in the form of lists of goal-specific requirements and associated summarization processes.

Page 10: Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

10

Summarization Task RepresentationGoal-Driven Semantic Data Summarization

●To represent custom summarization tasks according to the checklist paradigm we have adopted the Minim model.

●This defines the following information:

●The Goals the dataset summarization task is designed to serve

●The Requirements against which the summarization task evaluates the dataset.

●The Data Analysis Operations that the summarization task employs in order to assess the satisfaction of its requirements

Page 11: Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

11

Example GoalsGoal-Driven Semantic Data Summarization

●Decide if a dataset is appropriate for a Semantic Annotation scenario.

●Decide if a dataset is appropriate for a Question Answering scenario

●Determine which of two or more similar datasets best represent a given corpus.

●Detect arising inconsistencies or other quality problems.

●…

Page 12: Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

12

Example RequirementsGoal-Driven Semantic Data Summarization

●Evaluate the dataset’s coverage of a particular domain/topic: Aims to measure the extent to which a dataset describes a given domain or topic.

●Evaluate the dataset’s labeling adequacy and richness: Aims to measure the extent to which the dataset’s elements (concepts, instances, relations etc.) are accompanied by representative and comprehensible labels, in one or more languages.

●Evaluate Connectivity: This requirement checks the existence of paths between concepts or entities, i.e. whether it is possible to go from a given concept to another on the graph and in what ways.

Page 13: Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

13

Example Data OperationsGoal-Driven Semantic Data Summarization

●Check the existence of a particular element (concept, relation, attribute, instance, axiom) in the dataset.

●Check the dataset’s consistency (e.g. by running a reasoner).

●Measure the number of ambiguous entities in the dataset.

●Measure the number of labeled entities.

Page 14: Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

14

Application ExampleGoal-Driven Semantic Data Summarization

●We applied our framework to assess the suitability of public datasets for the purposes of reusing to semantically annotate texts describing football matches from the Spanish League.

●For that, we wanted the dataset to be reused to

●Contain information about all the current teams of the Spanish football league.

●All its entities to have at least one associated label and

●To relate teams with the players that current play in them.

Page 15: Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

15

Defined Summarization TaskGoal-Driven Semantic Data Summarization

Page 16: Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

16

Resulting Summary

Goal-Driven Semantic Data Summarization

●We executed this task against DBPedia and Freebase, automatically producing the following summary report

●The system provides a yes/no answer as to whether each dataset satisfies each requirement but also additional information on why this may or may not be the case.

●This is important because:

● A requirement might not be satisfied because of a high threshold● A requirement might seem to be satisfied, yet that might not be actually true.

Page 17: Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

17

Summary Generation ToolOngoing Work

●We are currently developing a summarization tool that enables the definition manipulation and execution of summarization tasks as well as the dashboard-like visualization of their output

Page 18: Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization

18

Questions?Thank you!

iSOCO MadridAv. del Partenón, 16-18, 1º7ª

Campo de las Naciones

28042 Madrid

España

(t) +34 913 349 797

iSOCO PamplonaParque Tomás

Caballero, 2, 6º4ª

31006 Pamplona

España

(t) +34 948 102 408

iSOCO ValenciaC/ Prof. Beltrán Báguena, 4

Oficina 107

46009 Valencia

España

(t) +34 963 467 143

iSOCO BarcelonaAv. Torre Blanca, 57

Edificio ESADE CREAPOLIS

Oficina 3C 15

08172 Sant Cugat del Vallès

Barcelona, España

(t) +34 935 677 200

iSOCO ColombiaComplejo Ruta N

Calle 67, 52-20

Piso 3, Torre A

Medellín

Colombia

(t) +57 516 7770 ext. 1132

Key Vendor Virtual Assistant 2013

Quieres innovar?

Dr. Panos Alexopoulos

Semantic Applications Research Manager

[email protected]

(t) +34 913 349 797


Top Related