a multidimensional semantic space for data model independent queries over rdf data

42
Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.ie A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data André Freitas, João Gabriel Oliveira, Edward Curry Seán O’Riain

Upload: andre-freitas

Post on 09-May-2015

1.064 views

Category:

Education


1 download

DESCRIPTION

IEEE International Conference on Semantic Computing (ICSC 2011). A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data André Freitas, João Gabriel Oliveira, Edward Curry Seán O’Riain http://andrefreitas.org/papers/preprint_multidimensional_ieee_icsc_2011.pdf Abstract: The vision of creating a Linked Data Web brings together the challenge of allowing queries across highly heterogeneous and distributed datasets. In order to query Linked Data on the Web today, end-users need to be aware of which datasets potentially contain the data and also which data model describes these datasets. The process of allowing users to expressively query relationships in RDF while abstracting them from the underlying data model represents a fundamental problem for Web-scale Linked Data consumption. This article introduces a multidimensional semantic space model which enables data model independent natural language queries over RDF data. The center of the approach relies on the use of a distributional semantic model to address the level of semantic interpretation demanded to build the data model independent approach. The final multidimensional semantic space proved to be flexible and precise under real-world query conditions achieving mean reciprocal rank = 0.516, avg. precision = 0.482 and avg. recall =0.491.

TRANSCRIPT

Page 1: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

A Multidimensional Semantic Space for Data Model Independent Queries

over RDF Data

André Freitas, João Gabriel Oliveira, Edward Curry Seán O’Riain

Page 2: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Outline

Problem Space & Motivation Description of the Approach Evaluation Conclusion & Future Work

Page 3: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Linked Data

Uses the Web infrastructure and standards to expose and interlink datasets.

Linked Data vision: The Web as a single Dataspace. Web of interlinked datasets.

Page 4: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Linked Data: Adoption

Page 5: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Queries over Linked Data

Linked Data brings a fundamental challenge for data

consumption: How to query heterogeneous and distributed datasets? At Web scale it is unfeasible for end-users to be aware of the

location and structure of datasets.

Demand for new query mechanisms for Linked Data (data model independency).

Page 6: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Query/Search Spectrum

Adapted from Kauffman et al (2009)

Page 7: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Fundamental Problem

From which university did the wife of Barack Obama graduate?

Popescu (2003): Semantic tractability problem.

Page 8: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Semantic Matching Problem

From which university did the wife of Barack Obama graduate?

Page 9: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Semantic Matching Problem

From which university did the wife of Barack Obama graduate?

Entity identification

Page 10: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Semantic Matching Problem

From which university did the wife of Barack Obama graduate?

Entity search

Page 11: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Semantic Matching Problem

From which university did the wife of Barack Obama graduate?

Approximate semantic matching

Page 12: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Semantic Matching Problem

From which university did the wife of Barack Obama graduate?

Approximate semantic matching

Page 13: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Semantic Matching Problem

From which university did the wife of Barack Obama graduate?

Approximate semantic matching

Page 14: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Semantic Matching Problem

From which university did the wife of Barack Obama graduate?

Structural matching

Page 15: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Semantic Matching Problem

From which university did the wife of Barack Obama graduate?

T- Space

Page 16: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Strategy

Best-effort query model (ranked results). Use of a distributional semantic model. Two phase search process combining entity search

with spreading activation search.

Page 17: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Proposed Approach

Page 18: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Query Approach Rationale

Page 19: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Query Approach Rationale

Page 20: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Query Approach Rationale

Page 21: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Query Approach Rationale

Final Query- Data Matching:

Page 22: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Semantic Relatedness

Computation of a measure of “semantic proximity” between two terms.

Allows a semantic approximate matching between query terms and dataset terms.

Most existing approaches use WordNet-based

solutions for approximate semantic matching. Distributional semantic approaches address these

limitations.

Page 23: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Distributional Semantics

Assumption: the context surrounding a given word in a text provides important information about its meaning.

Meaning is mediated by word distribution in the corpora.

Simplified semantic model. Opera is an art form in which singers and musicians perform a dramatic work combining text (called a libretto) and musical score. Opera is part of the Western classical music tradition. Opera incorporates many of the elements of spoken theatre, such as acting, scenery, and costumes and sometimes includes dance. The performance is typically given in an opera house, accompanied by an orchestra or smaller musical ensemble.

Page 24: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Explicit Semantic Analysis (ESA)

Based on Wikipedia. Interpretation vector using Wikipedia articles titles.

Page 25: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Building the T- Space (Steps)

Building the distributional semantic model using ESA.

Construction of instances spaces (TF/IDF). Construction of classes spaces (ESA). Construction of relation spaces (ESA).

Page 26: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Building the T- Space

instances

classes

properties

relations

Page 27: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Building the T- Space

Page 28: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Instances

Page 29: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Building the T- Space

Page 30: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Classes

Universal ESA Space

Page 31: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Building the T- Space

Page 32: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Relations

Universal ESA Space

Page 33: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Building the T- Space

Page 34: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Querying the T- Space

Page 35: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Querying the T- Space

Page 36: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Querying the T- Space

Page 37: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Querying the T- Space

Page 38: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Treo (Irish): Route, path

Page 39: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Evaluation

Page 40: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Quality of Results

Full DBPedia QuerySet (50 queries)

Avg. Precision Avg. Recall

MRR % of queries answered

0.482 0.491 0.516 58%

QALD DBPedia Training Set. 50 natural language queries. DBpedia 3.6.

Partial DBPedia QuerySet (38 queries)

Avg. Precision Avg. Recall

MRR % of queries answered

0.634 0. 645 0.679 76%

Page 41: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Error Distribution

Page 42: A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data

Digital Enterprise Research Institute www.deri.ie

Conclusion & Future Work

The T-Space semantic model shows a promising direction for providing data model independent queries over RDF data.

Improvement of semantic tractability. The distributional semantic model supports a flexible

matching between query terms and dataset terms in a best-effort scenario.

Further improvements are needed: QA features (e.g. answer type detection, operators). User feedback mechanisms (disambiguation). Entity recognition for complex classes.