keyword-based navigation and search over the linked data web

24
Keyword-Based Navigation and Search over the Linked Data Web Luca Matteis 1 , Aidan Hogan 2 , Roberto Navigli 1 1 Sapienza University of Rome 2 University of Chile

Upload: luca-matteis

Post on 28-Jul-2015

137 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Keyword-Based Navigation and Search over the Linked Data Web

Luca Matteis1, Aidan Hogan2, Roberto Navigli1

1 Sapienza University of Rome

2 University of Chile

General idea

• Browse the live linked data web using keywords

• Predicate resolution along the navigation to increase matches

• Results are streamed back to users as quickly as possible

• We measure how fast relevant triples are found at each step of the navigation

Navigation

• Navigation starts from a list of starting URIs

• Users/agents provide keywords to search against and guide the navigation

• Navigation is structured using a streaming pipeline

Search

• Search occurs at each element of the pipeline

• Several RDF keyword search algorithms can be used

• Predicate resolution is used to increase number of matches

SWGET comparison

• SWGET is an implementation of the NautiLOD navigational language

• It allows to filter (through SPARQL) triples at each step at the navigation

• We show that our pipeline streaming approach results in faster response times

SWGET comparison

Results

• Total response time is under 10 seconds (varies based on the number of keywords)

• Navigation hop time averages ~5 seconds

Discussion

• Results point to the fact that keyword-navigation is achievable, although a bit sluggish.

• Experiments were on the live linked data web! Servers optimized for concurrency and high-throughput (triple pattern fragments) might yield faster response times.

Final remarks

• Our approach incentives publishers to enrich their structured data (using predicates with meaningful descriptions)

• Concurrent resolution of many URIs at runtime to find answers to queries is becoming more and more viable; increase in bandwidth is going to make this even more usable

• Upfront querying may not be the only way we query the Web of Linked Data

Use case

Use case

Use case

dir suggestions codirector (8) redirection (4) director (1) nadir (1) …

Use case

director 1 triple found (view)

Use case

director 1 triple found (view)

know suggestions known for (17) knows (6) knowledge of (5) …

Use case

director 1 triple found (view)

known for 17 triples found (view)

Use case

director 1 triple found (view)

known for 17 triples found (view)

Use case

director 1 triple found (view)

known for 17 triples found (view)

act suggestions actor (56) abstract (48) …

Use case

director 1 triple found (view)

known for 17 triples found (view)

actor 56 triples found (view)

Users don't have to input URIs (as they do when writing SPARQL)

Nor they have to know the exact structure of the underlying dataset

(they simply type keywords)

SELECT * { <http://viaf.org/viaf/177603646> onto:mov100 ?movement . ?movement my:lab ?label .}

http://viaf.org/viaf/177603646 / movement / name

Query federation is built-in (we're simply following links)

http://viaf.org/viaf/177603646 / movement / same as / movement of / born < 1960 / same as freebase / name

} VIAF} DBpedia} Freebase

Future work

• Develop a functioning app (browser extension or add-on to Tabulator)

• Use third-party services to assist the navigation by matching synonyms or translations (BabelNet, WordNet)

• Use other third-party services to assist in the disambiguation of words using the context of the data acquired along the navigation (Babelfy)

• Better methods for effectively crawling Linked Datasets at runtime (that don't strain servers and provide quick response times)

Thanks!

@lmatteis http://lucaa.org