keyword-based navigation and search over the linked data web
TRANSCRIPT
Keyword-Based Navigation and Search over the Linked Data Web
Luca Matteis1, Aidan Hogan2, Roberto Navigli1
1 Sapienza University of Rome
2 University of Chile
General idea
• Browse the live linked data web using keywords
• Predicate resolution along the navigation to increase matches
• Results are streamed back to users as quickly as possible
• We measure how fast relevant triples are found at each step of the navigation
Navigation
• Navigation starts from a list of starting URIs
• Users/agents provide keywords to search against and guide the navigation
• Navigation is structured using a streaming pipeline
Search
• Search occurs at each element of the pipeline
• Several RDF keyword search algorithms can be used
• Predicate resolution is used to increase number of matches
SWGET comparison
• SWGET is an implementation of the NautiLOD navigational language
• It allows to filter (through SPARQL) triples at each step at the navigation
• We show that our pipeline streaming approach results in faster response times
Results
• Total response time is under 10 seconds (varies based on the number of keywords)
• Navigation hop time averages ~5 seconds
Discussion
• Results point to the fact that keyword-navigation is achievable, although a bit sluggish.
• Experiments were on the live linked data web! Servers optimized for concurrency and high-throughput (triple pattern fragments) might yield faster response times.
Final remarks
• Our approach incentives publishers to enrich their structured data (using predicates with meaningful descriptions)
• Concurrent resolution of many URIs at runtime to find answers to queries is becoming more and more viable; increase in bandwidth is going to make this even more usable
• Upfront querying may not be the only way we query the Web of Linked Data
Use case
director 1 triple found (view)
know suggestions known for (17) knows (6) knowledge of (5) …
Use case
director 1 triple found (view)
known for 17 triples found (view)
act suggestions actor (56) abstract (48) …
Use case
director 1 triple found (view)
known for 17 triples found (view)
actor 56 triples found (view)
Users don't have to input URIs (as they do when writing SPARQL)
Nor they have to know the exact structure of the underlying dataset
(they simply type keywords)
SELECT * { <http://viaf.org/viaf/177603646> onto:mov100 ?movement . ?movement my:lab ?label .}
http://viaf.org/viaf/177603646 / movement / name
Query federation is built-in (we're simply following links)
http://viaf.org/viaf/177603646 / movement / same as / movement of / born < 1960 / same as freebase / name
} VIAF} DBpedia} Freebase
Future work
• Develop a functioning app (browser extension or add-on to Tabulator)
• Use third-party services to assist the navigation by matching synonyms or translations (BabelNet, WordNet)
• Use other third-party services to assist in the disambiguation of words using the context of the data acquired along the navigation (Babelfy)
• Better methods for effectively crawling Linked Datasets at runtime (that don't strain servers and provide quick response times)