searching linked data with spinque
TRANSCRIPT
Searching Linked Data with Spinque
Michiel Hildebrand, Wouter Alink, Roberto Cornacchia, Arjen de Vries
Search Engines Amsterdam, January 30 2015
background
concept
product
Information Retrieval and DB integrationCornacchia et al. Flexible and efficient IR using Array Databases. VLDB‘08 JournalMühleisen et al. Column Stores as an IR Prototyping Tool. ECIR’14 & SIGIR’14
Search by StrategyAlink et al. Searching CLEF-IP by strategy. CLEF’09PatOlympics, 2010 and 2011
Tailored access to connected datasetsKoninklijke Bibliotheek, Wageningen Universiteit, Beeld&Geluid, Elsevier, Heineken, ...
Heterogenous Data
Hang Li et al. A new approach to intranet search based on information extraction. CIKM’05
Complex information needs
SQL
CSV
XML
HTML
OAI
JSON
Heterogenous University Data
Financial administration (ERP)Contract administration (CMS)Contract documents (CMS attachments)Publication database (Institutional Repository)Publication documents (Institutional Repository PDFs)Employee database (address lists, ERP+CMS)Companies (CMS + ERP + document mentions)Subsidy database (CMS)Departments (address lists, CMS)Web addresses (extracted from documents)Topic (assigned to publications)Research programmes (dependent on funding scheme)
Complex information needs
What funding schemes are the primary source of income?Can we move to Europe when Dutch funding dries up?
Who has active relations with partner X?“Valorisation”; new national funding requirementsWhat industry sectors do we depend upon?How many projects in smart cities? Green energy? Cloud computing? Etc.How are strategic decisions implemented?Has objective “move from Telecom toward ICT” been achieved, and how does it develop over time?
Heterogenous University Data
Harvest and link data, model as a graph
Complex information needs
Search by Strategy
Project by topic
Search in attachments of
projects
Search for project
contracts (by metadata)
Traverse from attachments to
projects & combine results
Topic expert
Search objects about topic
Expand with neighbours in and out
Return related persons Ranked by tf-idf on relations
Norbert Fuhr, Thomas Rölleke. A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems (1994)
API
STRATEGY EDITOR COMPILERINDEXING PIPELINE
SQL
CSV
HTML
OAI
XML
APPLICATIONS
Search by Strategy(visual) modelling of search processes
Rank. Everything. Always.all-round probabilistic search
Many strategies, one data modelmany search engines, one index
Components Supporting the Open Data Exploitation
APISQL
CSV
HTML
OAI
XML
STRATEGY EDITOR COMPILER APPLICATIONSINDEXING PIPELINE
Application front-end 400 lines of Javascript
autocompletion
Application back-end 3 search strategies
location search location + text search
API Builder for Open Data?
Supporting (search) application developersGregory Grefenstette. Search-based applications. 2010Jamie Callan. Search Engine Support For Software Applications. CIKM 2010 Keynote
Who builds search strategies?Developers are not IR specialistsDomain specialists neither
How to handle schema-mess?in a heterogeneous dataspace
Happy alignments are all alike,every unhappy alignment is unhappy in it’s own way
Jacco van Ossenbruggen 2012 (improvisation on Anna Karenina, Leo Tolstoy 1887)
Alignment strategies
Interactive vocabulary alignment, Jacco van Ossenbruggen, Michiel Hildebrand, Victor de Boer, TPDL 2011
Coming soon
Spinque Alignment ServiceBeeld&Geluid, Naturalis, Rijksdienst Cultureel Erfgoed (RCE)