semantic-assisted analysis and search in customer specifications
DESCRIPTION
Talk at Future Search Engines 2014 (FoRESEE), INFORMATIK2014 (http://www.informatik2014.de/) Abstract (DE): Die gezielte Suche von Informationen in großen Dokumentenmengen ist eine der wesentlichen Herausforderungen der heutigen Zeit. In diesem Papier wird beschrieben, wie wir die Analyse von und Suche in mehrsprachigen Kundenspezifikationen in einem aktuellen Kundenprojekt im Maschinenbau realisiert haben. Im Rahmen der Dokumentenanalyse kommen computerlinguistische und semantische Technologien zum Einsatz. Basis für die Suche bildet das Paradigma des Faceted Browsing.TRANSCRIPT
Semantic-assisted Analysis and
Search in Customer Specifications
Martin Voigt, Daniel Hladky
September 2014
1
ONTOS LINKED DATA INFORMATION WORKBENCH
Extraction & Analysis
Indexing
Information &Knowledge Management
SearchEngineer
Stor
age
Sales
Po
rtal
MultilingualSpecifications
I speak about …
The Problem,
Our Solution,
Insights & Further Work.
2
The Problem
AviComp Controls GmbH
leading engineering contractor
for rotating machinery controls
3
Customers
Engineers
Sales
> 100k TechnicalSpecifications
http://www.avicomp.com/capabilities/turbo-compressor-controls.html
The Problem
Analysis: 1) task, 2) current solution, 3) ideas
Problems
Multiple, inefficient tools
Heterogeneity
Knowledge management & transfer
4
http://answerhub.com/article/
the-cost-of-knowledge-loss/
Our Solution
5
ONTOS LINKED DATA INFORMATION WORKBENCH
Extraction & Analysis
Indexing
Information &Knowledge Management
SearchEngineer
Stor
age
Sales
Po
rtal
MultilingualSpecifications
http://www.ontos.com/products/ontosldiw/
Our Solution
Extraction & Analysis
Homogenization: PDF conversion (Apache POI) &
OCR (CuneiForm)
Text extraction (Apache Tika)
Language detection (language-detection API)
Text preparation, e.g., remove headers & footers
SKOS-based concept identification
6
Lorem ipsum dolor sit amet, consetetur sadipscing
elitr, sed diam nonumy eirmod tempor
invidunt ut labore et dolore magna aliquyam
erat, sed diam voluptua. At vero eos et accusam
et justo duo dolores et ea rebum. Stet
clita kasd gubergren, no sea takimata
sanctus est Lorem ipsum dolor sit
elitr, sed diam nonumy eirmod tempor
invidunt ut labore et dolore magna aliquyam
erat, sed diam voluptua. At vero eos et accusam
et justo duo dolores et ea rebum. Stet
clita kasd gubergren, no sea takimata
elitr, sed diam nonumy eirmod tempor
invidunt ut labore et dolore magna aliquyam
erat, sed diam voluptua. At vero eos et accusam
et justo duo dolores et ea rebum. Stet
clita kasd gubergren, no sea takimata
ONTOS LINKED DATA INFORMATION WORKBENCH
Extraction & Analysis
Indexing
Information &Knowledge Management
SearchEngineer
Stor
age
Sales
Port
al
MultilingualSpecifications
Our Solution
Storage via OntoQUAD
Triple and/or QuadStore, SPARQL 1.1, …
Indexing
Full text search, result grouping, faceted browsing,
SKOS-based label expansion, …
Apache Solr with lucene-skos plugin (https://github.com/behas/lucene-SKOS)
7
ONTOS LINKED DATA INFORMATION WORKBENCH
Extraction & Analysis
Indexing
Information &Knowledge Management
SearchEngineer
Stor
age
Sales
Port
al
MultilingualSpecifications
Our Solution
Knowledge Management
via OntoDix but SKOS-only
8
ONTOS LINKED DATA INFORMATION WORKBENCH
Extraction & Analysis
Indexing
Information &Knowledge Management
SearchEngineer
Stor
age
Sales
Port
al
MultilingualSpecifications
Our Solution
Search
via AJAX Solr (https://github.com/evolvingweb/ajax-solr)
9
ONTOS LINKED DATA INFORMATION WORKBENCH
Extraction & Analysis
Indexing
Information &Knowledge Management
SearchEngineer
Stor
age
Sales
Port
al
MultilingualSpecifications
Insights & Further Work
Iterative development with early customer
testing lowers usage barrier
Lessons learned
Development of a knowledge base
Faceted search user interface
Faceted search on RDF
Multilingual disambiguation
mechanisms
10
Q&A
Martin Voigt
Ontos AG / GmbH
Nidau (CH) / Leipzig (DE)
T: +49 341 21559-10
M: +49 178 40 222 58
11
About Ontos
12
12
DoW – CTI Project
Ontos Group
Key Facts- Established 2001
- 15+ employees
- Share in Eventos RU
(30 people)
- 5± Mio CHF turnover
Industry- Media/News
- Law Enforcement
- Government
- (Russia)