linked data-based concept recommendation: comparison of different methods in open innovation...
DESCRIPTION
Concept recommendation is a widely used technique aimed to assist users to chose the right tags, improve their Web search experience and a multi- tude of other tasks. In finding potential problem solvers in Open Innovation (OI) scenarios, the concept recommendation is of a crucial importance as it can help to discover the right topics, directly or laterally related to an innovation problem. Such topics then could be used to identify relevant experts. We pro- pose two Linked Data-based concept recommendation methods for topic dis- covery. The first one, hyProximity, exploits only the particularities of Linked Data structures, while the other one applies a well-known Information Retrieval method, Random Indexing, to the linked data. We compare the two methods against the baseline in the gold standard-based and user study-based evalu- ations, using the real problems and solutions from an OI company.TRANSCRIPT
Linked Data-‐‑based Concept Recommendation: Comparison of Different Methods in Open
Innovation Scenario Danica Damljanovic, Milan Stankovic,
Philippe Laublet
Innovation
Innovation Platforms
Challenge: Promote innovation problems to an audience of solvers who can propose relevant innovative solutions
Finding Meaningful Connec0ons
Clay mining …
Kaolinite extrac0on from
rocks …
Different communi-es use different terms and concepts to speak about seman-cally related things. Such “language” defines communi-es and separates them. Being able to find
meaningful connec-ons between concepts would enable us to build bridges between people and content.
h;p://bit.ly/hyProximity
Concept recommenda0on • Concepts you might not know but might want to use: to annotate
your content, to search for content, to search for people… • Help problem promoters discover relevant concepts (problem
promoters some0mes not field experts) • Discovery = relevance + unexpectedness
h;p://bit.ly/hyProximity
• HyProximity, a structure-based similarity • Structure-based Statistical Semantics Similarity
Random Indexing, a well-known statistical semantics from Information Retrieval to RDF
Discovering Direct and Lateral Concepts
Linked Data-‐based Concept Recommenda0on
Zemanta Textual Input
DBPedia Concepts found in the text
DBPedia Exploration suggestions
h;p://bit.ly/hyProximity
hyProximity
• We start from several seed concepts found directly in the text, and search the DBPedia graph
• The concepts found in the proximity of several seed concepts are considered more “in context” for the given input
• Concepts found at a shorter distance from the seed concepts have higher hyProximity
• Hierarchical: exploring skos:broader rela9ons • Transversal: exploring transversal links • mixed: a linear combina0on of hierarchical and transversal
Different Distance Func0ons skos:broader
other property
2 2 2 2+1
research.hypios.com/hyproximity
Paris Seine
Rivers in France Cities in France
Things in France
Products of France
Marne Chanel
Car Industry
BMW Peugeot
Different Distance Func0ons
“fashion” 1 1
research.hypios.com/hyproximity
1
Paris Seine
Rivers in France Cities in France
Things in France
Products of France
Marne
Car Industry
BMW Peugeot Chanel
flows through competitor
skos:broader
other property
famous for
• Hierarchical: exploring skos:broader rela0ons • Transversal: exploring transversal links • Mixed: a linear combina0on of hierarchical and transversal
Random Indexing • Words which appear in the similar context - with the
same set of other words - are contextually related e.g. synonyms.
• Synonyms tend not to co-occur with one another directly, so indirect inference is required to draw associations between words used to express the same idea
Two steps to Random Indexing
• Indexing o For an RDF graph, generate virtual documents o Prepare the corpus (pre-processing) o Generate semantic index
• Search - given a term X calculate a cosine similarity between the vector of that term and other vectors in the semantic space
Building context vectors
d1 0 0 -‐‑1 1 -‐‑1 1
d2 -‐‑1 1 0 0 1 -‐‑1
… dp 0 1 0 -‐‑1 -‐‑1 1
d1 d2 .. dp t1 1 2 .. 0
t2 3 0 .. 0
.. .. .. .. ..
tq 0 1 10
t1 t2 … tq
X =
Dimensionality = n
Seed length
M
D
T
Indexing: virtual documents
14
S
O2
O1
L7
P7
L3
L2
L1
P4
L4
P1
P2
P3
L8
L6
L5
P10 P9 P8
lexicalise
Representative subgraph for URI=S Virtual document for URI=S
P5 P6
P1 S P2 L2 S L1
S P3 L3
S
L5
P4 P5 L4 O1 S P4 O1 P6 S L6 S
L8
P7
P7 P9 O2
L7 P8
O2 S P7 O2 P10
S P7 O2 S P4 O1
Experiments • 26 real innovation problems from Hypios • Measure of success: the suggested concepts
appear in the actual solutions (precision, recall, f-measure)
(+) reasonable list of concepts from real scenarios (-) not complete:
o User study: measure discovery = relevance+unexpectedness
DBpedia Dataset • Select a number of properties relevant to the Open
Innovation-related scenario • dbo:product, dbp:pruducts, dbo:industry,
dbo:service, dbo:genre, and properties serving to establish a hierarchical categorization of con- cepts, namely dc:subject and skos:broader
Evaluation • “Gold standard”
o Extract problem URIs o Extract solution URIs
• Baseline: o Google Adwords Keyword Tool: finds similar
topics based on their distribution in textual corpora and the corpora of search queries.
o Suggesting up to 600 concepts which are then used for Web crawling for finding experts.
Evaluation: Results
! !
!!
User Study • Suggestions being both relevant and unexpected
o the most valuable discoveries for the user • 12 users • 34 problem evaluations
o 3060 suggested concepts/keywords.
• For the chosen innovation problem, the evaluators were presented with the lists of 30 top-ranked suggestions generated by adWords, hyProximity (mixed approach) and Random Indexing.
Example
User Study: Results
Conclusion • Linked Data valuable source of knowledge for
concept recommendation • Our two methods complementary
o hyProximity better for precision o Random Indexing better for recall
• User study: unexpectedness higher with our methods than with baseline
• Subjective user comment: o Random Indexing: generic o hyProximity: granular o adWords: redundant
Thank You! • Find out more: • http://research.hypios.com/?page_id=165
Contact us: • Danica Damljanovic @dancheeee • Milan Stankovic: @milstan