linked data-based concept recommendation: comparison of different methods in open innovation...

Linked Data-‐‑based Concept Recommendation: Comparison of Different Methods in Open

Innovation Scenario Danica Damljanovic, Milan Stankovic,

Philippe Laublet

Innovation

Innovation Platforms

Challenge: Promote innovation problems to an audience of solvers who can propose relevant innovative solutions

Finding Meaningful Connec0ons

Clay mining …

Kaolinite extrac0on from

rocks …

Different communi-es use different terms and concepts to speak about seman-cally related things. Such “language” defines communi-es and separates them. Being able to find

meaningful connec-ons between concepts would enable us to build bridges between people and content.

h;p://bit.ly/hyProximity

Concept recommenda0on •  Concepts you might not know but might want to use: to annotate

your content, to search for content, to search for people… •  Help problem promoters discover relevant concepts (problem

promoters some0mes not field experts) •  Discovery = relevance + unexpectedness


•  HyProximity, a structure-based similarity •  Structure-based Statistical Semantics Similarity

Random Indexing, a well-known statistical semantics from Information Retrieval to RDF

Discovering Direct and Lateral Concepts

Linked Data-‐based Concept Recommenda0on

Zemanta Textual Input

DBPedia Concepts found in the text

DBPedia Exploration suggestions


hyProximity

•  We start from several seed concepts found directly in the text, and search the DBPedia graph

•  The concepts found in the proximity of several seed concepts are considered more “in context” for the given input

•  Concepts found at a shorter distance from the seed concepts have higher hyProximity

•  Hierarchical: exploring skos:broader rela9ons •  Transversal: exploring transversal links •  mixed: a linear combina0on of hierarchical and transversal

Different Distance Func0ons skos:broader

other property

2 2 2 2+1

research.hypios.com/hyproximity

Paris Seine

Rivers in France Cities in France

Things in France

Products of France

Marne Chanel

Car Industry

BMW Peugeot

Different Distance Func0ons

“fashion” 1 1

research.hypios.com/hyproximity

1

Paris Seine

Rivers in France Cities in France

Things in France

Products of France

Marne

Car Industry

BMW Peugeot Chanel

flows through competitor

skos:broader

other property

famous for

•  Hierarchical: exploring skos:broader rela0ons •  Transversal: exploring transversal links •  Mixed: a linear combina0on of hierarchical and transversal

Random Indexing •  Words which appear in the similar context - with the

same set of other words - are contextually related e.g. synonyms.

•  Synonyms tend not to co-occur with one another directly, so indirect inference is required to draw associations between words used to express the same idea

Two steps to Random Indexing

•  Indexing o  For an RDF graph, generate virtual documents o  Prepare the corpus (pre-processing) o Generate semantic index

•  Search - given a term X calculate a cosine similarity between the vector of that term and other vectors in the semantic space

Building context vectors

d1 0 0 -‐‑1 1 -‐‑1 1

d2 -‐‑1 1 0 0 1 -‐‑1

… dp 0 1 0 -‐‑1 -‐‑1 1

d1 d2 .. dp t1 1 2 .. 0

t2 3 0 .. 0

.. .. .. .. ..

tq 0 1 10

t1 t2 … tq

X =

Dimensionality = n

Seed length

M

D

T

Indexing: virtual documents

14

S

O2

O1

L7

P7

L3

L2

L1

P4

L4

P1

P2

P3

L8

L6

L5

P10 P9 P8

lexicalise

Representative subgraph for URI=S Virtual document for URI=S

P5 P6

P1 S P2 L2 S L1

S P3 L3

S

L5

P4 P5 L4 O1 S P4 O1 P6 S L6 S

L8

P7

P7 P9 O2

L7 P8

O2 S P7 O2 P10

S P7 O2 S P4 O1

Experiments •  26 real innovation problems from Hypios •  Measure of success: the suggested concepts

appear in the actual solutions (precision, recall, f-measure)

(+) reasonable list of concepts from real scenarios (-) not complete:

o  User study: measure discovery = relevance+unexpectedness

DBpedia Dataset •  Select a number of properties relevant to the Open

Innovation-related scenario •  dbo:product, dbp:pruducts, dbo:industry,

dbo:service, dbo:genre, and properties serving to establish a hierarchical categorization of concepts, namely dc:subject and skos:broader

Evaluation •  “Gold standard”

o  Extract problem URIs o  Extract solution URIs

•  Baseline: o Google Adwords Keyword Tool: finds similar

topics based on their distribution in textual corpora and the corpora of search queries.

o  Suggesting up to 600 concepts which are then used for Web crawling for finding experts.

Evaluation: Results

! !

!!

User Study •  Suggestions being both relevant and unexpected

o  the most valuable discoveries for the user •  12 users •  34 problem evaluations

o  3060 suggested concepts/keywords.

•  For the chosen innovation problem, the evaluators were presented with the lists of 30 top-ranked suggestions generated by adWords, hyProximity (mixed approach) and Random Indexing.

Example

User Study: Results

Conclusion •  Linked Data valuable source of knowledge for

concept recommendation •  Our two methods complementary

o  hyProximity better for precision o  Random Indexing better for recall

•  User study: unexpectedness higher with our methods than with baseline

•  Subjective user comment: o  Random Indexing: generic o  hyProximity: granular o adWords: redundant

Thank You! •  Find out more: •  http://research.hypios.com/?page_id=165

Contact us: •  Danica Damljanovic @dancheeee •  Milan Stankovic: @milstan

linked data-based concept recommendation: comparison of different methods in open innovation...

Technology

seed concepts

lateral concepts hyproximity

concept recommenda0on

o user study

precision o random indexing

given input concepts

random indexing words

reasonable list of concepts