semashup - ensen in aimashup2014 by m.alsarem and p.portier
DESCRIPTION
TRANSCRIPT
SEMashup -Mazen Alsarem & Pierre-Edouard Portier 1
How to enhance Web snippets with Linked Data?Mazen Alsarem & Pierre-Edouard PortierLaboratory LIRIS, INSA de Lyon, France
SEMashup
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
2
Given the query: “epimenides knossos paradox”,Among the first results returned by the Google
SE, we find these snippets:
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
3
We enhance these snippets:
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
4
Our snippet highlights an alternative excerpt to better summarize the conceptual content of the document.
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
5
Alternative excerpt:
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
6
Our snippet also accentuates concepts that are present in the document and related to the user's information need as expressed by her query.
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
7
Important concepts:
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
8
After clicking the concept “Epimenides”:
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
9
Auto scrolling to an instance of the concept “Epimenides” in the underlying document:
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
10
How is it done?
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
11
A mashup of Web of Data services
We use the DBpedia Spotlight service to extract concepts from the document.
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
12
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
13
A mashup of Web of Data services
We use the DBpedia Spotlight service to extract concepts from the document.
We query a DBpedia SPARQL endpoint to find existing triples between the concepts.
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
14
dbp_res:Bertrand_Russell
dbp_res:Logic
dbp_res:Mathematics
dbp_res:Zondervan
dbp_res:Grand_Rapids,_Michigan
dbp_res:Callimachus
dbp_res:Alexandria
dbp_ont:mainInterest dbp_prop:deathPlace
dbp_prop:headquarters
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
15
In order to benefit from the Linked Data, we need to select the concepts to extend.
We propose to rank the concepts by their importance relatively to the user's information need.
To do this efficiently, we cannot rely only on the small graph we built, but we need to go back to the textual content of the document.
Therefore, we introduce a new iterative SVD algorithm.
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
16
To each concept, we associate a text made of its abstract and of the sentences of the document that contain its instances.
We build a concept-stem matrix whose entries are frequencies.
We do a first SVD decomposition.
We give more importance to the concepts and the stems close to the query, whereafter we do a second SVD decomposition.
In the reduced SVD space, we measure how the norms of the concepts and the stems evolved.
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
17
dbp:
Epim
enid
es
dbp:
Knoss
osdb
p:Par
adox
Evolution of the norms of the concepts in the reduced SVD space, between iterations 1 and 2:
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
18
The stems and the concepts that moved the most will be stressed at next iteration, the stems that nearly didn't move will be removed.
Concepts linked by a predicate to concepts elected to be stressed, will also be stressed.
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
19
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
20
We use a DBpedia SPARQL endpoint to find new triples about the most important resources.
In a pre-processing step, we kept only the DBpedia predicates that carry enough information (we discarded the predicates whose objects when concatenated had a low entropy).
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
21
In order to rank the triples of the extended graph and build the snippet, we do a tensor decomposition (CP) of the graph.
In order to take into account the types of the predicates, we choose to do a tensor decomposition instead of a decomposition of the adjacency matrix (each horizontal slice of the tensor represents the adjacency matrix for one given predicate).
SEMashup -Mazen Alsarem & Pierre-Edouard Portier
22
Thank you!
And, please, come see the live demo!
http://demo.ensen-insa.org