computing semantic relatedness using wikipedia features

20
Intelligent Database Systems Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou 2013. KBS Computing semantic relatedness using Wikipedia features

Upload: azuka

Post on 24-Feb-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Computing semantic relatedness using Wikipedia features. Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb * , Mohamed Ben Aouicha , Abdelmajid Ben Hamadou 2013. KBS. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

Presenter : YAN-SHOU SIE

Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha,

Abdelmajid Ben Hamadou

2013. KBS

Computing semantic relatedness using Wikipedia features

Page 2: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

OutlinesMotivationObjectivesMethodologyExperimentsConclusionsComments

Page 3: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

Motivation

• Measuring semantic relatedness is a critical task in many domains such as psychology, biology, linguis-

tics, cognitive science and artificial intelligence.

Page 4: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

Objectives

• We propose a novel system for computing semantic relatedness between words. Recent approaches have exploited Wikipedia as a huge semantic resource that showed good performances.

Page 5: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

Methodology• Our semantic relatedness computing system– Filtering Wikipedia category graph– pre-processing• Filtering article content• Porter stemming• Weighting article stems• Providing a Category Semantic Depiction (CSD)

Page 6: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

• Different steps performed to generate the Category Semantic DepictionFiltering Wikipedia category graph

Methodology

Page 7: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

Methodology• Filtering Wikipedia category graph– First : clean meta-categories

» We remove all those nodes whose labels contain any of the following strings : Wikipedia, wikiproject, lists, mediawiki,template, user, portal, categories, articles, pages, stub and album

– Second : remove orphan nodes and we keep only the category Contents as root» maximum depth 291 to 221

Page 8: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

• pre-processing– Filtering article content

» Remove html tags,infobox, language translation, hyperlinks. . .

– Porter stemming» filtered a stop list to eliminate words which do not have any

contribution.

– Weighting article stems

– Providing a Category Semantic Depiction (CSD)

Methodology

Page 9: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

• Semantic relatedness computing system architecture– Extraction categories algorithm• WordNet:• resolve the disambiguation pages problem:

– Setp1 : extracting all outLinks– Setp2 : find links containing disambiguation tag in parenthesis– Setp3 : extract categories to the two first links – Final : take the categories of the article assigned to the first link existing in the ordered set

Methodology-

Page 10: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

Methodology• Semantic relatedness computing system

architecture– Semantic relatedness computing

Page 11: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

Methodology• Evaluating semantic relatedness measuresComparison with human judgments

Pearson product-moment correlation coefficient

Spearman rank order correlation coefficient

Datasets

Page 12: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

Experiments• Our semantic relatedness computing system modules using

Wikipedia features– Basic system– First module– Second module– Third module– Forth module

Page 13: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

Experiments• Basic system

Page 14: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

Experiments• First module: simple patterns

Page 15: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

Experiments• Second module: Wikipedia pages

Page 16: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

Experiments• Third module: enrichment using categories

neighbors in WCG

Page 17: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

Experiments• Forth module: Categories enrichment using WCG

and redirects

Page 18: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

Experiments• Application of the SR measure on other datasets– Datasets RG-65 and MC-30– The verbal dataset YP-130

• Solving word choice problems

Page 19: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

Conclusions• Our result system shows a good performance and outperforms sometimes

ESA (Explicit Semantic Analysis) and TSA (Temporal Semantic Analysis) approaches

Page 20: Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

Comments• Advantages

Able to use wiki to get a lot of semantic relationship information, semantic relations for many measurements related work of great help.

• Applications– cognitive science– artificial intelligence