computing semantic relatedness using wikipedia features

Post on 24-Feb-2016

48 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Computing semantic relatedness using Wikipedia features. Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb * , Mohamed Ben Aouicha , Abdelmajid Ben Hamadou 2013. KBS. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Intelligent Database Systems Lab

Presenter : YAN-SHOU SIE

Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha,

Abdelmajid Ben Hamadou

2013. KBS

Computing semantic relatedness using Wikipedia features

Intelligent Database Systems Lab

OutlinesMotivationObjectivesMethodologyExperimentsConclusionsComments

Intelligent Database Systems Lab

Motivation

• Measuring semantic relatedness is a critical task in many domains such as psychology, biology, linguis-

tics, cognitive science and artificial intelligence.

Intelligent Database Systems Lab

Objectives

• We propose a novel system for computing semantic relatedness between words. Recent approaches have exploited Wikipedia as a huge semantic resource that showed good performances.

Intelligent Database Systems Lab

Methodology• Our semantic relatedness computing system– Filtering Wikipedia category graph– pre-processing• Filtering article content• Porter stemming• Weighting article stems• Providing a Category Semantic Depiction (CSD)

Intelligent Database Systems Lab

• Different steps performed to generate the Category Semantic DepictionFiltering Wikipedia category graph

Methodology

Intelligent Database Systems Lab

Methodology• Filtering Wikipedia category graph– First : clean meta-categories

» We remove all those nodes whose labels contain any of the following strings : Wikipedia, wikiproject, lists, mediawiki,template, user, portal, categories, articles, pages, stub and album

– Second : remove orphan nodes and we keep only the category Contents as root» maximum depth 291 to 221

Intelligent Database Systems Lab

• pre-processing– Filtering article content

» Remove html tags,infobox, language translation, hyperlinks. . .

– Porter stemming» filtered a stop list to eliminate words which do not have any

contribution.

– Weighting article stems

– Providing a Category Semantic Depiction (CSD)

Methodology

Intelligent Database Systems Lab

• Semantic relatedness computing system architecture– Extraction categories algorithm• WordNet:• resolve the disambiguation pages problem:

– Setp1 : extracting all outLinks– Setp2 : find links containing disambiguation tag in parenthesis– Setp3 : extract categories to the two first links – Final : take the categories of the article assigned to the first link existing in the ordered set

Methodology-

Intelligent Database Systems Lab

Methodology• Semantic relatedness computing system

architecture– Semantic relatedness computing

Intelligent Database Systems Lab

Methodology• Evaluating semantic relatedness measuresComparison with human judgments

Pearson product-moment correlation coefficient

Spearman rank order correlation coefficient

Datasets

Intelligent Database Systems Lab

Experiments• Our semantic relatedness computing system modules using

Wikipedia features– Basic system– First module– Second module– Third module– Forth module

Intelligent Database Systems Lab

Experiments• Basic system

Intelligent Database Systems Lab

Experiments• First module: simple patterns

Intelligent Database Systems Lab

Experiments• Second module: Wikipedia pages

Intelligent Database Systems Lab

Experiments• Third module: enrichment using categories

neighbors in WCG

Intelligent Database Systems Lab

Experiments• Forth module: Categories enrichment using WCG

and redirects

Intelligent Database Systems Lab

Experiments• Application of the SR measure on other datasets– Datasets RG-65 and MC-30– The verbal dataset YP-130

• Solving word choice problems

Intelligent Database Systems Lab

Conclusions• Our result system shows a good performance and outperforms sometimes

ESA (Explicit Semantic Analysis) and TSA (Temporal Semantic Analysis) approaches

Intelligent Database Systems Lab

Comments• Advantages

Able to use wiki to get a lot of semantic relationship information, semantic relations for many measurements related work of great help.

• Applications– cognitive science– artificial intelligence

top related