kolawole john adebayo, luigi di caro and guido boella | a supervised keyphrase extraction system

15
A SUPERVISED KEYPHRASE EXTRACTION SYSTEM Semantics 2016, Leipzig Kolawole. J, Adebayo Luigi, Di Caro Guido, Boella

Upload: semanticsconference

Post on 10-Apr-2017

97 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System

A SUPERVISED KEYPHRASE EXTRACTION SYSTEM

Semantics 2016, Leipzig

Kolawole. J, AdebayoLuigi, Di CaroGuido, Boella

Page 2: Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System

Outlines

• Introduction • Related Works• Methodology• Experiments• Conclusions

Page 3: Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System

Introduction

• Keyphrases– What?– Why?

• Document Indexing , Document Summarization , Clustering and visualization

• Keyphrase Assignment Vs Keyphrase Extraction• Unsupervised Vs Supervised• Classification Vs Ranking

Page 4: Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System

Introduction

Semantic Features

Supervised KeyPhrase Extraction, Keyphrase, Keyword

Page 5: Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System

Related WorksAlgorithm Classification Features AlgorithmWitten et al (1999)

Statistical TF, TFIDF, length (bi or tri), first occurrence, node degree etc.

Naïve Bayes

P. Turney (2000)

Statistical phrase frequency, position, TF, TFIDF, n-gram Overlap, etc.

C4.5, Genex

Hulth (2003) Linguistic Lexical and syntactic features

Mihalcea and Tarau (2004)

Graph Based Unsupervised TextRank

Medelyan et al (2010)

Graph Based Statistical, lexical , syntactic features

MAUI

Page 6: Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System

Methodology

Training Document

Select Candidate

Extract Feature for Candidates

Combine features with

Classifier

Training Document

Select Candidate

Extract Feature for Candidates

Predictor

Extracted Keyphrases

Page 7: Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System

Methodology

• Candidate Selection– extracts ngrams (range = 1-4) that do not start or

end with a stopword– Candidate should not be proper nouns– Candidate should not end with adjective– Candidates could start with Abbreviation – Verbs are down-weighted

Page 8: Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System

MethodologyCategory Description

Statistical TF, TFIDF, Keyphrase Length

Positional First and last point of appearance, geographical spread e.g., upper section, mid section and lower section. Also key candidates’ span

Lexical NP, NE, Ngrams

Semantic Wikipedia Lookup (Freq in Wikipedia), does it have wikipedia page, in-out link freq on wikipedia page

Semantic LDA Topic count (T=50)

Semantic Candidate similarity to POS-filtered words (Proper Nouns, Verbs and Adjective)

Page 9: Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System

Methodology

POS filtered n-grams (2,3,4)

Candidate keyphrase

Embedding Similarity

Page 10: Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System

ResultsFeatures Dataset Precision Recall F-Measure

Meldeyan et al (2010)

Marujo 49.4 - -

Marujo et al (2013)

Marujo 55.4 - -

All-features Marujo 58.3 42.0 48.8

Selected-features

Marujo 48.7 36.5 41.7

Table 1: Evaluation result on Marujo dataset

Page 11: Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System

ResultsFeatures Dataset Precision Recall F-Measure

Selected Features

Combined 29.9 20.3 16.9

Selected Features

Reader 26.4 17.1 20.7

All-features Combined 32.7 21.0 25.5

All-features Reader 30.2 18.1 22.6

Table 2: Evaluation result on Semeval dataset

Page 12: Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System

ResultsFeatures Dataset Precision Recall F-Measure

(2,5,6,7,8,9) Combined 32.1 20.6 25.0

(1,2,5,7,8,9) Combined 31.8 20.1 24.7

(2,4,5,7,8,9) Combined 30.2 17.7 22.3

(3,4,6,7,8,9) Combined 27.4 16.3 20.4

Table 3: Ablation test on Semeval dataset

Page 13: Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System

Good or Bad?

Supervised Keyphrase Extraction, Keyphrase Extraction system, supervised machine learning, Random Forest algorithm, Feature Engineering, Candidate Word,Keyphrase Extraction, Behavioural sciences, supervised classification, Keyphrase overlap

Page 14: Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System

References• A. Hulth. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the

2003 conference on Empirical methods in natural language processing, pages 216{223. Association for Computational Linguistics, 2003.

• S. N. Kim, O. Medelyan, M.-Y. Kan, and T. Baldwin. Semeval-2010 task 5: Automatic keyphrase extraction from scientic articles. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 21-26.

Association for Computational Linguistics, 2010.• L. Marujo, A. Gershman, J. Carbonell, R. Frederking, and J. P. Neto. Supervised topical key phrase extraction

of news stories using crowdsourcing, light fltering and co-reference normalization. arXiv preprint arXiv:1306.4886, 2013.

• P. Turney. Learning to extract keyphrases from text. 1999.• Xin Jianga, Yunhua Hub, Hang Lib : A Ranking Approach to Keyphrase Extraction, 2010• T. D. Nguyen and M.-Y. Kan. Keyphrase extraction in scientic publications. In Asian Digital Libraries.

Looking Back 10 Years and Forging New Frontiers, pages 317{326. Springer, 2007.• I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning. Kea: Practical automatic

keyphrase extraction. In Proceedings of the fourth ACM conference on Digital libraries, pages 254{255. ACM, 1999.

• R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. Association for Computational Linguistics, 2004.

Page 15: Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphrase Extraction System

Conclusions

• Many Thanks For The Attention!!!