enhancing named entity recognition in twitter messages using entity linking

19
Enhancing Named Entity Recognition in Twitter Messages Using Entity Linking Ikuya Yamada 1,2,3 Hideaki Takeda 3 Yoshiyasu Takefuji 2 1 Studio Ousia 2 Keio University 3 National Institute of Informatics 15731日金曜日

Upload: ikuya-yamada

Post on 14-Aug-2015

1.600 views

Category:

Software


1 download

TRANSCRIPT

Enhancing Named Entity Recognition inTwitter Messages Using Entity Linking

Ikuya Yamada1,2,3 Hideaki Takeda3 Yoshiyasu Takefuji2

1Studio Ousia 2Keio University 3National Institute of Informatics

15年7月31日金曜日

STUDIO OUSIA

Background

‣ Twitter NER is difficult because of the noisy, short, and colloquial nature of tweets

‣ The performance of standard NER software suffers significantly

2

15年7月31日金曜日

STUDIO OUSIA

Entity Linking

3

New Frozen Boutique to Open at Disney's Hollywood Studios

/wiki/Frozen_(2013_film)/wiki/The_Walt_Disney_Company /wiki/Disney’s_Hollywood_Studios

‣ Entity Linking: The task of linking entity mentions to entries in a knowledge base (KB) (e.g., Wikipedia)

‣ Recently entity linking has received considerable attention✦ Many research papers (2006-) [Cucerzan 2007, Milne et al. 2008, etc.]

✦ Competitions (TAC KBP, ERD@SIGIR, #Microposts@WWW, etc.)

15年7月31日金曜日

STUDIO OUSIA

Can we enhance Twitter NERby using entity linking?

4

15年7月31日金曜日

STUDIO OUSIA 5

New Frozen Boutique to Open at Disney's Hollywood Studios

Detecting “Frozen” from this tweet is difficult

15年7月31日金曜日

STUDIO OUSIA

Entity Linking

6

New Frozen Boutique to Open at Disney's Hollywood Studios

/wiki/Frozen_(2013_film)/wiki/The_Walt_Disney_Company /wiki/Disney’s_Hollywood_Studios

‣ By using entity linking, we can detect “Frozen”:✦ “Frozen” is a very popular entity (from Wikipedia link

structure and page view count)

✦ “Frozen” is semantically related to the context entities

15年7月31日金曜日

STUDIO OUSIA

Our Approach

‣ Our system first performs entity linking in an end-to-end manner

‣ Detected entity mentions are used to enhance the NER tasks

‣ The data of entities are extracted from several open knowledge bases (Wikipedia, DBpedia, Freebase)

‣ Segmentation and classification tasks are addressed by using separate components

7

End-to-EndEntity Linking

Segmentation(NER)

Classification(NER)

15年7月31日金曜日

End-to-End Entity LinkingEnd-to-End

Entity LinkingSegmentation

(NER)Classification

(NER)

15年7月31日金曜日

STUDIO OUSIA

End-to-End Entity Linking

‣ An entity linking system specifically designed for tweets✦ Does not depend on NER to detect entity mentions (considering all

possible n-grams as mention candidates)✦ Based on supervised machine-learning (random forest) using various kinds

of features (trained using #Microposts2015 dataset)✦ Winner of a recent Twitter entity linking competition called

#Microposts2015 NEEL Challenge at WWW2015

‣ For further details, please refer to:Yamada et al, An End-to-End Entity Linking Approach for Tweetsin Proceedings of #Microposts 2015

9

Image taken from NEEL2015 Challenge Summary: http://www.slideshare.net/giusepperizzo/neel2015-challenge-summary

15年7月31日金曜日

Segmentation of Named EntitiesEnd-to-End

Entity LinkingSegmentation

(NER)Classification

(NER)

15年7月31日金曜日

STUDIO OUSIA

Segmentation: Approach

‣ Supervised machine-learning is used to assign a binary label to each of possible n-grams

‣ Random forest is used as the machine-learning algorithm

‣ Overlaps of mentions are resolved by iteratively selecting the longest entity mention from the beginning of the tweet

‣ Machine-learning features can be classified as follows:✦ Entity-based features✦ Linguistic features

11

15年7月31日金曜日

STUDIO OUSIA

Segmentation: Entity-based Features

‣ The relevance score assigned by the entity linking system

‣ The popularity of the entity:✦ The number of inbound links of the entity in

Wikipedia

✦ The average page view count of the Wikipedia entity

‣ Mention statistics in Wikipedia:✦ Link probability✦ Capitalization probability

12

15年7月31日金曜日

STUDIO OUSIA

Segmentation: Link Probability Feature

13

Her public image is associated with Japan's kawaisa

culture centered in Harajuku, Tokyo

Takeshita Street is a street lined with

fashion boutiques, and cafes in Harajuku

in Tokyo, Japan.

Department Store and Museum is a department store

located in the Harajuku...

Takeshita Street Kyary Pamyu Pamyu Laforet

Link Plain text

LINK_PROBABILITY(Harajuku) = 2/3

15年7月31日金曜日

STUDIO OUSIA

Segmentation: Linguistic Features

‣ Whether or not Stanford NER detects the mention

‣ Part-of-speech tags of the current and surrounding words

‣ Whether or not the current and surrounding words are capitalized

‣ Mention length (# of words, # of characters)

14

15年7月31日金曜日

Classification of Named EntitiesEnd-to-End

Entity LinkingSegmentation

(NER)Classification

(NER)

15年7月31日金曜日

STUDIO OUSIA

Classification‣ Supervised machine-learning is used to classify detected

mentions into the predefined types

‣ Linear SVM is used as the machine-learning algorithm

‣ Main machine-learning features:✦ Entity types in knowledge bases

(DBpedia Ontology Classes and Freebase Types)✦ Entity type detected by Stanford NER

(i.e., PERSON, ORGANIZATION, LOCATION)✦ The average of vectors of words in the n-gram using

Stanford GloVe word embeddings (840B model)✦ The relevance score assigned by entity linking

16

15年7月31日金曜日

STUDIO OUSIA

Results

‣ Our method outperformed the 2nd-ranked method by 10.34 F1 at the segmentation task and by 5.01 F1 at the end-to-end task!

17

Performances of the proposed systems at segmenting entities

Performances of the proposed systems at both segmentation and classification tasks

15年7月31日金曜日

STUDIO OUSIA

Conclusion

‣ Twitter NER can be enhanced by using entity linking

‣ Entity linking enables us to use quality data in knowledge bases for NER tasks

18

15年7月31日金曜日

THANK YOU!

15年7月31日金曜日