enhancing named entity recognition in twitter messages using entity linking
Post on 14-Aug-2015
1.600 Views
Preview:
TRANSCRIPT
Enhancing Named Entity Recognition inTwitter Messages Using Entity Linking
Ikuya Yamada1,2,3 Hideaki Takeda3 Yoshiyasu Takefuji2
1Studio Ousia 2Keio University 3National Institute of Informatics
15年7月31日金曜日
STUDIO OUSIA
Background
‣ Twitter NER is difficult because of the noisy, short, and colloquial nature of tweets
‣ The performance of standard NER software suffers significantly
2
15年7月31日金曜日
STUDIO OUSIA
Entity Linking
3
New Frozen Boutique to Open at Disney's Hollywood Studios
/wiki/Frozen_(2013_film)/wiki/The_Walt_Disney_Company /wiki/Disney’s_Hollywood_Studios
‣ Entity Linking: The task of linking entity mentions to entries in a knowledge base (KB) (e.g., Wikipedia)
‣ Recently entity linking has received considerable attention✦ Many research papers (2006-) [Cucerzan 2007, Milne et al. 2008, etc.]
✦ Competitions (TAC KBP, ERD@SIGIR, #Microposts@WWW, etc.)
15年7月31日金曜日
STUDIO OUSIA 5
New Frozen Boutique to Open at Disney's Hollywood Studios
Detecting “Frozen” from this tweet is difficult
15年7月31日金曜日
STUDIO OUSIA
Entity Linking
6
New Frozen Boutique to Open at Disney's Hollywood Studios
/wiki/Frozen_(2013_film)/wiki/The_Walt_Disney_Company /wiki/Disney’s_Hollywood_Studios
‣ By using entity linking, we can detect “Frozen”:✦ “Frozen” is a very popular entity (from Wikipedia link
structure and page view count)
✦ “Frozen” is semantically related to the context entities
15年7月31日金曜日
STUDIO OUSIA
Our Approach
‣ Our system first performs entity linking in an end-to-end manner
‣ Detected entity mentions are used to enhance the NER tasks
‣ The data of entities are extracted from several open knowledge bases (Wikipedia, DBpedia, Freebase)
‣ Segmentation and classification tasks are addressed by using separate components
7
End-to-EndEntity Linking
Segmentation(NER)
Classification(NER)
15年7月31日金曜日
End-to-End Entity LinkingEnd-to-End
Entity LinkingSegmentation
(NER)Classification
(NER)
15年7月31日金曜日
STUDIO OUSIA
End-to-End Entity Linking
‣ An entity linking system specifically designed for tweets✦ Does not depend on NER to detect entity mentions (considering all
possible n-grams as mention candidates)✦ Based on supervised machine-learning (random forest) using various kinds
of features (trained using #Microposts2015 dataset)✦ Winner of a recent Twitter entity linking competition called
#Microposts2015 NEEL Challenge at WWW2015
‣ For further details, please refer to:Yamada et al, An End-to-End Entity Linking Approach for Tweetsin Proceedings of #Microposts 2015
9
Image taken from NEEL2015 Challenge Summary: http://www.slideshare.net/giusepperizzo/neel2015-challenge-summary
15年7月31日金曜日
Segmentation of Named EntitiesEnd-to-End
Entity LinkingSegmentation
(NER)Classification
(NER)
15年7月31日金曜日
STUDIO OUSIA
Segmentation: Approach
‣ Supervised machine-learning is used to assign a binary label to each of possible n-grams
‣ Random forest is used as the machine-learning algorithm
‣ Overlaps of mentions are resolved by iteratively selecting the longest entity mention from the beginning of the tweet
‣ Machine-learning features can be classified as follows:✦ Entity-based features✦ Linguistic features
11
15年7月31日金曜日
STUDIO OUSIA
Segmentation: Entity-based Features
‣ The relevance score assigned by the entity linking system
‣ The popularity of the entity:✦ The number of inbound links of the entity in
Wikipedia
✦ The average page view count of the Wikipedia entity
‣ Mention statistics in Wikipedia:✦ Link probability✦ Capitalization probability
12
15年7月31日金曜日
STUDIO OUSIA
Segmentation: Link Probability Feature
13
Her public image is associated with Japan's kawaisa
culture centered in Harajuku, Tokyo
Takeshita Street is a street lined with
fashion boutiques, and cafes in Harajuku
in Tokyo, Japan.
Department Store and Museum is a department store
located in the Harajuku...
Takeshita Street Kyary Pamyu Pamyu Laforet
Link Plain text
LINK_PROBABILITY(Harajuku) = 2/3
15年7月31日金曜日
STUDIO OUSIA
Segmentation: Linguistic Features
‣ Whether or not Stanford NER detects the mention
‣ Part-of-speech tags of the current and surrounding words
‣ Whether or not the current and surrounding words are capitalized
‣ Mention length (# of words, # of characters)
14
15年7月31日金曜日
Classification of Named EntitiesEnd-to-End
Entity LinkingSegmentation
(NER)Classification
(NER)
15年7月31日金曜日
STUDIO OUSIA
Classification‣ Supervised machine-learning is used to classify detected
mentions into the predefined types
‣ Linear SVM is used as the machine-learning algorithm
‣ Main machine-learning features:✦ Entity types in knowledge bases
(DBpedia Ontology Classes and Freebase Types)✦ Entity type detected by Stanford NER
(i.e., PERSON, ORGANIZATION, LOCATION)✦ The average of vectors of words in the n-gram using
Stanford GloVe word embeddings (840B model)✦ The relevance score assigned by entity linking
16
15年7月31日金曜日
STUDIO OUSIA
Results
‣ Our method outperformed the 2nd-ranked method by 10.34 F1 at the segmentation task and by 5.01 F1 at the end-to-end task!
17
Performances of the proposed systems at segmenting entities
Performances of the proposed systems at both segmentation and classification tasks
15年7月31日金曜日
STUDIO OUSIA
Conclusion
‣ Twitter NER can be enhanced by using entity linking
‣ Entity linking enables us to use quality data in knowledge bases for NER tasks
18
15年7月31日金曜日
top related