presentation at nldb 2012
TRANSCRIPT
![Page 1: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/1.jpg)
Two-stage Named Entity Recognition usingaveraged perceptrons
Lars Buitinck Maarten Marx
Information and Language Processing SystemsInformatics Institute
University of Amsterdam
17th Int’l Conf. on Applications of NLP to InformationSystems
Buitinck, Marx Two-stage NER
![Page 2: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/2.jpg)
Outline
Buitinck, Marx Two-stage NER
![Page 3: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/3.jpg)
Named Entity Recognition
Find names in text and classify them as belonging topersons, locations, organizations, events, products or“miscellaneous”Use machine learning
Buitinck, Marx Two-stage NER
![Page 4: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/4.jpg)
Named Entity Recognition
Find names in text and classify them as belonging topersons, locations, organizations, events, products or“miscellaneous”Use machine learning
Buitinck, Marx Two-stage NER
![Page 5: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/5.jpg)
Named Entity Recognition for Dutch
State of the art algorithm for Dutch by Desmet and Hoste(2011); voting classifiers with GA to train weightsGood training sets are just becoming availableMany practitioners retrain Stanford CRF-NER tagger
Buitinck, Marx Two-stage NER
![Page 6: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/6.jpg)
Named Entity Recognition for Dutch
State of the art algorithm for Dutch by Desmet and Hoste(2011); voting classifiers with GA to train weightsGood training sets are just becoming availableMany practitioners retrain Stanford CRF-NER tagger
Buitinck, Marx Two-stage NER
![Page 7: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/7.jpg)
Named Entity Recognition for Dutch
State of the art algorithm for Dutch by Desmet and Hoste(2011); voting classifiers with GA to train weightsGood training sets are just becoming availableMany practitioners retrain Stanford CRF-NER tagger
Buitinck, Marx Two-stage NER
![Page 8: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/8.jpg)
Overview
Realize that NER is two problems in one: recognition andclassificationPipeline solution with two classifiersUse custom feature sets for eachDo not used precompiled list of names (“gazetteer”)Work at the sentence level (because of how training setsare set up)
Buitinck, Marx Two-stage NER
![Page 9: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/9.jpg)
Overview
Realize that NER is two problems in one: recognition andclassificationPipeline solution with two classifiersUse custom feature sets for eachDo not used precompiled list of names (“gazetteer”)Work at the sentence level (because of how training setsare set up)
Buitinck, Marx Two-stage NER
![Page 10: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/10.jpg)
Overview
Realize that NER is two problems in one: recognition andclassificationPipeline solution with two classifiersUse custom feature sets for eachDo not used precompiled list of names (“gazetteer”)Work at the sentence level (because of how training setsare set up)
Buitinck, Marx Two-stage NER
![Page 11: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/11.jpg)
Overview
Realize that NER is two problems in one: recognition andclassificationPipeline solution with two classifiersUse custom feature sets for eachDo not used precompiled list of names (“gazetteer”)Work at the sentence level (because of how training setsare set up)
Buitinck, Marx Two-stage NER
![Page 12: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/12.jpg)
Overview
Realize that NER is two problems in one: recognition andclassificationPipeline solution with two classifiersUse custom feature sets for eachDo not used precompiled list of names (“gazetteer”)Work at the sentence level (because of how training setsare set up)
Buitinck, Marx Two-stage NER
![Page 13: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/13.jpg)
Recognition stage
Token-level task: is a token the Beginning of, Inside, orOutside any entity name?Features:
Word window wi−2, . . . ,wi+2POS tags for words in windowConjunction of words and POS tags in window, e.g.(wi−1,pi−1)Capitalization of tokens in window(Character) prefixes and suffixes of wi and wi−1REs for digits, Roman numerals and punctuation
Buitinck, Marx Two-stage NER
![Page 14: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/14.jpg)
Recognition stage
Token-level task: is a token the Beginning of, Inside, orOutside any entity name?Features:
Word window wi−2, . . . ,wi+2POS tags for words in windowConjunction of words and POS tags in window, e.g.(wi−1,pi−1)Capitalization of tokens in window(Character) prefixes and suffixes of wi and wi−1REs for digits, Roman numerals and punctuation
Buitinck, Marx Two-stage NER
![Page 15: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/15.jpg)
Recognition stage
Token-level task: is a token the Beginning of, Inside, orOutside any entity name?Features:
Word window wi−2, . . . ,wi+2POS tags for words in windowConjunction of words and POS tags in window, e.g.(wi−1,pi−1)Capitalization of tokens in window(Character) prefixes and suffixes of wi and wi−1REs for digits, Roman numerals and punctuation
Buitinck, Marx Two-stage NER
![Page 16: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/16.jpg)
Recognition stage
Token-level task: is a token the Beginning of, Inside, orOutside any entity name?Features:
Word window wi−2, . . . ,wi+2POS tags for words in windowConjunction of words and POS tags in window, e.g.(wi−1,pi−1)Capitalization of tokens in window(Character) prefixes and suffixes of wi and wi−1REs for digits, Roman numerals and punctuation
Buitinck, Marx Two-stage NER
![Page 17: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/17.jpg)
Recognition stage
Token-level task: is a token the Beginning of, Inside, orOutside any entity name?Features:
Word window wi−2, . . . ,wi+2POS tags for words in windowConjunction of words and POS tags in window, e.g.(wi−1,pi−1)Capitalization of tokens in window(Character) prefixes and suffixes of wi and wi−1REs for digits, Roman numerals and punctuation
Buitinck, Marx Two-stage NER
![Page 18: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/18.jpg)
Recognition stage
Token-level task: is a token the Beginning of, Inside, orOutside any entity name?Features:
Word window wi−2, . . . ,wi+2POS tags for words in windowConjunction of words and POS tags in window, e.g.(wi−1,pi−1)Capitalization of tokens in window(Character) prefixes and suffixes of wi and wi−1REs for digits, Roman numerals and punctuation
Buitinck, Marx Two-stage NER
![Page 19: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/19.jpg)
Recognition stage
Token-level task: is a token the Beginning of, Inside, orOutside any entity name?Features:
Word window wi−2, . . . ,wi+2POS tags for words in windowConjunction of words and POS tags in window, e.g.(wi−1,pi−1)Capitalization of tokens in window(Character) prefixes and suffixes of wi and wi−1REs for digits, Roman numerals and punctuation
Buitinck, Marx Two-stage NER
![Page 20: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/20.jpg)
Recognition stage
Token-level task: is a token the Beginning of, Inside, orOutside any entity name?Features:
Word window wi−2, . . . ,wi+2POS tags for words in windowConjunction of words and POS tags in window, e.g.(wi−1,pi−1)Capitalization of tokens in window(Character) prefixes and suffixes of wi and wi−1REs for digits, Roman numerals and punctuation
Buitinck, Marx Two-stage NER
![Page 21: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/21.jpg)
Classification stage
Don’t do this at token-level; we know the entity spans!Input is a list of tokens considered an entity by therecognition stageFeatures:
The tokens we got from recognitionThe four surrounding tokensTheir pre- and suffixes up to length fourCapitalization pattern, as a string on the alphabet (L|U|O)∗The occurrence of capitalized tokens, digits and dashes inthe entire sentence
Buitinck, Marx Two-stage NER
![Page 22: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/22.jpg)
Classification stage
Don’t do this at token-level; we know the entity spans!Input is a list of tokens considered an entity by therecognition stageFeatures:
The tokens we got from recognitionThe four surrounding tokensTheir pre- and suffixes up to length fourCapitalization pattern, as a string on the alphabet (L|U|O)∗The occurrence of capitalized tokens, digits and dashes inthe entire sentence
Buitinck, Marx Two-stage NER
![Page 23: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/23.jpg)
Classification stage
Don’t do this at token-level; we know the entity spans!Input is a list of tokens considered an entity by therecognition stageFeatures:
The tokens we got from recognitionThe four surrounding tokensTheir pre- and suffixes up to length fourCapitalization pattern, as a string on the alphabet (L|U|O)∗The occurrence of capitalized tokens, digits and dashes inthe entire sentence
Buitinck, Marx Two-stage NER
![Page 24: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/24.jpg)
Classification stage
Don’t do this at token-level; we know the entity spans!Input is a list of tokens considered an entity by therecognition stageFeatures:
The tokens we got from recognitionThe four surrounding tokensTheir pre- and suffixes up to length fourCapitalization pattern, as a string on the alphabet (L|U|O)∗The occurrence of capitalized tokens, digits and dashes inthe entire sentence
Buitinck, Marx Two-stage NER
![Page 25: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/25.jpg)
Classification stage
Don’t do this at token-level; we know the entity spans!Input is a list of tokens considered an entity by therecognition stageFeatures:
The tokens we got from recognitionThe four surrounding tokensTheir pre- and suffixes up to length fourCapitalization pattern, as a string on the alphabet (L|U|O)∗The occurrence of capitalized tokens, digits and dashes inthe entire sentence
Buitinck, Marx Two-stage NER
![Page 26: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/26.jpg)
Classification stage
Don’t do this at token-level; we know the entity spans!Input is a list of tokens considered an entity by therecognition stageFeatures:
The tokens we got from recognitionThe four surrounding tokensTheir pre- and suffixes up to length fourCapitalization pattern, as a string on the alphabet (L|U|O)∗The occurrence of capitalized tokens, digits and dashes inthe entire sentence
Buitinck, Marx Two-stage NER
![Page 27: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/27.jpg)
Classification stage
Don’t do this at token-level; we know the entity spans!Input is a list of tokens considered an entity by therecognition stageFeatures:
The tokens we got from recognitionThe four surrounding tokensTheir pre- and suffixes up to length fourCapitalization pattern, as a string on the alphabet (L|U|O)∗The occurrence of capitalized tokens, digits and dashes inthe entire sentence
Buitinck, Marx Two-stage NER
![Page 28: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/28.jpg)
Classification stage
Don’t do this at token-level; we know the entity spans!Input is a list of tokens considered an entity by therecognition stageFeatures:
The tokens we got from recognitionThe four surrounding tokensTheir pre- and suffixes up to length fourCapitalization pattern, as a string on the alphabet (L|U|O)∗The occurrence of capitalized tokens, digits and dashes inthe entire sentence
Buitinck, Marx Two-stage NER
![Page 29: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/29.jpg)
Learning algorithm
Use averaged perceptron for both stagesLearns an approximation of max-margin solution (linearSVM)40 iterationsUsed the LBJ machine learning toolkit
Buitinck, Marx Two-stage NER
![Page 30: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/30.jpg)
Learning algorithm
Use averaged perceptron for both stagesLearns an approximation of max-margin solution (linearSVM)40 iterationsUsed the LBJ machine learning toolkit
Buitinck, Marx Two-stage NER
![Page 31: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/31.jpg)
Learning algorithm
Use averaged perceptron for both stagesLearns an approximation of max-margin solution (linearSVM)40 iterationsUsed the LBJ machine learning toolkit
Buitinck, Marx Two-stage NER
![Page 32: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/32.jpg)
Learning algorithm
Use averaged perceptron for both stagesLearns an approximation of max-margin solution (linearSVM)40 iterationsUsed the LBJ machine learning toolkit
Buitinck, Marx Two-stage NER
![Page 33: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/33.jpg)
Evaluation
Aim for F1 score, as defined in the CoNLL 2002 sharedtask on NERTwo corpora: CoNLL 2002 and a subset of SoNaR(courtesy Desmet and Hoste)Compare against Stanford and Desmet and Hoste’salgorithm
Buitinck, Marx Two-stage NER
![Page 34: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/34.jpg)
Evaluation
Aim for F1 score, as defined in the CoNLL 2002 sharedtask on NERTwo corpora: CoNLL 2002 and a subset of SoNaR(courtesy Desmet and Hoste)Compare against Stanford and Desmet and Hoste’salgorithm
Buitinck, Marx Two-stage NER
![Page 35: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/35.jpg)
Evaluation
Aim for F1 score, as defined in the CoNLL 2002 sharedtask on NERTwo corpora: CoNLL 2002 and a subset of SoNaR(courtesy Desmet and Hoste)Compare against Stanford and Desmet and Hoste’salgorithm
Buitinck, Marx Two-stage NER
![Page 36: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/36.jpg)
Results on CoNLL 2002
309.686 tokens containing 19901 names, four categories65% training, 22% validation and 12% test setsStanford achieves F1 = 74.72; "miscellaneous" category ishard (< 0.7)We achieve F1 = 75.14; "organization" category is hard
Buitinck, Marx Two-stage NER
![Page 37: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/37.jpg)
Results on CoNLL 2002
309.686 tokens containing 19901 names, four categories65% training, 22% validation and 12% test setsStanford achieves F1 = 74.72; "miscellaneous" category ishard (< 0.7)We achieve F1 = 75.14; "organization" category is hard
Buitinck, Marx Two-stage NER
![Page 38: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/38.jpg)
Results on CoNLL 2002
309.686 tokens containing 19901 names, four categories65% training, 22% validation and 12% test setsStanford achieves F1 = 74.72; "miscellaneous" category ishard (< 0.7)We achieve F1 = 75.14; "organization" category is hard
Buitinck, Marx Two-stage NER
![Page 39: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/39.jpg)
Results on CoNLL 2002
309.686 tokens containing 19901 names, four categories65% training, 22% validation and 12% test setsStanford achieves F1 = 74.72; "miscellaneous" category ishard (< 0.7)We achieve F1 = 75.14; "organization" category is hard
Buitinck, Marx Two-stage NER
![Page 40: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/40.jpg)
Results on SoNaR
New, large corpus with manual annotationsUsed a 200k tokens subset of a preliminary version,three-fold cross validationState of the art is Desmet and Hoste (2011) withF1 = 84.44Best individual classifier from that paper (CRF) gets 83.77Our system: 83.56Here, “product” and “miscellaneous” categories are hard
Buitinck, Marx Two-stage NER
![Page 41: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/41.jpg)
Results on SoNaR
New, large corpus with manual annotationsUsed a 200k tokens subset of a preliminary version,three-fold cross validationState of the art is Desmet and Hoste (2011) withF1 = 84.44Best individual classifier from that paper (CRF) gets 83.77Our system: 83.56Here, “product” and “miscellaneous” categories are hard
Buitinck, Marx Two-stage NER
![Page 42: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/42.jpg)
Results on SoNaR
New, large corpus with manual annotationsUsed a 200k tokens subset of a preliminary version,three-fold cross validationState of the art is Desmet and Hoste (2011) withF1 = 84.44Best individual classifier from that paper (CRF) gets 83.77Our system: 83.56Here, “product” and “miscellaneous” categories are hard
Buitinck, Marx Two-stage NER
![Page 43: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/43.jpg)
Results on SoNaR
New, large corpus with manual annotationsUsed a 200k tokens subset of a preliminary version,three-fold cross validationState of the art is Desmet and Hoste (2011) withF1 = 84.44Best individual classifier from that paper (CRF) gets 83.77Our system: 83.56Here, “product” and “miscellaneous” categories are hard
Buitinck, Marx Two-stage NER
![Page 44: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/44.jpg)
Results on SoNaR
New, large corpus with manual annotationsUsed a 200k tokens subset of a preliminary version,three-fold cross validationState of the art is Desmet and Hoste (2011) withF1 = 84.44Best individual classifier from that paper (CRF) gets 83.77Our system: 83.56Here, “product” and “miscellaneous” categories are hard
Buitinck, Marx Two-stage NER
![Page 45: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/45.jpg)
Results on SoNaR
New, large corpus with manual annotationsUsed a 200k tokens subset of a preliminary version,three-fold cross validationState of the art is Desmet and Hoste (2011) withF1 = 84.44Best individual classifier from that paper (CRF) gets 83.77Our system: 83.56Here, “product” and “miscellaneous” categories are hard
Buitinck, Marx Two-stage NER
![Page 46: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/46.jpg)
Conclusion
Near-state of the art performance from simple learnerswith good feature setsNo gazetteers, so should be fairly reusable(Side conclusion: SoNaR is more easily learnable thanCoNLL)
Buitinck, Marx Two-stage NER
![Page 47: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/47.jpg)
Conclusion
Near-state of the art performance from simple learnerswith good feature setsNo gazetteers, so should be fairly reusable(Side conclusion: SoNaR is more easily learnable thanCoNLL)
Buitinck, Marx Two-stage NER
![Page 48: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/48.jpg)
Conclusion
Near-state of the art performance from simple learnerswith good feature setsNo gazetteers, so should be fairly reusable(Side conclusion: SoNaR is more easily learnable thanCoNLL)
Buitinck, Marx Two-stage NER
![Page 49: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/49.jpg)
Future work
Being integrated in UvA’s xTAS text analysis pipelineUsed to find entities in Dutch Hansard corpus(forthcoming) and link entities to WikipediaFull SoNaR is now available; new evaluation needed
Buitinck, Marx Two-stage NER
![Page 50: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/50.jpg)
Future work
Being integrated in UvA’s xTAS text analysis pipelineUsed to find entities in Dutch Hansard corpus(forthcoming) and link entities to WikipediaFull SoNaR is now available; new evaluation needed
Buitinck, Marx Two-stage NER
![Page 51: Presentation at NLDB 2012](https://reader034.vdocuments.net/reader034/viewer/2022052505/554e8e6ab4c90573338b4c47/html5/thumbnails/51.jpg)
Future work
Being integrated in UvA’s xTAS text analysis pipelineUsed to find entities in Dutch Hansard corpus(forthcoming) and link entities to WikipediaFull SoNaR is now available; new evaluation needed
Buitinck, Marx Two-stage NER