named entity recognitionsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… ·...
TRANSCRIPT
![Page 1: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/1.jpg)
NAMED ENTITY RECOGNITION
JACOB SU WANG OJO LABS INC.
![Page 2: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/2.jpg)
WE EXPLORE …
IN THIS TALK
• WHAT IS NER, WHAT ARE ITS APPLICATIONS
• WHAT ARE THE METHODS USED IN VARIOUS CONDITIONS
• WHAT MODEL TO USE WHEN?
• HOW DO THE MODELS WORK?
![Page 3: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/3.jpg)
WHAT IS A NAMED ENTITY?
![Page 4: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/4.jpg)
WORDS/PHRASES OF INTEREST IN TEXT
NAMED ENTITIES
• NATURAL NAMED ENTITIES
• PROPER NOUNS
• E.G. PERSON NAME (Steve Jobs), ORGANIZATION (OJO Labs), LOCATION (Austin), ETC.
• DEFINED NAMED ENTITIES
• NON-PRON WORDS/PHRASES WE DEFINE TO BE INFORMATIVE
• E.G. TIME (9pm, 1945), INDICATORS (which, where, etc.), CONTEXTUALLY-SIGNIFICANT TERMS (buffalo hunter, horse trading in Lonesome Dove).
![Page 5: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/5.jpg)
NAMED ENTITY RECOGNITION TASKpic1: https://www.semanticengine.ws/namedentityrecognitionpic2: https://www.ravn.co.uk/named-entity-recognition-ravn-part-1/
IDENTIFICATION OF WORD/PHRASES OF INTEREST
IN TEXT
![Page 6: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/6.jpg)
APPLICATIONS
![Page 7: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/7.jpg)
QUESTION-ANSWERING: LOCATE INFORMATIVE TEXT
WHY?
Q: WHEN DID ADOLF HITLER COME TO POWER?
![Page 8: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/8.jpg)
WHY?
Q: WHEN DID ADOLF HITLER COME TO POWER?
QUESTION-ANSWERING: LOCATE INFORMATIVE TEXT
![Page 9: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/9.jpg)
WHY?
Q: WHEN DID ADOLF HITLER COME TO POWER?
QUESTION-ANSWERING: LOCATE INFORMATIVE TEXT
![Page 10: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/10.jpg)
MEDICAL DIAGNOSIS: LEVERAGE OPEN DATA SOURCE
WHY?
ARE THERE ANY EVIDENCE MEDICINE X IS A GOOD TREATMENT FOR DISEASE Y?
pic: http://www.slideshare.net/larsjuhljensen/one-tagger-many-uses-illustrating-the-power-of-ontologies-in-named-entity-recognition
![Page 11: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/11.jpg)
INFO SUPPLEMENTATION: AMAZON X -RAY
WHY?pic1: http://www.blogher.com/kindle-paperwhite-smart-bitches-reviewpic2: https://adelightfulspace.wordpress.com/2015/09/14/review-amazon-kindle-
![Page 12: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/12.jpg)
INFO SUPPLEMENTATION: AMAZON X -RAY
WHY?
![Page 13: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/13.jpg)
TEXT NORMALIZATION: PROPER LEVEL OF ABSTRACTION
WHY?
WHAT DO PEOPLE CARE ABOUT?
WHAT DO COMPANIES CARE ABOUT?
![Page 14: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/14.jpg)
COOL! HOW?
![Page 15: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/15.jpg)
LOOK UP A GAZETTEER?
HOW? http://www.bph-postcodes.co.uk/licenced_products/postcode-sector-town.cgi
![Page 16: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/16.jpg)
LOOK UP A GAZETTEER?
HOW?
STRING MATCHING ONLY IS NOT GOING TO CUT! MAINLY BECAUSE OF WEAK GENERALIZATION!
![Page 17: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/17.jpg)
LOOKING-UP AIN’T GONNA CUT!
HOW?
ENGLAND: COUNTRY_NAME OR LOCATION?
1945: NUMBER OR TIME?
CHASE: PERSON_NAME OR ORGANIZATION?
ISSUE 1: AMBIGUITY
![Page 18: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/18.jpg)
LOOKING-UP AIN’T GONNA CUT!
HOW?
STREET, ST., STRT, …
UNIVERSITY OF TEXAS, UNIV TX, UT, …
NAMED ENTITY RECOGNITION, NER, …
ISSUE 2: VARIANTS
![Page 19: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/19.jpg)
LOOKING-UP AIN’T GONNA CUT!
HOW?
MOST ITEMS WILL BE OOVS!
ISSUE 3: OUT-OF-VOCAB ITEMS
ZIPF’S LAW
![Page 20: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/20.jpg)
SOLUTION: FEATURIZATION
HOW?
… W W W NAMED ENTITY W W W …MUCH RICHER INFORMATION THAN HAVING ONY THE ITEM ITSELF!!
SEMANTIC
SYNTACTIC
LEXICAL
MORPHOLOGICAL
FEATURE VECTOR OF WORD 1
![Page 21: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/21.jpg)
HOW?SOLUTION: FEATURIZATION
• E.G. DISAMBIGUATION
1945 (NUMBER): - PARSER LIKELY TO GIVE NUM/ADJ POS TAG. - IN MORE SIMILAR CONTEXT AS ARBITRARY-LENGTH DIGIT SEQUENCES AS “YYYY” FORMAT SEQUENCES.
1945 (TIME): - MORE LIKELY TO BE THE LAST ITEM BEFORE DELIMITERS (COMMA OR PERIOD). - MORE LIKELY TO HAVE PRECEDING ‘IN’.
![Page 22: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/22.jpg)
HOW?SOLUTION: FEATURIZATION
• E.G. VARIANTS
UT, UNIVERSITY OF TEXAS, UNIV TX, …
- SIMILAR COOCCURRENCE VECTORS (OVER VOCAB). - MORE SIMILAR TO EACH OTHER THAN TO OTHER NON-WORD ABBREVIATIONS (UT, UNIV TX)
![Page 23: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/23.jpg)
HOW?SOLUTION: FEATURIZATION
• E.G. OUT-OF-VOCAB ITEMS
BARFKNECHT, THORUP, PECKENPAUGH, … (RARE SURNAMES, LESS THAN 0.15% IN 100,000 PEOPLE)
- SIMILAR CONTEXTUAL DISTRIBUTION TO “JOHNSON” OR “SMITH” THAN TO RANDOM WORDS. - LIKELY TO BE TAGGED AS “PRON” BY PARSER.
![Page 24: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/24.jpg)
FEATURIZATION
![Page 25: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/25.jpg)
METHODS
FEATURIZATION
• FEATURE ENGINEERING
• DOMAIN EXPERT KNOWLEDGE
• LINGUISTIC KNOWLEDGE
• FEATURE ABSTRACTION
• “ALMOST FROM SCRATCH” WITH AUTOMATIC FEATURE DISCOVERY
![Page 26: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/26.jpg)
FEATURE ENGINEERING
… W W W NAMED ENTITY W W W …
• MORPHOLOGY: PREFIX, SUFFIX, STEM, ETC. IN A ENGLISH • anti-, con-, dis-, re-, …, -ly, -ness, …
• LEXICAL: GAZETTEER, SPELLING • {city_names}, …, capitalized, …
• SEMANTICS: COOCCURRENCE PATTERN, HEARST PATTERNS • cooccurrence counts, CITIES such as …
• SYNTAX: DEPENDENCY CONTEXT, SYNTACTIC PATH • (NE, dobj, kill), …, V->NP->N, …
![Page 27: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/27.jpg)
FEATURE ABSTRACTION
… W W W NAMED ENTITY W W W …
• MORPHOLOGY: CHARACTER EMBEDDINGS • from sequences of characters in words (CNN)
• LEXICAL & SEMANTICS: WORD EMBEDDINGS • from sequences of words in sentences (word2vec with RNN/FNN)
• SYNTAX: WORD EMBEDDING + PHRASE EMBEDDINGS • from sequences of words in sentences (vectors with RecNN)
![Page 28: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/28.jpg)
FEATURE ENGINEERING VS. FEATURE ABSTRACTIONA BIG DIFFERENCE WE CARE ABOUT: INTERPRETABILITY
THEY WORK IN THE CLASSIFICATION TASK BUT I DON’T KNOW WHAT THEY MEAN!!
![Page 29: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/29.jpg)
WHAT MODEL TO USE?
![Page 30: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/30.jpg)
NERWHAT TO USE?
![Page 31: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/31.jpg)
ALL DEPENDS ON DATA AVAILABILITY & KNOWLEDGE STATE
WHAT MODEL TO USE?
SUPERVISED SEMI-SUPERVISED UNSUPERVISED
ANNOTATION
LABELING SCHEME
![Page 32: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/32.jpg)
WHAT MODEL TO USE?OUR GENERAL OPINION
PURPOSEPERFORMANCE EXPECTATION
(ACCURACY/F1)
SUPERVISEDWEAPONIZED,
PRODUCTION-LEVEL MODELS
95%+
SEMI-SUPERVISED
EXPLORATORY MODELS PATTERN DISCOVERY
~75%
UNSUPERVISED ~65%
![Page 33: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/33.jpg)
OVERVIEW
![Page 34: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/34.jpg)
FEATURE ENGINEERING FEATURE ABSTRACTION
SUPERVISED• CONDITIONAL RANDOM
FIELDS (CRF)• RECURRENT NEURAL
NETS (RNN)
SEMI-SUPERVISED• BOOTSTRAPPING • LABEL PROPAGATION
-
UNSUPERVISED• HEARST PATTERN • EXTERNAL TAXONOMY
![Page 35: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/35.jpg)
FEATURE ENGINEERING FEATURE ABSTRACTION
SUPERVISED• CONDITIONAL RANDOM
FIELDS (CRF)• RECURRENT NEURAL
NETS (RNN)
SEMI-SUPERVISED• BOOTSTRAPPING • LABEL PROPAGATION
-
UNSUPERVISED• HEARST PATTERN • EXTERNAL TAXONOMY
![Page 36: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/36.jpg)
LINEAR-CHAIN GRAPHICAL MODEL
CRF
STATES: LATENT VARIABLES THAT TAKE LABELS AS VALUES
OBSERVATION: WORDS
![Page 37: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/37.jpg)
EXAMPLE
ADOLF HITLER CAME TO POWER IN 1933
CRF
![Page 38: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/38.jpg)
EXAMPLE
CRF
ADOLF HITLER CAME TO POWER IN 1933
![Page 39: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/39.jpg)
OBJECTIVE: FINDING THE BEST PATH!
COMPUTING FOR THE BEST SEQUENCE OF LABELS (Y’s)? EXPENSIVE!!
CRF
![Page 40: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/40.jpg)
CRFOBJECTIVE: FINDING THE BEST PATH!
![Page 41: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/41.jpg)
COMPLEXITY OF BRUTE-FORTH COMPUTATION
E.G. ATIS DATASET
- TYPICAL SENTENCE: ~15 WORDS - SIZE OF LABEL SET: 127 - POSSIBLE PATHS / LABEL SEQUENCES: 12715
CRFBRUTE FORTH?
![Page 42: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/42.jpg)
WE WANT TO BREAK THE GRAPH INTO SUBCOMPONENTS (FACTORS)
CRFFACTORIZATION
: FACTOR AT TIME T�(Xt, Yt)
HITLER CAME TO POWER IN
![Page 43: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/43.jpg)
CRFFACTORIZATION
: FACTOR AT TIME T�(Xt, Yt)
HITLER CAME TO POWER IN
<0, 1, 1, 0, 0, 0, … >
FEATURE INDICATOR FUNCTION
f1 f2 f3 f4 f5 f6, …
![Page 44: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/44.jpg)
CRFFEATURE FUNCTIONS (INDICATOR FUNCTIONS)
fk(Xt, Yt) =
(1 if �(Xt, Yt) has feature k
0 otherwise
E.G. - X_t-1 IS WORD “TO” - X_t+1 HAS POS TAG “PREP” - Y_t+1 IS LABEL “B-PER” - e.t.
�(Xt, Yt)
HITLER CAME TO POWER IN
![Page 45: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/45.jpg)
CRFBEST SEQUENCE
Y = argmax
Y
QTt exp
⇣PKk wkfk(Xt, Yt)
⌘
PY 0
QTt exp
⇣PKk wkfk(Xt, Y
0t )⌘
w_k: WEIGHT OF FEATURE FUNCTION f_k
THE PROBABILITY OF THE ENTIRE SEQUENCE PRODUCT OF “WEIGHTED FEATURE SCORE” OF FACTORS
THIS COULD BE FORMULATED INTO AN OPTIMIZATION PROBLEM AND SOLVED WITH VARIOUS ALGORITHMS!
![Page 46: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/46.jpg)
FACTORIZATION: AS YOU LIKE IT
Sutton & McCallum (2011)
CRF
![Page 47: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/47.jpg)
WE FOUND THESE MODELS LESS APPROPRIATE
CRF
HIDDEN MARKOV MODEL (HMM)
• EMPIRICALLY WEAKER PERFORMANCE • TROUBLE CAPTURING LONG-DISTANCE DEPENDENCY
MAXIMUM ENTROPY MARKOV MODEL
(MEMM)• CRF IS AN IMPROVED VERSION OF MEMM
SUPPORT VECTOR MACHINE (SVM)
• SVM, AS A BINARY CLASSIFIER, AGGREGATES ERROR!
Lafferty et al. (2001) Sutton & McCallum (2011)
![Page 48: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/48.jpg)
RECAP
SUPERVISED + FEATURE ENGINEERING
IN: ADOLF HITLER CAME TO POWER IN 1933.
OUT: B-PER I-PER O O O O B-TIME. NER
LEARNING A MAPPING
![Page 49: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/49.jpg)
TOOLBOX
SUPERVISED + FEATURE ENGINEERING
LIBRARIES
CRF• pycrfsuite: https://python-crfsuite.readthedocs.io/en/latest/ • crf++: https://taku910.github.io/crfpp/
HMM• seqlearn: https://github.com/larsmans/seqlearn • hmmlearn: https://github.com/hmmlearn/hmmlearn
MEMM • nltk: http://www.nltk.org/_modules/nltk/classify/maxent.html
SVM • sklearn: http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
![Page 50: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/50.jpg)
FEATURE ENGINEERING FEATURE ABSTRACTION
SUPERVISED• CONDITIONAL RANDOM
FIELDS (CRF)• RECURRENT NEURAL
NETS (RNN)
SEMI-SUPERVISED• BOOTSTRAPPING • LABEL PROPAGATION
-
UNSUPERVISED• HEARST PATTERN • EXTERNAL TAXONOMY
![Page 51: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/51.jpg)
LEARNING TYPE & OBJECTIVE
SUPERVISED VS. SEMI-SUPERVISED NER
LEARNING TYPE
SUPERVISEDINDUCTIVE LEARNING
LEARNING CLASSIFIER THAT INCORPORATES GENERAL RULES
SEMI-SUPERVISEDTRANSDUCTIVE LEARNING
CLASSIFY UNLABELED DATA AS SPECIFIC CASES BY THEIR SIMILARITY TO LABELED DATA
![Page 52: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/52.jpg)
SUPERVISED LEARNING: INDUCTIVE
SUPERVISED VS. SEMI-SUPERVISED NER
WE ARE LEARNING THESE PARAMETERS!
classifier(label sequences|word sequences;⇥)
![Page 53: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/53.jpg)
SEMI-SUPERVISED LEARNING: TRANSDUCTIVE
SUPERVISED VS. SEMI-SUPERVISED NER
SIMILAR WORDS SHOULD HAVE SAME LABELS
![Page 54: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/54.jpg)
BOOTSTRAPPINGALTERNATING/MUTUAL BOOTSTRAPPING
![Page 55: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/55.jpg)
BY LEXICAL FEATURES
EXTRACTION
• AUTHOR
{[A-Z][A-Za-z .,&]; [A-Za-z.]; ...}
• TITLE
{[A-Z0-9][A-Za-z0-9 .,:’#!?;&]; [A-Za-z0-9?!]}
• ...
E.G. REGEX CHARACTERIZATION
E.G. LEXICAL RULES
Brin (1999)
Collins & Singer (1999)
![Page 56: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/56.jpg)
BY CONTEXTUAL FEATURES
EXTRACTION
Riloff & Jones (1999)
![Page 57: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/57.jpg)
BY DISTRIBUTIONAL SIMILARITY
EXTRACTION
Pasca et al. (2006)
![Page 58: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/58.jpg)
BOOTSTRAPPINGALTERNATING/MUTUAL BOOTSTRAPPING
THE SET OF NAMED ENTITIES!
![Page 59: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/59.jpg)
LABEL PROPAGATION
ITEMS
SIMILARITIES
(TOKENS)
*THE ACTUAL GRAPH IS USUALLY FULLY CONNECTED, BUT NOT NECESSARILY SO IN SOME VARIANTS OF LP
![Page 60: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/60.jpg)
LABEL PROPAGATION
L_1
L_2
<0,1,1,0,0, …><1,1,0,1,1, …>
<0,1,1,1,0, …>
LABELED NODES: LABEL + FEATURE VEC UNLABELED NODES: FEATURE VEC ONLY
![Page 61: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/61.jpg)
LABEL PROPAGATION
Network graph from Mejova (2015), interpretation differs here.
LABELED ITEMS
UNLABELED ITEMS
ITEMS
SIMILARITIES
PROPAGATION
FULLY LABELED!
![Page 62: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/62.jpg)
PROCEDURE
LABEL PROPAGATION
l
l l
u
u u
Xl+u
Xl+u Xl+u
C
SIMILARITY MATRIX SOFT LABEL ASSIGNMENT DISTRIBUTION
Zhu & Ghahramani (2002)
![Page 63: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/63.jpg)
STEP 1
LABEL PROPAGATION
NEW DISTRIBUTION MATRIX <= SIMILARITY MATRIX * DISTRIBUTION MATRIX
X =
C
Xl+u
Zhu & Ghahramani (2002)
![Page 64: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/64.jpg)
STEP 1
LABEL PROPAGATION
AN INTUITIVE EXPLANATION WHY THE UPDATE WORKS
X =
C
Xl+u
Zhu & Ghahramani (2002)
INFLUENCE OF LABELED DATA ON UNLABELED DATA PROPORTIONAL TO THEIR SIMILARITY!
WORD X_i
LABELED WORD
X_jx =
![Page 65: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/65.jpg)
STEP 2
LABEL PROPAGATION
ROW NORMALIZE SIMILARITY MATRIX
EACH ROW IS A PROBABILITY DISTRIBUTION OVER CLASSES!
Xl+u
C
Zhu & Ghahramani (2002)
![Page 66: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/66.jpg)
STEP 3
LABEL PROPAGATION
ROW NORMALIZE SIMILARITY MATRIX
CLAMP/ “REPLENISH” THE LABELED DATA
C
Xl+u
Zhu & Ghahramani (2002)
![Page 67: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/67.jpg)
CONVERGENCE
LABEL PROPAGATION
UNTIL ALL THE ITEMS ARE LABELED …
C
Xl+u
Zhu & Ghahramani (2002)
![Page 68: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/68.jpg)
PROS & CONS
BOOTSTRAPPING VS. LABEL PROPAGATION
PROS CONS
BOOTSTRAPPING• EASY IMPLEMENTATION • FAST EXTRACTION
• LABOR INTENSIVE (HEAVY HUMAN SUPERVISION TO
GUARANTEE QUALITY • PRONE TO INTRODUCING
NOISE
LABEL PROPAGATION• CONVERGENCE
GUARANTEED • MORE AUTOMATED
• PARAMETERS DIFFICULT TO TUNE (PERFORMANCE DEPENDS HEAVILY ON PARAMETER SETTING)
• SLOW WITH LARGE GRAPH (SOPHISTICATED VARIANTS)
![Page 69: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/69.jpg)
RECAP
SEMI-SUPERVISED + FEATURE ENGINEERING
SIMILAR WORDS SHOULD HAVE SAME LABELS
![Page 70: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/70.jpg)
TOOLBOX
SEMI-SUPERVISED + FEATURE ENGINEERING
LIBRARIES
BOOTSTRAPPING NONE NEEDED
LABEL PROPAGATION
• sklearn: http://scikit-learn.org/stable/modules/label_propagation.html • MAD: https://github.com/psorianom/modified_adsorption
Taludkar & Crammer (2009)
![Page 71: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/71.jpg)
FEATURE ENGINEERING FEATURE ABSTRACTION
SUPERVISED• CONDITIONAL RANDOM
FIELDS (CRF)• RECURRENT NEURAL
NETS (RNN)
SEMI-SUPERVISED• BOOTSTRAPPING • LABEL PROPAGATION
-
UNSUPERVISED• HEARST PATTERN • EXTERNAL TAXONOMY
![Page 72: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/72.jpg)
OBJECTIVE
UNSUPERVISED LEARNING VS. OTHER
SUPERVISED SEMI-SUPERVISED UNSUPERVISED
ANNOTATION
LABELING SCHEME
LEARNING OBJECTIVE LABELING STUFF ALSO INDUCE A
LABELING SCHEME
![Page 73: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/73.jpg)
GENERAL IDEA
UNSUPERVISED + FEATURE ENGINEERING
INDUCING LABELING SCHEME
![Page 74: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/74.jpg)
INDUCING LABELING SCHEME
UNSUPERVISED LEARNING
HYPERNYMS
HYPONYMS
![Page 75: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/75.jpg)
UNSUPERVISED LEARNINGINDUCING LABELING SCHEME
![Page 76: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/76.jpg)
WHAT IS A HEARST PATTERN?
HEARST PATTERN BASED EXTRACTION
PARADIGM
Y such as X (LABEL CANDIDATE, NE CANDIDATE)
E.G.
CITIES such as Austin, Dallas, and Houston
Hearst (1992)
![Page 77: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/77.jpg)
HEARST PATTERN BASED EXTRACTIONSTEP 1
![Page 78: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/78.jpg)
STEP 2
HEARST PATTERN BASED EXTRACTION
LABEL(X) = argmax
YSCORE(X,Y)
Evans (2003)
THIS IS A SELECTION PROCESS FOR LABELS
![Page 79: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/79.jpg)
STEP 3
HEARST PATTERN BASED EXTRACTION
Etzioni et al. (2005)
PMI(X,Yi[X]) = PMI(Austin, CITY such as Austin)where
X = Austin
Y = CITY
Yi[X] = CITY such as Austin
Y_1 Y_2 Y_3 …. Y_k (Austin,CITY)
Y such as X
GROUNDING
![Page 80: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/80.jpg)
EXTERNAL TAXONOMY BASED EXTRACTIONEXAMPLE: WORDNET
HYPERNYM (MORE GENERAL)
HYPONYM (MORE SPECIFIC)
![Page 81: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/81.jpg)
EXTERNAL TAXONOMY BASED EXTRACTIONOBJECTIVE: FIND THE SWEET SPOT
FIND ALL CAPITALIZED WORDS / PHRASES
(OPTION: BOOTSTRAP FROM GAZETEER)
MANUALLY FIND A SET OF HYPERNYMS THAT
COVERS ALL THE NE CANDIDATES!
![Page 82: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/82.jpg)
EXTERNAL TAXONOMY BASED EXTRACTIONOBJECTIVE: FIND THE SWEET SPOT
![Page 83: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/83.jpg)
EXTERNAL TAXONOMY BASED EXTRACTIONOBJECTIVE: FIND THE SWEET SPOT
TOPIC SIGNATURE
SIG(X) = {(word, freqword
) | word in X
0s context}
![Page 84: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/84.jpg)
EXTERNAL TAXONOMY BASED EXTRACTIONOBJECTIVE: FIND THE SWEET SPOT
argmax
Y(SIM(SIG(X), SIG(Y ))) = LOCATION
SIM=230
SIM=410SIM=140
CURRENT NODE = ENTITY
![Page 85: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/85.jpg)
EXTERNAL TAXONOMY BASED EXTRACTIONOBJECTIVE: FIND THE SWEET SPOT
argmax
Y(SIM(SIG(X), SIG(Y ))) = VILLAGE
SIM=410
SIM=251
SIM=533
CURRENT NODE = LOCATION
![Page 86: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/86.jpg)
EXTERNAL TAXONOMY BASED EXTRACTIONOBJECTIVE: FIND THE SWEET SPOT
SWEET SPOT FOUND!!
argmax
Y(SIM(SIG(X), SIG(Y ))) = VILLAGE
SIM=533CURRENT NODE = VILLAGE
![Page 87: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/87.jpg)
EXTERNAL TAXONOMY BASED EXTRACTIONOBJECTIVE: FIND THE SWEET SPOT
NE CANDIDATES: MORDOR HOBBITON, HOBBIT, WIZARD, DWARF, FAIRY
Alphonseca & Manandhar (2002)
![Page 88: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/88.jpg)
RECAP
UNSUPERVISED + FEATURE ENGINEERING
INDUCING LABELING SCHEME
![Page 89: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/89.jpg)
FEATURE ENGINEERING FEATURE ABSTRACTION
SUPERVISED• CONDITIONAL RANDOM
FIELDS (CRF)• RECURRENT NEURAL
NETS (RNN)
SEMI-SUPERVISED• BOOTSTRAPPING • LABEL PROPAGATION
-
UNSUPERVISED• HEARST PATTERN • EXTERNAL TAXONOMY
![Page 90: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/90.jpg)
FEATURE ENGINEERING VS. FEATURE ABSTRACTION
FEATURE ENGINEERING
AUTOMATIC FEATURE ABSTRACTION THROUGH JOINT OPTIMIZATION
![Page 91: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/91.jpg)
• WHAT ARE EMBEDDINGS?
FEATURE ABSTRACTIONREPRESENTATION: EMBEDDINGS
REPRESENTATIONS WHICH LIVE IN A HIGH DIMENSIONAL SPACE WHERE THE DISTANCE AMONG ITEMS IS DEFINED WITH SIMILARITY OF SORTS…
![Page 92: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/92.jpg)
FEATURE ABSTRACTIONE.G. WORD EMBEDDINGS
![Page 93: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/93.jpg)
FEATURE ABSTRACTIONPIPELINE: HOW ARE EMBEDDINGS LEARNED?
ONE-HOT
![Page 94: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/94.jpg)
PREDICTION RECALIBERATION
ONE-HOT
FEATURE ABSTRACTIONJOINT OPTIMIZATION
![Page 95: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/95.jpg)
FEATURE ABSTRACTION
ONE-HOT
JOINT OPTIMIZATION
![Page 96: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/96.jpg)
MULTICHANNEL EMBEDDINGS
EMBEDDINGS COULD DRAW ON INFORMATION FROM MULTIPLE SOURCES!
MORPHOLOGICAL LEXICAL SEMANTIC SYNTACTIC
![Page 97: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/97.jpg)
dos Santos & Guimaraes (2014a)
MULTICHANNEL EMBEDDINGSEXAMPLE: CHAR-WORD JOINT FEATURIZATION
![Page 98: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/98.jpg)
EXAMPLE: CHAR-WORD JOINT FEATURIZATION
dos Santos & Guimaraes (2015)
PROJECTION
PROJECTION
EMBEDDING LV1
EMBEDDING LV2
MULTICHANNEL EMBEDDINGS
![Page 99: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/99.jpg)
ARCHITECTURE
RECURRENT NEURAL NETS
ADOLF HITLER CAME TO POWER IN 1933
B-PER I-PER O O O O B-TIME
TIME DISTRIBUTED PREDICTION
OUTPUT SEQUENCE: LABELS
INPUT SEQUENCE: WORDSTHE MODEL “REMEBERS” WHAT HAPPENED IN THE 3 PREVIOUS TIME STEPS
![Page 100: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/100.jpg)
PROCESS
RECURRENT NEURAL NETS
ADOLF
B-PER
PROJECTION TO EMBEDDING SPACE
PROJECTION TO LABEL SPACE
![Page 101: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/101.jpg)
PROCESS
RECURRENT NEURAL NETS
ADOLF HITLER
B-PER I-PER
THE PARAMETERS “REMEMBER”
ITS TRANSITIONAL HISTORY!
THE SAME HIDDEN LAYER AT DIFFERENT TIME POINTS
![Page 102: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/102.jpg)
PROCESS
RECURRENT NEURAL NETS
ADOLF HITLER CAME
B-PER I-PER O
THE PARAMETERS “REMEMBER”
ITS TRANSITIONAL HISTORY!
![Page 103: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/103.jpg)
RESULT
RECURRENT NEURAL NETS
ADOLF HITLER CAME TO POWER IN 1933
B-PER I-PER O O O O B-TIME
AT EACH TIME POINT, THE PREVIOUS HISTORY IS ENCODED IN PARAMETERS
![Page 104: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/104.jpg)
STATE-OF -THE-ART: BIDIRECTIONAL LSTM-CRF
RECURRENT NEURAL NETSClassifier
EncoderInput
Join
t Tra
inin
g
Lample et al. (2016)
![Page 105: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/105.jpg)
ClassifierEncoder
InputJo
int T
rain
ing
Lample et al. (2016)CAN ALSO BE MULTICHANNEL EMBEDDINGS!
RECURRENT NEURAL NETSSTATE-OF -THE-ART: BIDIRECTIONAL LSTM-CRF
![Page 106: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/106.jpg)
TOOLBOX
SUPERVISED NER + FEATURE ABSTRACTION
LIBRARIES
RNN
• word embeddings - pre-trained: spacy (https://spacy.io/) - create new: https://radimrehurek.com/gensim/models/word2vec.html • neural nets - Keras: https://keras.io/layers/recurrent/ - Tensorflow: https://www.tensorflow.org/tutorials/recurrent/ - Theano: http://deeplearning.net/tutorial/rnnslu.html
![Page 107: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/107.jpg)
COMPARISON
FEATURE ENGINEERING VS. FEATURE ABSTRACTION
FEATURIZATION INTERPRETABILITY
FEATURE ENGINEERING MANUAL INTERPRETABLE
FEATURE ABSTRACTION AUTOMATIC NOT
INTERPRETABLE
![Page 108: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/108.jpg)
ARE DEEP LEARNING BASED MODELS NECESSARILY BETTER?
FEATURE ENGINEERING VS. FEATURE ABSTRACTION
• CRF CONVERGES FAST • CRF IS GOOD IN LOW DATA • CRF IS MORE INTERPRETABLE • PERFORMANCE DIFFERENCE ~1%
LIME (LOCAL INTERPRETABLE MODEL-AGNOSTIC EXPLANATIONS) https://github.com/marcotcr/lime
![Page 109: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/109.jpg)
• OPTION 1 (ABSENT DOMAIN KNOWLEDGE)
• 1) UNSUPERVISED EXPLORATION
• 2) PAID LABELING, THEN SUPERVISED MODEL
• OPTION 2 (EXPERT DOMAIN KNOWLEDGE AVAILABLE)
• 1) PAID LABELING ON SMALL SET
• 2) SEMI-SUPERVISED EXPLORATION
• 3) PAID LABELING, THEN SUPERVISED MODEL
SUGGESTIONS ON MODELINGNEW DOMAIN + UNLABELED DATA
![Page 110: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/110.jpg)
PAID LABELINGTOOLS
CONFIDENT IN DOMAIN
KNOWLEDGE
MECHANICAL TURK (https://www.mturk.com/mturk)
LESS CONFIDENT IN DOMAIN
KNOWLEDGE
• CROWDFLOWER (https://www.crowdflower.com/)
• SPARE5 (https://app.spare5.com/fives)
![Page 111: NAMED ENTITY RECOGNITIONsuwangcompling.com/wp-content/uploads/2016/11/ner_overview_ver… · words/phrases of interest in text named entities • natural named entities • proper](https://reader036.vdocuments.net/reader036/viewer/2022063003/5f518f4fe8c58f31ed0b2ca2/html5/thumbnails/111.jpg)
THANK YOU!