biemann ibm cog_comp_jan2015_noanim
TRANSCRIPT
![Page 1: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/1.jpg)
Cognitive Systems Institute External Speaker Series January 15, 2015
Chris Biemann [email protected]
Adaptive Natural Language Processing
![Page 2: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/2.jpg)
2
Natural Language Understanding – the key to intelligent behavior
§ Most information and knowledge is encoded in unstructured form in natural language
§ When humans learn about a new topic, they read about it – machines should do the same
§ Natural language content on the internet is growing constantly § Natural language is evolving, and natural language processing should
account for that
Cognitive computing Cognitive computing systems learn and interact naturally with people to extend what either humans or machine could do on their own. They help human experts make better decisions by penetrating the complexity of Big Data.
http://www.research.ibm.com/cognitive-computing
![Page 3: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/3.jpg)
3
Why Language is difficult ..
He sat on the river bank and counted his dough.
She went to the bank and took out some money.
![Page 4: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/4.jpg)
4
Why Language is difficult ..
He sat on the river bank and counted his dough.
She went to the bank and took out some money.
Lexical Layer
Concept Layer
![Page 5: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/5.jpg)
5
Why Language is difficult ..
He sat on the river bank and counted his dough.
She went to the bank and took out some money.
Lexical Layer
Concept Layer polysemous
![Page 6: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/6.jpg)
6
Why Language is difficult ..
He sat on the river bank and counted his dough.
She went to the bank and took out some money.
Lexical Layer
Concept Layer
synonymous polysemous
![Page 7: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/7.jpg)
7
Why Not To Use Dictionaries or Ontologies
Advantages: § Sense inventory given § Linking to concepts § Full control
Photo by zeh fernando under Creative Commons licence
http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
![Page 8: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/8.jpg)
8
Why Not To Use Dictionaries or Ontologies
Advantages: § Sense inventory given § Linking to concepts § Full control
Photo by zeh fernando under Creative Commons licence Disadvantages: • Dictionaries have to be created • Dictionaries are incomplete • Language changes constantly: new
words, new meanings …
http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
![Page 9: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/9.jpg)
9
Why Not To Use Dictionaries or Ontologies
Advantages: § Sense inventory given § Linking to concepts § Full control
Photo by zeh fernando under Creative Commons licence
“give a man a fish and you feed him for a day…
Disadvantages: • Dictionaries have to be created • Dictionaries are incomplete • Language changes constantly: new
words, new meanings …
http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
![Page 10: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/10.jpg)
10
Structure Discovery Paradigm
… teach a man to fish and you feed him for a lifetime”
Consequences: § Only raw text input required § No fine-grained control on categories § Cognitive system: learns from and adopts to data
Task
Use annotations as features
Text Data
SD algorithm
Find regularities by analysis
Annotate data with regularitiesSD algorithm
SD algorithmSD algorithms
![Page 11: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/11.jpg)
11
The JoBimText project – www.jobimtext.org Partners: § Lead at IBM: Alfio Gliozzo
IBM Watson DeepQA, Yorktown, NY, USA § Lead at TU DA: Chris Biemann
Language Technology, TU Darmstadt, Germany Software Capabilities: § Compute a Distributional Thesaurus § Compute Sense Representations § 2-Dimensional Text: Contextualized Expansion § RESTful API and Web Demo Features: § Scalable architecture § Open Source, ASL 2.0
![Page 12: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/12.jpg)
12
2D Text: Matching Meaning beyond Keywords
almost no word overlap
Where was the first professor for electric science established?
In 1883 the first faculty for electrical engineering was founded there.
![Page 13: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/13.jpg)
13
2D Text: Matching Meaning beyond Keywords
Where was the first professor for electric science established?
In 1883 the first faculty for electrical engineering was founded there. teacher professor student graduate alumnus staff campus
electric mechanical thermal electronic industrial optical automotive
science sciences biology physics economics mathematics psychology
co-found form establish own join rename bear
director emeritus dean lecturer president psychologist historian
electrical heavy-duty antique battery-powered electronic stainless diesel
biology economics sciences mathematics physics math psychology
create form set maintain found abolish strengthen
![Page 14: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/14.jpg)
14
2D Text: Matching Meaning beyond Keywords
Where was the first professor for electric science established?
In 1883 the first faculty for electrical engineering was founded there. teacher professor student graduate alumnus staff campus
electric mechanical thermal electronic industrial optical automotive
science sciences biology physics economics mathematics psychology
co-found form establish own join rename bear
director emeritus dean lecturer president psychologist historian
electrical heavy-duty antique battery-powered electronic stainless diesel
biology economics sciences mathematics physics math psychology
create form set maintain found abolish strengthen
![Page 15: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/15.jpg)
15
Sipping cappuccino ..
§ s
![Page 16: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/16.jpg)
16
.. in Milan.
§ s
![Page 17: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/17.jpg)
17
.. in Milan.
§ s
![Page 18: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/18.jpg)
18
Clustering of DT entries: Sense Induction
bright#JJ
paper#NN
C. Biemann (2006): Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems. Proceedings of the HLT-NAACL-06 Workshop on Textgraphs-06, New York, USA.
![Page 19: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/19.jpg)
19
Features for Disambiguation
paper 0 (newspaper) read#VB#-dobj 45 reading#VBG#-dobj 45 write#VB#-dobj 38 read#VBD#-dobj 37 writing#VBG#-dobj 36 wrote#VBD#-dobj 34 original#JJ#amod 27 wrote#VBD#-prep_in 26 recent#JJ#amod 26 published#VBN#partmod 25 written#VBN#-dobj 23 published#VBN#-nsubjpass 20 published#VBD#-dobj 19 copy#NN#-prep_of 18 said#VBD#-prep_in 18 author#NN#-prep_of 17 pages#NNS#-prep_of 16 told#VBD#-dobj 15 buy#VB#-dobj 14 published#VBN#-prep_in 14 page#NN#-prep_of 14
paper 1 (material) piece#NN#-prep_of 21 pieces#NNS#-prep_of 17 made#VBN#-prep_from 13 bags#NNS#-nn 11 white#JJ#amod 9 paper#NN#-conj_and 9 glass#NN#-conj_and 9 products#NNS#-nn 9 industry#NN#-nn 8 plastic#NN#conj_and 8 plastic#NN#-conj_and 8 bits#NNS#-prep_of 8 bag#NN#-nn 8 plastic#NN#conj_or 8 sheet#NN#-prep_of 7 recycled#JJ#amod 7 tons#NNS#-prep_of 7 glass#NN#conj_and 7 buy#VB#-dobj 6 plates#NNS#-nn 6 pile#NN#-prep_of 6
These are shared by paper and the cluster members. Disambiguation: find features in context. I am reading an original paper on the paper .
![Page 20: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/20.jpg)
20
§ d
Paraphrasing with JoBimText
![Page 21: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/21.jpg)
21
§ d
Paraphrasing with JoBimText
![Page 22: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/22.jpg)
22
JoBimText Model example “beetle”
S. Mitra, R. Mitra, M. Riedl, C. Biemann, A. Mukherjee, P. Goyal (2014): That’s sick dude!: Automatic identification of word sense change across different timescales. Proceedings of ACL-2014, Baltimore, MD, USA
http://www.thezooom.com/2013/01/10749/
![Page 23: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/23.jpg)
23
JoBimText Model example “beetle”
S. Mitra, R. Mitra, M. Riedl, C. Biemann, A. Mukherjee, P. Goyal (2014): That’s sick dude!: Automatic identification of word sense change across different timescales. Proceedings of ACL-2014, Baltimore, MD, USA
http://www.thezooom.com/2013/01/10749/
![Page 24: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/24.jpg)
24
Outlook: From Similarities and Relations…
Cathy liked the blue dress very much.
She bought it for 15 Euros from the shop.
gown skirt blouse
Pat Brian Kevin
red purple green
currency greenback yen
store restaurant boutique
COLOR CLOTHING FIRSTNAME
MONEY SALESPOINT
HAS-PROPERTY 1: ENTITIES 2. RELATIONS
![Page 25: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/25.jpg)
25
Sneak Preview: Induction of Relations
§ JoBimText model on pairs and paths between pairs
![Page 26: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/26.jpg)
26
… to Frames and Causality
She bought it for 15 Euros from the shop. MONEY SALESPOINT
FIRSTNAME adored CLOTHING FIRSTNAME found CLOTHING great
POSITIVE-OPINION-ABOUT
subj=FIRSTNAME obj=CLOTHING
VERKAUFSVORGANG
subj=AGENT obj=THING für=MONEY loc=SALESPOINT
FIRSTNAME
CLOTHING
Cathy
dress
Cathy
dress
3: FRAMES 4: CAUSALITY
Cathy liked the blue dress very much. COLOR CLOTHING FIRSTNAME HAS-PROPERTY
![Page 27: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/27.jpg)
27
Sneak Preview: Frame Induction
§ s
![Page 28: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/28.jpg)
28
§ JoBimText informs relation extraction significant improvements in EMRA application, e.g. for finding drug prescriptions for diseases
§ JoBimText sense clusters are being used to inform term matching e.g. when finding justifications for answers
§ JoBimText is one of the solutions for knowledge induction from text in new domains
Applications of JoBimText in IBM Watson
![Page 29: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/29.jpg)
29
Conclusion
§ The role of Natural Language Processing in Cognitive Computing is two-fold: § the technology for natural interaction with the system § a technology subject to be framed in the cognitive paradigm
![Page 30: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/30.jpg)
30
Conclusion
§ The role of Natural Language Processing in Cognitive Computing is two-fold: § the technology for natural interaction with the system § a technology subject to be framed in the cognitive paradigm
§ Adaptive Natural Language Processing § makes use of static AND dynamically generated resources § is driven by (text) data that defines its application domain § accounts for language evolution and new meanings by adaptation
to the data § beyond NLP pipelines
![Page 31: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/31.jpg)
31
Thanks..
.. and now some (deep) QA!
www.jobimtext.org
Special Track: Semantic and Cognitive Computing
![Page 32: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/32.jpg)
32
![Page 33: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/33.jpg)
33
The @-ing (‘holing’) operation: producing pairs of Jos and Bims
SENTENCE: I suffered from a cold and took aspirin.
STANFORD COLLAPSED DEPENDENCIES: nsubj(suffered, I); nsubj(took, I); root(ROOT, suffered); det(cold, a); prep_from(suffered, cold); conj_and(suffered, took); dobj(took, aspirin) WORD-CONTEXT PAIRS: suffered nsubj(@@, I) 1 took nsubj(@@, I) 1 cold det(@@, a) 1 suffered prep_from(@@, cold) 1 suffered conj_and(@@, took) 1 took dobj(@@, aspirin) 1
I nsubj(suffered, @@) 1 I nsubj(took, @@) 1 a det(cold, @@) 1 cold prep_from(suffered, @@) 1 took conj_and(suffered, @@) 1 aspirin dobj(took, @@) 1
http://nlp.stanford.edu:8080/parser/
Jo Bim
![Page 34: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/34.jpg)
34
Distributional Thesaurus (DT)
§ Computed from distributional similarity statistics § Entry for a target word consists of a ranked list of neighbors meeting meeting 288 meetings 102 hearing 89 session 68 conference 62 summit 51 forum 46 workshop 46 hearings 46 ceremony 45 sessions 41 briefing 40 event 40 convention 38 gathering 36 ...
articulate articulate 89 explain 19 understand 17 communicate 17 defend 16 establish 15 deliver 14 evaluate 14 adjust 14 manage 13 speak 13 change 13 answer 13 maintain 13 ...
immaculate amod(condition,@@)
perfect amod(timing,@@)
nsubj(@@,hair)
cop(@@,remains)
First order
immaculate perfect
Second order
3
amod(Church,@@)
![Page 35: Biemann ibm cog_comp_jan2015_noanim](https://reader030.vdocuments.net/reader030/viewer/2022032503/55bebd4bbb61eba41d8b47e4/html5/thumbnails/35.jpg)
35
Scaling Computation with MapReduce Roomano is a hard Gouda-like cheese from Friesland in the northern part of The Netherlands. It pairs well with aged sherries ...
FreqSig t: min freq s: min sign
Holing using gramm. relations
word feature t hard#a cheese#ADJ_MODn 17 cheese#n Gouda-like#ADJ_MODa 5 cheese#n hard#ADJ_MODa 17 pair#v well#ADV_MODa 3 ... .... ...
word feature s hard#a cheese#ADJ_MODn 15.8 cheese#n Gouda-like#ADJ_MODa 7.6 cheese#n hard#ADJ_MODa 0.4 ... .... ...
AggrPerFt feature words cheese#ADJ_MODn hard#a, yellow#a, French#a hard#ADJ_MODa cheese#n, stone#n ... .... ...
SimCounts w: weighting for # words/ feature
word word w.sum hard#a yellow#a 0.234 yellow#a hard#a 0.234 cheese#n stone#n 3.14 ... .... ...
PruneGraph p: max number of features per word ; s
(like data below)
Convert sum threshold
ibm i.b.m. 164 intel 154 hewlett-packard 151 dell 141 cisco 134 microsoft 125 hp 124 green: Steps blue: Parameters