natural language inference, reading comprehension and deep...

72
Natural Language Inference, Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR 2016

Upload: others

Post on 28-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

NaturalLanguageInference,ReadingComprehensionandDeepLearning

ChristopherManning@chrmanning•@stanfordnlp

StanfordUniversitySIGIR2016

Page 2: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Machine ComprehensionTested by question answering (Burges)

“Amachinecomprehends apassageoftext if,foranyquestion regardingthattextthatcanbeansweredcorrectlybyamajorityofnativespeakers,thatmachinecanprovideastringwhichthosespeakerswouldagreebothanswersthatquestion,anddoesnotcontaininformationirrelevanttothatquestion.”

Page 3: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

IR needs language understanding

• ThereweresomethingsthatkeptIRandNLPapart• IRwasheavilyfocusedonefficiencyandscale• NLPwaswaytoofocusedonformratherthanmeaning

• Nowtherearecompellingreasonsforthemtocometogether• TakingIRprecisionandrecalltothenextlevel• [carparts forsale]• Shouldmatch:Sellingautomobileandpickupengines, transmissions

• Example fromJeffDean’sWSDM2016talk

• Informationretrieval/questionansweringinmobilecontexts• Websnippetsnolongercutitonawatch!

Page 4: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Menu

1. Naturallogic:Aweaklogicoverhumanlanguagesforinference

2. Distributedwordrepresentations

3. Deep,recursiveneuralnetworklanguageunderstanding

Page 5: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

How can information retrieval be viewed more as theorem proving (than matching)?

Page 6: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

AI2 4th Grade Science Question Answering [Angeli, Nayak, & Manning, ACL 2016]

Our“knowledge”:

Ovariesarethefemalepartoftheflower,whichproduceseggsthatareneededformakingseeds.

Thequestion:

Whichpartofaplantproducestheseeds?

Theanswerchoices:

theflowertheleavesthestemtheroots

Page 7: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

How can we represent and reason with broad-coverage knowledge?

1. Rigid-schemaknowledgebaseswithwell-definedlogicalinference

2. Open-domainknowledgebases(OpenIE)– noclearontologyorinference[Etzionietal.2007ff]

3. HumanlanguagetextKB–Norigidschema,butwith“Naturallogic”candoformalinferenceoverhumanlanguagetext [MacCartneyandManning2008]

Page 8: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Natural Language Inference[Dagan 2005, MacCartney & Manning, 2009]

Doesapieceoftextfollowsfromorcontradictanother?TwosenatorsreceivedcontributionsengineeredbylobbyistJackAbramoffinreturnforpoliticalfavors.

JackAbramoffattemptedtobribetwolegislators.

Heretrytoproveorrefuteaccordingtoalargetextcollection:1. Theflowerofaplantproducestheseeds2. Theleavesofaplantproducestheseeds3. Thestemofaplantproducestheseeds4. Therootsofaplantproducestheseeds

Follows

Page 9: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Text as Knowledge Base

Storingknowledgeastextiseasy!

Doinginferencesovertextmightbehard

Don’twanttoruninferenceovereveryfact!

Don’twanttostorealltheinferences!

Page 10: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Inferences … on demand from a query … [Angeli and Manning 2014]

Page 11: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

… using text as the meaningrepresentation

Page 12: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Natural Logic: logical inference over text

WearedoinglogicalinferenceThecatateamouse⊨ ¬Nocarnivoreseatanimals

WedoitwithnaturallogicIfImutateasentenceinthisway,doIpreserveitstruth?Post-DealIranAsksifU.S. IsStill‘GreatSatan,’orSomethingLess ⊨ACountryAsksifU.S. IsStill‘GreatSatan,’orSomethingLess

• Asoundandcompleteweaklogic[Icard andMoss2014]• Expressiveforcommonhumaninferences*• “Semantic”parsingisjustsyntacticparsing• Tractable:Polynomialtimeentailmentchecking• Playsnicelywithlexicalmatchingback-offmethods

Page 13: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

#1. Common sense reasoning

PolarityinNaturalLogic

WeorderphrasesinpartialordersSimplestone:is-a-kind-ofAlso:geographicalcontainment,etc.

Polarity:Inacertaincontext,isitvalidtomoveupordowninthisorder?

Page 14: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Example inferences

Quantifiersdeterminethepolarity ofphrases

Validmutationsconsider polarity

Successfultoyinference:• Allcatseatmice ⊨ Allhousecatsconsumerodents

Page 15: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

“Soft” Natural Logic

• Wealsowanttomakelikely(butnotcertain)inferences• SamemotivationasMarkovlogic,probabilisticsoftlogic,

etc.• Eachmutationedgetemplatefeaturehasacostθ ≥0• Costofanedgeisθi ·fi• Costofapathisθ ·f• Canlearnparametersθ• Inferenceisthengraphsearch

Page 16: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

#2. Dealing with real sentences

Naturallogicworkswithfactsliketheseintheknowledgebase:ObamawasborninHawaii

Butreal-worldsentencesarecomplexandlong:BorninHonolulu,Hawaii,Obama isagraduateofColumbiaUniversityandHarvardLawSchool,whereheservedaspresidentoftheHarvardLawReview.

Approach:1. Classifierdivideslongsentencesintoentailedclauses2. Naturallogicinferencecanshortentheseclauses

Page 17: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Universal Dependencies (UD)http://universaldependencies.github.io/docs/

Asingleleveloftypeddependencysyntaxthat(i) worksforallhumanlanguages(ii) givesasimple,human-friendlyrepresentationofsentence

Dependencysyntaxisbetterthanaphrase-structuretreeformachineinterpretation– it’salmostasemanticnetwork

UDaimstobelinguisticallybetteracrosslanguagesthanearlierrepresentations,suchasCoNLLdependencies

Page 18: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Generation of minimal clauses

1. Classificationproblem:givenadependencyedge,doesitintroduceaclause?

2. Isitmissingacontrolledsubjectfromsubj/object?

3. Shortenclauseswhilepreservingvalidity,usingnaturallogic!• Allyoungrabbitsdrinkmilk

⊭ Allrabbitsdrinkmilk

• OK: SJC,theBayArea’sthirdlargestairport,oftenexperiences delaysduetoweather.

• Oftenbetter:SJCoftenexperiencesdelays.

Page 19: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

#3. Add a lexical alignment classifier

• Sometimeswecan’tquitemaketheinferencesthatwewouldliketomake:

• Weuseasimplelexicalmatchback-offclassifierwithfeatures:• Matchingwords,mismatchedwords,unmatchedwords• Thesealwaysworkprettywell• Thiswasthe lessonofRTEevaluations andperhaps orIRingeneral

Page 20: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

The full system

• Werunourusualsearchoversplitup,shortenedclauses• Ifwefindapremise,great!• Ifnot,weusethelexicalclassifierasanevaluationfunction

• Weworktodothisquicklyatscale• Visit1Mnodes/second, don’trefeaturize, justdelta• 32bytesearchstates (thanksGabor!)

Page 21: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Solving NY State 4th grade science (Allen AI Institute datasets)

Multiplechoicequestionsfromreal4th gradescienceexamsWhichactivityisanexampleofagoodhealthhabit?

(A)Watchingtelevision(B)Smokingcigarettes(C)Eatingcandy(D)Exercisingeveryday

Inourcorpus knowledgebase:• PlasmaTV’scandisplayupto16millioncolors...greatfor

watchingTV...alsomakeagoodscreen.• Notsmokingordrinkingalcoholisgoodforhealth,regardlessof

whetherclothingiswornornot.• Eatingcandyfordinerisanexampleofapoorhealthhabit.• Healthyisexercising

Page 22: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Solving 4th grade science (Allen AI NDMC)

System Dev TestKnowBot[Hixon etal.NAACL2015] 45 –KnowBot(augmentedwith humaninloop) 57 –IRbaseline(Lucene) 49 42NaturalLI 52 51Moredata+IRbaseline 62 58Moredata+NaturalLI 65 61NaturalLI+🔔 +(lex.classifier) 74 67Aristo[Clarketal.2016]6systems,evenmoredata 71

Testset:NewYorkRegents4thGradeScienceexammultiple-choice questionsfromAI2Training:BasicisBarron’sstudyguide;moredataisSciText corpusfromAI2.Score:%correct

Page 23: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Natural Logic

• Canwejustusetextasaknowledgebase?• Naturallogicprovidesauseful,formal(weak)logicfortextual

inference• Naturallogiciseasilycombinablewithlexicalmatching

methods,includingneuralnetmethods• Theresultingsystemisusefulfor:• Common-sensereasoning• QuestionAnswering• OpenInformationExtraction• i.e.,gettingoutrelation triples fromtext

Page 24: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Can information retrieval benefit from distributed representations of words?

Page 25: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

From symbolic to distributed representations

Thevastmajorityofrule-basedorstatisticalNLPand IR workregardedwordsasatomicsymbols:hotel, conference,walk

Invectorspaceterms,thisisavectorwithone1andalotofzeroes

[0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]

Wenowcallthisa“one-hot”representation.

Sec. 9.2.2

Page 26: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

From symbolic to distributed representations

Itsproblem:• Ifusersearchesfor[Dellnotebookbatterysize],wewouldliketomatchdocumentswith“Delllaptopbatterycapacity”

• Ifusersearchesfor[Seattlemotel],wewouldliketomatchdocumentscontaining“Seattlehotel”

Butmotel [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]T

hotel [0 0 0 0 0 0 0 1 0 0 0 0 0 0 0] = 0OurqueryanddocumentvectorsareorthogonalThereisnonaturalnotionofsimilarityinasetofone-hotvectors

Sec. 9.2.2

Page 27: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Capturing similarity

Therearemanythingsyoucandoaboutsimilarity,manywellknowninIR

Queryexpansionwithsynonymdictionaries

Learningwordsimilaritiesfromlargecorpora

ButawordrepresentationthatencodessimilaritywinsLessparameterstolearn(perword,notperpair)

Moresharingofstatistics

Moreopportunitiesformulti-tasklearning

Page 28: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Distributional similarity-based representations

Youcangetalotofvaluebyrepresentingawordbymeansofitsneighbors

“Youshallknowawordbythecompanyitkeeps”(J.R.Firth1957:11)

OneofthemostsuccessfulideasofmodernNLPgovernment debt problems turning into banking crises as has happened in

saying that Europe needs unified banking regulation to replace the hodgepodge

ë Thesewordswillrepresentbankingì

Page 29: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Basic idea of learning neural network word embeddings

Wedefinesomemodelthataimstopredictawordbasedonotherwordsinitscontext

Chooseargmaxw w·((wj−1 +wj+1)/2)

whichhasalossfunction,e.g.,

J =1−wj·((wj−1 +wj+1)/2)

Welookatmanysamplesfromabiglanguagecorpus

Wekeepadjustingthevectorrepresentationsofwordstominimizethisloss

Unitnormvectors

Page 30: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

With distributed, distributional representations, syntactic and semantic similarity is captured

0.2860.792−0.177−0.1070.109−0.5420.3490.271

currency =

Page 31: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Distributional representations can solve the fragility of NLP tools

StandardNLPsystems– here,theStanfordParser– areincrediblyfragilebecauseofsymbolicrepresentations

Crazysententialcomplement, suchasfor“likes [(being)crazy]”

Page 32: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Distributional representations can capture the long tail of IR similarity

Google’sRankBrain

Notnecessarilyasgoodfortheheadofthequerydistribution,butgreatforseeingsimilarityinthetail

3rd mostimportantrankingsignal(we’retold…)

Page 33: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

LSA:Count!models• Factorizea(maybeweighted,oftenlog-scaled)term-

document(Deerwester etal.1990)orword-contextmatrix(Schütze1992)intoUΣVT

• Retainonlyksingularvalues,inordertogeneralize

[Cf.Baroni:Don’tcount,predict!Asystematiccomparisonofcontext-countingvs.context-predicting semanticvectors. ACL2014]

LSA (Latent Semantic Analysis) vs. word2vec

k

Sec. 18.3

Page 34: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

word2vecCBOW/SkipGram:Predict![Mikolov etal.2013]:Simplepredictmodelsforlearningwordvectors• Trainwordvectorstotrytoeither:

• Predictawordgivenitsbag-of-wordscontext(CBOW);or

• Predictacontextword(position-independent)fromthecenterword

• Updatewordvectorsuntiltheycandothispredictionwell

LSA vs. word2vec

Sec. 18.3

Page 35: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

word2vec encodes semantic components as linear relations

Page 36: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

COALS model (count-modified LSA)[Rohde, Gonnerman & Plaut, ms., 2005]

Page 37: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Count based vs. direct prediction

LSA, HAL (Lund & Burgess), COALS (Rohde et al), Hellinger-PCA (Lebret & Collobert)

• Fast training• Efficient usage of statistics

• Primarily used to capture word similarity

• May not use the best methods for scaling counts

• NNLM, HLBL, RNN, word2vec Skip-gram/CBOW, (Bengio et al; Collobert & Weston; Huang et al; Mnih & Hinton; Mikolov et al; Mnih & Kavukcuoglu)

• Scales with corpus size

• Inefficient usage of statistics

• Can capture complex patterns beyond word similarity

• Generate improved performance on other tasks

Page 38: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Ratiosofco-occurrenceprobabilitiescanencodemeaningcomponents

Crucialinsight:

x =solid x =water

large

x =gas

small

x =random

smalllarge

small large large small

~1 ~1large small

Encoding meaning in vector differences[Pennington, Socher, and Manning, EMNLP 2014]

Page 39: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Ratiosofco-occurrenceprobabilitiescanencodemeaningcomponents

Crucialinsight:

x =solid x =water

1.9x10-4

x =gas x =fashion

2.2x10-5

1.36 0.96

Encoding meaning in vector differences[Pennington, Socher, and Manning, EMNLP 2014]

8.9

7.8x10-4 2.2x10-3

3.0x10-3 1.7x10-5

1.8x10-5

6.6x10-5

8.5x10-2

Page 40: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

A:Log-bilinearmodel:

withvectordifferences

Encoding meaning in vector differences

Q:Howcanwecaptureratiosofco-occurrenceprobabilitiesasmeaningcomponentsinawordvectorspace?

Page 41: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Nearestwordsto frog:

1.frogs2.toad3.litoria4.leptodactylidae5.rana6.lizard7.eleutherodactylus

Glove Word similarities[Pennington et al., EMNLP 2014]

litoria leptodactylidae

rana eleutherodactylushttp://nlp.stanford.edu/projects/glove/

Page 42: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Glove Visualizations

http://nlp.stanford.edu/projects/glove/

Page 43: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Glove Visualizations: Company - CEO

Page 44: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Named Entity Recognition Performance

ModelonCoNLL CoNLL ’03dev CoNLL ’03test ACE2 MUC7CategoricalCRF 91.0 85.4 77.4 73.4SVD(logtf) 90.5 84.8 73.6 71.5HPCA 92.6 88.7 81.7 80.7C&W 92.2 87.4 81.7 80.2CBOW 93.1 88.2 82.2 81.1GloVe 93.2 88.3 82.9 82.2

F1scoreofCRFtrainedonCoNLL2003Englishwith50dimwordvectors

Page 45: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Word embeddings: Conclusion

Glovetranslatesmeaningfulrelationshipsbetweenword-wordco-occurrencecounts into linearrelations inthewordvectorspace

GloveshowstheconnectionbetweenCount!workandPredict!work– appropriatescalingofcountsgivesthepropertiesandperformanceofPredict!Models

Alotofotherimportantworkinthislineofresearch:[Levy&Goldberg,2014][Arora,Li,Liang,Ma&Risteski,2015][Hashimoto,Alvarez-Melis &Jaakkola,2016]

Page 46: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Can we use neural networks to understand, not just word similarities,

but language meaning in general?

Page 47: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

CompositionalityArtificial Intelligence requires being able to understand bigger things from knowing

about smaller parts

Page 48: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

We need more than word embeddings!

Howcanweknowwhenlargerlinguisticunitsaresimilarinmeaning?

Thesnowboarderisleapingoverthemogul

Apersononasnowboardjumpsintotheair

Peopleinterpretthemeaningoflargertextunits–entities,descriptiveterms,facts,arguments,stories– bysemanticcomposition ofsmallerelements

Page 49: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Beyond the bag of words: Sentiment detection

Isthetoneofapieceoftextpositive,negative,orneutral?

• Sentimentisthatsentimentis“easy”• Detectionaccuracyforlongerdocuments~90%,BUT

……loved……………great………………impressed………………marvelous…………

Page 50: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Stanford Sentiment Treebank

• 215,154phraseslabeledin11,855sentences• Cantrainandtestcompositions

http://nlp.stanford.edu:8080/sentiment/

Page 51: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Tree-Structured Long Short-Term Memory Networks [Tai et al., ACL 2015]

Page 52: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Tree-structured LSTM

GeneralizessequentialLSTMtotreeswithanybranchingfactor

Page 53: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Positive/Negative Results on Treebank

75

80

85

90

95

TrainingwithSentence Labels TrainingwithTreebank

BiNB

RNN

MV-RNN

RNTN

TreeLSTM

Page 54: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Experimental Results on Treebank

• TreeRNN cancaptureconstructionslikeXbutY• Biword NaïveBayesisonly58%onthese

Page 55: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Stanford Natural Language Inference Corpus http://nlp.stanford.edu/projects/snli/570K Turker-judged pairs, based on an assumed picture

Amanridesabikeon asnowcoveredroad.Aman isoutside. ENTAILMENT

2femalebabieseatingchips.Twofemalebabiesareenjoyingchips.NEUTRAL

Amaninanapronshoppingatamarket.Amaninanapronispreparingdinner.CONTRADICTION

Page 56: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

NLI with Tree-RNNs[Bowman, Angeli, Potts & Manning, EMNLP 2015]

Approach: Wewouldliketoworkoutthemeaningofeachsentenceseparately– apurecompositionalmodel

ThenwecomparethemwithNN&classifyforinference

P(Entail)=0.8

manoutside vs. maninsnow

snowin

insnowman

maninsnow

outsideman

manoutside

Softmax classifier

ComparisonNNlayer(s)

Composition NN layer

Learnedwordvectors

Page 57: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Tree recursive NNs (TreeRNNs)

Theoreticallyappealing

Veryempiricallycompetitive

But

Prohibitivelyslow

Usuallyrequireanexternalparser

Don’texploitcomplementarylinearstructureoflanguage

Page 58: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

A recurrent NN allows efficient batched computation on GPUs

Page 59: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

TreeRNN: Input-specific structure undermines batched computation

Page 60: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

The Shift-reduce Parser-Interpreter NN (SPINN) [Bowman, Gauthier et al. 2016]

BasemodelequivalenttoaTreeRNN,but…supportsbatchedcomputation:25× speedups

Plus:Effectivenewhybridthatcombineslinearandtree-structuredcontext

Canstandalonewithoutaparser

Page 61: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Beginning observation: binary trees = transition sequences

SHIFT SHIFTREDUCE SHIFTSHIFT REDUCE

REDUCE

SHIFT SHIFTSHIFT SHIFTREDUCE REDUCE

REDUCE

SHIFT SHIFTSHIFT REDUCESHIFT REDUCE

REDUCE

Page 62: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

The Shift-reduce Parser-Interpreter NN (SPINN)

Page 63: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

The Shift-reduce Parser-Interpreter NN (SPINN)

ThemodelincludesasequenceLSTMRNN• ThisactsasasimpleparserbypredictingSHIFTorREDUCE• Italsogivesleftsequencecontextasinputtocomposition

Page 64: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Implementing the stack

• Naïveimplementation:simulatesstacksin abatchwithafixed-sizemultidimensionalarray ateachtimestep• Backpropagationrequiresthateachintermediatestackbemaintainedinmemory

• ⇒ Largeamountofdatacopyingandmovementrequired• Efficientimplementation

• Haveonlyonestackarrayforeachexample• Ateachtimestep,augmentwiththecurrentheadofthestack• Keeplistofbackpointers forREDUCEoperations

• Similartozipperdatastructuresemployedelsewhere

Page 65: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Array Backpointers

1

2

3

4

5

Spot 1

sat 1 2

down 123

(satdown) 14

(Spot(satdown)) 5

A thinner stack

Page 66: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning
Page 67: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Using SPINN for natural language inference

o₁ o₂the cat

sat down

the cat

is angry

(the cat) (sat down)

(the cat) (is angry)

Page 68: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

SNLI Results

Model %Accuracy(Testset)

Feature-based classifier 78.2PreviousSOTAsentenceencoder[Mou etal.2016]

82.1

LSTMRNNsequencemodel 80.6Tree LSTM 80.9SPINN 83.2SOTA(sentencepairalignmentmodel)[Parikhetal.2016]

86.8

Page 69: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Successes for SPINN over LSTM

Exampleswithnegation• P:Therhythmicgymnastcompletesherfloorexerciseatthecompetition.

• H:Thegymnastcannotfinishherexercise.

Longexamples(>20words)• P:AmanwearingglassesandaraggedcostumeisplayingaJaguarelectricguitarandsingingwiththeaccompanimentofadrummer.

• H:Amanwithglassesandadisheveledoutfitisplayingaguitarandsingingalongwithadrummer.

Page 70: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Envoi

• Thereareverygoodreasonsforwantingtorepresentmeaningwithdistributedrepresentations

• Sofar,distributionallearninghasbeenmosteffectiveforthis• Butcf.[Young,Lai,Hodosh&Hockenmaier 2014]ondenotationalrepresentations,usingvisualscenes

• However,wewantnotjustwordmeanings,butalso:• Meaningsoflargerunits,calculatedcompositionally• Theabilitytodonaturallanguageinference

• TheSPINNmodelisfast— closetorecurrentnetworks!• Itshybridsequence/treestructureispsychologicallyplausible

andout-performsothersentencecompositionmethods

Page 71: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

2011201320152017speechvisionNLPIR

Final Thoughts

Youarehere

IR NLP

Page 72: Natural Language Inference, Reading Comprehension and Deep Learningmanning/talks/SIGIR2016-Deep-Learning-NL… · Natural Language Inference, Reading Comprehension and Deep Learning

Final Thoughts

I’mcertainthatdeep learningwillcometodominateSIGIRover thenextcoupleofyears…justlikespeech, vision,andNLPbefore it.Thisisagoodthing.Deeplearningprovidessomepowerfulnewtechniques thatarejustbeingamazinglysuccessful onmanyhardappliedproblems.However, weshould realize thatthere isalsocurrentlyahugeamountofhypeaboutdeep learningandartificialintelligence.Weshouldnotletagenuineenthusiasmforimportantandsuccessful newtechniques leadtoirrationalexuberance oradiminishedappreciation ofotherapproaches. Finally,despite theefforts ofanumberofpeople, inpractice therehas beenaconsiderable divisionbetween thehumanlanguage technologyfieldsofIR,NLP,andspeech.Partlythisisduetoorganizational factorsandpartlythatatonetimethesubfieldseachhadaverydifferent focus.However, recentchanges inemphasis– withIRpeoplewantingtounderstand theuserbetterandNLPpeoplemuchmoreinterested inmeaningandcontext– meanthattherearealotofcommoninterests, andIwouldencouragemuchmorecollaborationbetween NLPandIRinthenextdecade.