dependency parsing 2 - university of maryland · data-driven dependency parsing goal:learn a good...

52
Dependency Parsing 2 CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

Upload: others

Post on 02-Oct-2020

24 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

DependencyParsing2CMSC723/LING723/INST725

MarineCarpuat

Figcredits:Joakim Nivre,DanJurafsky &JamesMartin

Page 2: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

DependencyParsing

• Formalizingdependencytrees

• Transition-baseddependencyparsing• Shift-reduceparsing• Transitionsystem• Oracle• Learning/predictingparsingactions

Page 3: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Data-drivendependencyparsing

Goal: learnagoodpredictorofdependencygraphsInput:sentenceOutput:dependencygraph/treeG=(V,A)

Canbeframedasastructuredpredictiontask- verylargeoutputspace- withinterdependentlabels

2dominantapproaches:transition-basedparsingandgraph-basedparsing

Page 4: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Transition-baseddependencyparsing

• Buildsonshift-reduceparsing[Aho &Ullman,1927]

• Configuration• Stack• Inputbuffer ofwords• Setofdependencyrelations

• Goalofparsing• findafinalconfigurationwhere• allwordsaccountedfor• Relationsformdependencytree

Page 5: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Transitionoperators

• Transitions:produceanewconfigurationgivencurrentconfiguration

• Parsingisthetaskof• Findingasequenceoftransitions• Thatleadsfromstartstatetodesiredgoalstate

• Startstate• StackinitializedwithROOTnode• Inputbufferinitializedwithwordsinsentence• Dependencyrelationset=empty

• Endstate• Stackandwordlistsareempty• Setofdependencyrelations=finalparse

Page 6: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

ArcStandardTransitionSystem

• Defines3transitionoperators[Covington,2001;Nivre 2003]• LEFT-ARC:• createhead-dependentrel.betweenwordattopofstackand2nd word(undertop)• remove2nd wordfromstack

• RIGHT-ARC:• Createhead-dependentrel.betweenwordon2nd wordonstackandwordontop• Removewordattopofstack

• SHIFT• Removewordatheadofinputbuffer• Pushitonthestack

Page 7: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Arcstandardtransitionsystems

• Preconditions• ROOTcannothaveincomingarcs• LEFT-ARCcannotbeappliedwhenROOTisthe2nd elementinstack• LEFT-ARCandRIGHT-ARCrequire2elementsinstacktobeapplied

Page 8: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Transition-basedDependencyParser

• Assumeanoracle

• Parsingcomplexity• Linearinsentencelength!

• Greedyalgorithm• UnlikeViterbiforPOStagging

Page 9: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Transition-BasedParsingIllustrated

Page 10: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Wheretowegetanoracle?

• Multiclassclassificationproblem• Input:currentparsingstate(e.g.,currentandpreviousconfigurations)• Output:onetransitionamongallpossibletransitions• Q:sizeofoutputspace?

• Supervisedclassifierscanbeused• E.g.,perceptron

• Openquestions• Whataregoodfeaturesforthistask?• Wheredowegettrainingexamples?

Page 11: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

GeneratingTrainingExamples

• Whatwehaveinatreebank • Whatweneedtotrainanoracle• Pairsofconfigurationsandpredictedparsingaction

Page 12: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Generatingtrainingexamples

• Approach:simulateparsingtogeneratereferencetree

• Given• Acurrentconfig withstackS,dependencyrelationsRc• Areferenceparse(V,Rp)

• Do

Page 13: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Let’stryitout

Page 14: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Features

• Configurationconsistofstack,buffer,currentsetofrelations

• Typicalfeatures• Featuresfocusontoplevelofstack• Usewordforms,POS,andtheirlocationinstackandbuffer

Page 15: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Featuresexample

• Givenconfiguration • Exampleofusefulfeatures

Page 16: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Featuresexample

Page 17: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Researchhighlight:Dependencyparsingwithstack-LSTMs• FromDyeretal.2015:http://www.aclweb.org/anthology/P15-1033

• Idea• Insteadofhand-craftedfeature• Predictnexttransitionusingrecurrentneuralnetworkstolearnrepresentationofstack,buffer,sequenceoftransitions

Page 18: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Researchhighlight:Dependencyparsingwithstack-LSTMs

Page 19: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Researchhighlight:Dependencyparsingwithstack-LSTMs

Page 20: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

AlternateTransitionSystems

Page 21: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Note:Adifferentwayofwritingarc-standardtransitionsystem

Page 22: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Aweaknessofarc-standardparsing

Rightdependentscannotbeattachedtotheirheaduntilalltheirdependentshavebeenattached

Page 23: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

ArcEagerParsing• LEFT-ARC:

• Createhead-dependentrel.betweenwordatfrontofbufferandwordattopofstack

• popthestack• RIGHT-ARC:

• Createhead-dependentrel.betweenwordontopofstackandwordatfrontofbuffer

• Shiftbufferheadtostack• SHIFT

• Removewordatheadofinputbuffer• Pushitonthestack

• REDUCE• Popthestack

Page 24: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

ArcEagerParsingExample

Page 25: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Trees&Forests

• Adependencyforest(here)isadependencygraphsatisfying• Root• Single-Head• Acyclicity• butnot Connectedness

Page 26: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Propertiesofthistransition-basedparsingalgorithm

- Correctness- Foreverycompletetransitionsequence,theresultinggraphisaprojectivedependencyforest(soundness)

- ForeveryprojectivedependencyforestG,thereisatransitionsequencethatgeneratesG(completeness)

- Trick:forestcanbeturnedintotreebyaddinglinkstoROOT0

Page 27: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Dealingwithnon-projectivity

Page 28: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Projectivity• Arc fromheadtodependentisprojective• Ifthereisapathfromheadtoeverywordbetweenheadanddependent

• Dependencytree isprojective• Ifallarcsareprojective• Orequivalently,ifitcanbedrawnwithnocrossingedges

• Projectivetreesmakecomputationeasier• Butmosttheoreticalframeworksdonotassumeprojectivity• Needtocapturelong-distancedependencies,freewordorder

Page 29: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Arc-standardparsingcan’tproducenon-projectivetrees

Page 30: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =
Page 31: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Howfrequentarenon-projectivestructures?

• StatisticsfromCoNLL sharedtask• NPD=nonprojectivedependencies• NPS=nonprojectivesentences

Page 32: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Howtodealwithnon-projectivity?(1)changethetransitionsystem

• Addnewtransitions• Thatapplyto2nd wordofthestack• Topwordofstackistreatedascontext

[Attardi 2006]

Page 33: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Howtodealwithnon-projectivity?(2)pseudo-projectiveparsing

Solution:• “projectivize”anon-projectivetreebycreatingnewprojectivearcs• Thatcanbetransformedbackintonon-projectivearcsinapost-processingstep

Page 34: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Howtodealwithnon-projectivity?(2)pseudo-projectiveparsing

Solution:• “projectivize”anon-projectivetreebycreatingnewprojectivearcs• Thatcanbetransformedbackintonon-projectivearcsinapost-processingstep

Page 35: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Graph-basedparsing

Page 36: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Graphconceptsrefresher

Page 37: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

DirectedSpanningTrees

Page 38: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

MaximumSpanningTree

• Assumewehaveanarcfactoredmodeli.e.weightofgraphcanbefactoredassumorproductofweightsofitsarcs

• Chu-Liu-Edmondsalgorithmcanfindthemaximumspanningtreeforus!• Greedyrecursivealgorithm• Naïveimplementation:O(n^3)

Page 39: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Chu-Liu-Edmondsillustrated

Page 40: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Chu-Liu-Edmondsillustrated

Page 41: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Chu-Liu-Edmondsillustrated

Page 42: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Chu-Liu-Edmondsillustrated

Page 43: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Chu-Liu-Edmondsillustrated

Page 44: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =
Page 45: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Arcweightsaslinearclassifiers

Page 46: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Exampleofclassifierfeatures

Page 47: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

HowtoscoreagraphGusingfeatures?

Arc-factoredmodelassumption

Bydefinitionofarcweightsaslinearclassifiers

Page 48: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Howcanwelearntheclassifierfromdata?

Page 49: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

DependencyParsing:whatyoushouldknow• Formalizingdependencytrees

• Transition-baseddependencyparsing• Shift-reduceparsing• Transitionsystem:arcstandard,arceager• Oracle• Learning/predictingparsingactions

• Graph-baseddependencyparsing

• Aflexibleframeworkthatallowsmanyextensions• RNNsvsfeatureengineering,non-projectivity

Page 50: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =
Page 51: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Extension:dynamicoracle

Problemwithstandardclassifier-basedoracle:- Itis“static”

- ie tiedtooptimalconfig sequencethatproducesgoldtree

- Whatiftherearemultiplesequencesforasinglegoldtree?- Howcanwerecoveriftheparserdeviatesfromgoldsequence?

Onesolution:“dynamicoracle”[Goldberg&Nivre 2012]

SeealsoLocallyOptimalLearningtoSearch[Changetal.ICML2015]

Page 52: Dependency Parsing 2 - University Of Maryland · Data-driven dependency parsing Goal:learn a good predictor of dependency graphs Input: sentence Output: dependency graph/tree G =

Extension:dynamicoracleProblemwithstandard

See[Goldberg&Nivre 2012]fordetails