1 cs546: machine learning and natural language preparation to the term project: - dependency parsing...
TRANSCRIPT
1
CS546: Machine Learning and Natural Language
Preparation to the Term Project:- Dependency Parsing- Dependency Representation for Semantic Role Labeling
Slides for Dependency Parsing are based on Joakim Nivre and Sandar Kuebler slides from ACL 06 Tutorial
2
Outline
–Dependency Parsing:• Formalism• Dependency Parsing algorithms
– Semantic Role Labeling• Dependency Formalism• Basic Approach for the First Part of the Term
Project– Pipeline for the first assignment
=5
3
=5
• Formalization by Lucien Tesniere [Tesniere, 1959]• Idea known long before (e.g., Panini, India, >2000 yrs ago)• Studied extensively in the Prague School approach in syntax• (in US, research was focused more on constituent formalism)
4
=5
5
=5
(or Constituent Structure)
8
=5
• There are advantages of dependency structures:– for free (or semi-free) order languages– easier to convert to predicate-argument structure– ...
• But there are drawbacks too...• You can try to convert one representation into
another– but, in general, these formalisms are not equivalent
Constituent vs Dependency
•
9
=5
• Most of approaches have been focused constituent tree-based features
• But now it changes– Machine Translation (e.g., Menezes & Quirk, 07)– Summarization and sentence compression (e.g.,
Fillippova & Strube, 08)– Opinion mining, (e.g., Lerman et al, 08)– Information extraction, Question Answering (e.g.,
Bouma et al, 06)
Dependency structures for NLP tasks
•
All these conditions will be violated for semantic dependency graphs we will consider later
You can think of it as (related) planarity
14
=5
• Global inference algorithms:– graph-based approaches– transition-based approaches
• We will not consider– rule-based systems– constraint satisfaction
Algorithms
•
15
=5
Idea:• Convert dependency structures to constituent
structures– easy for projective dependency structures
• Apply algorithms for constituent parsing to them– E.g., CKY - if some of you attend the class by Julia
Hockenmaier on parsing it was/will be covered there
Converting to Constituent Formalism
•
16
=5
Converting to Constituent Formalism
•
• Different independence assumption lead to different statistical models– both accuracy and parsing time (dynamic
programming) varies
• Features f(i,j) can include dependence on any words in the sentence, i.e. f(i, j, sent) • But still the score decomposes over edges in the graph•Strong independence assumption
Online Learning (Structured Perceptron)
x̂
• Joint feature representation:– we will talk about it more later
• Algoritm:
Here we run MST or Eisner’s algorithm
Features over edges only
x̂
• Here, when we say parsing algorithm (=derivation order) we often mean mapping:– Given a tree map it to a sequence of actions which create
this tree
• Tree T is equivalent to these sequence of actions:d1, ..., dn
• Therefore, P(T) = P(d1, ..., dn)
• P(T) = P(d1, ..., dn) = P(d1) P(d2|d1)... P(dn|dn-1, ..., d1)
• Ambigous: some times “parsing algorithms” refers to the decoding algorithm to find the most likely sequence
Parsing Algorithms
You can use classifiers here and search for most likely sequence (Recall Maryam’s talk)
• Most algorithms are restricted to projective structures, but not all
It can handle only projective structures
x̂
• Your training examples are{(dj; d1,....,dn-1)} -- collections of parsing contexts
• Your want to predict correct actionsP(dn|dn-1, ..., d1)• How to define feature representation of (dn-1, ..., d1)• You can think instead of (dn-1, ..., d1) in terms of:
– partial tree corresponding to them– current contents of queue (Q) and stack (S)– The most important features are top of S and front of Q (only
between them you can potentially create links)
• (Inference: you can do it greedily or with beam search)
How to learn in this case?
x̂
CoNLL-2006 Shared Task, Average over 12 langs (Labeled Attachment Score)
McDonald et al (MST): 80.27Nivre et al (Transitions): 80.19
• Results are the same• A lot of research in both directions,– e.g., Latent Variable Models for Transition Based
Parsing (Titov and Henderson, 07) – best single-model system in CoNLL-2007 (third overall)
Results: Transition-based vs Graph-Based
x̂
• Graph-Based Algorithms (McDonald)• Post-Processing of Projective Algorithms (Hall
and Novak, 05)• Transition-Based Algorithms which handle
non-projectivity (Attardi 06, Titov et al, 08; Nivre et al, 08)
• Pseudo Projective Parsing: Removing non-projective (crossing) links and encoding them in labels (Nivre and Nilsson, 05)
Non-Projective Parsing
x̂
• Graph-Based Algorithms (McDonald)• Post-Processing of Projective Algorithms (Hall
and Novak, 05)• Transition-Based Algorithms which handle
non-projectivity (Attardi 06, Titov et al, 08; Nivre et al, 08)
• Pseudo Projective Parsing: Removing non-projective (crossing) links and encoding them in labels (Nivre and Nilsson, 05)
Non-Projective Parsing
37
First Phase of Term Project– The goal is to construct a joint syntax-SRL
(Semantic Role Labeling) dependency structures– Similar to CoNLL-2008, 09 Shared Tasks– 2nd phase will focus on SRL – Now we need to create the entire pipeline
• Tagger: SVM tagger• Pseudo-Projective Transformations: tool by Nilsson & Nivre• Dependency Parser: Malt Parser by Nivre et al• Implement a basic classifier for SRL (see next slide)
– Due after Spring Break– I’ll send email and description by email
=5
38
First Phase of Term Project
=5
Syntactic structure
Semantic structure
•Properties of the Semantic (SRL) Structure •Multiple heads (parents)•Need to annotate predicates with senses (predicates are potential parents in the graph) – not indicated in the figure
It is not the most standard formalism for SRL
39
SRL Pipeline– 1st Stage: For every word you decide if a word
is a predicate (binary classification)– 2nd Stage: For all the words which are
predicates predict their sense – 3rd Stage: For every pair of words decide:• word A is an argument of word B• word B is an argument of word A• there is no SRL relation between them(constraint: only predicates can be parents)
– 4th Stage: Label all the relations
=5
40
SRL Pipeline
– Use any features: • hint: dependency parse features are going to be very
useful• see the CoNLL 2008 shared task papers to see which
features were useful
– Use any learning algorithm• You can use a package (e.g., SnOW)• Or implement it (e.g., averaged perceptron is easy)
– Do not use any SRL tools
=5
41
Next lectures
– I will be away for 2 weeks– Next week (Mar, 9 – Mar, 15):• Wednesday: Alex Klementiev on Weak Supervision • Friday: Kevin Small on Active Learning
+ student presentation by Ryan on Friday
– 2nd week (Mar, 16 – Mar, 22):• work on the project
– 1st phase will be due around April, 1 (exact dates later)
=5