syntactic parsing introduction - cl.lingfil.uu.se

56
Syntactic Parsing Introduction Joakim Nivre and Daniel Dakota Department of Linguistics and Philology Partly based on slides from Marco Kuhlmann and Sara Stymne

Upload: others

Post on 03-Jun-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Syntactic Parsing Introduction - cl.lingfil.uu.se

Syntactic Parsing

Introduction

Joakim Nivre and Daniel DakotaDepartment of Linguistics and Philology

Partly based on slides from Marco Kuhlmann and Sara Stymne

Page 2: Syntactic Parsing Introduction - cl.lingfil.uu.se

Today

• Introduction to syntactic parsing

• Course information

Page 3: Syntactic Parsing Introduction - cl.lingfil.uu.se

What is syntactic parsing?

• Given a natural language sentence, compute a representation of its syntactic structure

Page 4: Syntactic Parsing Introduction - cl.lingfil.uu.se

What is syntactic parsing?

• Given a natural language sentence, compute a representation of its syntactic structure

Page 5: Syntactic Parsing Introduction - cl.lingfil.uu.se

What is syntactic parsing?

• Given a natural language sentence, compute a representation of its syntactic structure

Page 6: Syntactic Parsing Introduction - cl.lingfil.uu.se

What is syntactic parsing?

• Given a natural language sentence, compute a representation of its syntactic structure

Page 7: Syntactic Parsing Introduction - cl.lingfil.uu.se

Why study syntactic parsing?

• Syntactic structure determines semantic interpretation

Disney acquired PixarPixar acquired DisneyPixar was acquired by Disney

• Natural language understanding applications

- Information extraction- Question answering

• Grammar checking

• Modeling human sentence processing

Page 8: Syntactic Parsing Introduction - cl.lingfil.uu.se

Parsing as Structured Prediction

Parsing is a structured prediction problem:

X → Y

• Input space X: sentences

• Output space Y: syntactic representations

• X and Y are infinite sets of structured objects

• Complexity requires specialised algorithms

Page 9: Syntactic Parsing Introduction - cl.lingfil.uu.se

Input Space

A sentence x ∈ X is a sequence of tokens x1, …, xn

• How do we delimit sentences?

• How do we split sentences into tokens?

- Different writing systems (white space or not)

- Different degrees of morphological complexity

• How do we preprocess tokens before parsing?

- Part-of-speech tagging

- Morphological analysis

- Lemmatization

Page 10: Syntactic Parsing Introduction - cl.lingfil.uu.se

Output Space

A syntactic representation y ∈ Y is a labeled graph

• Constituency-based representations

- Sentences decomposed into phrases and words

- Focus on internal structure of phrases

• Dependency-based representations

- Words connected by dependency relations

- Focus on functional role of words

• Hybrid representations exist

Page 11: Syntactic Parsing Introduction - cl.lingfil.uu.se

Syntactic Representations

Page 12: Syntactic Parsing Introduction - cl.lingfil.uu.se

Syntactic Representations

Terminal nodes

Page 13: Syntactic Parsing Introduction - cl.lingfil.uu.se

Syntactic Representations

Nonterminal nodes

Page 14: Syntactic Parsing Introduction - cl.lingfil.uu.se

Syntactic Representations

Root node

Page 15: Syntactic Parsing Introduction - cl.lingfil.uu.se

Syntactic Representations

Page 16: Syntactic Parsing Introduction - cl.lingfil.uu.se

Syntactic Representations

Nodes = Terminal nodes

Page 17: Syntactic Parsing Introduction - cl.lingfil.uu.se

Syntactic Representations

Labeled arcs

Page 18: Syntactic Parsing Introduction - cl.lingfil.uu.se

Syntactic Representations

Root node

Page 19: Syntactic Parsing Introduction - cl.lingfil.uu.se

Mapping

What is the relation between inputs and outputs?

X → Y

• Grammar G generates x ∈ X with analysis y ∈ Y

- Recognition – does G generate x (yes/no)?

- Parsing – derive (all) y assigned to x by G

• Every instance of x has a single interpretation y

- Disambiguation – select the “best” y for x

- Parsing as optimization: y* = argmaxy f(x, y)

Page 20: Syntactic Parsing Introduction - cl.lingfil.uu.se

Ambiguity

I shot an elephant in my pajamas.

• This sentence is ambiguous. In what way?

• What should happen if we parse the sentence?

Page 21: Syntactic Parsing Introduction - cl.lingfil.uu.se

Ambiguity

Page 22: Syntactic Parsing Introduction - cl.lingfil.uu.se

Ambiguity

Recognition: Yes

Page 23: Syntactic Parsing Introduction - cl.lingfil.uu.se

Ambiguity

Recognition: Yes

Parsing with G

Page 24: Syntactic Parsing Introduction - cl.lingfil.uu.se

Ambiguity

Recognition: Yes

Parsing with G

Disambiguation

Page 25: Syntactic Parsing Introduction - cl.lingfil.uu.se

Parsing Models

A model for parsing (with disambiguation):

• GEN(x) ⊆ Y

- Defines the set of candidate analyses for x

• EVAL(x, y) ∈ R

- Scores an analysis y in relation to x

• Parsing algorithm

- Finds the highest scoring analysis y*

Page 26: Syntactic Parsing Introduction - cl.lingfil.uu.se

Parsing Models

PCFG Parsing Dependency Parsing

GEN(x) { y | G derives x with y } { y | y is a spanning tree for x }

EVAL(x, y) PG(x, y) = PG(y) S(x, y) = ∑ S(xi, xj) for (xi, xj) in y

Algorithm CKY, Earley, … Collins, Eisner, …

Page 27: Syntactic Parsing Introduction - cl.lingfil.uu.se

Parsing as Search

• Parsing algorithms have to search the space of candidate trees

• In order to search the space of trees we have to build them

• To do parsing efficiently, we have to use “smart” search strategies

Page 28: Syntactic Parsing Introduction - cl.lingfil.uu.se

How many trees are there?

0

400

800

1200

1600

1 2 3 4 5 6 7 8

linear cubic exponential

Page 29: Syntactic Parsing Introduction - cl.lingfil.uu.se

Search Strategies

• Naive enumeration takes exponential time: O(2n)

• Dynamic programming

- Solve each subproblem only once

- Parsing in polynomial time, often cubic: O(n3)

- Examples: CKY, Earley, Eisner

• Greedy search and beam search

- Best-first search of a (small) subspace

- Parsing in sub-cubic time, often linear: O(n)

- Example: Transition-based parsing

Page 30: Syntactic Parsing Introduction - cl.lingfil.uu.se

Structured Prediction Again

• The algorithmic challenges arising in parsing are typical of structured prediction problems.

• Many of the techniques can be applied directly to other problems, for example, sequence labeling.

• This parsing course has an algorithmic focus in order to teach these techniques.

Page 31: Syntactic Parsing Introduction - cl.lingfil.uu.se

Treebanks

Syntactically annotated corpora (treebanks) have two main uses in parser development:

• The parameters of the scoring function EVAL(x, y) can be estimated from a training set

- This may involve more or less advanced machine learning methods – not a focus of this course

- Grammars may also be extracted from treebanks

• Parsing accuracy can be evaluated on a test set

- Most evaluation metrics compute partial matches between parse trees and treebank trees

- Test set must be distinct from training set – why?

Page 32: Syntactic Parsing Introduction - cl.lingfil.uu.se

Questions?

Page 33: Syntactic Parsing Introduction - cl.lingfil.uu.se

Course Information

All course information is in Studium

(TimeEdit schedule is not up to date)

Page 34: Syntactic Parsing Introduction - cl.lingfil.uu.se

Intended Learning Outcomes

At the end of the course, you should be able to

• explain the standard models and algorithms used in phrase structure and dependency parsing

• implement and evaluate some of these techniques

• critically evaluate scientific publications in the field of syntactic parsing

• design, evaluate, or theoretically analyse the syntactic component of an NLP system

Page 35: Syntactic Parsing Introduction - cl.lingfil.uu.se

Examination

Examination is continuous and based on

• 3 assignments

- 2 programming assignments

- 1 literature review

• 2 literature seminars

• 1 project

Page 36: Syntactic Parsing Introduction - cl.lingfil.uu.se

Programming Assignments

Assignment 1: PCFG parsing

• Implement CKY algorithm

• Evaluate parser on treebank data

Assignment 3: Dependency parsing

• Implement arc-eager transition system

• Implement oracle for arc-eager transition system

• Evaluate oracle on treebank data

Page 37: Syntactic Parsing Introduction - cl.lingfil.uu.se

Literature Review

Assignment 2: Literature review

• Pick two research articles about parsing

- Journal, conference or workshop papers

- Main topic should be parsing methods

• Write a 3-page report

- Summarize, analyze, critically review

Page 38: Syntactic Parsing Introduction - cl.lingfil.uu.se

Literature Seminars

Seminars to discuss an article in smaller groups

• Preparation:

- Read article

- Go through questions and discussion points

• Active participation is obligatory

- If you miss a seminar or do not participate actively, you have to hand in a written report

Page 39: Syntactic Parsing Introduction - cl.lingfil.uu.se

Project

Organization:

• Define your own project – suggestions in Studium

• Work individually or in pairs

Activities:

• Project proposal – February 26

- Assignment of supervisor

• Oral discussion (only for pairs) – March 22

• Project report – March 26

Page 40: Syntactic Parsing Introduction - cl.lingfil.uu.se

Learning Outcomes and Examination

At the end of the course, you should be able to

• explain the standard models and algorithms used in phrase structure and dependency parsing

• implement and evaluate some of these techniques

• critically evaluate scientific publications in the field of syntactic parsing

• design, evaluate, or theoretically analyse the syntactic component of an NLP system

Page 41: Syntactic Parsing Introduction - cl.lingfil.uu.se

Learning Outcomes and Examination

At the end of the course, you should be able to

• explain the standard models and algorithms used in phrase structure and dependency parsing

• implement and evaluate some of these techniques

• critically evaluate scientific publications in the field of syntactic parsing

• design, evaluate, or theoretically analyse the syntactic component of an NLP system

Ass 1–3Seminars

Page 42: Syntactic Parsing Introduction - cl.lingfil.uu.se

Learning Outcomes and Examination

At the end of the course, you should be able to

• explain the standard models and algorithms used in phrase structure and dependency parsing

• implement and evaluate some of these techniques

• critically evaluate scientific publications in the field of syntactic parsing

• design, evaluate, or theoretically analyse the syntactic component of an NLP system

Ass 1–3Seminars

Ass 1, 3

Page 43: Syntactic Parsing Introduction - cl.lingfil.uu.se

Learning Outcomes and Examination

At the end of the course, you should be able to

• explain the standard models and algorithms used in phrase structure and dependency parsing

• implement and evaluate some of these techniques

• critically evaluate scientific publications in the field of syntactic parsing

• design, evaluate, or theoretically analyse the syntactic component of an NLP system

Ass 1–3Seminars

Ass 1, 3

Ass 2Seminars

Page 44: Syntactic Parsing Introduction - cl.lingfil.uu.se

Learning Outcomes and Examination

At the end of the course, you should be able to

• explain the standard models and algorithms used in phrase structure and dependency parsing

• implement and evaluate some of these techniques

• critically evaluate scientific publications in the field of syntactic parsing

• design, evaluate, or theoretically analyse the syntactic component of an NLP system

Ass 1–3Seminars

Ass 1, 3

Ass 2Seminars

Project

Page 45: Syntactic Parsing Introduction - cl.lingfil.uu.se

Grading

Grading scales:

• Assignments and project: U–G–VG

• Literature seminars: U–G

To achieve G on the course:

• G on all assignments, seminars and project

To achieve VG on the course:

• VG on 3 assignments or 1 assignment + project

Page 46: Syntactic Parsing Introduction - cl.lingfil.uu.se

Teachers

Joakim Nivre

• Examiner and course coordinator

• Lectures, seminars, project supervision

• Assignment 2 and 3

Daniel Dakota

• Lectures, seminars, project supervision

• Assignments 1 and 2

Page 47: Syntactic Parsing Introduction - cl.lingfil.uu.se

Teaching

All teaching and tutoring is on line

• Monday and Wednesday 10.15–12:

- 10 sessions (90 min, whole class)

• Live lecture on Zoom

• Recorded lecture + Q&A

- 2 literature seminars (45 min, small groups)

• Monday 13:15–14 (from February 1):

- Tutoring for assignments and project

Page 48: Syntactic Parsing Introduction - cl.lingfil.uu.se

Teaching

All teaching and tutoring is on line

• Monday and Wednesday 10.15–12:

- 10 sessions (90 min, whole class)

• Live lecture on Zoom

• Recorded lecture + Q&A

- 2 literature seminars (45 min, small groups)

• Monday 13:15–14 (from February 1):

- Tutoring for assignments and project

CHECK SCHEDULE

Page 49: Syntactic Parsing Introduction - cl.lingfil.uu.se

Lectures

• Lectures and course books cover basic parsing algorithms in detail

• They touch on more advanced material, but you will need to read up on that independently

• To get the most out of the lectures, prepare by watching recorded lectures and reading the literature in advance

Page 50: Syntactic Parsing Introduction - cl.lingfil.uu.se

Course Workload

7.5 hp means about 200 hours work:

~ 40 h lectures (including preparation)

2 h seminars

158 h work on your own

~ 80 h assignment work (including reading)

~ 10 h seminar preparation

~ 68 h project work

Page 51: Syntactic Parsing Introduction - cl.lingfil.uu.se

Deadlines

Assignment Deadline1: PCFG parsing February 19

2: Literature review March 123: Dependency parsing March 26Project proposal February 26Project report March 26

Backup April 23

Seminars Date

Seminar 1 February 8, 10Seminar 2 March 1, 3

Page 52: Syntactic Parsing Introduction - cl.lingfil.uu.se

Reading: Course Books

Daniel Jurafsky and James H. Martin. Speech and Language Processing. 3rd edition. 2020. Chapters 12–15. Available on line at: https://web.stanford.edu/~jurafsky/slp3/.

Sandra Kübler, Ryan McDonald, and Joakim Nivre. Dependency Parsing. Morgan and Claypool. 2009. Chapters 1–4, 6. Available as e-book at: ub.uu.se.

Page 53: Syntactic Parsing Introduction - cl.lingfil.uu.se

Reading: Articles

Seminar 1:

• Mark Johnson. PCFG Models of Linguistic Tree Representations. Computational Linguistics 24(4), 613–632.

Seminar 2:

• Joakim Nivre and Jens Nilsson. Pseudo-Projective Dependency Parsing. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, 99–106.

Individual selection of articles for literature review (Ass 2).

Page 54: Syntactic Parsing Introduction - cl.lingfil.uu.se

Evaluation 2020

• Overall score: 3.7 / 5 (Median 4)

• What did you find valuable to the degree that it should be retained:

- Assignments interesting and suitably challenging. Flexibility in choosing and executing final project was good. Programming assignments were useful (but hard).

- The combination of implementing basic algorithms such as CKY and talking about more advanced approaches was really helpful.

- Very well organised. Great slides and lectures.

• Which course components did you find most in need of revision?

- The workload felt a bit too high for a 7.5 credit course.

- Vague assignment instructions. The first lab was the most difficult and confusing.

Page 55: Syntactic Parsing Introduction - cl.lingfil.uu.se

Changes 2021

• One assignment less than in 2021.

• Assignment 1 redesigned and simplified.

Page 56: Syntactic Parsing Introduction - cl.lingfil.uu.se

Questions?