discourse analysis david m. cassel natural language processing villanova university april 21st, 2005...

18
Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005

Upload: javen-ghant

Post on 14-Dec-2015

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova

Discourse AnalysisDiscourse Analysis

David M. Cassel

Natural Language Processing

Villanova University

April 21st, 2005

David M. Cassel

Natural Language Processing

Villanova University

April 21st, 2005

Page 2: Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova

April 2005Discourse Analysis

David M. Cassel

Discourse AnalysisDiscourse Analysis

Discourse: collocated, related groups of sentences (from book)

Discourse: collocated, related groups of sentences (from book)

Page 3: Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova

April 2005Discourse Analysis

David M. Cassel

Discourse AnalysisDiscourse Analysis

Discourse Model -- a model to represent the entities mentioned in the discourse

Coreference or Anaphora Resolution -- determining which entity a referring expression refers to

Coherence -- modeling the logical flow of the discourse

The book also discusses Psycholinguistic Studies of Reference and Coherence

Discourse Model -- a model to represent the entities mentioned in the discourse

Coreference or Anaphora Resolution -- determining which entity a referring expression refers to

Coherence -- modeling the logical flow of the discourse

The book also discusses Psycholinguistic Studies of Reference and Coherence

Page 4: Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova

April 2005Discourse Analysis

David M. Cassel

Anaphora ResolutionAnaphora ResolutionBefore the game, manager Charlie Manuel said Gavin Floyd's performance would not affect whether he remains with the team when Vicente Padilla comes off the disabled list Tuesday.

Then Floyd went out and had a nightmarish first inning: four walks, one wild pitch, one hit, four runs.

After the game, Manuel said Floyd's disastrous outing had not changed his mind. The righthander will remain with the club and be used in relief.

"The pitcher we saw in St. Louis is a pitcher who has the ability to be a very good major-league pitcher," he said. "He didn't have command of his fastball and couldn't get his breaking ball over tonight... . Maybe the cold was affecting his breaking ball, because he was bouncing a lot of them."

-- Sam Carchidi, Philadelphia Inquirer, 4/16/05

Before the game, manager Charlie Manuel said Gavin Floyd's performance would not affect whether he remains with the team when Vicente Padilla comes off the disabled list Tuesday.

Then Floyd went out and had a nightmarish first inning: four walks, one wild pitch, one hit, four runs.

After the game, Manuel said Floyd's disastrous outing had not changed his mind. The righthander will remain with the club and be used in relief.

"The pitcher we saw in St. Louis is a pitcher who has the ability to be a very good major-league pitcher," he said. "He didn't have command of his fastball and couldn't get his breaking ball over tonight... . Maybe the cold was affecting his breaking ball, because he was bouncing a lot of them."

-- Sam Carchidi, Philadelphia Inquirer, 4/16/05

Page 5: Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova

April 2005Discourse Analysis

David M. Cassel

Discourse ModelDiscourse Model

Gavin Floyd

Charlie Manuel

Vicente Padilla

Gavin Floyd

heFloydThe righthanderThe pitcher we saw in St. Louishis

evoke(introduce)

refer

corefer

Adapted from Figure 18.1, Speech & Language Processing

Page 6: Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova

April 2005Discourse Analysis

David M. Cassel

Types of Anaphoric ReferencesTypes of Anaphoric References Indefinite noun phrases

A baseball player like that should do well.

Definite noun phrases The righthander will remain with the club.

Pronouns He had a bad game.

Demostratives This player has a bright future.

One-anaphora I saw no less than 6 Acura Integras today. Now I want one. (from book)

Indefinite noun phrases A baseball player like that should do well.

Definite noun phrases The righthander will remain with the club.

Pronouns He had a bad game.

Demostratives This player has a bright future.

One-anaphora I saw no less than 6 Acura Integras today. Now I want one. (from book)

Page 7: Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova

April 2005Discourse Analysis

David M. Cassel

Reference ConstraintsReference Constraints

Number Agreement Floyd pitched 6 innings. They went well.

Person and Case He didn’t have command of his fastball.

Gender Agreement Floyd took his glove with him. It fit well.

Syntactic Contraints Floyd threw him the ball.

Selectional Restrictions Floyd stepped onto the mound with the ball. He threw it really fast.

Number Agreement Floyd pitched 6 innings. They went well.

Person and Case He didn’t have command of his fastball.

Gender Agreement Floyd took his glove with him. It fit well.

Syntactic Contraints Floyd threw him the ball.

Selectional Restrictions Floyd stepped onto the mound with the ball. He threw it really fast.

Page 8: Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova

April 2005Discourse Analysis

David M. Cassel

PreferencesPreferences

Recency Floyd threw the ball. Lieberthal picked it up. He put the ball in his pocket.

Grammatical Role Floyd threw the ball to Lieberthal. His arm was getting tired.

Repeated Mention (See article)

Parallelism Floyd threw a ball to Lieberthal. Wagner threw a ball to him, too.

Verb Semantics John telephoned Bill. He lost the pamphlet on Acuras. John criticized Bill. He lost the pamphlet on Acuras.

Recency Floyd threw the ball. Lieberthal picked it up. He put the ball in his pocket.

Grammatical Role Floyd threw the ball to Lieberthal. His arm was getting tired.

Repeated Mention (See article)

Parallelism Floyd threw a ball to Lieberthal. Wagner threw a ball to him, too.

Verb Semantics John telephoned Bill. He lost the pamphlet on Acuras. John criticized Bill. He lost the pamphlet on Acuras.

Page 9: Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova

April 2005Discourse Analysis

David M. Cassel

Pronoun Resolution AlgorithmsPronoun Resolution Algorithms

Traditional Carter: shallow parsing Rich, LuperFoy: distributed

architecture Carbonell, Brown: multi-strategy Rico Pérez: scalar product Mitkov: combination of linguistic,

statistical (high 80s) Lappin, Leass: syntax-based

(86%) Hobbs: Tree Search Algorithm

(91.7%) Grosz, Joshi, Weinstein:

Centering Algorithm (77.6%) Hobbs: Coherence

Traditional Carter: shallow parsing Rich, LuperFoy: distributed

architecture Carbonell, Brown: multi-strategy Rico Pérez: scalar product Mitkov: combination of linguistic,

statistical (high 80s) Lappin, Leass: syntax-based

(86%) Hobbs: Tree Search Algorithm

(91.7%) Grosz, Joshi, Weinstein:

Centering Algorithm (77.6%) Hobbs: Coherence

Alternative Nasukawa: knowledge-

independent (93.8%) Dagan, Itai: statistical, corpus

processing (87% for “genuine” it) Connolly, Burger, Day: machine

learning Aone, Bennett: machine learning

(“close to 90%”) Mitkov: uncertainty reasoning Mitkov: 2-engine (~90%) Tin, Akman: situational semantics Say, Vakman

Alternative Nasukawa: knowledge-

independent (93.8%) Dagan, Itai: statistical, corpus

processing (87% for “genuine” it) Connolly, Burger, Day: machine

learning Aone, Bennett: machine learning

(“close to 90%”) Mitkov: uncertainty reasoning Mitkov: 2-engine (~90%) Tin, Akman: situational semantics Say, Vakman

Page 10: Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova

April 2005Discourse Analysis

David M. Cassel

Lappin & LeassLappin & Leass

Book presents a slightly modified algorithm for nonreflexive, 3rd person pronouns. Two parts:

Update discourse model with salience value Resolve pronouns

Let’s apply this to some text:In the afternoon, Gavin Floyd played baseball at the park. Then he

went to a bar with Mike Lieberthal. He enjoyed a beer.

Book presents a slightly modified algorithm for nonreflexive, 3rd person pronouns. Two parts:

Update discourse model with salience value Resolve pronouns

Let’s apply this to some text:In the afternoon, Gavin Floyd played baseball at the park. Then he

went to a bar with Mike Lieberthal. He enjoyed a beer.

Page 11: Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova

April 2005Discourse Analysis

David M. Cassel

Salience FactorsSalience FactorsFactor Weight

Sentence recency 100

Subject emphasis 80

Existential emphasis 70

Accusative (direct object) emphasis 50

Indirect object, oblique complement emphasis

40

Non-adverbial emphasis 50

Head noun emphasis 80

Page 12: Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova

April 2005Discourse Analysis

David M. Cassel

Pronoun SaliencePronoun Salience

Factor Weight

Role parallelism 35

Cataphora -175

Page 13: Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova

April 2005Discourse Analysis

David M. Cassel

L&L AlgorithmL&L Algorithm

Collect the potential referents (up to four sentences back).

Remove potential referents that do not agree in number or gender with the pronoun.

Remove potential referents that do not pass intrasentential syntactic coreference constraints.

Compute the total salience value of the referent by adding any applicable values to existing salience value.

Select the referent with the highest salience value. In case of ties, select closest referent in terms of string position.

Collect the potential referents (up to four sentences back).

Remove potential referents that do not agree in number or gender with the pronoun.

Remove potential referents that do not pass intrasentential syntactic coreference constraints.

Compute the total salience value of the referent by adding any applicable values to existing salience value.

Select the referent with the highest salience value. In case of ties, select closest referent in terms of string position.

Page 14: Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova

April 2005Discourse Analysis

David M. Cassel

ExampleExample

In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer.

Rec Subj Exist Obj Ind-Obj

Non-Adv

Head Noun

Total

the afternoon 100 80 180

Gavin Floyd 100 80 50 80 310

baseball 100 50 50 50 250

the park 100 50 150

Page 15: Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova

April 2005Discourse Analysis

David M. Cassel

ExampleExample

In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer.

Carry Rec Subj Exist Obj Ind-Obj

Non-Adv

Head Noun

Total

the afternoon 90

Gavin Floyd 155

baseball 125

the park 75

a bar 100 50 80 230

Mike Lieberthal 100 50 150

Page 16: Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova

April 2005Discourse Analysis

David M. Cassel

ExampleExample

In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer.

Carry Rec Subj Exist Obj Ind-Obj

Non-Adv

Head Noun

Total

the afternoon 90

{Gavin Floyd, he} 155 100 80 50 80 465

baseball 125

the park 75

a bar 100 50 80 230

Mike Lieberthal 100 50 150

Page 17: Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova

April 2005Discourse Analysis

David M. Cassel

ExampleExample

In the afternoon, Gavin Floyd played baseball at the park. Then he went to a bar with Mike Lieberthal. He enjoyed a beer.

Carry

the afternoon 45

{Gavin Floyd, he} 230

baseball 62

the park 37

a bar 115

Mike Lieberthal 75

a beer 280

Gavin Floyd gets 35 point for Role Parallelism. Mike Lieberthal does not.

Floyd => 265 pointsLieberthal => 75 points

We pick Floyd as the antecedent of He.

Page 18: Discourse Analysis David M. Cassel Natural Language Processing Villanova University April 21st, 2005 David M. Cassel Natural Language Processing Villanova

April 2005Discourse Analysis

David M. Cassel

SummarySummary

Discourse Analysis requires processing more text than POS tagging or finding entities.

Part of tracing the flow of discourse is resolving anaphora.

That resolution lets us capture more relationships and other information than we could otherwise.

Discourse Analysis requires processing more text than POS tagging or finding entities.

Part of tracing the flow of discourse is resolving anaphora.

That resolution lets us capture more relationships and other information than we could otherwise.