robust shallow semantic parsing of text › ~dipanjan › ttic_presentation.pdf · semantic parsing...

Post on 04-Jul-2020

10 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Dipanjan DasCarnegie Mellon University

Toyota Technological Institute at Chicago

February 14, 2012

Robust Shallow Semantic Parsing of Text

Natural Language Understanding

2

Natural Language Understanding

I want to go to Chicago on Sunday

3

I want to go to on Sunday

Natural Language Understanding

P V ADP V ADP N ADP N

Shallow Syntax:Part-of-Speech Tagging

4

Chicago

Natural Language Understanding

Deeper Syntax:Dependency parsing

I want to go to on Sunday

P V ADP V ADP N ADP N

5

Chicago

Natural Language Understanding

Shallow Semantics: Frames and Roles

I want to go to on Sunday

P V ADP V ADP N ADP N

6

Chicago

Natural Language Understanding

Shallow Semantics: Frames and Roles

I want to go to on Sunday

P V ADP V ADP N ADP N

Encodes an eventor

scenario

7

Chicago

Natural Language Understanding

Shallow Semantics: Frames and Roles

I want to go to on Sunday

P V ADP V ADP N ADP N

8

Chicago

Natural Language Understanding

Shallow Semantics: Frames and Roles

I want to go to on Sunday

P V ADP V ADP N ADP N

participant orrole

for the frame

9

Chicago

Natural Language Understanding

Shallow Semantics: Frames and Roles

I want to go to on Sunday

Experiencer

10

Chicago

Natural Language Understanding

Shallow Semantics: Frames and Roles

Focus of this talk!(Das, Schneider, Chen and Smith, NAACL 2010;

Das and Smith, ACL 2011)

11

I want to go to on Sunday

Experiencer

Chicago

1. Why semantic analysis?

3. Semi-supervised learning for robustness

2. Statistical models for structure prediction

12

1. Why semantic analysis?

3. Semi-supervised learning for robustness

2. Statistical models for structure prediction

ApplicationsMotivation Choice offormalism

13

1. Why semantic analysis?

3. Semi-supervised learning for robustness

2. Statistical models for structure prediction

Argument identificationFrame identification

14

1. Why semantic analysis?

3. Semi-supervised learning for robustness

2. Statistical models for structure prediction

Argument identificationFrame identification

(use of latent variables) (dual decomposition)

15

1. Why semantic analysis?

3. Semi-supervised learning for robustness

2. Statistical models for structure prediction

Novel graph-based learning algorithms16

1. Why semantic analysis?

Motivation ApplicationsChoice offormalism

17

Bengal ’s massive stock of food was reduced to nothing

18

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

19

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

Large body of research on syntax, includingDas and Petrov, ACL 2011

Cohen, Das and Smith, EMNLP 2011Martins, Das, Smith and Xing, EMNLP 2008

20

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

storeor

financial entity?

21

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

Store of what?Of what size?Whose store?

22

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

Store of what?Of what size?Whose store?

23

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

What was reduced?

24

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

What was reduced?

To what?

25

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

What was reduced?

To what?

26

Bengal ’s massive stock of food was reduced to nothing

27

Bengal ’s massive stock of food was reduced to nothing

AUSE_CHANGE_OF_POSITION_ON_A_SCALE

28

Origins: (Computational) Linguistics

29

Origins: (Computational) Linguistics

Case Grammar(“The Case for Case”, Fillmore, 1968)

30

I gave some money to him

Case Grammar(“The Case for Case”, Fillmore, 1968)

Origins: (Computational) Linguistics

31

(cases are words/phrases required by a predicate)

I gave some money to him

Agent Object Beneficiary

Case Grammar(“The Case for Case”, Fillmore, 1968)

Origins: (Computational) Linguistics

32

(cases are words/phrases required by a predicate)

I gave some money to him

Agent Object Beneficiary

Semantic valency of a predicate

Correlation with syntax(e.g. subject and object)

Obligatory cases / optional cases

Case Grammar(“The Case for Case”, Fillmore, 1968)

Origins: (Computational) Linguistics

33

Slide idea taken from Brendan O’Connor

Case Grammar(“The Case for Case”,

Fillmore, 1968)

34

Frames(“A Framework

for Representing Knowledge”, Minsky, 1975)

Slide idea taken from Brendan O’Connor

Case Grammar(“The Case for Case”,

Fillmore, 1968)

35

Frame Semantics(“Frame Semantics”,

Fillmore, 1982)

Frames(“A Framework

for Representing Knowledge”, Minsky, 1975)

Slide idea taken from Brendan O’Connor

Case Grammar(“The Case for Case”,

Fillmore, 1968)

36

Frame Semantics(“Frame Semantics”,

Fillmore, 1982)

Relates the meaning of a word with world knowledge

(e.g. gave evokes a IVING frame;

it has several participating roles;the frame is evoked by other words,

such as bequeath, contribute, donate)

Frames(“A Framework

for Representing Knowledge”, Minsky, 1975)

Slide idea taken from Brendan O’Connor

Case Grammar(“The Case for Case”,

Fillmore, 1968)

37

Frame Semantics(“Frame Semantics”,

Fillmore, 1982)

Frames(“A Framework

for Representing Knowledge”, Minsky, 1975)

Slide idea taken from Brendan O’Connor

Case Grammar(“The Case for Case”,

Fillmore, 1968)

Relates the meaning of a word with world knowledge

(e.g. gave evokes a IVING frame;

it has several participating roles;the frame is evoked by other words,

such as bequeath, contribute, donate)

38

Case Grammar(“The Case for Case”,

Fillmore, 1968)

Frame Semantics(“Frame Semantics”,

Fillmore, 1982)

FrameNetPropBankVerbNet

NomBankOntoNotes

Frames(“A Framework

for Representing Knowledge”, Minsky, 1975)

Datasets

Slide idea taken from Brendan O’Connor

39

Case Grammar(“The Case for Case”,

Fillmore, 1968)

Frame Semantics(“Frame Semantics”,

Fillmore, 1982)

FrameNetPropBankVerbNet

NomBankOntoNotes

Frames(“A Framework

for Representing Knowledge”, Minsky, 1975)

Datasets

Data-Driven ShallowSemantic Parsing

Slide idea taken from Brendan O’Connor

40

Case Grammar(“The Case for Case”,

Fillmore, 1968)

Frame Semantics(“Frame Semantics”,

Fillmore, 1982)

FrameNetPropBankVerbNet

NomBankOntoNotes

Frames(“A Framework

for Representing Knowledge”, Minsky, 1975)

Scripts(“Scripts, Plans,

Goals and Understanding”, Schank and Abelson, 1977)

MUCACE

GENIA

Datasets

Data-Driven ShallowSemantic Parsing

Information Extraction(template filling)

Slide idea taken from Brendan O’Connor

41

Case Grammar(“The Case for Case”,

Fillmore, 1968)

Frame Semantics(“Frame Semantics”,

Fillmore, 1982)

FrameNetPropBankVerbNet

NomBankOntoNotes

Frames(“A Framework

for Representing Knowledge”, Minsky, 1975)

Scripts(“Scripts, Plans,

Goals and Understanding”, Schank and Abelson, 1977)

MUCACE

GENIA

Datasets

Data-Driven ShallowSemantic Parsing

Information Extraction(template filling)

Slide idea taken from Brendan O’Connor

42

Case Grammar(“The Case for Case”,

Fillmore, 1968)

Frame Semantics(“Frame Semantics”,

Fillmore, 1982)

FrameNetPropBankVerbNet

NomBankOntoNotes

Frames(“A Framework

for Representing Knowledge”, Minsky, 1975)

Scripts(“Scripts, Plans,

Goals and Understanding”, Schank and Abelson, 1977)

MUCACE

GENIA

Datasets

Data-Driven ShallowSemantic Parsing

Information Extraction(template filling)

Slide idea taken from Brendan O’Connor

43

Case Grammar(“The Case for Case”,

Fillmore, 1968)

Frame Semantics(“Frame Semantics”,

Fillmore, 1982)

FrameNetPropBankVerbNet

NomBankOntoNotes

Frames(“A Framework

for Representing Knowledge”, Minsky, 1975)

Scripts(“Scripts, Plans,

Goals and Understanding”, Schank and Abelson, 1977)

MUCACE

GENIA

Datasets

Data-Driven ShallowSemantic Parsing

Information Extraction(template filling)

Slide idea taken from Brendan O’Connor

structurally similar!

44

Why this Linguistic Formalism?

45

shallow

deep

46

Why this Linguistic Formalism?

shallow

deep

PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)

47

Why this Linguistic Formalism?

shallow

deep

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)

48

Why this Linguistic Formalism?

shallow

deep

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

A1 A4

PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)

49

Why this Linguistic Formalism?

shallow

deep

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

A1 A4

symbolic set of semantic roles(six total)

Verb-specific meaning for these labels

Conflates the meaning of different roles due to oversimplification

(Yi et al., 2007)

PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)

50

Why this Linguistic Formalism?

shallow

deepSemantic Parsing into Logical Forms

(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)

PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)

51

Why this Linguistic Formalism?

shallow

deep

What states border the state that borders the most states

Semantic Parsing into Logical Forms(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)

PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)

52

Why this Linguistic Formalism?

shallow

deep

What states border the state that borders the most states

Semantic Parsing into Logical Forms(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)

PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)

53

Why this Linguistic Formalism?

shallow

deep

What states border the state that borders the most states

Semantic Parsing into Logical Forms(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)

Trained on very restricted domains

Poor lexical coverage

PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)

54

Why this Linguistic Formalism?

shallow

deep

PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)

Semantic Parsing into Logical Forms(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)

Frame-Semantic Parsing(Gildea and Jurafsky, 2002, Johansson and Nugues, 2007; this work)

55

Why this Linguistic Formalism?

shallow

deep

PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)

Semantic Parsing into Logical Forms(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)

56

Frame-Semantic Parsing(Gildea and Jurafsky, 2002, Johansson and Nugues, 2007; this work)

Does not model quantification or negation

unlike logical forms

Why this Linguistic Formalism?

shallow

deep

PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)

Semantic Parsing into Logical Forms(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)

Deeper than PropBank-style semantic role labeling

57

Frame-Semantic Parsing(Gildea and Jurafsky, 2002, Johansson and Nugues, 2007; this work)

Why this Linguistic Formalism?

shallow

deep

PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)

Semantic Parsing into Logical Forms(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)

Models all types of part-of-speech categories

58

Frame-Semantic Parsing(Gildea and Jurafsky, 2002, Johansson and Nugues, 2007; this work)

Why this Linguistic Formalism?

shallow

deep

PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)

Semantic Parsing into Logical Forms(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)

Larger lexical coverage than logical form parsers

59

Frame-Semantic Parsing(Gildea and Jurafsky, 2002, Johansson and Nugues, 2007; this work)

Why this Linguistic Formalism?

shallow

deep

PropBank-style Semantic Role Labeling(Pradhan et al., 2008; Punyakanok et al., 2008)

Semantic Parsing into Logical Forms(Ge and Mooney, 2005; Zettlemoyer and Collins., 2005)

Lexicon actively increasing in size

60

Frame-Semantic Parsing(Gildea and Jurafsky, 2002, Johansson and Nugues, 2007; this work)

Why this Linguistic Formalism?

1. Why semantic analysis?

ApplicationsMotivation ApplicationsChoice offormalism

61

Question Answering

Bengal ’s massive stock of food was reduced to nothing

Whose stock of food was diminished ?

Possible Applications

62

Question Answering

Bengal ’s massive stock of food was reduced to nothing

Whose stock of food was diminished ?

Possible Applications

63

Question Answering

Bengal ’s massive stock of food was reduced to nothing

Whose stock of food was diminished ?

Possible Applications

64

Question Answering

Bengal ’s massive stock of food was reduced to nothing

Whose stock of food was diminished ?

Possible Applications

65

Question Answering

Bengal ’s massive stock of food was reduced to nothing

Whose stock of food was diminished ?reserve

Possible Applications

66

Question Answering

Bengal ’s massive stock of food was reduced to nothing

Whose stock of food was diminished ?reserve

Bilotti et al. (2007)

Possible Applications

67

Information Extraction

In 1997, France's stock of unirradiated civil plutonium increased to 72 tons.

Saudi Arabia has 267 billion barrels in reserves of oil.

Does Egypt have stockpiles of biological weapons?

Bengal’s massive stock of food was reduced to nothing.

Possible Applications

68

Information Extraction

In 1997, France's stock of unirradiated civil plutonium increased to 72 tons.

Saudi Arabia has 267 billion barrels in reserves of oil.

Does Egypt have stockpiles of biological weapons?

Bengal’s massive stock of food was reduced to nothing.

Possible Applications

69

Information Extraction

In 1997, France's stock of unirradiated civil plutonium increased to 72 tons.

Saudi Arabia has 267 billion barrels in reserves of oil.

Does Egypt have stockpiles of biological weapons?

Bengal’s massive stock of food was reduced to nothing.

Possessor Desc Resource

Bengal Massive food

France - unirradiated civil plutonium

Saudi Arabia 267 billion barrels oil

Egypt - Biological weapons

Possible Applications

70

Multilingual Applications

Bengal ’s massive stock of food was reduced to nothing

Possible Applications

71

Multilingual Applications

Bengal ’s massive stock of food was reduced to nothing

������ ��� ��� � ����� ��� �� ��� ���

Possible Applications

72

Multilingual Applications

Bengal ’s massive stock of food was reduced to nothing

������ ��� ��� � ����� ��� �� ��� ���

Possible Applications

73

Multilingual KBs

Multilingual Applications

Bengal ’s massive stock of food was reduced to nothing

������ ��� ��� � ����� ��� �� ��� ���

Cross lingual IR

Machine translation

Possible Applications

74

2. Statistical models for structure prediction

(Das, Schneider, Chen and Smith, NAACL 2010)

75

Structure of Lexicon and Data

76

Structure of Lexicon and Data

PLACING

Agent

Cause

Goal

Theme

Area

Time

77

Structure of Lexicon and Data

PLACING

Agent

Cause

Goal

Theme

Area

Time

frame

78

Structure of Lexicon and Data

PLACING

Agent

Cause

Goal

Theme

Area

Time

frame

roles

79

Structure of Lexicon and Data

PLACING

Agent

Cause

Goal

Theme

Area

Time

framecore roles

non-core roles

80

Structure of Lexicon and Data

PLACING

Agent

Cause

Goal

Theme

Area

Time

framecore roles

non-core roles

excludes relationship

81

Structure of Lexicon and Data

PLACING

Agent

Cause

Goal

Theme

Area

Time

framecore roles

non-core roles

archive.V, arrange.V, bag.V, bestow.V bin.V

predicates

excludes relationship

82

Structure of Lexicon and Data

PLACING

Agent

Cause

Goal

Theme

Area

Time

DISPERSAL

Agent

Cause

Individuals

Distance

Time

TRANSITIVE_ACTION

Agent

Cause

Patient

Event

Place

Time

INSTALLING

Agent

Component

Fixed_location

Area

Time

STORING

Agent

Location

Theme

Area

Time

STORE

Possessor

Resource

Supply

Descriptor

83

Structure of Lexicon and Data

PLACING

Agent

Cause

Goal

Theme

Area

Time

DISPERSAL

Agent

Cause

Individuals

Distance

Time

TRANSITIVE_ACTION

Agent

Cause

Patient

Event

Place

Time

INSTALLING

Agent

Component

Fixed_location

Area

Time

STORING

Agent

Location

Theme

Area

Time

STORE

Possessor

Resource

Supply

Descriptor

84

Structure of Lexicon and Data

PLACING

Agent

Cause

Goal

Theme

Area

Time

DISPERSAL

Agent

Cause

Individuals

Distance

Time

TRANSITIVE_ACTION

Agent

Cause

Patient

Event

Place

Time

INSTALLING

Agent

Component

Fixed_location

Area

Time

STORING

Agent

Location

Theme

Area

Time

STORE

Possessor

Resource

Supply

Descriptor

inheritance

used by

85

Structure of Lexicon and Data

Benchmark Dataset(SemEval 2007)

665 frames720 role labels

8.4K unique predicate types

Training set:2.2K sentences

11.2K predicate tokens

Test set:120 sentences

1. 1K predicate tokens86

Structure of Lexicon and Data

For comparison with past state of the art

(very small dataset)

Benchmark Dataset(SemEval 2007)

665 frames720 role labels

8.4K unique predicate types

Training set:2.2K sentences

11.2K predicate tokens

Test set:120 sentences

1. 1K predicate tokens87

Structure of Lexicon and Data

New Data(FrameNet 1.5, 2010)

877 frames1068 role labels

9.3K unique predicate types

Training set:3.3K sentences

19.6K predicate tokens

Test set:2420 sentences

4.5K predicate tokens

Benchmark Dataset(SemEval 2007)

665 frames720 role labels

8.4K unique predicate types

Training set:2.2K sentences

11.2K predicate tokens

Test set:120 sentences

1. 1K predicate tokens88

Structure of Lexicon and Data

New Data(FrameNet 1.5, 2010)

877 frames1068 role labels

9.3K unique predicate types

Training set:3.3K sentences

19.6K predicate tokens

Test set:2420 sentences

4.5K predicate tokens

Benchmark Dataset(SemEval 2007)

665 frames720 role labels

8.4K unique predicate types

Training set:2.2K sentences

11.2K predicate tokens

Test set:120 sentences

1. 1K predicate tokens89

2. Statistical models for structure prediction

Argument identificationFrame identification

(use of latent variables) (dual decomposition)

90

2. Statistical models for structure prediction

Argument identificationFrame identification

(use of latent variables) (dual decomposition)

91

Frame Identification

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

92

Frame Identification

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

ambiguous

93

Frame Identification

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

Find the best among all the frames

94

Frame Identification

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

95

Frame Identification

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

96

Frame Identification

Direct modeling using logistic regression

97

Two problems:1. Unable to model unknown

predicates at test time

Frame Identification

Direct modeling using logistic regression

98

Frame Identification

Direct modeling using logistic regression

Two problems:1. Unable to model unknown

predicates at test time

2. Number of features:

99

Frame Identification

Direct modeling using logistic regression

Two problems:1. Unable to model unknown

predicates at test time

2. Number of features: ≈ 50 million

100

Frame Identification

Logistic regression with a latent variable

101

Frame Identification

Logistic regression with a latent variable

102

Frame Identification

Logistic regression with a latent variable

103

Predicates evoking a frame in supervised data, e.g.cargo.N, inventory.N, reserve.N, stockpile.N, store.N, supply.N

evoke STORE

Frame Identification

Logistic regression with a latent variable

104

Does not look at the predicate’s surface form

Frame Identification

105

Frame Identification

106

TORE

stock.N

stockpile.N

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

Frame Identification

107

TORE

stock.N

stockpile.N

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

LexSem = {synonym}

Frame Identification

108

TORE

stock.N

stockpile.N

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

LexSem = {synonym}

If STORE

stockpile.N

synonym LexSem

Frame Identification

109

TORE

stock.N

stockpile.N

Bengal ’s massive stock of food was reduced to nothingN X A N ADP N V V ADP N

LexSem = {synonym}

If STORE

stockpile.N

synonym LexSem

comes from WordNet!

Frame Identification

110

Number of features:

Frame Identification

111

Number of features:

≈ 500 K

(1% of the features we had before)

Frame Identification

112

Number of features:

≈ 500 K

(1% of the features we had before)

Aside:

Probabilistic modeling of language meaning using syntax, lexical semantics and latent structure:

Paraphrase identification (Das and Smith, ACL 2009)

Frame Identification

Training:

Maximum conditional log-likelihood(batch training with L-BFGS)

113

Frame Identification

Fast inference:

If predicate is unseen,

otherwise,

114

Frame Identification

Results

Benchmark

60.464.0

58

63

68

73

78

83

88

93

UTD LTH ThisWork

F-Measure

New Data

115

Frame Identification

Results

Benchmark

60.464.0

58

63

68

73

78

83

88

93

UTD LTH ThisWork

F-Measure

New Data

Bejan and Hathaway, (2007)

116

Frame Identification

Results

Benchmark

60.464.0

58

63

68

73

78

83

88

93

UTD LTH ThisWork

F-Measure

New Data

Johansson and Nugues, (2007)

117

Frame Identification

Results

Benchmark

60.464.0

68.3

58

63

68

73

78

83

88

93

UTD LTH ThisWork

F-Measure

New Data

118

Frame Identification

Results

Benchmark

60.464.0

68.3

58

63

68

73

78

83

88

93

UTD LTH ThisWork

F-Measure

auto predicates

New Data

119

Frame Identification

Results

Benchmark

60.464.0

68.3

74.2

58

63

68

73

78

83

88

93

UTD LTH ThisWork

ThisWork

F-Measure

auto predicates givenpredicates

New Data

120

Frame Identification

Results

Benchmark New Data

60.464.0

68.3

74.2

58

63

68

73

78

83

88

93

UTD LTH ThisWork

ThisWork

F-Measure

auto predicates givenpredicates

90.5

58

63

68

73

78

83

88

93

ThisWork

F-Measure

givenpredicates

121

Frame Identification

Results

Benchmark New Data

60.464.0

68.3

74.2

58

63

68

73

78

83

88

93

UTD LTH ThisWork

ThisWork

F-Measure

auto predicates givenpredicates

90.5

80.0

58

63

68

73

78

83

88

93

ThisWork

ThisWork

F-Measure

givenpredicates

no hidden variable

122

2. Statistical models for structure prediction

Argument identificationFrame identification

(use of latent variables) (dual decomposition)

123

Argument Identification

Bengal ’s massive stock of food was reduced to nothing

TORE

124

Argument Identification

Bengal ’s massive stock of food was reduced to nothing

TORE

125

Argument Identification

stock

TORE

Bengal ’s

Bengal

massive stock

of food

food

massive

Bengal ’s massive stock of food

massive stock of food

126

Argument Identification

stock

TORE

Bengal ’s

Bengal

massive stock

of food

food

massive

Bengal ’s massive stock of food

massive stock of food

Ideal mapping!

127

Argument Identification

stock

TORE

Bengal ’s

Bengal

massive stock

of food

food

massive

Bengal ’s massive stock of food

massive stock of food

128

Argument Identification

stock

TORE

Bengal ’s

Bengal

massive stock

of food

food

massive

Bengal ’s massive stock of food

massive stock of food

Violates overlap constraints

129

Argument Identification

Other types of structural constraints

PLACING

Agent

Cause

Goal

Theme

Area

Time

Mutual exclusion constraint

archive.V, arrange.V, bag.V, bestow.V bin.V

130

Argument Identification

Other types of structural constraints

PLACING

Agent

Cause

Goal

Theme

Area

Time

Mutual exclusion constraint

archive.V, arrange.V, bag.V, bestow.V bin.V

If an agent places something, there cannot be a cause role in the sentence

131

Argument Identification

Other types of structural constraints

PLACING

Agent

Cause

Goal

Theme

Area

Time

Mutual exclusion constraint

archive.V, arrange.V, bag.V, bestow.V bin.V

132

AgentThe waiter placed food on the table.

In Kabul, hauling water put food on the table.Cause

Argument Identification

Other types of structural constraints

SIMILARITY

Dimension

Differentiating_fact

Entity_1

Entity_2

Degree

Requires constraint

difference.N, resemble.V,

unliike.A, vary.V

133

Argument Identification

Other types of structural constraints

A mulberry resembles a loganberry.

second entity

first entity

SIMILARITY

Dimension

Differentiating_fact

Entity_1

Entity_2

Degree

Requires constraint

difference.N, resemble.V,

unliike.A, vary.V

134

Argument Identification

Other types of structural constraints

A mulberry resembles.

SIMILARITY

Dimension

Differentiating_fact

Entity_1

Entity_2

Degree

Requires constraint

difference.N, resemble.V,

unliike.A, vary.V

135

!

Argument Identification

stock

TORE

Bengal ’s

Bengal

massive stock

of food

food

massive

Bengal ’s massive stock of food

massive stock of food

A constrained optimization

problem

136

Argument Identification

stock

TORE

Bengal ’s

Bengal

massive stock

of food

food

massive

Bengal ’s massive stock of food

massive stock of food

137

Argument Identification

stock

TORE

Bengal ’s

Bengal

massive stock

of food

food

massive

Bengal ’s massive stock of food

massive stock of food

138

Argument Identification

A constrained optimization problem

139

Argument Identification

A constrained optimization problem

a binary variable for each role, span tuple

140

Argument Identification

A constrained optimization problem

a binary vector for all role, span tuples

141

Argument Identification

A constrained optimization problem

142

Argument Identification

A constrained optimization problem

143

Argument Identification

A constrained optimization problem

Uniqueness144

Argument Identification

A constrained optimization problem

Prevents overlap 145

Argument Identification

A constrained optimization problem

more structural constraints

146

Argument Identification

A constrained optimization problem

An integer linear program (ILP)

more structural constraints

147

Argument Identification

A constrained optimization problem

An integer linear program (ILP)

more structural constraints

148

Punyakanok, Roth and Yih(2008)

Argument Identification

A constrained optimization problem

Often, very slow solutions

An integer linear program (ILP)

more structural constraints

149

Argument Identification

A constrained optimization problemAn integer linear program (ILP)

more structural constraints

Fast ILP solvers proprietary

150

Argument Identification

An alternate approach

Dual Decomposition with Alternating Direction Method of Multipliers

developed with colleagues at CMU(Das, Martins and Smith, forthcoming)

151

Argument Identification

An alternate approach

basic part:

entire space: all tuples

152

Argument Identification

An alternate approach

basic part:

entire space: all tuples

Break down the problem into many small components

e.g. -- find the best span for a role

-- for a sentence position, find the best tuple

-- for a pair of mutually exclusive roles,find the best tuple

153

Argument Identification

An alternate approach

basic part:

entire space: all tuples

Break down the problem into many small components

e.g. -- find the best span for a role

-- for a sentence position, find the best tuple

-- for a pair of mutually exclusive roles,find the best tuple

impose agreement between components

154

Argument Identification

An alternate approach

For each component, a binary vector:

155

Argument Identification

An alternate approach

For each component, a binary vector:

Total score assigned to :

156

Argument Identification

An alternate approach

For each component, a binary vector:

Total score assigned to :

uses

157

Argument Identification

An alternate approach

158

Argument Identification

An alternate approach

witness vector for consensus

159

Argument Identification

An alternate approach

Primal:

160

Argument Identification

An alternate approach

Primal':

Integer constraints relaxed

161

Argument Identification

An alternate approach

Primal':

An augmented Lagrangian function

162

Argument Identification

An alternate approachAn augmented Lagrangian function

Saddle point can be found using several decoupled

worker problems

163

Argument Identification

An alternate approachAn augmented Lagrangian function

Three types of iterative updates:

1. Lagrange multiplier updates ( )2. Consensus variable updates ( )

3. updates

164

Argument Identification

An alternate approachAn augmented Lagrangian function

Three types of iterative updates:

1. Lagrange multiplier updates ( )2. Consensus variable updates ( )

3. updatesAt decoupled workers

165

Argument Identification

An alternate approachAn augmented Lagrangian function

Three types of iterative updates:

1. Lagrange multiplier updates ( )2. Consensus variable updates ( )

3. updatesAt decoupled workers

e.g. for each role, we have a worker that imposes a XOR/uniqueness constraint

166

Argument Identification

An alternate approachAn augmented Lagrangian function

Three types of iterative updates:

1. Lagrange multiplier updates ( )2. Consensus variable updates ( )

3. updatesAt decoupled workers

e.g. for each role, we have a worker that imposes a XOR/uniqueness constraint

Projection onto a simplexa simple sort operation

167

Argument Identification

An alternate approachAn augmented Lagrangian function

Three types of iterative updates:

1. Lagrange multiplier updates ( )2. Consensus variable updates ( )

3. updatesAt decoupled workers

e.g. for each role, we have a worker that imposes a XOR/uniqueness constraint

Projection onto a simplexa simple sort operation

Challenge:define fast, simple workers

168

Argument Identification

An alternate approach

Advantages:

1) Significant speedup2) No proprietary solver necessary

169

Argument Identification

An alternate approach

Advantages:

43.12

4.780

10

20

30

40

50

CPLEX (ILP) Dual Decomposition

Time to decode the test set in seconds

1) Significant speedup2) No proprietary solver necessary

170

Argument Identification

An alternate approach

Advantages:

43.12

4.780

10

20

30

40

50

CPLEX (ILP) Dual Decomposition

Time to decode the test set in seconds

1) Significant speedup2) No proprietary solver necessary

171

Certificate of Optimality in

>99% of examples

Argument Identification

stock

TORE

Bengal ’s

Bengal

massive stock

of food

food

massive

Bengal ’s massive stock of food

massive stock of food

172

Argument Identification

stock

TORE

Bengal ’s

Learning?

173

Argument Identification

stock

TORE

Bengal ’s

Learning?

Maximum conditional log-likelihoodof local role span pairs

(batch training using L-BFGS)

174

Argument Identification

Results

New Data

Benefit of joint inference175

8283.8

76.4 76.2

79.1 79.8

70

75

80

85

Local DualDecomposition

Precision

Recall

F-Measure

Argument Identification

Results

New Data

Benefit of joint inference

8283.8

76.4 76.2

79.1 79.8

70

75

80

85

Local DualDecomposition

Precision

Recall

F-Measure

501 linguistic violations

176

Argument Identification

Results

New Data

Benefit of joint inference

501 linguistic violations No violations

177

8283.8

76.4 76.2

79.1 79.8

70

75

80

85

Local DualDecomposition

Precision

Recall

F-Measure

Full Parsing

Final Results

Benchmark New Data

37.9

45.6

35

40

45

50

55

60

65

70

UTD LTH ThisWork

F-Measure

auto predicates

178

Full Parsing

Final Results

Benchmark New Data

37.9

45.6

50.2

35

40

45

50

55

60

65

70

UTD LTH ThisWork

F-Measure

auto predicates

179

Full Parsing

Final Results

Benchmark New Data

37.9

45.6

50.253.6

35

40

45

50

55

60

65

70

UTD LTH ThisWork

ThisWork

F-Measure

auto predicates givenpredicates

180

Full Parsing

Final Results

Benchmark New Data

37.9

45.6

50.253.6

35

40

45

50

55

60

65

70

UTD LTH ThisWork

ThisWork

F-Measure

auto predicates givenpredicates

68.5

35

40

45

50

55

60

65

70

This Work

givenpredicates

F-Measure

181

1. Why semantic analysis?

3. Semi-supervised learning for robustness

2. Statistical models for structure prediction

Novel graph-based learning algorithms

182

1. Why semantic analysis?

3. Semi-supervised learning for robustness

Novel graph-based learning algorithms

(Das and Smith, ACL 2011)

2. Statistical models for structure prediction

183

90.5

405060708090

All Predicates

F-Measure

Results on Unknown Predicates

46.6

40

50

60

70

80

90

Unknown Predicates

F-Measure

Frame Identification

184

90.5

405060708090

All Predicates

F-Measure

Results on Unknown Predicates

46.6

40

50

60

70

80

90

Unknown Predicates

F-Measure

Frame Identification

68.5

25303540455055606570

All Predicates

F-Measure

30.2

25303540455055606570

Unknown Predicates

F-Measure

Full Parsing

185

Handling Unknown Predicates

Knowledge of only 9,263 predicates in supervised data

186

Handling Unknown Predicates

Knowledge of only 9,263 predicates in supervised data

187

However, English has lot more potential predicates(~65,000 in newswire English)

Handling Unknown Predicates

Knowledge of only 9,263 predicates in supervised data

However, English has lot more potential predicates(~65,000 in newswire English)

Lexicon expansion using graph-based semi-supervised learning

188

Build a graph over potential predicates as vertices• compute similarity matrix using co-occurrence statistics

Label distribution at each vertexdistribution over frames that the predicate can evoke

How can label propagation help?

189

Build a graph over potential predicates as vertices• compute similarity matrix using co-occurrence statistics

Label distribution at each vertexdistribution over frames that the predicate can evoke

How can label propagation help?

Idea very similar to Das and Petrov (ACL 2011): unsupervised lexicon expansion for POS tagging

190

Example Graph

Seed predicates

191

Example Graph

Seed predicatesUnseen predicates

192

Example Graph

Graph Propagation193

Example Graph

Graph Propagation194

Example Graph

Graph Propagation

Continues till convergence...

195

Brief Overview:Graph-Based Learning

with Labeled and Unlabeled Data

196

197

0.9

0.01

0.8

0.9

0.1

= symmetric weight matrix

0.05

198

labeled datapointsunlabeled datapoints

0.9

0.01

0.8

0.9

0.1

= symmetric weight matrix

0.05

199

0.9

0.01

0.8

0.9

0.1

= symmetric weight matrix

0.05

200

0.9

0.01

0.8

0.9

0.1

= symmetric weight matrix

0.05

supervised label distributions

201

0.9

0.01

0.8

0.9

0.1

= symmetric weight matrix

0.05

supervised label distributions

distributions to be found

202

Label Propagation

Minimize:

Das and Smith, forthcoming 203

Label Propagation

Minimize:

204

Brings the distribution of observed and induced distributions over labeled vertices closer

Label Propagation

Minimize:

205

brings the distributions of similarvertices closer

Label Propagation

Minimize:

206

induces sparse distributionsin each vertex

Constrained Inference

If predicate is seen,

otherwise,

else if predicate is in graph,

207

Constrained Inference

If predicate is seen,

otherwise,

else if predicate is in graph,

Six times faster inference on unknown predicates!

208

Results on Unknown Predicates

46.642.7

65.3

40455055606570

Supervised Self-Training Graph-Based

F-Measure

Frame Identification

209

Results on Unknown Predicates

46.642.7

65.3

40455055606570

Supervised Self-Training Graph-Based

F-Measure

Frame Identification

Full Parsing

30.226.6

46.7

25

30

35

40

45

50

Supervised Self-Training Graph-Based

F-Measure

210

Conclusions

211

Parsing using the theory of frame semantics– richer output than popular SRL systems

(Kingsbury and Palmer, 2002)

– domain general in comparison to deep semantic parsers

Significantly better performance on benchmark datasets than previous work– less independence assumptions

– only two statistical models

– semi-supervised extensions

Conclusions

212

Train parsers in other languages– Spanish, German, Portuguese, Japanese, Chinese, Swedish

Use presented techniques for deeper semantic analysis tasks–Especially semi-supervised learning

Use parser for NLP applications

–(right now being used to bootstrap more annotations)

Future Work

213

Case Grammar(“The Case for Case”,

Fillmore, 1968)

Frame Semantics(“Frame Semantics”,

Fillmore, 1982)

FrameNetPropBankVerbNet

NomBankOntoNotes

Frames(“A Framework

for Representing Knowledge”, Minsky, 1975)

Scripts(“Scripts, Plans,

Goals and Understanding”, Schank and Abelson, 1977)

MUCACE

GENIA

Data-Driven ShallowSemantic Parsing

Information Extraction(template filling)

Slide idea taken from Brendan O’Connor

More annotationsLarger lexicons

214

Last Word

Parser available at:

http://www.ark.cs.cmu.edu/SEMAFOR

(200 downloads in the past 6 months)

215

216

UDGMENT_DIRECT_ADDRESS

217

top related