history of major nlp products & services in toshiba

39
1 / プププププププププププププププ(プププ1 Toshiba Confidential TOSHIBA OF EUROPE LTD. History of Major NLP Products & Services in Toshiba 978 JW-10 Japanese Word Processor 985 ASTRANSAC EJ : EtoJ MT System 989 ASTRANSAC JE : JtoE MT System 995 The 翻翻 PC MT System (Internet & Personal) 996 News Watch : Information Filtering Service 999 Fresh Eye Internet Search Engine/Portal 翻翻 KnowledgeMeister KM Support System 翻 翻 Chinese-Japanese Translation Service 006 KnowledgeMeister - Succeed

Upload: diza

Post on 11-Jan-2016

16 views

Category:

Documents


2 download

DESCRIPTION

History of Major NLP Products & Services in Toshiba. 1978 「 JW-10」 : Japanese Word Processor 1985 「 ASTRANSAC EJ」 : EtoJ MT System 1989 「 ASTRANSAC JE」 : JtoE MT System 1995 「 The 翻訳」 : PC MT System (Internet & Personal) 1996 「 News Watch」 : Information Filtering Service - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: History of Major NLP Products & Services in Toshiba

1 / (プレゼンテーション資料の作り方 ご提案) 1Toshiba Confidential TOSHIBA OF EUROPE LTD.

History of Major NLP Products & Services in Toshiba

1978 「 JW-10 」 : Japanese Word Processor

1985 「 ASTRANSAC EJ 」 : EtoJ MT System

1989 「 ASTRANSAC JE 」 : JtoE MT System

1995 「 The 翻訳」 : PC MT System (Internet & Personal)

1996 「 News Watch 」 : Information Filtering Service

1999 「 Fresh Eye 」 : Internet Search Engine/Portal

2001 「 KnowledgeMeister 」: KM Support System

2005   Chinese-Japanese Translation Service

2006  「 KnowledgeMeister - Succeed 」

Page 2: History of Major NLP Products & Services in Toshiba

2

Confidential

00 Month 0000 (edit in View > Header and Footer)

2Toshiba Confidential 2TOSHIBA OF EUROPE LTD.

Toshiba of Europe Ltd.Hideki Hirakawa

Integrated Use of Phrase Structure Forest and Dependency Forest in Preference Dependency Grammar (PDG)

29 January, 2008

Page 3: History of Major NLP Products & Services in Toshiba

3 3Toshiba Confidential TOSHIBA OF EUROPE LTD.

Agenda

Phrase Structure and Dependency Structure Analysis

Overview of the Preference Dependency Grammar(PDG)

Packed Shared Data Structure “Dependency Forest”

Evaluation of Dependency Forest

Conclusion

Page 4: History of Major NLP Products & Services in Toshiba

4 4Toshiba Confidential TOSHIBA OF EUROPE LTD.

Phrase Structure (PS) and Dependency Structure (DS)

Two major syntactic representation schemes

detpre

vp

n

time fly like an arrow

nv

np

pp

np

s

Information explicitly expressed by PS- Phrases (non-terminal nodes)- Structural categories (non-terminal labels)

detpre

vppsub

time fly like an arrow

Information explicitly expressed by DS- Head-dependent relations (directed arcs)- Functional categories (arc labels)

Phrase Structure (PS) Dependency Structure (DS)

Page 5: History of Major NLP Products & Services in Toshiba

5 5Toshiba Confidential TOSHIBA OF EUROPE LTD.

Constituency and dependency describe different dimensions.

A phrase-structure tree (PST) is closely related to a derivation, whereas a dependency tree rather describes the product of a process of derivation.

Constituency and dependency are not adversaries, they are complementary notions. Using them together we can overcome the problems that each notion has individually.

Formal & Computational Aspects of Dependency Grammar [Kruijff 02]

Relation between PS (Constituency) and DS

Page 6: History of Major NLP Products & Services in Toshiba

6 6Toshiba Confidential TOSHIBA OF EUROPE LTD.

Phrase structure analysis - Lexicalized PCFG Lexical information (including dependency relation) improves PS analysis accuracy (ex. Charniak 1997; Collins 1999; Bikel 2004)

- Use of dependency relations as discriminative features of maximum entropy phrase structure parser (ex. HPSG Parser (Oepen 2002), Reranking parser (Charniak and Johnson 2005))

- Use of another independent shallow dependency parser (Sagae et al. 2007)

Dependency analysis Almost no use of phrase structure information (Kakari-uke parsers, MSTParser (McDonald 2005), Malt parser(Nivre 2004)

Integration requires mapping Integration of PS and DS requires mapping between two structures of a sentence because sentence analyzers cannot combine any linguistic information without correspondence between the two structures.

Integrated Use of Phrase and Dependency Structures

Page 7: History of Major NLP Products & Services in Toshiba

7 7Toshiba Confidential TOSHIBA OF EUROPE LTD.

Mapping between PS and DS ( traditional researches ) Conversion from/to PS to/from DS based on heuristics    Phrase Structure Tree (PST) → Dependency Tree (DT) [Collins 99],   DT → PST [Xia&Palmer 00]   ⇒ Measurement of parse accuracy, tree bank creation etc.

Grammar equivalence   [Gaifman 65],[Abney 94] studied the equivalence relation between CFG PSG (CFG) and DG (Tesniere model DG)  ⇒  DG is strongly equivalent to only sub-class of CFG*1

Structure mapping based on packed shared data structures Partial structure mapping framework based on the Syntactic Graph [Seo&Simmons 89].    Creates mappings between PSTs and DTs based on partial structure mapping rules (described later)  ⇒ Syntactic graph generates inappropriate mapping [Hirakawa 06]

Complete mapping based on the “Dependency Forest”⇒ Integrated use of PS and DS (described later)

Page 8: History of Major NLP Products & Services in Toshiba

8 8Toshiba Confidential TOSHIBA OF EUROPE LTD.

Agenda

Phrase Structure and Dependency Structure Analysis

Overview of the Preference Dependency Grammar(PDG)

Packed Shared Data Structure “Dependency Forest”

Evaluation of Dependency Forest

Conclusion

Page 9: History of Major NLP Products & Services in Toshiba

9 9Toshiba Confidential TOSHIBA OF EUROPE LTD.

Basic Sentence Analysis Model

Sentence ◎○○

×

×

×

×

× ×

×

×

Generation Knowledge generates all possible interpretations

Interpretation Space  prescribed by interpretation description scheme

Constraint Knowledge rejection of interpretations

Preference Knowledge preference order of interpretations

Interpretation ◎ correct

○ plausible× implausible

◎ ○ ×> >

Optimum Interpretation Extraction

◎The optimum interpretation

reject

accept

Page 10: History of Major NLP Products & Services in Toshiba

10 10Toshiba Confidential TOSHIBA OF EUROPE LTD.

Example (1) Probabilistic Context Free Grammar(PCFG)

◎○○

×

×

×

×

××

×

×○

Generation Knowledge CFG rules

Interpretation Space  Phrase structure (parse tree)

Constraint Knowledge No constraints

Optimum Interpretation Extraction the Viterbi algorithm

Preference Knowledge Probabilities of the CFG rules

◎ ○ ×> >

Sentence ◎The optimum interpretation

Page 11: History of Major NLP Products & Services in Toshiba

12 12Toshiba Confidential TOSHIBA OF EUROPE LTD.

Basic Sentence Analysis Model of PDG

PK: Preference Knowledge, CK: Constraint Knowledge, GK: Generation Knowledge, IS: Interpretation Space

(a) NLA system with multilevel interpretation space(b) Packed shared data structure and interpretation mapping(c) Interpretations are externalizations of the lower level interpretations

Multilevel Packed Shared Data Connection Model

PK1 CK1

Sentence

GK1

IS1

5◇

3◇

IS2

◎The OptimumInterpretation

OptimumInterpretationExtraction

mapping

2◇△

2△

4△

△ △

5△

6△

m△

3△

△ 1△△

Level 1 Interpretation:

IS3

△△ ◎◇◇

l ◇

◇◇

4◇1◇◇

PK2 CK2

GK2

PK3 CK3

GK3

2○n○

5○1◎○

4○3○

6○

Level 2 Interpretation: Level 3 Interpretation:

1. Data Structure2. Optimum Solution Search

Page 12: History of Major NLP Products & Services in Toshiba

13 13Toshiba Confidential TOSHIBA OF EUROPE LTD.

PDG Implementation Model (data structure)

WPP = Word POS Pair, Phrase structure forest (PSF) = (packed shared) parse forest

Syntactic Layer

○○

All PSTs All DTs

Sentence“Time flies”

Morphological Layer

The OptimumDependency Tree

All WPP sequences

Interpretation mapping

Phrase str. forest

np np vp

fly/v

time/n

time/v

fly/n

vp

roots s

Dependency forest

top

fly/v

time/n

time/v

fly/n

obj

sub

toptopfly/vtime/n

time/v fly/n

WPP trellis

△△

×

××

×

×

top

fly/v

time/nsub

topDTPST

np vp

time/n fly/v

rootsWPP sequence

fly/vtime/n

PDG is an all-pair dependency analysis method with three level architecture utilizing three packed shared data structures

Integrated use of PS and DS level in syntactic layer

Page 13: History of Major NLP Products & Services in Toshiba

14 14Toshiba Confidential TOSHIBA OF EUROPE LTD.

: △

×

: ◎Optimum interpretation

1 □2 ◎

1 ◎

: ◎Optimum interpretation

□ 2 ◎

1 ◎

MSTParser

PDG

All MorphologicalInterpretations

1-best MorphologicalInterpretation

No CFG Grammar

MorphologyLevel

All DS Interpretations

All Interpretations with no POS ambiguities

:Well-formed Interpretations

Sentence ◎ , ◎

Comparison with other dependency analysis methods

No CFG Grammar

Sentence

Sentence

All DS Interpretations

PS Level DS Level

CDG

All PS Interpretations

CFG Filtering

CDG: Constraint Dependency Grammar, MSTParser : Maximum Spanning Tree Parser

CombinatorialExplosion

Over Pruning

Page 14: History of Major NLP Products & Services in Toshiba

15 15Toshiba Confidential TOSHIBA OF EUROPE LTD.

PDG Implementation Model (optimum solution search)

Integration of Preference Knowledge: Preference scores based on multilevel data structures are integrated into scores on a DF

Scoring

“Time flies”

Graph Branch Algorithm

PS forest

np np vp

fly/v

time/n

time/v

fly/n

vp

roots s

Dep. forest

top

fly/v

time/n

time/v

fly/n

obj

sub

toptopSentence fly/vtime/n

time/v fly/n

WPP trellis

The optimum dep. tree

Score integration

WPP seq. score Phrase str. score Dep. score

top

time/n

top

subfly/v

Syntactic LayerMorphological Layer

Optimum solution search

Page 15: History of Major NLP Products & Services in Toshiba

16 16Toshiba Confidential TOSHIBA OF EUROPE LTD.

PDG Analysis FlowSentence

Dependency Forest

PS Forest

WPP Trellis

Scored Dependency Forest

Extended Chart Parser

Forest Generation Scoring

Optimum TreeSearch

・ Preference Score Integration

・ Optimum Tree Search based on CM and PM

The Optimum Tree

Co-occurrenceScore Matrix

・ Dependency Forest Generation

Page 16: History of Major NLP Products & Services in Toshiba

17 17Toshiba Confidential TOSHIBA OF EUROPE LTD.

Agenda

Phrase Structure and Dependency Structure Analysis

Overview of the Preference Dependency Grammar(PDG)

Packed Shared Data Structure “Dependency Forest”

Evaluation of Dependency Forest

Conclusion

Page 17: History of Major NLP Products & Services in Toshiba

20 20Toshiba Confidential TOSHIBA OF EUROPE LTD.

Grammar Rule : partial structure mapping rule

X1/w1

Y/wh

Xh/wh Xn/wnXi/wi… … …

whd1 di

w1widn

wn

Partial Dependency Tree

Parser

Mapping

Sentence

Set of dependency trees

◇◇

◇◇

◇◇

◇◇◇

◇◇◇ ◇

△△

△△

△△

△△

△ △

△△ △

Mapping

Set of phrase structure trees

Packed Shared Dependency Structure(Syntactic Graph)

Packed Shared Phrase Structure(Phrase structure forest)

Partial Structure Mapping Method [Seo&Simmons 89]

Headed CFG Rule

Page 18: History of Major NLP Products & Services in Toshiba

21 21Toshiba Confidential TOSHIBA OF EUROPE LTD.

Syntactic Graph

Packed Shared Data Structure for Dependency Trees

Encompasses all dependency trees corresponding to phrase structure trees in the parse forest for a sentence

[1,fly,v][0,time,n]

[0,time,v] [1,fly,n]

[2,like,p]

[2,like,v]

[3,an,det] [4,arrow,n]

mod npp vnp

det

ppnvpp

vppsnp

snpvnp

SS

S

“Time flies likes an arrow”

1 2 3 4 5 6 7 8 9 10 11 12 13

1 1 1 1 1 1 1 1 1

2

3 1 1 1 1

4 1 1 1 1 1 1 1 1

5 1 1 1 1 1 1 1 1

6 1 1 1 1 1 1 1 1

7 1 1 1 1 1 1 1 1

8 1 1 1 1 1 1 1 1

9 1 1 1 1 1 1 1

10 1 1 1 1 1 1 1 1

11 1 1 1 1 1 1 1

12 1 1 1 1 1 1 1 1

13 1 1 1 1 1 1 1 1

Node: WPP Arc: Dependency Relation

Syntactic Graph Exclusion Matrix

Page 19: History of Major NLP Products & Services in Toshiba

22 22Toshiba Confidential TOSHIBA OF EUROPE LTD.

Completeness and Soundness of the syntactic graph

Definitions

Completeness : For every parse tree in the forest, there is a syntactic reading from the syntactic graph that is structurally equivalent to that parse tree.

∀PST : Phr.Str.Tree  ∃DT: Dep.Tree PST corresponds to DT

Soundness : For every syntactic reading from the syntactic graph, there is a parse tree in the forest that is structurally equivalent to that syntactic reading.

∀ DT: Dep.Tree   ∃ PST : Phr.Str. Tree PST corresponds to DT

Problem of the syntactic graph Violation of the soundness [Hirakawa 06]

×

×

×○

Phrase structure forestSyntactic graph

completeness

soundness

×

Dep. tree : DT Phr. str. tree : PT

Page 20: History of Major NLP Products & Services in Toshiba

23 23Toshiba Confidential TOSHIBA OF EUROPE LTD.

Example of the violation of soundness

Tokyo taxi driver call center○ ○ ○ ○ ○

nc-1 nc-2 nc-6nc-3

nj-5 nj-7S

rt-8nj-4

np1

Tokyo taxi driver call center

○ ○ ○ ○ ○

nc-1 nc-2 nc-6

nj-7S

rt-8

np3

Tokyo taxi driver call center○ ○ ○ ○ ○

nc-1 nc-6nc-3nj-5

Srt-8

np2

Tokyo taxi driver call center○ ○ ○ ○ ○

nc-2 nc-6nc-3nj-4

S

rt-8

○ ○ ○ ○ ○

nc-1 nc-2 nc-6nc-3

(a) (b)

(c)

(d)

Syntactic graph for (a),(b) and (c) generates (d) which has no corresponding phrase structure tree in the phrase structure forest

EM 1 4 2 5 3 7 6 81 14 1 1 12 15 1 1 13 17 1 1 168

Syntactic Graph/Exclusion Matrix

S

rt-8

Page 21: History of Major NLP Products & Services in Toshiba

24 24Toshiba Confidential TOSHIBA OF EUROPE LTD.

Packed Shared Data Structure for Dependency Trees

Dependency Forest(DF) = Dependency Graph(DG) + Co-occurrence Matrix(CM) CM(Dependency Forest): Defines the arc co-occurrence relation (Equivalent arcs are allowed in DF)

Dependency Forest [Hirakawa 06]

Co-occurrence MatrixDependency Graph

Dependency Forest for “Time flies like an arrow.”

npp19

det14

pre15vpp20

vpp18sub24

sub23obj4

nc2 obj16

0,time/n 1,fly/v

0,time/v 1,fly/n

2,like/p

2,like/v

3,an/det 4,arrow/n

root

rt29

rt32

rt31

2 24 4 23 19 18 20 14 16 15 31 29 322 - ○ ○ ○ ○ 24 - ○ ○ ○ ○ 4 - ○ ○ ○ ○ ○23 ○ - ○ ○ ○ 19 ○ - ○ ○ ○18 ○ - ○ ○ ○ 20 ○ - ○ ○ ○14 ○ ○ ○ ○ ○ ○ ○ - ○ ○ ○ ○ ○16 ○ ○ ○ - ○ 15 ○ ○ ○ ○ ○ ○ - ○ ○31 ○ ○ ○ ○ - 29 ○ ○ ○ ○ - 32 ○ ○ ○ ○ ○ -

obj25

Page 22: History of Major NLP Products & Services in Toshiba

25 25Toshiba Confidential TOSHIBA OF EUROPE LTD.

Features of the Dependency Forest

Mapping is assured (phrase structure tree ⇔ dependency tree) → usable for multilevel packed shared data connection model

High flexibility in describing constraints

   ex. non-projective dependency structure*1

*1 : dependency structure violating at least the following projectivity conditions ''no cross dependency exits'' ''no dependency covers the top node''

Page 23: History of Major NLP Products & Services in Toshiba

26 26Toshiba Confidential TOSHIBA OF EUROPE LTD.

Generation Flow of Phrase Structure Forest and Dependency Forest

Input sentence

WPP Trellis

Parse Forest

Initial Dependency Forest

DF Extraction

Chart Parsing

Dictionary

ExtendedCFG

Optimum Solution Search

Dependency Tree

Dependency Forest

Morphological Analysis

DF Reduction

(1)

(2)

(3)

(4)

PDG analysis process PDG data structure

Page 24: History of Major NLP Products & Services in Toshiba

27 27Toshiba Confidential TOSHIBA OF EUROPE LTD.

y/Xi→x1/X1,...,xn/Xn

CFG    

PDG Grammar RuleExtended CFG rule with phrase head and mapping to dependency structure

Xi: Variable Xh(phrase head) :

     “ Xh” is either of “X1”..“Xn”

Rewriting rule part

y/X h→ x1/X1,...,xn/Xn

Dependency tree

Nodes: X1, ... , Xn

Top node: Xh

: [arc(arcname1,Xi,X j ),...,arc(arcnamen-1,Xk,X l )] Dependency structure part

 ex.  vp/V → v/V, np/NP, pp/PP     :   [arc(obj,NP,V), arc(vpp,PP,V)]

V ( = see/v )

obj

PP ( = in/pre )NP ( = girl/n )

vpp

vp/V(=see/v)

v/V(=see/v) np/NP(=girl/n)pp/PP(=in/pre)

Phrase structure Dependency structuresee a girl in the forest

Page 25: History of Major NLP Products & Services in Toshiba

28 28Toshiba Confidential TOSHIBA OF EUROPE LTD.

Standard Chart Parsing: Structure of Standard Edge

<0,2, s → np   ・ vp pp>

a cat chases …

0 1 2 3

<0,2, np → det noun ・>

<0,1,det → [a]・> <1,2,n → [cat]・> <2,3,v → [chase]・>Lexical edge

Inactive edge

Active edge

Input position

EDGE <0,2, s → np・ vp pp>

Startposition

Endposition

Head category

Found constituents Remainingconstituents

Page 26: History of Major NLP Products & Services in Toshiba

29 29Toshiba Confidential TOSHIBA OF EUROPE LTD.

Structure of PDG Edge

a cat chases 0 1 2 3

<0,1,det → [a] ・ : [a-det-0]>

<0,2, s/V → np/[cat-n-1] ・ vp/V pp/PP : [arc(obj,/[cat-n-1],V), arc(vpp,PP,V)]>

Two extensions to the standard edge structure

(1) Mapping to dependency structure

(2) Packing of inactive edges       PDG (packed) edge is a set of sharable PDG single edges

<0,2, np/[cat-n-1] → det/[a-det-0] noun/[cat-n-1] ・ : arc(det,[a-det-0] ,[cat-n-1] )>

PDG single edge = Standard edge + Phrase head + Dependency structure(tree)

<1,2,n → [cat] ・ : [cat-n-1]> <2,3,v → [chase] ・ : [chase-n-2]>

Page 27: History of Major NLP Products & Services in Toshiba

30 30Toshiba Confidential TOSHIBA OF EUROPE LTD.

・ Bottom-up chart parser using the Agenda

・ Terminates when the Agenda becomes empty

Generation of Phrase Structure Forest and Initial Dependency Forest

Chart

Agenda

φ

Inactive Edges

Active edges<E12 s2→… … ・><E12 s2→... … ・> :

<E52 np2→... ・ > :

<E1 s1  →  [[np1 vp1]][ds11 ] >

<Eroot root  →  [s1 s2][ds1 ds2]>

<E2 s2 →…>

<E3 np1  →  [[det1 n1]]: [ds31 ] >

<E4 vp1  →  [[v1 np2] [v1 np3 pp1]]: [ds41 ds42] >

<Er root→[s] ・><E2 s2 →… ・ > :

Phrase Structure Foresta set of inactive edges reachable from the root edge

Initial Dependency Graph a set of arcs in the PS forest

arc(root-17,[like]-v-2,[root]-x),arc(root-24,[flies]-v-1,[root]-x),arc(root-27,[time]-v-0,[root]-x),arc(sub-16,[flies]-n-1,[like]-v-2),arc(nc-4,[time]-n-0,[flies]-n-1),arc(obj-14,[arrow]-n-4,[like]-v-2),             :

<E3 np1  → ・

・・ >Arc3,..

<E4 vp1  → ・・・ >

Arc8,Arc9,..

Initial Co-occurrence Matrix CM1 ~ 3:CMatrix setting condition

<E1 s1  →  [[np1 vp1 pp1]][Arc1,Arc2 ]

>

A1 A2 A3 A4 A8 A9

A1

A2

A3

A4

A8

A9

CM1: Between arcs in DS

○○

CM2: Between arcs in DS and arcs governed by constituents ○

○○

○ ○○○

○ ○○○

CM3: Between arcs governed by different constituents

○ ○

Page 28: History of Major NLP Products & Services in Toshiba

31 31Toshiba Confidential TOSHIBA OF EUROPE LTD.

Generation of Phrase Structure Forest and Initial Dependency Forest

Chart

Agenda

φ

Inactive Edges

Active edges<E12 s2→… … ・><E12 s2→... … ・> :

<E52 np2→... ・ > :

<Er root→[s] ・><E2 s2 →… ・ > :

Phrase Structure Forest

Initial Dependency Graph a set of arcs in the PS forest

Initial Co-occurrence Matrix CM1 ~ 3:CMatrix setting condition

2 24 4 25 23 19 18 20 14 16 15 31 29 322 - ○ ○ ○ ○ 24 - ○ ○ ○ ○ 4 - ○ ○ ○ ○25 - ○ ○ ○ ○23 ○ - ○ ○ ○ 19 ○ - ○ ○ ○18 ○ - ○ ○ ○ 20 ○ - ○ ○ ○14 ○ ○ ○ ○ ○ ○ ○ ○ - ○ ○ ○ ○ ○16 ○ ○ ○ - ○ 15 ○ ○ ○ ○ ○ ○ ○ - ○ ○31 ○ ○ ○ ○ - 29 ○ ○ ○ ○ - 32 ○ ○ ○ ○ ○ ○ -

Initial Dependency Forest

178 np

[1,fly,v][0,time,n]

[0,time,v] [1,fly,n]

[2,like,p]

[2,like,v]

[3,an,det]

[4,arrow,n]

123 np103 np

166

169

150

153

138

121110

101

133 np

184 vp188 pp

197 np189 vp

201 vp195 vp

191 s 186 s196 s

186 root

Page 29: History of Major NLP Products & Services in Toshiba

32 32Toshiba Confidential TOSHIBA OF EUROPE LTD.

Reduction of the Initial Dependency Forest

npp19

vpp18sub24

sub23obj4

nc2

obj25

0,time/n 1,fly/v

0,time/v 1,fly/n

2 24 4 25 23 19 18 20 14 16 15 31 29 322 - ○ ○ ○ ○ 24 - ○ ○ ○ ○ 4 - ○ ○ ○ ○25 - ○ ○ ○ ○23 ○ - ○ ○ ○ 19 ○ - ○ ○ ○18 ○ - ○ ○ ○ 20 ○ - ○ ○ ○14 ○ ○ ○ ○ ○ ○ ○ ○ - ○ ○ ○ ○ ○16 ○ ○ ○ - ○ 15 ○ ○ ○ ○ ○ ○ ○ - ○ ○31 ○ ○ ○ ○ - 29 ○ ○ ○ ○ - 32 ○ ○ ○ ○ ○ ○ -

Equivalentarc

Generated from two grammar rules vp/V → v/V,np/NP       : [arc(obj,NP,V)] vp/V → v/V,np/NP,pp/PP : [arc(obj,NP,V),                            arc(vpp,PP,V)]

npp19

vpp18sub24

sub23obj4

nc2

0,time/n 1,fly/v

0,time/v 1,fly/n

Reduction

2 24 4 23 19 18 20 14 16 15 31 29 322 - ○ ○ ○ ○ 24 - ○ ○ ○ ○ 4 - ○ ○ ○ ○ ○23 ○ - ○ ○ ○ 19 ○ - ○ ○ ○18 ○ - ○ ○ ○ 20 ○ - ○ ○ ○14 ○ ○ ○ ○ ○ ○ ○ - ○ ○ ○ ○ ○16 ○ ○ ○ - ○ 15 ○ ○ ○ ○ ○ ○ - ○ ○31 ○ ○ ○ ○ - 29 ○ ○ ○ ○ - 32 ○ ○ ○ ○ ○ -

more than one equivalent arc is merged into one arc without increasing the number of the generalized dependency trees in the dependency forests

Page 30: History of Major NLP Products & Services in Toshiba

33 33Toshiba Confidential TOSHIBA OF EUROPE LTD.

Completeness and Soundness of the Dependency Forest

Completeness : All phrase structure trees in the parse forest have corresponding dependency trees in the dependency forest.

∀PT: phrase structure tree   ∃ DT: dependency tree  dep_tree(PT) = DT

Soundness : Every phrase structure tree corresponding to a dependency tree in the dependency forest exists in the phrase structure forest

∀DT: dependency tree    ∃ PT: phrase structure tree   dep_tree(PT) = DT

×

×××

×

○ ○

○○

×DT : dependency tree PT : phrase structure tree

Phrase structure forestDependency forest

1:N correspondence in general

The completeness and soundness of the dependency forest is assured [Hirakawa 06]

Page 31: History of Major NLP Products & Services in Toshiba

34 34Toshiba Confidential TOSHIBA OF EUROPE LTD.

Evaluation of the Dependency Forest Framework

Analysis of prototypical ambiguous sentences

1 to N / N to 1 correspondence between phrase structure tree/trees and dependency trees/tree

Generation of Non-projective dependency tree

Page 32: History of Major NLP Products & Services in Toshiba

35 35Toshiba Confidential TOSHIBA OF EUROPE LTD.

=========== s/Sentence =========== (R1) s/VP→ np/NP,vp/VP : [arc(sub,NP,VP)] % Declarative sentence (R2) s/VP→ vp/VP : [] % Imperative sentence========= np/Noun Phrase ======== (R3) np/N→ n/N : [] %  Single noun (R4) np/N2→ n/N1,n/N2 : [arc(nc,N1,N2)]    % Compound noun (R5) np/N→ det/DET,n/N : [arc(det,DET,N)]  %  (R6) np/NP→ np/NP,pp/PP : [arc(npp,PP,NP)] % Prepositional phrase attachment (R7)  np/N→  ving/V,n/N  :  [arc(adjs,V,N)]  %  Adjectival usage( subject) (R8)  np/N→  ving/V,n/N  :  [arc(adjo,V,N)]  %  Adjectival usage( object) (R9) np/V→ ving/V,np/NP : [arc(obj,NP,V)]  % Gerund phrase(R10)  np/V→  ving/V,np/NP,pp/PP  :  [arc(obj,NP,V),arc(vpp,PP,V)]  %  Gerand phrase with PP(R11)  np/NP→  np/NP0,and/AND,np/NP:  [arc(and,NP0,NP),arc(cnj,AND,NP0)]% Coordination (and)(R12)  np/NP→  np/NP0,or/OR,np/NP  :  [arc(or,NP0,NP),arc(cnj,OR,NP0)]      % Coordination (or)========= vp/Verb ======== phrase(R13) vp/V→ v/V : []  % Intransitive verb(R14) vp/V→ v/V,np/NP : [arc(obj,NP,V)]  % Transitive verb(R15) vp/V→ be/BE,ving/V,np/NP : [arc(obj,NP,V),arc(prg,BE,V)] % Progressive(R16) vp/BE→ be/BE,np/NP : [arc(dsc,NP,BE)]   % Copular(R17) vp/VP→ vp/VP,pp/PP : [arc(vpp,PP,VP)]   % PP-attachment(R18) vp/VP→ adv/ADV,vp/VP : [arc(adv,ADV,VP)]   % Adverb modification(R19) vp/V→ v/V,np/NP,adv/ADV,relc/RELP  % non-projective pattern        :[arc(obj,NP,V),arc(adv,ADV,V),arc(rel,RELP,NP)] ======== pp/Prepositional phrase ========(R20) pp/P→ pre/P,np/NP :[arc(pre,NP,P)] 

Grammar rules for typical ambiguities (PP-attachment,Coordination, be-verb usage)

Grammar for Ambiguous Sentences

Page 33: History of Major NLP Products & Services in Toshiba

36 36Toshiba Confidential TOSHIBA OF EUROPE LTD.

PP-attachment Ambiguity

Input sentence: I saw a girl with a telescope in the forest.

Five well-formed dependency trees

0,I 1,saw 2,a

root

4,with 6,telescope 8,the 9,forest3,girl 5,a 7,in

det4,0det11,0 det42,0

sub33,20 obj6,20

vpp16,15

vpp27,5

npp14,10pre12,10 pre24,10

npp29,5

npp26,5root23,0

Node0,I : [i]-n-01,saw : [saw]-v-12,a : [a]-det-23,girl : [girl]-n-34,with : [with]-pre-45,a : [a]-det-56,telescope : [telescope]-n-67,in : [in]-pre-78,the : [the]-det-89,forest : [forest]-n-9root : [root]-x-root

33 4 6 14 16 11 12 29 26 27 23 24 4233 - ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○4 ○ - ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○6 ○ ○ - ○ ○ ○ ○ ○ ○ ○ ○ ○ ○14 ○ ○ ○ - ○ ○ ○ ○ ○ ○ ○ ○16 ○ ○ ○ - ○ ○ ○ ○ ○ ○ ○11 ○ ○ ○ ○ ○ - ○ ○ ○ ○ ○ ○ ○12 ○ ○ ○ ○ ○ ○ - ○ ○ ○ ○ ○ ○29 ○ ○ ○ ○ ○ ○ - ○ ○ ○26 ○ ○ ○ ○ ○ ○ ○ - ○ ○ ○27 ○ ○ ○ ○ ○ ○ ○ - ○ ○ ○23 ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ - ○ ○24 ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ - ○42 ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ -

Crossing

Single role

Page 34: History of Major NLP Products & Services in Toshiba

37 37Toshiba Confidential TOSHIBA OF EUROPE LTD.

Coordination Scope Ambiguity

Input sentence : Earth and Moon or Jupiter and Ganymede.

Node0,earth : [earth]-n-01,and : [and]-and-12,moon : [moon]-n-23,or : [or]-or-34,jupiter : [jupiter]-n-45,and : [and]-and-56,ganymede: [ganymede]-n-6root : [root]-x-root

0,earth 1,and 2,moon 3,or 4,jupitor

root

5,and 6,ganymede

and12,10

and25,20

cnj2,0

or9,4

cnj6,0

or22,3

cnj14,0

and18,12

root26,0

and14,5

25 12 4 2 22 9 6 18 14 2625 - ○ ○ ○ ○ ○ ○ ○12 - ○ ○ ○ ○ ○ ○4 - ○ ○ ○ ○ ○ ○ ○2 ○ ○ ○ - ○ ○ ○ ○ ○ ○22 ○ ○ ○ - ○ ○ ○ ○9 ○ ○ ○ ○ - ○ ○ ○ ○6 ○ ○ ○ ○ ○ ○ - ○ ○ ○18 ○ ○ ○ ○ ○ ○ ○ - ○ ○14 ○ ○ ○ ○ ○ ○ ○ ○ - ○26 ○ ○ ○ ○ ○ ○ ○ ○ ○ -

Crossing

Single role

Five well-formed dependency trees

Page 35: History of Major NLP Products & Services in Toshiba

38 38Toshiba Confidential TOSHIBA OF EUROPE LTD.

Structural Interpretation Ambiguity andPP-attachment Ambiguity

Input sentence: My hobby is watching birds with telescopeTen well-formed dependency trees

0,my 1,hobby 2,is 3,watching 4,birds

root

5,with 6,telescope

sub35,1

sub38,10

prg2,10adj4,12

dsc33,8

dsc36,10

obj6,15

sub5,5

npp23,5

npp27,3

vpp24,7

root44,0det1,0 pre22,0root41,0

Node0,my : [my]-det-01,hobby : [hobby]-n-12,is : [is]-be-23,watching : [watching]-ving-34,birds : [birds]-n-45,with : [with]-pre-56,telescope : [telescope]-n-6root : [root]-x-root

1 38 35 2 4 33 36 6 5 23 27 24 22 44 411 - ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○38 ○ - ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ 35 ○ - ○ ○ ○ ○ ○ ○2 ○ ○ - ○ ○ ○ ○ ○4 ○ ○ - ○ ○ ○ ○ 33 ○ ○ - ○ ○ ○ ○ ○ ○ ○ 36 ○ ○ ○ - ○ ○ ○ 6 ○ ○ ○ ○ ○ - ○ ○ ○ ○ ○ ○5 ○ ○ ○ - ○ ○ ○ ○ 23 ○ ○ ○ ○ ○ ○ ○ ○ ○ - ○ ○ ○27 ○ ○ ○ ○ ○ - ○ ○ 24 ○ ○ ○ ○ ○ ○ - ○ ○ ○22 ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ - ○ ○44 ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ - 41 ○ ○ ○ ○ ○ ○ ○ -

Page 36: History of Major NLP Products & Services in Toshiba

39 39Toshiba Confidential TOSHIBA OF EUROPE LTD.

N to 1 Correspondence from PSTs to One DT ( 1 )

Spurious ambiguity (Eisner96),(Noro05)

  (R17)vp/VP→  vp/VP,pp/PP    :  [arc(vpp,PP,VP)]   % PP-attachment

  (R18)vp/VP→ adv/ADV,vp/VP : [arc(adv,ADV,VP)]  % Adverb modification

in the forestsaw a catShe curiously

vp ppnp adv

vp

vp

s

in the forestsaw a catShe curiously

vp ppnp adv

vp

vp

sRule application: R17 → R18 Rule application: R18 → R17

sawShe in the foresta catcuriously

adv

vpp

Page 37: History of Major NLP Products & Services in Toshiba

40 40Toshiba Confidential TOSHIBA OF EUROPE LTD.

Modification scope problem (Mel'uk88)  Dependency structure has ambiguities in modification scope when it has a head word which

has dependants located at the right-hand side and the left-hand side of the head word.   ex. Earth and Jupiter in Solar System.

0,Earth 1,and 2,Jupiter 3,in 4,Solar System

rootand4,20npp8,0

4 2 8 7 124 - ○ ○ ○ ○2 ○ - ○ ○ ○8 ○ ○ - ○ ○7 ○ ○ ○ - ○12 ○ ○ ○ ○ -

pre7,0cnj2,0 root12,0

・ Introduction of “Grouping” ( Coordination and operator words (ex. not, only) ) [Mel'uk88]

・ Japanese has no modification scope problem because it has no right to left dependency.

Jupiter

np

in Solar System

pp

Earth

np

and

cnj

np

np

Jupiter

np

in Solar System

pp

Earth

np

and

cnj

np

np

N to 1 Correspondence from PSTs to One DT ( 2 )

Page 38: History of Major NLP Products & Services in Toshiba

41 41Toshiba Confidential TOSHIBA OF EUROPE LTD.

Generation of Non-projective Dependency Tree

Grammar rule for non-projective dependency tree 

  (R19)vp/V → v/V,np/NP,adv/ADV,relc/REL            :

[arc(obj,NP,V),arc(adv,ADV,V),arc(rel,RELP,NP)]

1,saw 2,the

root

5,which was Persian0,She 4,curiously3,cat

det4,0sub12,20

obj6,20

adv10,15

root14,0

re1 11,10

12 4 6 10 11 1412 - ○ ○ ○ ○ ○4 ○ - ○ ○ ○ ○6 ○ ○ - ○ ○ ○10 ○ ○ ○ - ○ ○11 ○ ○ ○ ○ - ○14 ○ ○ ○ ○ ○ -

Input sentence : She saw the cat curiously which was Persian*1

*1: Artificial example for showing the rule applicability

Page 39: History of Major NLP Products & Services in Toshiba

42 42Toshiba Confidential TOSHIBA OF EUROPE LTD.

Conclusion

Dependency forest is a packed shared data structure - Bridge between phrase structure and dependency structure

usable for Multilevel Packed Shared Data Connection MODEL of PDG

- High flexibility in describing constraints

               Future work

Extension of the framework for the modification scope problem (Grouping)

Real-world system implementation