phrase-based(the latter half)
TRANSCRIPT
![Page 1: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/1.jpg)
Graph Structure
・ Use search graph in phrase-based model ・ At weighted acyclic directed graph G < Ф,V,E,s,g,> Ф : phrase pair sets =feature vector h( ・ ) ・ weight V: vertex partial hypotheses E:edges weight of route E ⊆ V×V× Ф×A A: weight sets
![Page 2: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/2.jpg)
Graph Structure
• out()= edge sets which go out from vertex • in() = : edge sets which head to vertex ->Phrase pairs are linked by <out(), in()>At figure 5.8, phrase pair <へ行った , I went to> is linked by out() = <-----,0,<s>> and in()=<-- ・・・ ,9,went to>
𝑣
𝑣
![Page 3: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/3.jpg)
Graph Structure
• If Ѱ=(, ,…, ): rout from start to any vertexs, head()=tail(), then
Source language phrase sets: Target language phrase sets: Route weight: =
![Page 4: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/4.jpg)
Graph Structure
• In Fig.5.8, for the route
-> the parallel of word sets of source language 「行った」「へ」「領事館」 is “He went to the consulate”
Start
<行った ,He went>
<へ ,to><領事館 ,
the consulate>
![Page 5: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/5.jpg)
Semiring
• set R equipped with two binary operations addition“ + ” and multiplication “ × ”
• Associative: a+(b+c)=(a+b)+c, a×(b×c)=(a×b)×c• Commutative: a+b=b+a• Distributional: a×(b+c)=(a×b)+(a×c)• Additive inverse, multiplicative inverse 0+a=a+0=a; 1×a=a×1=a; 0×a=a×0=0 are not defined
![Page 6: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/6.jpg)
Semiring
• In Table 5.1, tropical semiring is used to solve maximization problem for route weight in decoder
A ⊕ ⊗
Tropical max + ー 0
![Page 7: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/7.jpg)
Semiring
• In weight directed graph G, for a rout from starting point to ending point of source language input f is Ѱ=
• Score of Ѱ = product of partial route = -> Problem which maximize this score is max⊗()= ⊕⊗()
A ⊕ ⊗Tropical max + ー 0
![Page 8: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/8.jpg)
Semiring
• In Fig.5.7,line 11 Q(+1,)max additive operation ⊕ is implemented for each vertex tail(e)=s of G• As semiring sastifies distributional feature-> weight of any vertexs V is ⊕⊗()=⊗
![Page 9: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/9.jpg)
Semiring
• Forward-backward algorithm for finding maximum of route weight in graph structure
• topological order(G): list of vertexs of graph G which arranged in topological order
• external variable
![Page 10: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/10.jpg)
Semiring
FORWARD(G)• topological order(G), ein()⊗
⊕ Start
tail(e)(e)
(e) ⊗
![Page 11: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/11.jpg)
Semiring
BACKWARD(G)• inversetopological order(G), e()⊗
⊕ Goal
(e)
(e) ⊗
head(e)
![Page 12: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/12.jpg)
Semiring
In problem which choose the optimum translation from search space expressed by weighted directed graph G Tropical semiring + Forward algorithm->Viterbi semiring
![Page 13: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/13.jpg)
k-best
• Besides forward-backward algorithm, k-best algorithm is used to optimize route weight
• Dijkstra’s algorithm: for single source shortest path problem
• Eppstein’s algorithm: for heaping multiple paths efficiently
![Page 14: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/14.jpg)
k-best
• Assume problem satisfies Tropical semiring and backward algorithm• Calculate and choose max (weight )• Fig.5.10 algorithm ・ cand: priority queue ・ < , s>: partial route ・ < ,>: partial route whose vertex and edgeout() ・ D: set of < ,>
![Page 15: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/15.jpg)
k-best
• k=1: Initialized cand
• Optimize weight of partial route and whole route
Whole route
D
cand
optimal
get out < , s>,register D Choose and out() insert to cand
heap ( ・ ) to get optimal
k time
![Page 16: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/16.jpg)
Limitation of Search Space
• If search space is big->any sort can be forgiven->calculation amount of decode algorithm become massive->limitation is necessary: ・ Distortion limit, constraint ・ Reordering limit, constraint
![Page 17: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/17.jpg)
Distortion Constraint
• Upper limit setting d for distance between phrase pair d The purpose is making model score small if model distorted lead to penalty become bigFor language pair which do not have big sort, distortion constraint reach good efficiencyIf d=0: no skip, translate from left to right smoothly->monotone translation
![Page 18: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/18.jpg)
Distortion Constraint• Constraint for case when have partial phrases do not reach the ending point : position of the first phrase of source language : the first position of translated phraseIf (), add d・ IBM Constraint
�̈� 𝑠𝑡𝑎𝑟𝑡𝑘 𝑒𝑛𝑑𝑘・・・
phrase
No need to exam
![Page 19: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/19.jpg)
Beam Search
・ Prune disused partial hypothesis and pay attention only partial hypothesis with high score for computational reduction・ Group of vertexs of search graph and prune partial hypothesis which has low score
![Page 20: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/20.jpg)
Beam Search・ Group of vertexs of search graph and prune partial hypothesis which has low score
Partial hypothesis pruned Partial hypothesis chose
![Page 21: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/21.jpg)
Beam Search
Some kinds of grouping: - Cover vector grouping - Radix grouping - Beam width pruning - Histogram pruning
![Page 22: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/22.jpg)
Heuristic Function
• Prevent partial hypothesis which has not been translated yet from pruning• Give predicted score for the rout and learn by A* search so that rout score get the maximum• ->can reduce search error
![Page 23: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/23.jpg)
Pre-reordering Method
Translation between languages which has significantly different grammatical structure• Pre-reordering rule• Pre-reordering model• Pre-reordering learning
![Page 24: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/24.jpg)
Pre-reordering Rule
• Based on tree from syntactic analysis, reorder to target language word order• Head-driven phrase structure grammar(HPSG)’s rule: - Syntactic anlysis - Move the subjects back
![Page 25: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/25.jpg)
Pre-reordering Model
• Source languages must have syntactic analysis tool and morphological analysis tool• Bilingual data are necessary• Probability value of pre-reordering patterns obtained will be estimated by maximum-likelihood estimation(MLE)• Choose the suitable pre-reordering patterns based on reordering part of speech from morphological analysis, or clustering word class
![Page 26: Phrase-based(the latter half)](https://reader036.vdocuments.net/reader036/viewer/2022062515/55d18370bb61eb57678b461a/html5/thumbnails/26.jpg)
Pre-reordering Learning
• For language pairs without any syntactic analysis tools and morphological analysis tools• Provisional tree structure automatically generated from syntactic analysis result• Divide tree factors to 2 labels: reordering label [X],and no-reordering label <X>• Use linear ordering problem(LOP) to formulate reordering model to find the approximate solution and build the parse tree