coordination structure analysis using dual decomposition · 1 introduction...

Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 430–438,Avignon, France, April 23 - 27 2012. c©2012 Association for Computational Linguistics

Coordination Structure Analysis using Dual Decomposition

Atsushi Hanamoto 1 Takuya Matsuzaki 1

1. Department of Computer Science, University of Tokyo, Japan2. Web Search & Mining Group, Microsoft Research Asia, China{hanamoto, matuzaki}@is.s.u-tokyo.ac.jp

[email protected]

Jun’ichi Tsujii 2

Abstract

Coordination disambiguation remains a dif-ficult sub-problem in parsing despite thefrequency and importance of coordinationstructures. We propose a method for disam-biguating coordination structures. In thismethod, dual decomposition is used as aframework to take advantage of both HPSGparsing and coordinate structure analysiswith alignment-based local features. Weevaluate the performance of the proposedmethod on the Genia corpus and the WallStreet Journal portion of the Penn Tree-bank. Results show it increases the per-centage of sentences in which coordinationstructures are detected correctly, comparedwith each of the two algorithms alone.

1 Introduction

Coordination structures often give syntactic ambi-guity in natural language. Although a wrong anal-ysis of a coordination structure often leads to atotally garbled parsing result, coordination disam-biguation remains a difficult sub-problem in pars-ing, even for state-of-the-art parsers.

One approach to solve this problem is a gram-matical approach. This approach, however, of-ten fails in noun and adjective coordinations be-cause there are many possible structures in thesecoordinations that are grammatically correct. Forexample, a noun sequence of the form “n0 n1

and n2 n3” has as many as five possible struc-tures (Resnik, 1999). Therefore, a grammaticalapproach is not sufficient to disambiguate coor-dination structures. In fact, the Stanford parser(Klein and Manning, 2003) and Enju (Miyao andTsujii, 2004) fail to disambiguate a sentence I am

a freshman advertising and marketing major. Ta-ble 1 shows the output from them and the correctcoordination structure.

The coordination structure above is obvious tohumans because there is a symmetry of conjuncts(-ing) in the sentence. Coordination structures of-ten have such structural and semantic symmetryof conjuncts. One approach is to capture localsymmetry of conjuncts. However, this approachfails in VP and sentential coordinations, whichcan easily be detected by a grammatical approach.This is because conjuncts in these coordinationsdo not necessarily have local symmetry.

It is therefore natural to think that consider-ing both the syntax and local symmetry of con-juncts would lead to a more accurate analysis.However, it is difficult to consider both of themin a dynamic programming algorithm, which hasbeen often used for each of them, because it ex-plodes the computational and implementationalcomplexity. Thus, previous studies on coordina-tion disambiguation often dealt only with a re-stricted form of coordination (e.g. noun phrases)or used a heuristic approach for simplicity.

In this paper, we present a statistical analysismodel for coordination disambiguation that usesthe dual decomposition as a framework. We con-sider both of the syntax, and structural and se-mantic symmetry of conjuncts so that it outper-forms existing methods that consider only eitherof them. Moreover, it is still simple and requiresonly O(n4) time per iteration, where n is the num-ber of words in a sentence. This is equal to thatof coordination structure analysis with alignment-based local features. The overall system still has aquite simple structure because we need just slightmodifications of existing models in this approach,

430

Stanford parser/EnjuI am a ( freshman advertising ) and (marketing major )

Correct coordination structureI am a freshman ( ( advertising and mar-keting ) major )

Table 1: Output from the Stanford parser, Enju and thecorrect coordination structure

so we can easily add other modules or features forfuture.

The structure of this paper is as follows. First,we describe three basic methods required in thetechnique we propose: 1) coordination structureanalysis with alignment-based local features, 2)HPSG parsing, and 3) dual decomposition. Fi-nally, we show experimental results that demon-strate the effectiveness of our approach. We com-pare three methods: coordination structure anal-ysis with alignment-based local features, HPSGparsing, and the dual-decomposition-based ap-proach that combines both.

2 Related Work

Many previous studies for coordination disam-biguation have focused on a particular type of NPcoordination (Hogan, 2007). Resnik (1999) dis-ambiguated coordination structures by using se-mantic similarity of the conjuncts in a taxonomy.He dealt with two kinds of patterns, [n0 n1 andn2 n3] and [n1 and n2 n3], where ni are all nouns.He detected coordination structures based on sim-ilarity of form, meaning and conceptual associa-tion between n1 and n2 and between n1 and n3.Nakov and Hearst (2005) used the Web as a train-ing set and applied it to a task that is similar toResnik’s.

In terms of integrating coordination disam-biguation with an existing parsing model, our ap-proach resembles the approach by Hogan (2007).She detected noun phrase coordinations by find-ing symmetry in conjunct structure and the depen-dency between the lexical heads of the conjuncts.They are used to rerank the n-best outputs of theBikel parser (2004), whereas two models interactwith each other in our method.

Shimbo and Hara (2007) proposed analignment-based method for detecting and dis-ambiguating non-nested coordination structures.

They disambiguated coordination structuresbased on the edit distance between two conjuncts.Hara et al. (2009) extended the method, dealingwith nested coordinations as well. We used theirmethod as one of the two sub-models.

3 Background

3.1 Coordination structure analysis withalignment-based local features

Coordination structure analysis with alignment-based local features (Hara et al., 2009) is a hy-brid approach to coordination disambiguation thatcombines a simple grammar to ensure consistentglobal structure of coordinations in a sentence,and features based on sequence alignment to cap-ture local symmetry of conjuncts. In this section,we describe the method briefly.

A sentence is denoted by x = x1...xk, where xi

is the i-th word of x. A coordination boundariesset is denoted by y = y1...yk, where

yi =

(bl, el, br, er) (if xi is a coordinating

conjunction having leftconjunct xbl

...xeland

right conjunct xbr ...xer)null (otherwise)

In other words, yi has a non-null valueonly when it is a coordinating conjunction.For example, a sentence I bought books andstationary has a coordination boundaries set(null, null, null, (3, 3, 5, 5), null).

The score of a coordination boundaries set isdefined as the sum of score of all coordinatingconjunctions in the sentence.

score(x, y) =

k∑m=1

score(x, ym)

=k∑

m=1

w · f(x, ym) (1)

where f(x, ym) is a real-valued feature vector ofthe coordination conjunct xm. We used almost thesame feature set as Hara et al. (2009): namely, thesurface word, part-of-speech, suffix and prefix ofthe words, and their combinations. We used theaveraged perceptron to tune the weight vector w.

Hara et al. (2009) proposed to use a context-free grammar to find a properly nested coordina-tion structure. That is, the scoring function Eq (1)

431

COORD Coordination.CJT Conjunct.

N Non-coordination.CC Coordinating conjunction like “and”.W Any word.

Table 2: Non-terminals

Rules for coordinations:COORDi,m → CJTi,jCCj+1,k−1CJTk,m

Rules for conjuncts:CJTi,j → (COORD|N)i,j

Rules for non-coordinations:Ni,k → COORDi,jNj+1,k

Ni,j →Wi,i(COORD|N)i+1,j

Ni,i →Wi,i

Rules for pre-terminals:CCi,i → (and|or|but|, |; |+|+/−)i

CCi,i+1 → (, |; )i(and|or|but)i+1

CCi,i+2 → (as)i(well)i+1(as)i+2

Wi,i → ∗i

Table 3: Production rules

is only defined on the coordination structures thatare licensed by the grammar. We only slightly ex-tended their grammar for convering more varietyof coordinating conjunctions.

Table 2 and Table 3 show the non-terminals andproduction rules used in the model. The only ob-jective of the grammar is to ensure the consistencyof two or more coordinations in a sentence, whichmeans for any two coordinations they must be ei-ther non-overlapping or nested coordinations. Weuse a bottom-up chart parsing algorithm to out-put the coordination boundaries with the highestscore. Note that these production rules don’t needto be isomorphic to those of HPSG parsing andactually they aren’t. This is because the two meth-ods interact only through dual decomposition andthe search spaces defined by the methods are con-sidered separately.

This method requires O(n4) time, where n isthe number of words. This is because there areO(n2) possible coordination structures in a sen-tence, and the method requires O(n2) time to geta feature vector of each coordination structure.

3.2 HPSG parsing

HPSG (Pollard and Sag, 1994) is one of thelinguistic theories based on lexicalized grammar

signPHON list of string

SYNSEM

synsem

LOCAL

local

CAT

category

HEADheadMODL synsemMODR synsem

SUBJ list of synsemCOMPS list of synsem

SEM semantics

NONLOCnonlocalREL list of localSLASH list of local

Figure 1: HPSG sign

2SUBJ < >COMPS < >2

HEADSUBJ < >COMPS < >

1


1

HEADSUBJ COMPS < | >

1COMPS < >

HEADSUBJ COMPS

1

23 4

3

42

Figure 2: Subject-Head Schema (left) and Head-Complement Schema (right)

and unbounded dependencies. SEM feature rep-resents the semantics of a constituent, and in thisstudy it expresses a predicate-argument structure.Figure 2 presents the Subject-Head Schema

and the Head-Complement Schema1 defined in(Pollard and Sag, 1994). In order to express gen-eral constraints, schemata only provide sharing offeature values, and no instantiated values.Figure 3 has an example of HPSG parsing

of the sentence “Spring has come.” First, eachof the lexical entries for “has” and “come” areunified with a daughter feature structure of theHead-Complement Schema. Unification providesthe phrasal sign of the mother. The sign of thelarger constituent is obtained by repeatedly apply-ing schemata to lexical/phrasal signs. Finally, thephrasal sign of the entire sentence is output on thetop of the derivation tree.

3 Acquiring HPSG from the PennTreebank

As discussed in Section 1, our grammar devel-opment requires each sentence to be annotatedwith i) a history of rule applications, and ii) ad-ditional annotations to make the grammar rulesbe pseudo-injective. In HPSG, a history of ruleapplications is represented by a tree annotatedwith schema names. Additional annotations are

1The value of category has been presented for simplicity,while the other portions of the sign have been omitted.

Spring

HEAD nounSUBJ < >COMPS < >

HEAD verbSUBJ < >

COMPS < >

5

has

HEAD verb

SUBJ < >

COMPS < >

come

HEAD verbSUBJ < >COMPS < >

5HEAD nounSUBJ < >COMPS < >


1COMPS < >

HEADSUBJ COMPS

1

23 4

3

42

UnifyUnify

Head-complementschema

Lexical entries

Spring

HEAD nounSUBJ < >COMPS < > 2


1

has


1

come

2


1


1

subject-head

head-comp

Figure 3: HPSG parsing

required because HPSG schemata are not injec-tive, i.e., daughters’ signs cannot be uniquely de-termined given the mother. The following annota-tions are at least required. First, the HEAD featureof each non-head daughter must be specified sincethis is not percolated to the mother sign. Second,SLASH/REL features are required as described inour previous study (Miyao et al., 2003a). Finally,the SUBJ feature of the complement daughter inthe Head-Complement Schema must be specifiedsince this schema may subcategorize an unsatu-rated constituent, i.e., a constituent with a non-empty SUBJ feature. When the corpus is anno-tated with at least these features, the lexical en-tries required to explain the sentence are uniquelydetermined. In this study, we define partially-specified derivation trees as tree structures anno-tated with schema names and HPSG signs includ-ing the specifications of the above features.We describe the process of grammar develop-

ment in terms of the four phases: specification,externalization, extraction, and verification.

3.1 SpecificationGeneral grammatical constraints are defined inthis phase, and in HPSG, they are representedthrough the design of the sign and schemata. Fig-ure 1 shows the definition for the typed featurestructure of a sign used in this study. Some morefeatures are defined for each syntactic category al-

Figure 1: subject-head schema (left) and head-complement schema (right); taken from Miyao et al.(2004).

formalism. In a lexicalized grammar, quite asmall numbers of schemata are used to explaingeneral grammatical constraints, compared withother theories. On the other hand, rich word-specific characteristics are embedded in lexicalentries. Both of schemata and lexical entriesare represented by typed feature structures, andconstraints in parsing are checked by unificationamong them. Figure 1 shows examples of HPSGschema.

Figure 2 shows an HPSG parse tree of the sen-tence “Spring has come.” First, the lexical en-tries of “has” and “come” are joined by head-complement schema. Unification gives the HPSGsign of mother. After applying schemata to HPSGsigns repeatedly, the HPSG sign of the whole sen-tence is output.

We use Enju for an English HPSG parser(Miyao et al., 2004). Figure 3 shows how a co-ordination structure is built in the Enju grammar.First, a coordinating conjunction and the rightconjunct are joined by coord right schema. Af-terwards, the parent and the left conjunct arejoined by coord left schema.

The Enju parser is equipped with a disam-biguation model trained by the maximum entropymethod (Miyao and Tsujii, 2008). Since we donot need the probability of each parse tree, wetreat the model just as a linear model that definesthe score of a parse tree as the sum of featureweights. The features of the model are definedon local subtrees of a parse tree.

The Enju parser takes O(n3) time since it usesthe CKY algorithm, and each cell in the CKYparse table has at most a constant number of edgesbecause we use beam search algorithm. Thus, wecan regard the parser as a decoder for a weightedCFG.

3.3 Dual decomposition

Dual decomposition is a classical method to solvecomplex optimization problems that can be de-

432

signPHON list of string

SYNSEM

synsem

LOCAL

local

CAT

category

HEADheadMODL synsemMODR synsem

SUBJ list of synsemCOMPS list of synsem

SEM semantics

NONLOCnonlocalREL list of localSLASH list of local

Figure 1: HPSG sign

2SUBJ < >COMPS < >2


1


1


1COMPS < >

HEADSUBJ COMPS

1

23 4

3

42

Figure 2: Subject-Head Schema (left) and Head-Complement Schema (right)

and unbounded dependencies. SEM feature rep-resents the semantics of a constituent, and in thisstudy it expresses a predicate-argument structure.Figure 2 presents the Subject-Head Schema

and the Head-Complement Schema1 defined in(Pollard and Sag, 1994). In order to express gen-eral constraints, schemata only provide sharing offeature values, and no instantiated values.Figure 3 has an example of HPSG parsing

of the sentence “Spring has come.” First, eachof the lexical entries for “has” and “come” areunified with a daughter feature structure of theHead-Complement Schema. Unification providesthe phrasal sign of the mother. The sign of thelarger constituent is obtained by repeatedly apply-ing schemata to lexical/phrasal signs. Finally, thephrasal sign of the entire sentence is output on thetop of the derivation tree.

3 Acquiring HPSG from the PennTreebank

As discussed in Section 1, our grammar devel-opment requires each sentence to be annotatedwith i) a history of rule applications, and ii) ad-ditional annotations to make the grammar rulesbe pseudo-injective. In HPSG, a history of ruleapplications is represented by a tree annotatedwith schema names. Additional annotations are

1The value of category has been presented for simplicity,while the other portions of the sign have been omitted.

Spring

HEAD nounSUBJ < >COMPS < >

HEAD verbSUBJ < >

COMPS < >

5

has

HEAD verb

SUBJ < >

COMPS < >

come


5HEAD nounSUBJ < >COMPS < >


1COMPS < >

HEADSUBJ COMPS

1

23 4

3

42

UnifyUnify

Head-complementschema

Lexical entries

Spring

HEAD nounSUBJ < >COMPS < > 2


1

has


1

come

2


1


1

subject-head

head-comp

Figure 3: HPSG parsing

required because HPSG schemata are not injec-tive, i.e., daughters’ signs cannot be uniquely de-termined given the mother. The following annota-tions are at least required. First, the HEAD featureof each non-head daughter must be specified sincethis is not percolated to the mother sign. Second,SLASH/REL features are required as described inour previous study (Miyao et al., 2003a). Finally,the SUBJ feature of the complement daughter inthe Head-Complement Schema must be specifiedsince this schema may subcategorize an unsatu-rated constituent, i.e., a constituent with a non-empty SUBJ feature. When the corpus is anno-tated with at least these features, the lexical en-tries required to explain the sentence are uniquelydetermined. In this study, we define partially-specified derivation trees as tree structures anno-tated with schema names and HPSG signs includ-ing the specifications of the above features.We describe the process of grammar develop-

ment in terms of the four phases: specification,externalization, extraction, and verification.

3.1 SpecificationGeneral grammatical constraints are defined inthis phase, and in HPSG, they are representedthrough the design of the sign and schemata. Fig-ure 1 shows the definition for the typed featurestructure of a sign used in this study. Some morefeatures are defined for each syntactic category al-

Figure 2: HPSG parsing; taken from Miyao et al.(2004).

Coordina(on�

Le3,Conjunct� Par(al,Coordina(on�

Coordina(ng,Conjunc(on�

Right,Conjunct�

← coord_right_schema

← coord_left_schema

Figure 3: Construction of coordination in Enju

composed into efficiently solvable sub-problems.It is becoming popular in the NLP communityand has been shown to work effectively on sev-eral NLP tasks (Rush et al., 2010).

We consider an optimization problem

arg maxx

(f(x) + g(x)) (2)

which is difficult to solve (e.g. NP-hard), whilearg maxx f(x) and arg maxx g(x) are effectivelysolvable. In dual decomposition, we solve

minu

maxx,y

(f(x) + g(y) + u(x− y))

instead of the original problem.To find the minimum value, we can use a sub-

gradient method (Rush et al., 2010). The subgra-dient method is given in Table 4. As the algorithm

u(1) ← 0for k = 1 to K do

x(k) ← arg maxx(f(x) + u(k)x)y(k) ← arg maxy(g(y)− u(k)y)if x = y then

return u(k)

end ifu(k+1) ← uk − ak(x

(k) − y(k))end forreturn u(K)

Table 4: The subgradient method

shows, you can use existing algorithms and don’tneed to have an exact algorithm for the optimiza-tion problem, which are features of dual decom-position.

If x(k) = y(k) occurs during the algorithm, thenwe simply take x(k) as the primal solution, whichis the exact answer. If not, we simply take x(K),the answer of coordination structure analysis withalignment-based features, as an approximate an-swer to the primal solution. The answer does notalways solve the original problem Eq (2), but pre-vious works (e.g., (Rush et al., 2010)) has shownthat it is effective in practice. We use it in thispaper.

4 Proposed method

In this section, we describe how we apply dualdecomposition to the two models.

4.1 Notation

We define some notations here. First we describeweighted CFG parsing, which is used for bothcoordination structure analysis with alignment-based features and HPSG parsing. We follows theformulation by Rush et al., (2010). We assume acontext-free grammar in Chomsky normal form,with a set of non-terminals N . All rules of thegrammar are either the form A→ BC or A→ wwhere A, B,C ∈ N and w ∈ V . For rules of theform A→ w we refer to A as the pre-terminal forw.

Given a sentence with n words, w1w2...wn, aparse tree is a set of rule productions of the form⟨A → BC, i, k, j⟩ where A,B, C ∈ N , and1 ≤ i ≤ k ≤ j ≤ n. Each rule production rep-resents the use of CFG rule A→ BC where non-terminal A spans words wi...wj , non-terminal B

433

spans word wi...wk, and non-terminal C spansword wk+1...wj if k < j, and the use of CFGrule A→ wi if i = k = j.

We now define the index set for the coordina-tion structure analysis as

Icsa = {⟨A→ BC, i, k, j⟩ : A,B, C ∈ N,

1 ≤ i ≤ k ≤ j ≤ n}

Each parse tree is a vector y = {yr : r ∈ Icsa},with yr = 1 if rule r is in the parse tree, and yr =0 otherwise. Therefore, each parse tree is repre-sented as a vector in {0, 1}m, where m = |Icsa|.We use Y to denote the set of all valid parse-treevectors. The set Y is a subset of {0, 1}m.

In addition, we assume a vector θcsa = {θcsar :

r ∈ Icsa} that specifies a score for each rule pro-duction. Each θcsa

r can take any real value. Theoptimal parse tree is y∗ = arg maxy∈Y y · θcsa

where y · θcsa =∑

r yr · θcsar is the inner product

between y and θcsa.We use similar notation for HPSG parsing. We

define Ihpsg , Z and θhpsg as the index set forHPSG parsing, the set of all valid parse-tree vec-tors and the weight vector for HPSG parsing re-spectively.

We extend the index sets for both the coor-dination structure analysis with alignment-basedfeatures and HPSG parsing to make a constraintbetween the two sub-problems. For the coor-dination structure analysis with alignment-basedfeatures we define the extended index set to beI ′csa = Icsa

∪Iuni where

Iuni = {(a, b, c) : a, b, c ∈ {1...n}}

Here each triple (a, b, c) represents that wordwc is recognized as the last word of the rightconjunct and the scope of the left conjunct orthe coordinating conjunction is wa...wb

1. Thuseach parse-tree vector y will have additional com-ponents ya,b,c. Note that this representation isover-complete, since a parse tree is enough todetermine unique coordination structures for asentence: more explicitly, the value of ya,b,c is

1This definition is derived from the structure of a co-ordination in Enju (Figure 3). The triples show wherethe coordinating conjunction and right conjunct are incoord right schema, and the left conjunct and partial coor-dination are in coord left schema. Thus they alone enablenot only the coordination structure analysis with alignment-based features but Enju to uniquely determine the structureof a coordination.

1 if rule COORDa,c → CJTa,bCC , CJT ,c orCOORD ,c → CJT , CCa,bCJT ,c is in the parsetree; otherwise it is 0.

We apply the same extension to the HPSG in-dex set, also giving an over-complete representa-tion. We define za,b,c analogously to ya,b,c.

4.2 Proposed methodWe now describe the dual decomposition ap-proach for coordination disambiguation. First, wedefine the set Q as follows:

Q = {(y, z) : y ∈ Y, z ∈ Z, ya,b,c = za,b,c

for all (a, b, c) ∈ Iuni}

Therefore, Q is the set of all (y, z) pairs thatagree on their coordination structures. The coor-dination structure analysis with alignment-basedfeatures and HPSG parsing problem is then tosolve

max(y,z)∈Q

(y · θcsa + γz · θhpsg) (3)

where γ > 0 is a parameter dictating the relativeweight of the two models and is chosen to opti-mize performance on the development test set.

This problem is equivalent to

maxz∈Z

(g(z) · θcsa + γz · θhpsg) (4)

where g : Z → Y is a function that maps aHPSG tree z to its set of coordination structuresz = g(y).

We solve this optimization problem by usingdual decomposition. Figure 4 shows the result-ing algorithm. The algorithm tries to optimizethe combined objective by separately solving thesub-problems again and again. After each itera-tion, the algorithm updates the weights u(a, b, c).These updates modify the objective functions forthe two sub-problems, encouraging them to agreeon the same coordination structures. If y(k) =z(k) occurs during the iterations, then the algo-rithm simply returns y(k) as the exact answer. Ifnot, the algorithm returns the answer of coordina-tion analysis with alignment features as a heuristicanswer.

It is needed to modify original sub-problemsfor calculating (1) and (2) in Table 4. We modifiedthe sub-problems to regard the score of u(a, b, c)as a bonus/penalty of the coordination. The mod-ified coordination structure analysis with align-ment features adds u(k)(i, j, m) and u(k)(j+1, l−

434

u(1)(a, b, c)! 0 for all (a, b, c) " Iunifor k = 1 to K do

y(k) ! argmaxy!Y(y · !csa #!

(a,b,c)!Iuni u(k)(a, b, c)ya,b,c) ... (1)

z(k) ! argmaxz!Z(z · !hpsg +!

(a,b,c)!Iuni u(k)(a, b, c)za,b,c) ... (2)

if y(k)(a, b, c) = z(k)(a, b, c) for all (a, b, c) " Iuni thenreturn y(k)

end iffor all (a, b, c) " Iuni do

u(k+1)(a, b, c)! u(k)(a, b, c)# ak(y(k)(a, b, c)# z(k)(a, b, c))end for

end forreturn y(K)

Figure 4: Proposed algorithm

w · f(x, (i, j, l,m)) to the score of the sub-tree, when the rule production COORDi,m $CJTi,jCCj+1,l"1CJTl,m is applied.

The modified Enju adds u(k)(i, j, l) when co-ord left schema is applied, where word wc

is recognized as a coordinating conjunctionand left side of its scope is wa...wb, or co-ord right schema is applied, where word wc

is recognized as a coordinating conjunction andright side of its scope is wa...wb.

5 Experiments

5.1 Test/Training dataWe trained the alignment-based coordinationanalysis model on both the Genia corpus (?)and the Wall Street Journal portion of the PennTreebank (?), and evaluated the performance ofour method on (i) the Genia corpus and (ii) theWall Street Journal portion of the Penn Treebank.More precisely, we used HPSG treebank con-verted from the Penn Treebank and Genia, andfurther extracted the training/test data for coor-dination structure analysis with alignment-basedfeatures using the annotation in the Treebank. Ta-ble ?? shows the corpus used in the experiments.

The Wall Street Journal portion of the PennTreebank has 2317 sentences from WSJ articles,and there are 1356 COOD tags in the sentences,while the Genia corpus has 1754 sentences fromMEDLINE abstracts, and there are 1848 COODtags in the sentences. COOD tags are furthersubcategorized into phrase types such as NP-COOD or VP-COOD. Table ?? shows the per-centage of each phrase type in all COOD tags.It indicates the Wall Street Journal portion of the

COORD WSJ GeniaNP 63.7 66.3VP 13.8 11.4ADJP 6.8 9.6S 11.4 6.0PP 2.4 5.1Others 1.9 1.5

Table 6: The percentage of each conjunct type (%) ofeach test set

Penn Treebank has more VP-COOD tags and S-COOD tags, while the Genia corpus has moreNP-COOD tags and ADJP-COOD tags.

5.2 Implementation of sub-problems

We used Enju (?) for the implementation ofHPSG parsing, which has a wide-coverage prob-abilistic HPSG grammar and an efficient parsingalgorithm, while we re-implemented Hara et al.,(2009)’s algorithm with slight modifications.

5.2.1 Step sizeWe used the following step size in our algo-

rithm (Figure ??). First, we initialized a0, whichis chosen to optimize performance on the devel-opment set. Then we defined ak = a0 · 2"!k ,where "k is the number of times that L(u(k!)) >L(u(k

!"1)) for k# % k.

5.3 Evaluation metric

We evaluated the performance of the tested meth-ods by the accuracy of coordination-level brack-eting (?); i.e., we count each of the coordinationscopes as one output of the system, and the system

Figure 4: Proposed algorithm

1, m), as well as adding w · f(x, (i, j, l, m)) tothe score of the subtree, when the rule produc-tion COORDi,m → CJTi,jCCj+1,l−1CJTl,m isapplied.

The modified Enju adds u(k)(a, b, c) whencoord right schema is applied, where wordwa...wb is recognized as a coordinating conjunc-tion and the last word of the right conjunct iswc, or coord left schema is applied, where wordwa...wb is recognized as the left conjunct and thelast word of the right conjunct is wc.

5 Experiments

5.1 Test/Training data

We trained the alignment-based coordinationanalysis model on both the Genia corpus (Kimet al., 2003) and the Wall Street Journal portionof the Penn Treebank (Marcus et al., 1993), andevaluated the performance of our method on (i)the Genia corpus and (ii) the Wall Street Jour-nal portion of the Penn Treebank. More precisely,we used HPSG treebank converted from the PennTreebank and Genia, and further extracted thetraining/test data for coordination structure analy-sis with alignment-based features using the anno-tation in the Treebank. Table 5 shows the corpusused in the experiments.

The Wall Street Journal portion of the PennTreebank in the test set has 2317 sentences fromWSJ articles, and there are 1356 coordinationsin the sentences, while the Genia corpus in thetest set has 1764 sentences from MEDLINE ab-stracts, and there are 1848 coordinations in thesentences. Coordinations are further subcatego-

COORD WSJ GeniaNP 63.7 66.3VP 13.8 11.4ADJP 6.8 9.6S 11.4 6.0PP 2.4 5.1Others 1.9 1.5

Table 6: The percentage of each conjunct type (%) ofeach test set

rized into phrase types such as a NP coordinationor PP coordination. Table 6 shows the percentageof each phrase type in all coordianitons. It indi-cates the Wall Street Journal portion of the PennTreebank has more VP coordinations and S co-ordianitons, while the Genia corpus has more NPcoordianitons and ADJP coordiations.

5.2 Implementation of sub-problems

We used Enju (Miyao and Tsujii, 2004) forthe implementation of HPSG parsing, which hasa wide-coverage probabilistic HPSG grammarand an efficient parsing algorithm, while we re-implemented Hara et al., (2009)’s algorithm withslight modifications.

5.2.1 Step size

We used the following step size in our algo-rithm (Figure 4). First, we initialized a0, whichis chosen to optimize performance on the devel-opment set. Then we defined ak = a0 · 2−ηk ,where ηk is the number of times that L(u(k′)) >L(u(k′−1)) for k′ ≤ k.

435

Task (i) Task (ii)Training WSJ (sec. 2–21) + Genia (No. 1–1600) WSJ (sec. 2–21)Development Genia (No. 1601–1800) WSJ (sec. 22)Test Genia (No. 1801–1999) WSJ (sec. 23)

Table 5: The corpus used in the experiments

Proposed Enju CSAPrecision 72.4 66.3 65.3Recall 67.8 65.5 60.5F1 70.0 65.9 62.8

Table 7: Results of Task (i) on the test set. The preci-sion, recall, and F1 (%) for the proposed method, Enju,and Coordination structure analysis with alignment-based features (CSA)

5.3 Evaluation metric

We evaluated the performance of the tested meth-ods by the accuracy of coordination-level bracket-ing (Shimbo and Hara, 2007); i.e., we count eachof the coordination scopes as one output of thesystem, and the system output is regarded as cor-rect if both of the beginning of the first outputconjunct and the end of the last conjunct matchannotations in the Treebank (Hara et al., 2009).

5.4 Experimental results of Task (i)

We ran the dual decomposition algorithm with alimit of K = 50 iterations. We found the twosub-problems return the same answer during thealgorithm in over 95% of sentences.

We compare the accuracy of the dual decompo-sition approach to two baselines: Enju and coor-dination structure analysis with alignment-basedfeatures. Table 7 shows all three results. The dualdecomposition method gives a statistically signif-icant gain in precision and recall over the twomethods2.

Table 8 shows the recall of coordinations ofeach type. It indicates our re-implementation ofCSA and Hara et al. (2009) have a roughly simi-lar performance, although their experimental set-tings are different. It also shows the proposedmethod took advantage of Enju and CSA in NPcoordination, while it is likely just to take the an-swer of Enju in VP and sentential coordinations.This means we might well use dual decomposi-

2p < 0.01 (by chi-square test)

60%$

65%$

70%$

75%$

80%$

85%$

90%$

95%$

100%$

1$ 3$ 5$ 7$ 9$ 11$13$15$17$19$21$23$25$27$29$31$33$35$37$39$41$43$45$47$49$

accuracy certificates

Figure 5: Performance of the approach as a function ofK of Task (i) on the development set. accuracy (%):the percentage of sentences that are correctly parsed.certificates (%): the percentage of sentences for whicha certificate of optimality is obtained.

tion only on NP coordinations to have a better re-sult.

Figure 5 shows performance of the approach asa function of K, the maximum number of iter-ations of dual decomposition. The graphs showthat values of K much less than 50 produce al-most identical performance to K = 50 (withK = 50, the accuracy of the method is 73.4%,with K = 20 it is 72.6%, and with K = 1 itis 69.3%). This means you can use smaller K inpractical use for speed.

5.5 Experimental results of Task (ii)

We also ran the dual decomposition algorithmwith a limit of K = 50 iterations on Task (ii).Table 9 and 10 show the results of task (ii). Theyshow the proposed method outperformed the twomethods statistically in precision and recall3.

Figure 6 shows performance of the approach asa function of K, the maximum number of iter-ations of dual decomposition. The convergencespeed for WSJ was faster than that for Genia. Thisis because a sentence of WSJ often have a simplercoordination structure, compared with that of Ge-nia.

3p < 0.01 (by chi-square test)

436

COORD # Proposed Enju CSA # Hara et al. (2009)Overall 1848 67.7 63.3 61.9 3598 61.5NP 1213 67.5 61.4 64.1 2317 64.2VP 208 79.8 78.8 66.3 456 54.2ADJP 193 58.5 59.1 54.4 312 80.4S 111 51.4 52.3 34.2 188 22.9PP 110 64.5 59.1 57.3 167 59.9Others 13 78.3 73.9 65.2 140 49.3

Table 8: The number of coordinations of each type (#), and the recall (%) for the proposed method, Enju,coordination structure analysis with alignment-based features (CSA) , and Hara et al. (2009) of Task (i) on thedevelopment set. Note that Hara et al. (2009) uses a different test set and different annotation rules, although itstest data is also taken from the Genia corpus. Thus we cannot compare them directly.

Proposed Enju CSAPrecision 76.3 70.7 66.0Recall 70.6 69.0 60.1F1 73.3 69.9 62.9

Table 9: Results of Task (ii) on the test set. The preci-sion, recall, and F1 (%) for the proposed method, Enju,and Coordination structure analysis with alignment-based features (CSA)

COORD # Proposed Enju CSAOverall 1017 71.6 68.1 60.7NP 573 76.1 71.0 67.7VP 187 62.0 62.6 47.6ADJP 73 82.2 75.3 79.5S 141 64.5 62.4 42.6PP 19 52.6 47.4 47.4Others 24 62.5 70.8 54.2

Table 10: The number of coordinations of each type(#), and the recall (%) for the proposed method, Enju,and coordination structure analysis with alignment-based features (CSA) of Task (ii) on the developmentset.

6 Conclusion and Future Work

In this paper, we presented an efficient method fordetecting and disambiguating coordinate struc-tures. Our basic idea was to consider both gram-mar and symmetries of conjuncts by using dualdecomposition. Experiments on the Genia corpusand the Wall Street Journal portion of the PennTreebank showed that we could obtain statisti-cally significant improvement in accuracy whenusing dual decomposition.

We would need a further study in the follow-ing points of view: First, we should evaluate our

60%$

65%$

70%$

75%$

80%$

85%$

90%$

95%$

100%$

1$ 3$ 5$ 7$ 9$ 11$13$15$17$19$21$23$25$27$29$31$33$35$37$39$41$43$45$47$49$

accuracy certificates

Figure 6: Performance of the approach as a function ofK of Task (ii) on the development set. accuracy (%):the percentage of sentences that are correctly parsed.certificates (%): the percentage of sentences for whicha certificate of optimality is provided.

method with corpus in different domains. Be-cause characteristics of coordination structuresdiffers from corpus to corpus, experiments onother corpus would lead to a different result. Sec-ond, we would want to add some features to coor-dination structure analysis with alignment-basedlocal features such as ontology. Finally, we canadd other methods (e.g. dependency parsing) assub-problems to our method by using the exten-sion of dual decomposition, which can deal withmore than two sub-problems.

Acknowledgments

The second author is partially supported by KAK-ENHI Grant-in-Aid for Scientific Research C21500131 and Microsoft CORE project 7.

437

ReferencesKazuo Hara, Masashi Shimbo, Hideharu Okuma, and

Yuji Matsumoto. 2009. Coordinate structure analy-sis with global structural constraints and alignment-based local features. In Proceedings of the 47th An-nual Meeting of the ACL and the 4th IJCNLP of theAFNLP, pages 967–975, Aug.

Deirdre Hogan. 2007. Coordinate noun phrase dis-ambiguation in a generative parsing model. In Pro-ceedings of the 45th Annual Meeting of the Asso-ciation of Computational Linguistics (ACL 2007),pages 680–687.

Jun-Dong Kim, Tomoko Ohta, and Jun’ich Tsujii.2003. Genia corpus - a semantically annotated cor-pus for bio-textmining. Bioinformatics, 19.

Dan Klein and Christopher D. Manning. 2003. Fastexact inference with a factored model for naturallanguage parsing. Advances in Neural InformationProcessing Systems, 15:3–10.

Mitchell P. Marcus, Beatrice Santorini, and Mary AnnMarcinkiewicz. 1993. Building a large annotatedcorpus of english: The penn treebank. Computa-tional Linguistics, 19:313–330.

Yusuke Miyao and Jun’ich Tsujii. 2004. Deep lin-guistic analysis for the accurate identification ofpredicate-argument relations. In Proceeding ofCOLING 2004, pages 1392–1397.

Yusuke Miyao and Jun’ich Tsujii. 2008. Featureforest models for probabilistic hpsg parsing. MITPress, 1(34):35–80.

Yusuke Miyao, Takashi Ninomiya, and Jun’ichi Tsu-jii. 2004. Corpus-oriented grammar developmentfor acquiring a head-driven phrase structure gram-mar from the penn treebank. In Proceedings ofthe First International Joint Conference on NaturalLanguage Processing (IJCNLP 2004).

Preslav Nakov and Marti Hearst. 2005. Using the webas an implicit training set: Application to structuralambiguity resolution. In Proceedings of the HumanLanguage Technology Conference and Conferenceon Empirical Methods in Natural Language (HLT-EMNLP 2005), pages 835–842.

Carl Pollard and Ivan A. Sag. 1994. Head-drivenphrase structure grammar. University of ChicagoPress.

Philip Resnik. 1999. Semantic similarity in a takon-omy. Journal of Artificial Intelligence Research,11:95–130.

Alexander M. Rush, David Sontag, Michael Collins,and Tommi Jaakkola. 2010. On dual decomposi-tion and linear programming relaxations for natu-ral language processing. In Proceeding of the con-ference on Empirical Methods in Natural LanguageProcessing.

Masashi Shimbo and Kazuo Hara. 2007. A discrimi-native learning model for coordinate conjunctions.

In Proceedings of the 2007 Joint Conference onEmpirical Methods in Natural Language Process-ing and Computational Natural Language Learn-ing, pages 610–619, Jun.

438

coordination structure analysis using dual decomposition · 1 introduction...

Documents