parsing mildly context-sensitive rms

2
PARSING MILD CONTEXT-SENSITIVE RMS Tilman Becker Dominik Heckmann German Research Center r Artificial Intelligence (DFKI GmbH) {becker ,heckmann }@di.de Abstract We introduce Recursive Matrix Systems (RMS) which encomps mildly context-sensitive r- malisms and present efficient parsing algorithms r linear and context-ee variants of RMS. The time- complexities are O(n 2h+I ), and O(n 3h ) respectively, where h is the height of the matrix. It is possible to represent ee Adjoining Grammars (TAG [1], MC-TAG [2], and R-TAG [3]) RMS unirmly. 1 Recursive Matrix Systems (RMS) RMS = (G, I) is a two-step rmalism. In a first step, a grammar G = (N, T, S, P) generates a set of recursive matrices. In a second step, a yield nction maps, according to the interpretation I, the recursive matrices to a set of strings L(G, I). The recursive matrix generating grammars are ordinary phrase structure grammars, where the terminal symbols are replaced by vertical vectors. These vertical vectors are filled with terminal symbols, nonterminal symbols, or they are leſt empty. derive ield The two-step rmalism: S E N 0 m E M H I w E T*. Example: G 1 = (N, T, S, P) = ({S}, {a, b, c}, S, {S ,� I S, S I I}). Every time the first rule is applied, a new column with terminals is added to the matrix. The last rule terminates the process. A possible derivation: S h , I : I S h, I : is h , I ! I I H The product of the derivation process is a matrix with terminals as elements. Finally, these terminal symbols are combined into one string. There are many possible interpretation nctions. However, it seems reasonable to read the terminal symbols within the matrix row by row om top to bottom, and each row alternating om leſt to right and om right to leſt. This interpretation condition r height 3 could be visualized by ; . The grammar G 1 together with this interpretation generates strings of the rm aaa . . . bbb . .. ccc.... Thus, the generated language is L(G 1 ,;) = {a n b n c n l n E N} Definition: h is a positive integer. A recursive matrix grammar is a ur-tuple G = (N, T, S, P), where N and T are disjoint alphabets, S E N, P is a finite set of ordered pairs (u , v) EN x (N UCh)*, ;:: e { T .F e :: :•. i : : : :, x} .( :hui�::� : : : e : :•c:::,:f xh: : :' : g ::: r : : U :e :� N ' . An interetation I E Ih is a vertical vector with leſt and right arrows as elements, where d1 Ih = { I V1x9 : d x E { ➔, �} } . Ih is the set of Interetations of height h. dh 293

Upload: others

Post on 05-Dec-2021

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PARSING MILDLY CONTEXT-SENSITIVE RMS

PARSING MILDLY CONTEXT-SENSITIVE RMS Tilman Becker Dominik Heckmann

German Research Center for Artificial Intelligence (DFKI GmbH) {becker ,heckmann }@dfki.de

Abstract We introduce Recursive Matrix Systems (RMS) which encompass mildly context-sensitive for­malisms and present efficient parsing algorithms for linear and context-free variants of RMS. The time­complexities are O(n2h+I ), and O(n3h ) respectively, where h is the height of the matrix. It is possible to represent Tree Adjoining Grammars (TAG [1], MC-TAG [2], and R-TAG [3]) as RMS uniformly. 1 Recursive Matrix Systems (RMS)

RMS = (G, I) is a two-step formalism. In a first step, a grammar G = (N, T, S, P) generates a set of recursive matrices. In a second step, a yield function maps, according to the interpretation I, the recursive matrices to a set of strings L(G, I) . The recursive matrix generating grammars are ordinary phrase structure grammars, where the terminal symbols are replaced by vertical vectors. These vertical vectors are filled with terminal symbols, nonterminal symbols, or they are left empty.

derive ield The two-step formalism: S E N ==} 0 m E M HI w E T* .

Example: G1 = (N, T, S, P) = ({S} , {a, b, c} , S, {S ➔ , � I S , S --+ I � I }) . Every time the first rule

is applied, a new column with terminals is added to the matrix. The last rule terminates the process.

A possible derivation: S =?h, I : I S =?h, I : II: i s =?h, I ! II: II H

The product of the derivation process is a matrix with terminals as elements. Finally, these terminal symbols are combined into one string. There are many possible interpretation functions. However, it seems reasonable to read the terminal symbols within the matrix row by row from top to bottom, and each row alternating from left to right and from right to left. This interpretation condition for height 3 could be visualized by ; . The grammar G1 together with this interpretation generates strings of the form aaa . . . bbb . . . ccc . . . . Thus, the generated language is L(G1 , ; ) = {anbncn l n E N} Definition: h is a positive integer. A recursive matrix grammar is a four-tuple G = (N, T, S, P), where N and T are disjoint alphabets , S E N, P is a finite set of ordered pairs (u, v) E N x (N U Ch)* ,

;::

e

{ T .Fe

:::•.

i

: : : :, x} .(:hui�::� :::

e

::•c:::, :f xh:::':

g

:::

r

:: Uc;: : e:�

N

'

.

An interpretation I E Ih is a vertical vector with left and right arrows as elements, where d1

Ih = { I V1-s,x9 : dx E { ➔, �} } . Ih is the set of Interpretations of height h. dh

293

Page 2: PARSING MILDLY CONTEXT-SENSITIVE RMS

2 Parsing Schemata for Recursive Matrix Systems (RMS)

The notation for the parsing schemata is close to [4] . An Earley parser for RMS can be found in [2] . RMS offer brief vector notations : Let i =

i, , i1 = i2

, J = ii k = k�h' , V = I Jh' I ·

ih ih+I jh

The set of hypothesis: 1i = {I a I i - 1 I i I I a = ai , 1 ::; i ::; n} U {I c I i I i 1 1 0 ::; i ::; n} . The set of items :1 depends on the height h: :J(h) = { I A I i I j I , I A I i I J l} -The parsing schema W'(CYK) (h'RMS(CFG, :! ) = MC - TAG) is defined by the deduction rules: Dinit = {I V1 I i1 I k1 I , • • • , I Vh I ih I kh 1 1- I A I i I k I I (A -H1) E P} ncombine {I B I i I J I , I C I J j k 1 1- 1 A I i I k 1 1 (A ➔ BC) E P} Dlinearize = {I A I i I {i I I- I A I i1 I ih+I �

The parsing schema W'(CYK)(2RMS(LIN, :! ) = R - TAG) is defined by the deduction rules:

Dadjoin

= {

= {

Dlinearize = {

Vt

V3

V1

V3

A

i1 h i3 j4 i1 i1 B i3 j3 i1 i2 i2 i3

I- A i1 h i3 i4 ii i2 V2 i2 h A i1 h jg i4 V4 i4 j4 i3 i4

The time-complexity of the algorithms can be derived from the number of relevant indices, where each vector i represents h indices. Step ncombine in algorithm "IP'(CYK) (h'RMS(CFG, :! )) has three relevant vectors r, ;, k, which each represent h indices, resulting in a time-complexity of O(n3h ) . Step Dlinearize has a O(nh+I ) time-complexity. The correctness is inherited from the CFG cases. The following figure explains the variables of the deduction rule ncombine for "IP'(CYK) (2RMS(CFG, :! ) : D-COMBXNE :

References

j1 lt1

A

C

i2 j2

ANTECEDENTS

XNPOT-STRXNG

CONSEQUENCE

[1] Tilman Becker and Dominik Heckmann. 1998. Recursive Matrix Systems (RMS) and TAG. Proceedings of the Fourth International Workshop on TAG (TAG+4},Philadelphia, PA. [2] Dominik Heckmann. 1999. Recursive Matrix Systems. Diploma Thesis, University of Saarland. [3] Giorgio Satta and William Schuler. 1998. Restrictions on Tree Adjoining Languages. Proceedings of CO LING-A CL '98, pages 1 176-1182. [4] Klaas Sikkel and Anton Nijhold. 1997. Parsing of Context-Free Languages. Handbook of Formal Languages, Vol. 2, Springer, Berlin.

294