bidirectional charts: a potential technique for parsing spoken natural language sentences

Computer Speech and Lmguage (1989) 3,219-237

Bidirectional charts: a potential technique for parsing spoken natural language sentences

Oliviero Stock, Istituto per la Ricerca Scientijica e Tecnologica. 38050 Povo, Trento, Italy

Rho Falcone* and Patrizia Insinnamo Fondazione Ugo Bordoni, Roma, Italy

Abstract

The use of “high level” knowledge sources in recognizing continuous speech is aimed at reducing the hypothesis space generated by acoustic-phonetic analysis. In this, a sentence parser can be a basic resource, provided that it can deal with the ambiguity of the input and with the fact that fragments may have been recognized even hypothetically. One of the most successful techniques for parsing natural language is chart parsing. Chart parsing is directional in the sense that it works from a starting point (usually the beginning of the sentence) and usually proceeds to the right. We describe the concept of a chart that works outward from islands (reliably recognized fragments), makes sense of as much of the sentence as possible, and then goes on to make predictions about missing fragments.

1. Introduction

Using “high level” knowledge sources, (lexical, syntactic, semantic and so on) has proved to be a necessary step in the realization of a continuous speech recognizer (see for instance Walker, 1976). Building algorithms that rely on such knowledge, in fact, contribute to the solution of problems arising from the acoustic-phonetic analysis of the signal.

Decoding the vocal signal is a process that must take into account phenomena such as the coarticulatory processes typical of continuous speech and the presence of many sources of variability of the signal (anatomic characteristics of the speaker, speech rate, prosody and so on). The result is that, at the level of acoustic-phonetic analysis, there is great uncertainty as to how to segment the signal and what label to give to the segments. Therefore acoustic-phonetic analysis generated a space of possible interpretative hypotheses of the signal. In order to reduce such a space (and afortiori if the aim is to get an interpretation of the sentence) it is necessary to make use of different knowledge sources (Woods, 1982; Stringa, 1988). Among classical ARPA systems, in Harpy (Lowerre & Reddy, 1978) the knowledge sources are compiled into a single network, while in Hearsay II(Lesser, Fennell, Erman & Reddy, 1975), the knowledge sources are separate and communicate through a blackboard. The role of each knowledge source is

*Present address: IAC-Consiglio Nazionale delle Ricerche. Viale del Policlinico 137, Roma, Italy.

088~2308/89/030219+ 19 $03.00/O 0 1989 Academic Press Limited

220 0. Stock, R. Falcone and P. Insinnamo

to control the plausibility of hypotheses emerging from other knowledge sources and to create new hypotheses.

The parser described in this work is proposed as a “high level” component that operates on a lattice of scored word hypotheses produced by acoustic-phonetic analysis. Each word hypothesis is characterized by: (a) the hypothesized recognized string; (b) the score of this hypothesis; (c) the time interval that this hypothesis spans. We consider two different thresholds for scores: the word hypotheses with scores above the higher threshold are considered “very reliable”, and their role will be to drive the process. The word hypotheses with scores between the two thresholds, if structurally close to an island, will be included in the analysis without a driving role. The word hypotheses below the lower threshold are not considered, at least in the first pass.

We shall discuss an abstract non-deterministic algorithm appropriate for working with the constraints discussed above and without constraining it to a particular representation of linguistic knowledge. The representation can well be in the form of rewriting rules (as in the reported examples) or in the form of slot-filler grammars, in the form of recursive or augmented transition networks etc. Our starting point will be a very well-founded technique, that has also been experimentally proved valid, namely chart parsing. Chart parsing has the property of being appropriate for dealing with local ambiguity in natural language, reducing the exponential complexity of a backtracking based algorithm. It works very well with well formed input, but the technique was not conceived for working with an uncertain input, and even worse, with a fragmentary input. The drawback of charts is that they go in one direction, i.e. you must be at the beginning (or at the end) of a constituent to analyze it. So, it would simply not work for our purposes. What we want is to maintain the chart paradigm, but to change the technique in a way that moving from islands outward (a) the analyzer can reach the borders of the sentence fragments poorly recognized by the lower level component; (b) the space of non-deterministic attempts is constrained by the most consolidated word interpretations, thereby reducing the combinatory explosion of attempts; in fact, if we started just from the first hypotheses at the border of a constituent and not from the most reliable ones, wherever they may be, an enormous number of useless attempts would have followed; (c) the particular search tactics, given the general control strategy here outlined, are left to external scheduling functions that may interact with other knowledge sources and that can maximize the efficiency by selecting tasks on agendas; (d) the chart configuration may be used by heuristics that try to make preductions about missing fragments. The absence of likely word interpretations would normally cause a failure of the process. The heuristics cause the insertion of likely word interpretations in the chart. After the insertion normal chart processing activity is resumed. Point (d) means that the higher level component will tell the lower level component to “do its best” to find, in a given place, an instance of what was predicted: in the simplest case we can think of, the direct recovery of a word hypothesis scoring below the lower threshold.

So, in sum, we are talking of a chart that works outward from islands and makes sense of as much of the sentence as possible. Furthermore, where the signal was simply not even tentatively recognized, or we are in a place structurally far from islands, predictions can be made on the basis of the configuration and of a set of heuristics. After the application of these heuristics, and the introduction of new low-level hypotheses, the algorithm works on in the same way and, if the situation is not unrecoverable, concludes with one (or more) complete analyses of the sentence.

Compared with other techniques used for island parsing, such as those described in

Bidirectional charts 221

Woods (1982), what is presented here, besides providing an abstract mechanism for specifying control and reducing search spaces, yields a bridge between some speech research problems and some contemporary research in computational linguistics. This is because (monodirectional) chart parsing is considered a fundamental resource for parsing with a set of recent linguistic formalisms, especially because it can be very well combined with unification-based techniques, where order independence of operations is crucial (Shieber, 1986). On the other hand, it may be worth noting that island parsing can also give something to computational linguistics. A number of contemporary linguistic theories (e.g. head-driven phrase structure grammar, Sag & Pollard, 1987) emphasize the role of particular words acting as the head of a constituent (e.g. a verb in a verb phrase). As a matter of general parsing strategy it seems very interesting to couple the individuation of the head with an island mechanism that guarantees local control over the process, limiting the number of attempts by virtue of subcategorization as specified for the particular head in the lexicon.

In this paper we shall first briefly revise the chart concept and some of its properties within a speech perspective. We shall then describe bidirectional charts and the behaviour of the proposed algorithm. In more detail, with the work described in Section 3, we aim at the ability to parse fragments of a sentence, as part of a general non- deterministic approach, that maintains the aspects of efficiency typical of chart parsing. We want to constrain the analysis with the most reliable word hypotheses coming from the lower level of analysis, that can occur anywhere in the fragments. These hypotheses not only bind the possible (partial) interpretations assigned to the fragments, but also exclude a combinatorial explosion of attempts. A number of examples will be given to help make specific points clearer. With Section 4 we try to show how, given this general framework, it is possible to add heuristics that can help hypothesize the nature of the missing fragments. The insertion of new elements, predicted in this way, in the data structure, causes the resumption of the chart process, possibly up to a complete analysis of the sentence. Some technical details, useful for a more efficient implementation, are discussed at the end of the paper, before the conclusions.

2. Chart parsing

Chart parsing is a very powerful idea for parsing natural language. It was introduced by Kay (1980) and Kaplan (1973) and historically was based on Earley’s algorithm (1970). The most basic goal in introducing the chart was to reduce the complexity of a non- deterministic parsing algorithm. Such an algorithm, in general, has to return all possible structures of a given sentence and therefore will have to go across all the different paths in the abstract tree of non-deterministic choice points. In following one particular path, the algorithm will proceed by evolving its context and building its structures in its own space. A sentence fragment may be analyzed in the same way in several different paths, which merely make a different use of the structures resulting from the analysis of the fragment. (This is typical, for instance, of algorithms based on simple backtracking: take, for instance, the analysis process of the sentence “I saw the man in the park with the telescope”.) A non-deterministic algorithm of this kind is NP-complete, as, generally speaking, the number of paths grows exponentially with the length of the sentence (see also Church & Patil, 1982). Introducing an algorithm that makes use of a table of recognized well-formed substrings avoids this and results in a polynominal complexity. Chart parsing has several further good qualities that will be discussed shortly.


Chart parsing was introduced for context-free languages, but, in relation to parsing natural languages it should be noted that: (a) a strong tendency in contemporary linguistics is to consider natural language as context-free (for instance bringing in metarules, as in GPSG (Gazdar & Pullum, 1982); (b) techniques have been developed for treating what normally are considered non-context-free phenomena within the chart, This is the case of the so-called long distance dependencies, a phenomenon that is typical of relative and interrogative clauses, where a gap can occur in the structure far away from its filler.

We shall briefly review the main concepts. A more didactive presentation can be found in Thompson 8c Ritchie (1984). A chart is a directed graph representing the state of the parser. Given the input string, the junctures between words are called vertices and are represented as nodes in the chart. Each vertex has an arbitrary number of arcs, called edges, entering and leaving it. An edge is therefore a link between two vertices. In the classic chart definition an edge may be of two types: inactive or active. An inactive edge stands for a recognized constituent (the edge spans the words that are included). An active edge represents a partially recognized constituent. For an inactive edge there is a specification of the category of the constituent. In the case of an active edge, a rewriting rule in the grammar and a position in the right-hand side of that rule are provided, thus indicating what is still needed to complete the recognition of the constituent. If the rule is R:C,-*C,...C,, there are specifications Rand i, with 0s i< n, where i is the position on the right-hand side of rule R. A word in the string is itself represented as an inactive edge connecting two adjoining vertices. An empty active edge is an active edge that spans no words and is therefore represented in the chart as a link cycling over one vertex. It means, at least in top-down parsing, a prediction of the application of a rule. A simple chart is shown in Fig. 1. Active edges are drawn as arcs in the lower part of the figure, and inactive edges on the upper side.

It is important to note that edges are only added to the chart, never removed. A new edge may be added in the following ways:

(1) Given an active edge A spanning from V, to Vb and an inactive edge I spanning from V, to V,, where A refers to rule R and to position i, and the category of I is just C,+,,(i+ I)th symbol of the right-hand side of R, then a new edge E can be added to the chart, which will span from I’, to V,, and, if C,, , was the last symbol in R, E will be an inactive edge with category equal to the symbol in the left-hand side of R, if not, it will be an active edge with rule R and position i+ 1.

NP

NP+-DETN

p0s1t10n:l

Figure 1.


In Fig. 1 the active edge with:

rule: NP+DET N position: 1 from: V, to: v, C,=N

is combined with the inactive edge with:

category: N from: V, to: v,

yelding an inactive edge with:

category: NP from: V, to: v,

(2) Empty active edges are placed in the chart at particular points, according to the general strategy used. If the parser is a top-down one, when, given an active edge with rule R and position i, that has reached the vertex V, there is a rule R’ with left hand side equal to Ci+,, (i+ 1)th symbol of the right-hand side of R, an empty active edge is introduced on the vertex V, with rule R’, provided that one such edge is not already present on that vertex. If the parser is a bottom-up one, when, given an inactive edge with category C that has reached the vertex V, there is a rule R’, that has C as the first symbol of its right-hand side, an empty active edge is introduced on the vertex V, with rule R’, provided that one such edge is not already present on that vertex.

The whole process of parsing aims at getting one or more (if the sentence is ambiguous) inactive edges to span the whole string, with category S, the distinguished initial symbol in the grammar. As mentioned above, one of the great advantages of chart parsing is that, whatever the strategy adopted, work is never duplicated. For example, if an active edge reaches a vertex from where an inactive edge starts, and if the edge addition rule (1) can be applied, then the analysis takes advantage of previous partial analyses. Another advantage is that the mechanism is perfectly suited for both bottom- up and top-down parsing, depending only on the form of the addition rule (2). A further advantage is that the chart can be supplemented with an agenda. In this way, instead of introducing new edges following the rigid application of the algorithm, tasks can be added to the agenda and at each stage a scheduling function can decide the order in which tasks should be performed. The scheduling function can very easily implement depth-first control and breadth-first control, but any kind of control can in principle be inserted (see for instance Stock, 1987). Also the input relation with other levels of analysis is very coherent: lexical ambiguity results in the very simple fact that more than one inactive edge is introduced for one ambiguous word.

It is worth noting that the sophistication of the basic top-down and bottom-up control schemata can be enhanced in a number of ways, so as to result in better performance (smaller number of useless edges introduced in the chart). Wiren (1987) compares a number of proposals, while Stock (1986) proposes a mixed strategy approach.


3. Bidirectionality

Chart parsing has a positive aspect and some evident problems in facing speech recognition. Typical of continuous speech recognition are the following aspects:

(1) The separation between words is not unequivocally given; one of the tasks of the sentence parser is precisely to come up with suggestions for word separations. In the chart this can be accomplished very well by introducing more vertices, one for each hypothetical separation point. Vertices must be ordered and here ordering is achieved by means of the time order relation. We can therefore introduce a vertex structure.

V@... v ,,... . . . v,;... . . . v,,, t;< tj, ,I O,<i,<n- 1,

where at least one lexical edge arrives or leaves for every vertex. The final analysis does not necessarily “make use” of all the vertices in the chart.

(2) Some words in the input matrix are anchored as “surely” recognized while others are only very tentative interpretations. It makes sense that the analysis should prefer elements of the first type as starting points. This is the concept of island parsing, in which the parser tries to make sense of portions of a sentence starting from fixed points (islands) that can occur in any position. The chart mechanism as described in paragraph (2) cannot handle this kind of task. Island parsing has to get to the extreme borders of the recognizable fragments, and from that position help to make suggestions for the unrecognized fragments based on both the left and the right contexts. Here again the chart mechanism described in paragraph (2) cannot handle this task.

The concept of bidirectional chart will now be introduced. Steel & De Roeck (1987) independently from the present work, have introduced a similar idea (originated by an indication of H. Thompson), as far as the basic data structure is concerned. One major difference is in their goals: rather than addressing the problem of speech recognition, or of any kind of ill formed input, they address some specific linguistic phenomena, like co- ordination, that may pose problems with monodirectional chart parsing. The consequence is that they do not have the concept of islands that can dynamically occur in any position of the sentence, and all the problems that derive from that assumption. In particular they do not address the problem of merging two different attempts to recognize a fragment into one that subsumes both attempts. The subsumption relation in edge sets, proposed in a general way, is also an original idea of the present work. Finally, heuristics of the kind suggested here are specific to the fragmentary input problem.

Let us now give the main ideas of the data structures involved in our approach. Here an active edge is a data structure that includes two positions in the rule involved-an initial position and a final position-such that a given edge spans a sub-string of the sentence with reference to a fragment of the right-hand side of the rule. The active edge of Fig. 2 has:

rule: NP+DET ADJ N fromposition: 1 toposition: 2 from: Vb to: V‘ sub-inactives: (#2)


#I OET #2 AOJ #3N

NP +OETAOJ N fromposition:l toposition:

Figure 2.

It spans the substring “red” with reference to the fragment ADJ on the right-hand side of the rule. Therefore an active edge is characterized by from, the left vertex, to, the right vertex, rule, the referred rule, fromposition, the first of the two positions in the rule, toposition, the second of the positions, and sub-inactives, the list of the immediately spanned inactive edges that were included. The fromposition and toposition notation corresponds to a notation with two dots in the right handside of the rule, where what is to the right of the rightmost dot and what is to the left of the leftmost dot is still needed. Inactive edges are characterized as usual, by from, to and cat, the category.

It may now be stated that an active edge E is locally rightward largest if there is no other active edge 6 with from(E) = from(E), rule(E) = rule(E), fromposition (6) = fromposition and sub-inactives(Z?) including as an initial substring sub- inactives(E). Likewise, we can define a locally leftward largest edge. In Fig. 3 the active edge with:

rule: NP+DET ADJ N fromposition: 0 toposition: 2 from: V, to: v, sub-inactives: (# 1#2)

is locally leftward largest.

#I DET #2 AOJ #3N

NP -+OET AOJ N fromposltlan:O toposition:

Figure 3.


In the following we shall use for our examples this simple grammar (for simplicity’s sake we do not express the right-hand side of the rules in a more compact way):

S-+NP V NP PP NP+DET ADJ N NP-+DET N S’

S’-,RELPRON NP V PP+PREP NP

As we shall see, the edges involved in edge combination rules may be one active and one inactive or both active. Every time we are to combine two edges, we must perform a redundancy check, to ensure that our present action will not lead to a result (an analysis of a constituent) that will also be obtained in a different way. We call this check the r/check.

r/Check

If the edges involved in an operation, say E, the leftmost edge and E2 the rightmost edge, are both active and, respectively, E, locally leftward largest and E2 locally rightward largest, we shall say that r/check(E,,E,) is positive. If E, is active and E, is inactive (symmetrically in the other case) and E, is locally leftward largest and there is no active edge A such that rule (A) = rule(E,), fromposition = toposition( sub-inactives(A) includes as the first element E2, then we shall say that r/check(E,,E,) is positive. In all the other cases the r/check is negative.

We maintain the usual edge combination rule (we call it A-Z because it combines an active edge and an inactive edge) with the extension to the two directions.

A-Z rule

Given an active edge A and an inactive edge Z, with from(Z) = to(A), and having named i toposition( with i #n (the number of symbols in the right hand side of the rule), cat(Z)= Ci+,, (i+ 1)th symbol of the right hand side of rule(A), and given that r/check(A,Z) is positive, then a new edge E can be added to the chart, with from(E) = from(A), to(E) = to(Z), and, if i+ 1 = n was the last symbol in rule(A) and fromposition(A) = 0, E will be an inactive edge with cat(E) equal to the left hand side of rule(A), if not it will be an active edge with rule(E) = rule(A) and fromposition = fromposition(A), toposition = i + 1

Similarly, if to(Z) = from(A), and having named i fromposition(A),i#0, cat(Z) = C,,ith symbol of the right hand side of rule (A), then a new edge E can be added to the chart, with from(E) = from(Z), to(E)= to(A), and, if i- 1 =0 and toposition is equal to the length of the right hand side of rule(A), E will be an inactive edge with cat(E) equal to the left hand side of rule(A), if not, it will be an active edge with rule(E)= rule(A), fromposition = i- 1, toposition = toposition( In Fig. 4, given the active edge with:

rule: S-rNP V NP PP fromposition: 3


XINP x2 PP

fr0mposition:i: topositton:4

Figure 4.

toposition: 4 from: Vd to: v, sub-inactives: (#Z)

it is combined through the A-Z rule with the inactive edge with:

category: NP from: V, to: Vd

yielding the active edge with:

rule: S+NP V NP PP fromposition: 2 toposition: 4 from: V, to: v, sub-inactives: (#I #2)

Let us now recall our classification of word hypotheses into three classes, say a, b, c, in relation to their sources. As stated earlier, we consider word hypotheses of class a the islands for our process. The algorithm will proceed outward from the islands and bottom-up when a constituent including an island is completed. It is important to note that as the process is entirely bottom-up starting from islands, a necessary condition for each constituent to be worked out is that at least one island is (directly or indirectly) subsumed by it. This ensures reliability for a fragment, and constrains, as we shall see, hypotheses on other fragments. A conseQuence of this aspect is that the higher the branching factor in the grammar, at the level of rules that are rewritten with lexical categories, the fewer islands are needed.

We can now state the rule that causes the introduction of new edges within the bottom-up strategy.

228 0. Stock. R. Falcone and P. Insinnamo

Ijbu rule

Whenever an inactive edge I, is introduced in the chart, a new active edge is introduced for every rule R in the grammar that includes on its right-hand side the symbol cat(Z) and in relation to R for every position i such that cat(l) is the (i+ I)th symbol on the right- hand side of R. Let us denote such a generic active edge as A; its characteristics will be from@) = from(l), to(A) = to(l), rule(A) = R, fromposition = i, toposition = i+ 1, sub-inactives = list(Z).

In Fig. 5 from the inactive edge with:

category: PP from: V, to: v,

Applying the I/bu rule an active edge is generated with:

rule: S+NP V NP PP fromposition: 3 toposition: 4 from: V, to: v, sub-inactives: (#I)

It may be noted that something is needed to take care of the particular case of cycling recursion for rules with a single element in the right-hand side, such as A -rB, B+A. This case would cause an infinite number of applications of the I/bu rule. The undesired behaviour can be blocked with techniques similar to those used for the well known case of left recursion, which will not be discussed in further detail here.

At this point the mechanism has one major shortcoming. If starting from different islands that eventually should yield the same constituent, the process would cause a lot of redundancy. For instance assume that we have a rule r: L-A B C, and that we have an inactive edge of category B coming from the left into a vertex V and an inactive edge of category C coming from the right into the vertex V. Each edge may cause the introduction of an active edge with rule r, as specified in the I/bu rule. But then, applying the A-I rule in the two cases, combining an active edge with its neighboring inactive edge to its right or to its left, two instances of exactly the same active edge would be produced.

#I PP

S -+ NPVNPPP

fromposItion: toposithon:4

Figure 5.


Furthermore, there is no way to merge two attempts originated from two different islands to recognize the same constituent down into a single attempt. A consequence of this insufficiency would have been that work is done, that is predictably redundant, and partial structures are built that eventually can be used only to produce a copy of what we have already obtained. The presence of an operation of merging attempts (in chart terms, producing a new edge out of two given active edges), together with the redundancy check specified before, are key points in guaranteeing that the complexity of the algorithm is the same as the basic chart one with a CFG, O(n’) time and O(n2) space (Aho & Ullman, 1972).

Let us now define this new basic operation for introducing a new edge in the chart, which we,shall call the A-A rule.

A-A rule If we have two active edges, A, and A,,

with to(A,) = from(A,) rule(A ,) = rule(A,) toposition = fromposition

and r/check(Al,A2) is positive, then we can introduce a new active edge (AJ into the chart with from(A,) = from(A,), to(A,), rule(A,) = rule(A,), fromposition = fromposition( toposition(A3) = toposition( sub-inactives(A,) = concat(sub-inactives(A,), sub-inactives(A,)), where concat is the usual string concatenation operator.

If fromposition = 0 and toposition = n, number of symbols in the right-hand side of rule(A,), an inactive edge I is introduced instead, with from(l)= from(A,), to(Z) = to(AJ and cat(Z) equal to the left-hand side of rule(A,).

As an example, let us refer to Fig. 6. Starting bottom up from island1 we obtain the active edge #5 with:

rule: S+ NP V NP PP fromposition: 1 toposition: 2 from: V, to: v, sub-inactives: (#2)

Starting bottom-up from island 2 the inactive edge #l is obtained, with:

category: NP from: VU to: Vb

Combining edge #I and edge #5, the active edge #6 is obtained, with:

rule: S+NP V NP PP fromposition: 0 toposition: 2 from: V, to: v, sub-inactives: (#l #2)

230 0. Stock, R. Faicone and P. Insinnamo

island1

Islond2 islond3

X5S +NPVNPPP #7S +NPVNPPP fromposition: toposition:

#6S +NPVNPPP +~S+NPVNPPP

frompositlon:O fromposltion:2

topositton:2 toposition:

Figure 6.

Starting bottom-up from island 3 the active edge #7 is obtained, with:

rule: S+ NP V NP PP fromposition: 3 toposition: 4 from: Vd to: v, sub-inactives: (##4)

Similarly, from island 4 the inactive edge #3 is obtained with:

category: NP from: V, to: Vd

This edge could be combined both with edges #6 and #7. Let us assume that control gives priority to the latter combination. Thus the combination of edge #3 with edge #7 generates the active edge #8 with:

rule: S+ NP V NP PP fromposition: 2 toposition: 4

Bidirectional charts

from: V, to: v, sub-inactives: (#3 #a)

231

Combining edges ##6 and #8 through the A-A-rule the edge #9 is obtained with:

category: S from: V, to: v,

Let us briefly discuss control in the present framework. It is convenient to use an agenda in which tasks are accumulated, scheduled and removed, so that control is kept separate. At the beginning of the process the agenda is filled with all the tasks originated by the class a word hypotheses. Thereafter, the application of the rules causes the introduction of new tasks in the agenda, instead of an immediate execution. It seems reasonable, however, that the application of the I/bu rule is not subject to this discipline. Rather, the effects of the rule are immediately brought into the chart.

Note also that working with an agenda, it may become useful to perform a second r/check before executing a task, in order to take into account the fact that new edges may have been added to the chart since the task was established in the agenda.

4. Bidirectional charts ad predietiot~s

Processing based on lexical hypotheses with scores above the minimal threshold (class a and class b words) may not lead to the recognition of a whole sentence, even if originally grammatical. This happens when some substrings of the sentence were not recognized by the acoustic-phonetic analyzer (class c words). This happens also if what should be interpreted as a constituent has no islands among its subconstituents: as in the process there is no top-down component, the constituent would never be generated. This seems correct, because the possible constituent would be structurally “too far” from the islands, and therefore unreliable. In the following discussion we shall assume that a missing fragment in fact consists of a single word. Of course this is not good enough for the general cases, but here we are just talking of heuristics relying exclusively on the syntactic configuration of the recognized fragments, and also these heuristics may be consistently extended. Should other aspects (such as semantics) of the recognized fragments, or partial low level information of the missing fragments be considered, then predictions might be possibly made for missing fragments larger than single words. The presence of unrecognized substrings will mean that at the critical vertices, i.e. the vertices adjacent to the unrecognized substring, the process cannot go on.

We shall now introduce some heuristics to help us make predictions about the missing fragment, so that the lower level processor can do its best to comply with this prediction and possibly yield a new word hypothesis to be introduced in the chart. This hypothesis is to be considered as a class b word. The process will then proceed as specified in paragraph (3). The example-in Fig. 7 clarifies the meaning of the first of these heuristics. The presence of the two active edges suggests that the word missing between the critical vertices V, and Ire has ADJ category.


0 ..,............. v, v,

NP 3 DET ADJ N fromposltlon:O topositlon:l

Figore 7.

NP + DET ADJ N fromposition:Z toposItion:

Formally we can state this general concept as follows

Heuristics 1

If at the critical vertex V an active edge A enters from the left with rule(A)= R and toposition( n, and in the next critical vertex an edge A’ with rule(A’)= R and fromposition = n + 1 exits to the right, then a word with category corresponding to the (n+ 1)th symbol of the right-hand side of rule R is missing at vertex V.

Let us now give two definitions derived from a concept due to Kay (Kay, 1980) for monodirectional charts, which we shall use in the following heuristics: Let X be a symbol (a category) and Y, be a rule. Let us denote as L Yi the left-hand side of a rule Yi and with RY, its right-hand side, and with RY,(k) the kth symbol in the right- hand side of rule Yi. De$nition. X is reachable from the left from Y, iff there exists a finite chain of rules Y, . . . Y, such that X is the first symbol of the right-hand side of Y, and for every i, 2 <i<n, the symbol L Yi, left side of Yi, is the first symbol of RY,_ ,, right-hand side of Yi_ ,.

For example, X is reachable from the left from Y, if:

rule Y, A-rB C... rule Y, B-DE... rule Y, D+FG...

rule Y, V+X.....

Definition. X is reachable from the right from Y, iff there exists a finite chain of rules YY,... Y,, such that X is the last symbol of the right-hand side of Y,, and for every i, 2 ii,< n, the symbol LYi, left-hand side of Y, is the last symbol of R Y_ I, right-hand side of Yi_,.

For example, X is reachable from the right from Y, if:

rule Y, A+B...C rule Y, C+D...E rule Y, E-+F...G

rule ‘L V+ w...x.


Let us also define the length predicate for a string. Dejnition. length (x) is the number of symbols in x.

We are now ready to introduce heuristics 2, which brings in recursion. Let us first give an example, before stating things formally.

DET ADJ

NP + DET ADJ N fromposItion: loposition:

V

aT= 0

v,

SG-NPvNPPP fromposhon:l toposltlon:2

Figure 8.

In Fig. 8, in the critical vertex V,, an active edge enters from the !eft with:

rule a: NP+DET ADJ N fromposition: 0 toposition: 2 from: V, to: V‘ Ra: DET ADJ N La: NP length(Ra): 3

From the critical vertex VC an active edge exists with:

rule/k S+NP V NP PP fromposition: 1 toposition: 2 from: V,’ to: v, RPNPVNPPP Lp: s

It may be noted that R/?( 1) = La = NP. Therefore it can be hypothesized that a word with category Ra(3)= N is missing

between vertices VC and V”,. We put this idea formally as follows.

Heuristics 2

If at the critical vertex Van active edge A enters with rule(A) = a, and toposition = n, and the length(Ra) = n + I and from the next critical vertex an active edge A’ exists to the right with

rule(A’) =B and fromposition = m and R/?(m) = La


or there exists a ruleA, with LA = R/l(m) and La is reachable from the right from y, then a word with category corresponding to Ra(n+ 2) is missing at vertex V. Another example is given in Fig. 9. The active edge with:

rule a: S’-*RELPRON NP V fromposition: 0 toposition: 2 from: V, to: V‘ Ra: RELPRON NP V La: S’ Ra(3): V

enters in the critical vertex V<. From the critical vertex V,’ the following active edge exists:

rule j3: S-+NP V NP PP fromposition: 1 toposition: 2 from: V(.’ to: v, Rp(l): NP

There is a rule NP+DET N S’. that we call il such that:

L/%=Rj?(I)

and La = S’ is reachable from the right from A; in fact:

NP+DET N S’ S’+RELPRON NP V

Therefore a word with category V is supposedly missing between vertices V, and VC’. There is an analogous heuristic that operates symmetrically, reversing the situations at

the two critical vertices. Let us now consider the case of a missing fragment at the end or at the beginning of the sentence.

RELPRON NP

S’ + RELPRON NP V fromposition: tOpOSItIOn:

V

c 0 v,

S +NPVNPPP frompositlon:l toposItion:

Figure 9.


Heuristics 3

If at the critical vertex V, the last vertex to the right introduced in the chart, an active edge A enters with rule(A) = a, and toposition = n, and the length of Ra = n + 1, and there is an unrecognized substring starting from V and Ra(n + 1) is reachable from the right from at least one rule /3 with LB= S, initial symbol in the grammar, then a word with category corresponding to Ra(n+ 1) is missing at vertex I/.

Again, there is an analogous heuristic that operates symmetrically if the unrecognized substring is at the beginning of the sentence.

As specified above, these heuristics may eventually lead to a further activity in the chart and, if the situation is recoverable, to an overall solution of the sentence recognition problem, i.e. an inactive edge with category S, spanning from the beginning to the end of the sentence.

5. Adding more structure to the vertices

Bidirectional charts do not cause any substantial increase in the number of edges, if compared with traditional charts. Nonetheless, lists of edges outcoming from vertices must be checked more frequently. For instance, the A-A rule requests that two active edge lists are checked when a new active edge is introduced in the chart (one to the left and one to the right). In particular, comparisons among active edges exiting from the same vertex and referring to the same rule are common (consider the r/check introduced above). Therefore it is useful to propose a modification to the vertex structure in relation to outcoming active edges, that improves the efficiency of the mechanism.

Associated with each vertex V usually there are two sets of edges, a right-hand one and a left-hand one. The difference is that we introduce sets of structures that we call edgeselections. An edgeselection is composed of a rule, say r, a position, say i, and a forest that represents a partial ordering of the edges. Each tree in the forest has edges as nodes and all the edges share the same rule r and, if we are talking of the right-hand list, the same fromposition i. For each tree the root is the minimal edge, i.e. an edge spanning a fragment of r from position i and which is not “larger” than any other edge exiting from V with rule r and fromposition i. Recursively, for each edge E in the tree of edges, E is “larger” than its mother: i.e. it spans a larger fragment of the rule r starting from i, “including” all the inactive edges used in the construction of the mother. The structure is not a list but a tree, because symbols in the right-hand side of the rule may be realized spanning different portions of the input string. Similarly, the whole structure is a forest because there is a set of initial nodes, depending on how the initial element of the considered portion of the right-hand side of the rule is realized.

For an example, see Fig. 10. Let a be: L-*.4 B C. All active edges (Iower portion of the chart) have rule= a and fromposition =O. In the figure the inactive edges that were “included” to produce the edge (the sub-inactives field) are reported inside parentheses on active edges. The vertex I/ will include in the right-hand active edges field an edgeselection with rule= a, position=O, and forest: [(#S #7) (#6 #8)], Likewise for the left-hand field. When a new edge is inserted in the chart, the appropriate edgeselections will be affected, both for the left and for the right vertices. The edge is inserted in the chart only if it results in a leaf for one of the trees, or if it starts a new tree, and therefore, by construction, this structure provides a partial ordering of edges and avoids redun-


Figure 10.

dancy. It can be proved that alternative interpretations are not ignored because of bidirectionality.

We also have been working on two further aspects: the first one does away altogkther with the concept of the inactive edge. The second one better studies the interaction between chart and agenda; in particular the idea of a delayed evaluation of the precise characteristics of a task is followed. Future work will focus on this point.

6. Coneh~~ions

We have introduced a mechanism that extends the chart algorithm with bidirectionality. This step is a major one, since a monodirectional chart would simply not be able to base its processing on well recognized fragments (the so called islands) and derive hypotheses about the other parts of the input string. Instead, with the mechanism proposed here, for any place where reliable fragments occur in the sentence, the process will extend to both the left and the right of the islands, until possibly completely missing fragments, or borders of “unreliable” constituent hypotheses are reached. At that point, by virtue of the fact that both a left and a right context were found, heuristics can be introduced that predict the nature of the missing fragments.

The described mechanism is particularly advantageous when dealing with complex sentences, because it is an inherently non-deterministic mechanism, capable of dealing with the complex local ambiguity typical of natural language. An important aspect is that the mechanism is completely independent of the particular linguistic theory adopted. In technical terms, the linguistic representation is reflected only in the particular functional description, and in its particular operations, which are added to the edges of the chart and will provide the necessary information for constraining the process and allowing better predictions.

Altogether, the aim of this work was to cast a bridge between developments in computational linguistics and speech research problems and concepts. Bidirectional charts, as introduced here, seem to be suitable for speech recognition, but also for processing some other forms of ill-formed input; they seem also particularly suited even for processing well formed strings if they are combined with a head-driven linguistic theory.


The authors wish to thank the anonymous referees and Giorgio Satta for their comments, which helped improve substantially the quality of this presentation. Giorgio also helped by identifying an error in a previous version.

References

Aho, A. V. & Ullman, J. D. (1972). The Theory of Parsing, Translation and Compiling. Parsing. Prentice- Hall, Englewood Cliffs, New Jersey.

Church, K. & Patil, R. (1982). Coping with syntactic ambiguity or how to put the block in the box on the table. American Journal of Computational Linguistics 8, 139-149.

Earley, J. (1970). An efficient context-free parsing algorithm. Communications of the Association for Computing Machinery 13,94-102.

Gazdar, G. & Pullum G. K. (1982). Generalized Phrase Structure Grammar: A Theoretical Synopsis. Indiana University Linguistics Club, Bloomington, Indiana.

Kaplan, R. (1973). A general syntactic processor. In Natural Language Processing (Rustin, E., ed.), Prentice-Hall, Englewood Cliffs, New Jersey.

Kay, M. (1980). Algorithm Schemata and Data Structures in Syntactic Processing. Xerox, Palo Alto Research Center.

Lesser, V. R., Fennell, R. D., Ennan, L. D. & Reddy, R. (1975). Organization of the Hearsay II speech understanding system. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-23, 1 l-23.

Lowerre, B. T. & Reddy, R. (1978). The Harpy speech understanding system. In Trends In Speech Recognition (W. A. Lea, ed.), Prentice-Hall, New York.

Sag, I. & Pollard, C. (1987). Head driven phrase structure grammar: an informal synopsis. Report CSLI 87-79, CSLI, Stanford University.

Shieber, S. M. (1986). An introduction to unification-based approaches to grammar. CSLI Lecture Notes Series, No. 4, distributed by the University of Chicago Press, Chicago, Illinois.

Steel, S. & De Roeck, A. (1987). Bidirectional chart parsing. Proceedings of AISB-87, Edinburgh. Stock, 0. (1986). Dynamic unification in lexically based parsing. Proceedings of the Seventh European

Conference on Artificial Intelligence, Brighton. Also in: Advances in Artificial Intelligence II (B. Du Boulay, D. Hogg & L. Steels, eds), North Holland, Amsterdam.

Stock, 0. (1987). Coping with dynamic syntactic strategies: an experimental environment for an experimental parser. Proceedings of the Third Conference of the Association for Computational Linguistics, European Chapter, Copenhagen.

Stringa, L. (1988). An artificial intelligence approach to speech recognition and understanding. Pattern Recognition Letters, 8, 39-45.

Thompson, H. S. & Ritchie, G. (1984). Implementing natural language parsers. In Artificial Intelligence: Tools, Techniques and Applications, (T. O’Shea, & M. Eisenstadt, eds), Harper & Row, New York.

Walker, D. E. (1976). Speech understanding through syntactic and semantic analysis. IEEE Transactions on Computers, C-28.

Wiren, M. (1987). A comparison of rule invocation strategies in context-free chart parsing. In Proceedings of the Third Conference of the European Chapter of the Association for Computational Linguistics, Copenhagen.

Woods, W. (1982). Optimal search strategies for speech understanding control. Artificial Intelligence, 18, 295-326.

bidirectional charts: a potential technique for parsing spoken natural language sentences

Documents