a memory-efficient -removal algorithm for weighted finite-state automata thomas hanneforth,...

A MEMORY-EFFICIENT -REMOVAL ALGORITHM FOR WEIGHTED FINITE-STATE AUTOMATA

Thomas Hanneforth, Universität Potsdam

Overview

-transitions in finite-state based NLP Removing -transitions in weighted finite-

state automata: an algorithm by M. Mohri

Some formal definitions An improved algorithm demonstrated in

the case of acyclic automata Experiments

-Transitions in finite-state based NLP

Many NLP applications based on (weighted) finite-state automata (WFSM) create a lot of -transitions during processing

Examples: Applying bracketing rules (NE-recognition, local

grammars) Corpus processing

These -transitions have to be removed due to speed and efficiency reasons.

In many cases, the finite state automata containing the -transitions are acyclic.

Example: N-gram counting in corpora

A corpus is a disjunction of sentences. The corpus of which the N-grams are to

be counted is represented as an acyclic WFSM over the real semiring.

That means: the weighted along a path in the corpus WFSM are multiplied to compute the absolute frequency of a given sentence.

The N-gram counter is represented as a special cyclic weighted finite-state transducer.


A corpus C as a WFSM

For example, the absolute frequency of the sentence bbcd is 4 · 0.25 · 1 · 1 · 1 = 1


A corpus as a WFSM

Counting is basically composition of the corpus with the counting transducer and taking the lower tape of the result: 2(C T)

A bigram counting transducer T


2(C T)

-removal in WFSMs: Mohri‘s algorithm

1. For each state p compute the -distance to any other reachable state q.

2. For each -path with distance w from p to q and a single transition from q to r labeled with a and weight w’, add a transition from p to r with label a and weight ww’ to the FSA. If q is a final state, p will also become a final state. If p already was a final state, the final weights of q and p are additively combined.

3. Remove all -transitions, non-reachable states and non-contributing transitions.


General -removal pattern:

The states for which the pattern is applied can be visited in any order

The -distance between p and q is w

Weights w and w‘ are combined by

multiplication


If the -subgraph of the WFSM is acyclic, it is possible to process the states in reverse topological order:Example:

Reverse topological order

Two transitions attached to non-reachable states are superfluous and have to be removed in step 3

Nevertheless, they preserve the weights associated with -transitions earlier in the reversed topological order.

An improved algorithm: Idea

The attachment of newly created transitions to inaccessible states must be somehow avoided

But, when applying the reverse topological order strategy, these transitions are necessary even if they are deleted in step 3 of the algorithm

Thus, the reverse topological order strategy can be no longer used

Simple idea: keep track of reachable states I will focus on the special case of acyclic WFSMs

Some formal definitions

1)<,,0> is a commutative monoid with 0 as the identity element for

2)<,,1> is a monoid with 1 as the identity element for

3) distributes over 4)0 is an annihilator for : w , w 0 = 0 w

= 0

A structure <,,,0,1> is a semiring if it fulfils the following conditions:

Semiring

Common semirings are the real semiring <R, +, ·, 0, 1> and the tropical semiring <R, min, +, 0, >.

Some formal definitions

-distance between two states p and q

w()-dist(p,q) = (p, , q)

Set of all paths between p and q labeled with

w() = w[t1] w[t2] ... w[tk] All -path weights

are abstractly added

A path = t1t2 … tk

An improved algorithm: example

-Reachable

Topological order

-distance(0) = { 1,0.1, 2,0.3, 3,0.6 }-distance(4) =

= {0,4}= {0}

An improved algorithm

Input: An acyclic WFSA A = ,Q,q0,F,E,Output: An equivalent -free WFSA A’R -reachable({q0})for all p Q in ascending order do

if p R thenD compute-shortest--distances(A,p)R‘ for all q,w D do

for all t E[q] doE E { p, l[t],w w[t], n[t] }R‘ R‘ {n[t]}

end foradjust-final-state(A,p,q)

end forR R -reachable(R‘)

end ifend fordelete--transitions(A)delete-states(Q-R)connect(A)return A

Improved algorithm: -distances

-distances are usually computed with a generalized shortest-distance algorithm

For cyclic WFSMs, this algorithm may be optimized by letting it operate on the strongly connected components of the WFSM

For acyclic WFSMs, relaxation in topological order is the most efficient algorithm

Improved algorithm: Computing -distances

1. Topologically sort the input WFSM and use this order for computing -distances

2. Construct a embedded topological order for every -subautomaton (two-pass strategy)

3. As 2., but cache already computed distances4. Topologically sort the input WFSM and make

use of a priority queue which is ordered after state number

There are at least 4 approaches to compute acyclic -distances:

Improved algorithm: Computing -distances in an acyclic WFSM

Example:

The global topological order is 0 1 2 3 4 5 6 There are two -subgraphs rooted at states 1 and 2,

respectively. The topological orders are:

1 3 4 5 2 4 5

In a topologically ordered WFSM, whenever you have a transition p q, the state number of q is strictly greater than the state number of p.

Improved algorithm: -distances with a priority queueInput:Output:S PQ enqueue(PQ,p)while PQ do

q pop(PQ)if q S then

S S {q}if q = p then dq 1else dq d[q]end iffor all t E[q] do

d[n[t]] d[n[t]] (dq w[t])enqueue(PQ,n[t])

end forend if

end whilereturn d

Improved algorithm: Complexity Of course, in the worst case the

algorithm presented here has the same complexity as Mohri‘s algorithm

So, the complexity is: In the acyclic case: O(|Q||E| + |Q|2) In the cyclic case: O(|Q||E| +|Q|2 log |Q|)

The memory complexity is in O(|Q|) As the experiments will show, there is a

clear improvement in practical cases

Experiments: Input data

Input data: 50,000 sentences of the German TiGer corpus

Compiled into an optimised WFSM over the real semiring with 681,689 states and 730,175 transitions with || = 89,418

To that, a trigram counter was applied This resulted in a WFSM with 2,724,212

states and 3,615,890 transitions (1,429,530 -transitions)

The out-degree, that is, the maximum number of outgoing transitions for a state was 14,044

Experiments

AlgorithmTotal time

(s)Max.

memory usage(MB)

# transitions

(before connect)

Mohri‘s algorithm with reverse topological order strategy

3.48 409 13,306,056

Algorithm with reachability enforcement: Processing the –subautomata in topological order

8.46 116 2,912,740

Algorithm with reachability enforcement: Using a priority queue

8.21 106 2,912,740

The experiments were run on an Intel Quadcore CPU with 2.5 GHz (one core used)Transition labels and weights use both 4 Bytes

Experiments: Conclusions

Mohri's original algorithm is very fast, since in the acyclic case it only requires a single traversion through the state sequence. But, 83.5 % of the added transitions were useless

Its memory usage depends crucially on the out-degree of the input WFSM which in turn depends on the size of the alphabet

That is, for bigger corpora with alphabet sizes of several hundred thousand symbols, the non-optimized approach may become unfeasible

The revised algorithm in its two variants perform slower, since they compute -distances

But their memory requirements are much lower

Appendix

adjust-final-state(A,p,q)if q F then

if p F then (p) (p) (w (q))

else F F {p}(p) w (q)

end ifend if

a memory-efficient -removal algorithm for weighted finite-state automata thomas hanneforth,...

Documents