a memory-efficient -removal algorithm for weighted finite-state automata thomas hanneforth,...
TRANSCRIPT
A MEMORY-EFFICIENT -REMOVAL ALGORITHM FOR WEIGHTED FINITE-STATE AUTOMATA
Thomas Hanneforth, Universität Potsdam
Overview
-transitions in finite-state based NLP Removing -transitions in weighted finite-
state automata: an algorithm by M. Mohri
Some formal definitions An improved algorithm demonstrated in
the case of acyclic automata Experiments
-Transitions in finite-state based NLP
Many NLP applications based on (weighted) finite-state automata (WFSM) create a lot of -transitions during processing
Examples: Applying bracketing rules (NE-recognition, local
grammars) Corpus processing
These -transitions have to be removed due to speed and efficiency reasons.
In many cases, the finite state automata containing the -transitions are acyclic.
Example: N-gram counting in corpora
A corpus is a disjunction of sentences. The corpus of which the N-grams are to
be counted is represented as an acyclic WFSM over the real semiring.
That means: the weighted along a path in the corpus WFSM are multiplied to compute the absolute frequency of a given sentence.
The N-gram counter is represented as a special cyclic weighted finite-state transducer.
Example: N-gram counting in corpora
A corpus C as a WFSM
For example, the absolute frequency of the sentence bbcd is 4 · 0.25 · 1 · 1 · 1 = 1
Example: N-gram counting in corpora
A corpus as a WFSM
Counting is basically composition of the corpus with the counting transducer and taking the lower tape of the result: 2(C T)
A bigram counting transducer T
Example: N-gram counting in corpora
2(C T)
-removal in WFSMs: Mohri‘s algorithm
1. For each state p compute the -distance to any other reachable state q.
2. For each -path with distance w from p to q and a single transition from q to r labeled with a and weight w’, add a transition from p to r with label a and weight ww’ to the FSA. If q is a final state, p will also become a final state. If p already was a final state, the final weights of q and p are additively combined.
3. Remove all -transitions, non-reachable states and non-contributing transitions.
-removal in WFSMs: Mohri‘s algorithm
General -removal pattern:
The states for which the pattern is applied can be visited in any order
The -distance between p and q is w
Weights w and w‘ are combined by
multiplication
-removal in WFSMs: Mohri‘s algorithm
If the -subgraph of the WFSM is acyclic, it is possible to process the states in reverse topological order:Example:
Reverse topological order
Two transitions attached to non-reachable states are superfluous and have to be removed in step 3
Nevertheless, they preserve the weights associated with -transitions earlier in the reversed topological order.
An improved algorithm: Idea
The attachment of newly created transitions to inaccessible states must be somehow avoided
But, when applying the reverse topological order strategy, these transitions are necessary even if they are deleted in step 3 of the algorithm
Thus, the reverse topological order strategy can be no longer used
Simple idea: keep track of reachable states I will focus on the special case of acyclic WFSMs
Some formal definitions
1)<,,0> is a commutative monoid with 0 as the identity element for
2)<,,1> is a monoid with 1 as the identity element for
3) distributes over 4)0 is an annihilator for : w , w 0 = 0 w
= 0
A structure <,,,0,1> is a semiring if it fulfils the following conditions:
Semiring
Common semirings are the real semiring <R, +, ·, 0, 1> and the tropical semiring <R, min, +, 0, >.
Some formal definitions
-distance between two states p and q
w()-dist(p,q) = (p, , q)
Set of all paths between p and q labeled with
w() = w[t1] w[t2] ... w[tk] All -path weights
are abstractly added
A path = t1t2 … tk
An improved algorithm: example
-Reachable
Topological order
-distance(0) = { 1,0.1, 2,0.3, 3,0.6 }-distance(4) =
= {0,4}= {0}
An improved algorithm
Input: An acyclic WFSA A = ,Q,q0,F,E,Output: An equivalent -free WFSA A’R -reachable({q0})for all p Q in ascending order do
if p R thenD compute-shortest--distances(A,p)R‘ for all q,w D do
for all t E[q] doE E { p, l[t],w w[t], n[t] }R‘ R‘ {n[t]}
end foradjust-final-state(A,p,q)
end forR R -reachable(R‘)
end ifend fordelete--transitions(A)delete-states(Q-R)connect(A)return A
Improved algorithm: -distances
-distances are usually computed with a generalized shortest-distance algorithm
For cyclic WFSMs, this algorithm may be optimized by letting it operate on the strongly connected components of the WFSM
For acyclic WFSMs, relaxation in topological order is the most efficient algorithm
Improved algorithm: Computing -distances
1. Topologically sort the input WFSM and use this order for computing -distances
2. Construct a embedded topological order for every -subautomaton (two-pass strategy)
3. As 2., but cache already computed distances4. Topologically sort the input WFSM and make
use of a priority queue which is ordered after state number
There are at least 4 approaches to compute acyclic -distances:
Improved algorithm: Computing -distances in an acyclic WFSM
Example:
The global topological order is 0 1 2 3 4 5 6 There are two -subgraphs rooted at states 1 and 2,
respectively. The topological orders are:
1 3 4 5 2 4 5
In a topologically ordered WFSM, whenever you have a transition p q, the state number of q is strictly greater than the state number of p.
Improved algorithm: -distances with a priority queueInput:Output:S PQ enqueue(PQ,p)while PQ do
q pop(PQ)if q S then
S S {q}if q = p then dq 1else dq d[q]end iffor all t E[q] do
d[n[t]] d[n[t]] (dq w[t])enqueue(PQ,n[t])
end forend if
end whilereturn d
Improved algorithm: Complexity Of course, in the worst case the
algorithm presented here has the same complexity as Mohri‘s algorithm
So, the complexity is: In the acyclic case: O(|Q||E| + |Q|2) In the cyclic case: O(|Q||E| +|Q|2 log |Q|)
The memory complexity is in O(|Q|) As the experiments will show, there is a
clear improvement in practical cases
Experiments: Input data
Input data: 50,000 sentences of the German TiGer corpus
Compiled into an optimised WFSM over the real semiring with 681,689 states and 730,175 transitions with || = 89,418
To that, a trigram counter was applied This resulted in a WFSM with 2,724,212
states and 3,615,890 transitions (1,429,530 -transitions)
The out-degree, that is, the maximum number of outgoing transitions for a state was 14,044
Experiments
AlgorithmTotal time
(s)Max.
memory usage(MB)
# transitions
(before connect)
Mohri‘s algorithm with reverse topological order strategy
3.48 409 13,306,056
Algorithm with reachability enforcement: Processing the –subautomata in topological order
8.46 116 2,912,740
Algorithm with reachability enforcement: Using a priority queue
8.21 106 2,912,740
The experiments were run on an Intel Quadcore CPU with 2.5 GHz (one core used)Transition labels and weights use both 4 Bytes
Experiments: Conclusions
Mohri's original algorithm is very fast, since in the acyclic case it only requires a single traversion through the state sequence. But, 83.5 % of the added transitions were useless
Its memory usage depends crucially on the out-degree of the input WFSM which in turn depends on the size of the alphabet
That is, for bigger corpora with alphabet sizes of several hundred thousand symbols, the non-optimized approach may become unfeasible
The revised algorithm in its two variants perform slower, since they compute -distances
But their memory requirements are much lower
Appendix
adjust-final-state(A,p,q)if q F then
if p F then (p) (p) (w (q))
else F F {p}(p) w (q)
end ifend if