![Page 1: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/1.jpg)
1
Finite State Transducers
Finite State Transducers
Mark Stamp
![Page 2: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/2.jpg)
2
Finite State Automata
FSA states and transitionso Represented as labeled directed
graphso FSA has one label per edge
State are circles: o Double circles for end states:
Beginning stateo Denoted by arrowhead:o Or, sometimes bold circle is used:
Finite State Transducers
![Page 3: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/3.jpg)
3
FSA Example
Nodes are states Transitions are (labeled) arrows For example…
Finite State Transducers
3
a
1
2y
c
z
![Page 4: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/4.jpg)
4
Finite State Transducer
FST input & output labels on edgeo That is, 2 labels per edgeo Can be more labels (e.g., edge
weights)o Recall, FSA has one label per edge
FST represented as directed grapho And same symbols used as for FSAo FSTs may be useful in malware
analysis…Finite State Transducers
![Page 5: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/5.jpg)
5
Finite State Transducer
FST has input and output “tapes”o Transducer, i.e., can map input to
outputo Often viewed as “translating” machineo But somewhat more general
FST is a finite automata with outputo Usual finite automata only has inputo Used in natural language processing
(NLP)o Also used in many other applicationsFinite State Transducers
![Page 6: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/6.jpg)
6
FST Graphically
Edges/transitions are (labeled) arrowso Of the form, i : o, that is, input:ouput
Nodes labeled numerically For example…
Finite State Transducers
3
a:b
1
2 y:q
c:d
z:x
![Page 7: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/7.jpg)
7
FST Modes
FST usually viewed as translating machine
But FST can operate in several modeso Generationo Recognitiono Translation (left-to-right or right-to-left)
Examples of modes considered next…
Finite State Transducers
![Page 8: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/8.jpg)
8
FST Modes
Consider this simple example: Generation mode
o Write equal number of a and b to first and second tape, respectively
Recognition modeo “Accept” when 1st tape has same
number of a as 2nd tape has b Translation mode next slide
Finite State Transducers
1
a:b
![Page 9: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/9.jpg)
9
FST Modes
Consider this simple example: Translation mode
o Left-to-right For every a read from 1st tape, write b to 2nd tape
o Right-to-left For every b read from 2nd tape, write a to 1st tape
Translation is the mode we usually want to consider
Finite State Transducers
1
a:b
![Page 10: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/10.jpg)
10
WFST
WFST == Weighted FSTo Include a “weight” on each edgeo That is, edges of the form i : o / w
Often, probabilities serve as weights…
Finite State Transducers
3
a:b/1
1
2 y:q/1
c:d/0.6
z:x/0.4
![Page 11: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/11.jpg)
11
FST Example
Homework…
Finite State Transducers
![Page 12: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/12.jpg)
12
Operations on FSTs
Many well-defined operations on FSTso Union, intersection, composition, etc.o These also apply to WFSTs
Composition is especially interesting
In malware context, might want to…o Compose detectors for same familyo Compose detectors for different
families Why might this be useful?
Finite State Transducers
![Page 13: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/13.jpg)
13
FST Composition
Compose 2 FSTs (or WFSTs)o Suppose 1st WFST has nodes 1,2,…,n o Suppose 2nd WFST has nodes 1,2,…,m o Possible nodes in composition labeled
(i,j), for i = 1,2,…,n and j = 1,2,…,m o Generally, not all of these will appear
Edge from (i1,j1) to (i2,j2) only when composed labels “match” (next slide…)
Finite State Transducers
![Page 14: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/14.jpg)
14
FST Composition
Suppose we have following labels o In 1st WFST, edge from i1 to i2 is x:y/p
o In 2nd WFST, edge from j1 to j2 is w:z/q
Consider nodes (i1,j1) and (i2,j2) in composed WFST o Edge between nodes provided y == w o I.e., output from 1st matches input for
2nd o And, resulting edge label is x:z/pq
Finite State Transducers
![Page 15: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/15.jpg)
15
WFST Composition
Consider composition of WFSTs
And…
Finite State Transducers
41 2a:b/0.1
3
41 2
3
a:b/0.2
b:b/0.3 a:b/0.5 a:a/0.6
b:b/0.4
b:b/0.1a:b/0.3 b:a/0.5
a:b/0.4
b:a/0.2
![Page 16: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/16.jpg)
16
WFSTCompositi
onExample
Finite State Transducers
41 2a:b/0.1
3
41 2
3
a:b/0.2
b:b/0.3 a:b/0.5 a:a/0.6
b:b/0.4
b:b/0.1a:b/0.3 b:a/0.5
a:b/0.4
b:a/0.2
1,1 2,2a:b/.01
1,2a:a/.04
a:a/.02
4,2b:a/.08
3,2
b:a/.06
a:a/.1
4,3
a:b/.24
a:b/.18
4,4
![Page 17: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/17.jpg)
17
WFST Composition In previous example, composition
is…
But (4,3) node is uselesso Must always end in a final state
Finite State Transducers
1,1 2,2a:b/.01
1,2a:a/.04
a:a/.02
4,2b:a/.08
3,2
b:a/.06
a:a/.1
4,3
a:b/.24
a:b/.18
4,4
![Page 18: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/18.jpg)
18
FST Approximation of HMM
Why would we want to approximate an HMM by FST?o Faster scoring using FSTo Easier to correct misclassification in
FSTo Possible to compose FSTso Most important, it’s really cool and
fun… Down side?
o FST may be less accurate than the HMM
Finite State Transducers
![Page 19: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/19.jpg)
19
FST Approximation of HMM
How to approximate HMM by FST? We consider 2 methods known as
o n-type approximationo s-type approximation
These usually focused on “problem 2”o That is, uncovering the hidden stateso This is the usual concern in NLP, such
as “part of speech” taggingFinite State Transducers
![Page 20: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/20.jpg)
20
n-type Approximation
Let V be distinct observations in HMMo Let λ = (A,B,π) be a trained HMMo Recall, A is N x N, B is N x M, π is 1 x N
Let (input : output / weight) = (Vi : Sj / p) o Where i {1,2,…,M} and j {1,2,…,N} o And Sj are hidden states (rows of B) o And weight is max probability (from λ)
Examples later…Finite State Transducers
![Page 21: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/21.jpg)
21
More n-type Approximations
Range of n-type approximationso n0-type only use the B matrixo n1-type see previous slideo n2-type for 2nd order HMMo n3-type for 3rd order HMM, and so on
What is 2nd order HMM?o Transitions depend on 2 consecutive
stateso In 1st order, only depend on previous
state Finite State Transducers
![Page 22: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/22.jpg)
22
s-type Approximation
“Sentence type” approximation Use sequences and/or natural breaks
o In n-type, max probability over one transition using A and B matrices
o In s-type, all sequences up to some length Ideally, break at boundaries of some sort
o In NLP, sentence is such a boundaryo For malware, not so clear where to breako So in malware, maybe just use a fixed length
Finite State Transducers
![Page 23: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/23.jpg)
23
HMM to FST
Exact representation also possibleo That is, resulting FST is “same” as
HMM Given model λ = (A,B,π) Nodes for each (input : output) = (Vi :
Sj) o Edge from each node to all other
nodes…o …including loop to same nodeo Edges labeled with target node o Weights computed from probabilities
in λ
Finite State Transducers
![Page 24: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/24.jpg)
24
HMM to FST
Note that some probabilities may be 0o Remove edges with 0 probabilities
A lot of probabilities may be smallo So, maybe approximate by removing
edges with “small” probabilities?o Could be an interesting experiment…o A reasonable way to approximate
HMM that does not seem to have been studied Finite State Transducers
![Page 25: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/25.jpg)
25
HMM Example
Suppose we have 2 coinso 1 coin is fair and 1 unfairo Roll a die to decide which coin to flipo We see resulting sequence of H and T
o We do not know which coin was
flipped…o …and we do not see the roll of the die
Observations? Hidden states? Finite State Transducers
![Page 26: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/26.jpg)
26
HMM Example Suppose probabilities are as given
o Then what is λ = (A,B,π) ?
Finite State Transducers
fair unfair0.9 0.2
0.8
0.1
0.5 0.5
H T H T
0.30.7
Observations:
Hidden states:
![Page 27: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/27.jpg)
27
HMM Example
HMM is given by λ = (A,B,π), where
A = B = π =
This π implies we start in F (fair) stateo Also, state 1 is F and state 2 is U (unfair)
Suppose we observe HHTHT o Then probability of, say, FUFFU isπFbF(H)aFUbU(H)aUFbF(T)aFFbF(H)aFUbU(T)
= 1.0(0.5)(0.1)(0.7)(0.8)(0.5)(0.9)(0.5)(0.1)(0.3) = 0.000189
Finite State Transducers
![Page 28: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/28.jpg)
28
HMM Example
We have
A =
B =
π =
And observe HHTHTo Probabilities in
tableFinite State Transducers
FFFFF .020503 .664086
FFFFU .001367 .044272
FFFUF .002835 .091824
FFFUU .000425 .013774
FFUFF .001215 .039353
FFUFU .000081 .002624
FFUUF .000387 .012243
FFUUU .000057 .001836
FUFFF .002835 .091824
FUFFU .000189 .006122
FUFUF .000392 .012697
FUFUU .000059 .001905
FUUFF .000378 .012243
FUUFU .000025 .000816
FUUUF .000118 .003809
FUUUU .000018 .000571
score probabilitystate
![Page 29: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/29.jpg)
29
HMM Example
So, most likely state sequence iso FFFFF o Solves problem 2
Problem 1, scoring?o Next slide
Problem 3?o Not relevant hereFinite State Transducers
FFFFF .020503 .664086
FFFFU .001367 .044272
FFFUF .002835 .091824
FFFUU .000425 .013774
FFUFF .001215 .039353
FFUFU .000081 .002624
FFUUF .000387 .012243
FFUUU .000057 .001836
FUFFF .002835 .091824
FUFFU .000189 .006122
FUFUF .000392 .012697
FUFUU .000059 .001905
FUUFF .000378 .012243
FUUFU .000025 .000816
FUUUF .000118 .003809
FUUUU .000018 .000571
score probabilitystate
![Page 30: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/30.jpg)
30
HMM Example
How to score sequence HHTHT ?
Sum over all stateso Sum the “score”
column in table:P(HHTHT) = .030874o Forward algorithm
is way more efficient
Finite State Transducers
FFFFF .020503 .664086
FFFFU .001367 .044272
FFFUF .002835 .091824
FFFUU .000425 .013774
FFUFF .001215 .039353
FFUFU .000081 .002624
FFUUF .000387 .012243
FFUUU .000057 .001836
FUFFF .002835 .091824
FUFFU .000189 .006122
FUFUF .000392 .012697
FUFUU .000059 .001905
FUUFF .000378 .012243
FUUFU .000025 .000816
FUUUF .000118 .003809
FUUUU .000018 .000571
score probabilitystate
![Page 31: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/31.jpg)
31
n-type Approximation
Consider the 2-coin HMM with
A = B = π =
For each observation, only include the most probable hidden stateo So, only possible FST labels in this
case…H:F/w1, H:U/w2, T:F/w3, T:U/w4
o Where weights wi are probabilitiesFinite State Transducers
![Page 32: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/32.jpg)
32
n-type Approximation
Consider example
A =
B =
π = For each
observation, most probable stateo Weight is probability
Finite State Transducers
2
H:F/0.45
3
T:F/0.45
1
H:F/0.5
T:F/0.5
T:F/0.45H:F/0.45
![Page 33: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/33.jpg)
33
n-type Approximation
Suppose instead…
A =
B =
π = Most probable state
for each observation?o Weight is probabilityFinite State Transducers
2
H:U/0.42
3
T:F/0.30
1
H:U/0.35
T:F/0.25
T:F/0.20
H:F/0.30
4
H:F/0.30
T:F/0.30
![Page 34: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/34.jpg)
34
HMM as FST
Consider 2-coin HMM where
A = B = π =
Then FST nodes correspond to…o Initial stateo Heads from fair coin, (H:F) o Tails from fair coin (T:F) o Heads from unfair coin (H:U) o Tails from unfair coin (T:U)
Finite State Transducers
![Page 35: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/35.jpg)
35
HMM as FST Suppose HMM is specified by
A = B = π =
Then FST is…
Finite State Transducers
2
H:F
3
T:F
4
1
5H:U
T:U
T:U T:U
T:U
H:U
H:UH:U
H:F
H:F
H:F
H:F
T:F
T:FT:F
T:F
![Page 36: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/36.jpg)
36
HMM as FST This FST is boring and not very useful
o Weights make it a little more interesting Computing the weights is homework…
Finite State Transducers
2
H:F
3
T:F
4
1
5H:U
T:U
T:U T:U
T:U
H:U
H:UH:U
H:F
H:F
H:F
H:F
T:F
T:FT:F
T:F
![Page 37: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/37.jpg)
37
Why Consider FSTs?
FST used as “translating machine” Well-defined operations on FSTs
o Composition is an interesting example Can convert HMM to FST
o Either exact or approximationo Approximations may be much
simplified, but might not be as accurate
Advantages of FST over HMM?Finite State Transducers
![Page 38: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/38.jpg)
38
Why Consider FSTs?
Scoring/translating faster with FST Able to compose multiple FSTs
o Where FSTs may be derived from HMMs One idea…
o Multiple HMMs trained on malware (same family and/or different families)
o Convert each HMM to FSTo Compose resulting FSTs
Finite State Transducers
![Page 39: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/39.jpg)
39
Bottom Line
Can we get best of both worlds?o Fast scoring, composition with FSTso Simplify/approximate HMMs via FSTso Tweak FST to improve scoringo Efficient training using HMMs
Other possibilities?o Directly compute an FST without HMMo Or FST as first pass (e.g.,
disassembly?)Finite State Transducers
![Page 40: Finite State Transducers 1 Mark Stamp. Finite State Automata FSA states and transitions o Represented as labeled directed graphs o FSA has one label](https://reader030.vdocuments.net/reader030/viewer/2022012918/56649cff5503460f949d06be/html5/thumbnails/40.jpg)
40
References
A. Kempe, Finite state transducers approximating hidden Markov models
J. R. Novak, Weighted finite state transducers: Important algorithms
K. Striegnitz, Finite state transducers
Finite State Transducers