boyer moore searches on binary texts
DESCRIPTION
Accelerating. Boyer Moore Searches on Binary Texts. Shmuel Tomi Klein Miri Kopel Ben-Nissan Bar Ilan University, ISRAEL. Background and motivation. Boyer Moore algorithm. New binary variant. Analysis. Experiments. Summary. Outline. Background and motivation. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/1.jpg)
Boyer Moore Searches Boyer Moore Searches
on Binary Textson Binary TextsShmuel Tomi Klein Shmuel Tomi Klein Miri Kopel Ben-NissanMiri Kopel Ben-Nissan
Bar Ilan University, ISRAELBar Ilan University, ISRAEL
AcceleratingAccelerating
![Page 2: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/2.jpg)
Outline
Background and motivationBoyer Moore algorithm
Analysis
Experiments
New binary variant
Summary
Background and motivationBoyer Moore algorithm
New binary variant
Analysis
Experiments
Summary
![Page 3: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/3.jpg)
Important application of Automata:
PATTERN MATCHING
KMP BDM BM
Boyer & Moore
this-is-a-sample-text---
pattern
Match Backwards ! !
![Page 4: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/4.jpg)
Mismatch – case 1: Mismatch – case 1: deltadelta11
ub
ua
b does not occur in x
x
y
contains no bcontains no bx
shift
Boyer – Moore Algorithm
![Page 5: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/5.jpg)
ub
uax
y
contains no bcontains no bbx
shift
b occurs in x
Mismatch – case 2: Mismatch – case 2: deltadelta11
Boyer – Moore Algorithm
![Page 6: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/6.jpg)
ub
uax
y
ucx
shift
Mismatch – case 3: Mismatch – case 3: deltadelta22
u reoccurs in x preceded by c ≠ a
Boyer – Moore Algorithm
![Page 7: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/7.jpg)
ub
uax
y
vx
v shift
Mismatch – case 4: Mismatch – case 4: deltadelta22
Only a suffix v of u reoccurs in x
Boyer – Moore Algorithm
![Page 8: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/8.jpg)
Boyer – Moore Example
aaeellmmppxxresrestt
44001133225577
eexxaammppllee
12121111101099887711
example
deltadelta11
deltadelta22
here ihere iss a simple example a simple example
exampleexamplehere is a simhere is a simpple examplele example
exampleexamplehere is a shere is a siimplemple example example
exaexamplemplehere is a simple examhere is a simple exampplele
exampleexamplehere is a simple here is a simple exampleexample
exampleexample
![Page 9: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/9.jpg)
Problems of Binary Boyer & Moore
deltadelta1 1 uselessuseless
most work bymost work by delta delta11
0100101101011101000100110101001
1101100
this-is-a-sample-text---
pattern
Bit-level processing
![Page 10: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/10.jpg)
Need for Binary Boyer & Moore
Compressed Matching
Given E(T) and P look for E(P) in E(T)
rather than P in D(E(T))
Suggested Solution:
BBBMM Blocked Binary Boyer Moore
Matching
![Page 11: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/11.jpg)
k
shsl
BBBMM
Text [ i ]
Pat [ sh , j ]
![Page 12: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/12.jpg)
ffghabdgttiocbsbgghj
0110001001101010
BBBMM
More information in binary case
ASCII
BINARY
![Page 13: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/13.jpg)
BBBMM
101
101
i i + 1i – 1
T
P
101
100
extended extended delta delta11
01
ksl 1slB 20
mBsldelta ],[1
![Page 14: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/14.jpg)
BBBMM
Total size of delta1 tables:
2221
1 k
sl
ksl
If too large, use limit value kK
T
P
sl k
K
Size of delta1 tables reduced to
12 K
![Page 15: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/15.jpg)
BBBMM
Original delta1 : increase of text pointer BBBMM delta1 : shift size
T
P
Mismatch not in last block
Correct[sh,j]
![Page 16: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/16.jpg)
BBBMM
T
P
deltadelta22
][2 matchlenmdelta
jj11223344556677889910
11
12
13
14
15
16
Pat[Pat[jj]]11001100110011001111110011110011deltadelta22[[jj
]]1133
1133
1133
1133
1133
1133
1133
1133
1133
1133
1133
33771155
2211
![Page 17: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/17.jpg)
AnalysisAssumption : random input
Reasonable for compressed text
Expected # comparisons till mismatch:
Bit-wise:
221
m
j
jj
Blocked:
kk
k
sl
km
t
sltk 112
11
1
/
1
)(
![Page 18: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/18.jpg)
AnalysisExpected # bits shifted after mismatch:
Bit-wise: M
Blocked: M’
mmME jm
j
j log),2min(2)(1
MM '
![Page 19: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/19.jpg)
Experiments
English Bible (2.5MB) World Factbook (1.5MB)
Text: Huffman encoded
Patterns: Random substrings
of lengths 10 to 500
k = 8
![Page 20: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/20.jpg)
Experiments:Average # comparisons between shiftsAverage # comparisons between shifts
Bit-wiseBlocked
100 200 300 400 500
1.1
1.2
1.3
1.4
1.5
length of pattern
![Page 21: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/21.jpg)
Experiments:Average size of shiftsAverage size of shifts
Bit-wise
100 200 300 400 500
20
40
60
80
100
length of pattern
Blocked
![Page 22: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/22.jpg)
Experiments:Average # comparisons for 1000 bitsAverage # comparisons for 1000 bits
100 200 300 400 500
100
200
300
400
500
length of pattern
Bit-wise
Blocked
BDM
![Page 23: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/23.jpg)
Experiments:Time to locate first occurrence (ms)Time to locate first occurrence (ms)
100 200 300 400 500
50
100
150
200
250
length of pattern
300
Bit-wise
Blocked
BDMTurbo-BDM
![Page 24: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/24.jpg)
Summary
Blocked variant of BMBlocked variant of BM
Faster than alternatives, Overhead 1-10 KFaster than alternatives, Overhead 1-10 K
Extensions:Extensions:
ASCII, words instead of characters
![Page 25: Boyer Moore Searches on Binary Texts](https://reader035.vdocuments.net/reader035/viewer/2022081514/56813a25550346895da204fb/html5/thumbnails/25.jpg)
Thank you Thank you !!