Download - Faster 2-Dimensional Scaled Matching
Faster 2-Dimensional Faster 2-Dimensional Scaled MatchingScaled Matching
Amihood Amir and Eran Chencinski
Real ScalingReal Scaling
Given an n x n Text T, m x m pattern P, find all Given an n x n Text T, m x m pattern P, find all occurrences of P in T, scaled to any read scaleoccurrences of P in T, scaled to any read scale
Best known algorithm [Amir at el.]:Best known algorithm [Amir at el.]: Time:Time: O(nm O(nm33+n+n22m*log(m)) m*log(m)) Space:Space: O(nm O(nm33+n+n22))
Our Altorithm:Our Altorithm: Time:Time: O(n O(n22m) m) Space:Space: O(n O(n22))
Scaling – Geometric Scaling – Geometric DefinitionDefinition
Scaling – Algebraic Scaling – Algebraic DefinitionDefinition
Rounding Function:Rounding Function:
Scaling – Algebraic Scaling – Algebraic DefinitionDefinition
Given pattern P, of size m x m, and scale rGiven pattern P, of size m x m, and scale r The first row would be scaled to || 1*r ||The first row would be scaled to || 1*r || The first 2 rows would be scaled to || 2*r ||The first 2 rows would be scaled to || 2*r || …… The first m rows would be scaled to || m*r ||The first m rows would be scaled to || m*r ||
Similarly on the columnsSimilarly on the columns
Scaling – Algebraic Scaling – Algebraic DefinitionDefinition
Rounding Function:Rounding Function:
Inverse Rounding Function: suppose we Inverse Rounding Function: suppose we know that K rows where scaled to L row:know that K rows where scaled to L row:
Subrow/column Repetition Subrow/column Repetition QueryQuery
Query time: O(1), preprocessing time: O(nQuery time: O(1), preprocessing time: O(n22))
Algorithm LayoutAlgorithm Layout
The algorithm consists of 4 stages:The algorithm consists of 4 stages:1. Scale Elimination1. Scale Elimination2. Candidate Consistency2. Candidate Consistency3. Candidate Verification3. Candidate Verification4. Occurrence Recognition4. Occurrence Recognition
Each stage takes O(nEach stage takes O(n22m) time and O(nm) time and O(n22) ) spacespace
Scale Elimination StageScale Elimination Stage
PivotPivot
Scale Elimination StageScale Elimination Stage
(i,j)(i,j)
Scale Elimination StageScale Elimination Stage
(i,j)(i,j)
O(m) time for each location, O(nO(m) time for each location, O(n22m) total, O(nm) total, O(n22) space) space
Candidate Consistency Candidate Consistency StageStage
Candidate Consistency Candidate Consistency StageStage
Case (a)Case (a) Case (b)Case (b)
Witness Table ConstructionWitness Table Construction
For each suffix O(mFor each suffix O(m22) time and O(m) space) time and O(m) space
Pre-Dueling StepPre-Dueling Step
For each candidate For each candidate cc in T: in T:For each suffix For each suffix ss of P: of P:Compare Compare c’sc’s borders with witness table borders with witness table borders of suffix borders of suffix ss
If borders are not the same – c is eliminatedIf borders are not the same – c is eliminated
Can be done in O(m) time for each candidateCan be done in O(m) time for each candidate
Performing a DuelPerforming a Duel
The Dueling OrderThe Dueling Order
Each candidate performs at most O(m) succ. duelsEach candidate performs at most O(m) succ. duels
Witness Table construction: Witness Table construction: O(mO(m33) time, O(m) time, O(m22) space) space
Pre-Dueling Step:Pre-Dueling Step: O(nO(n22m) time, O(mm) time, O(m22) space) space
# of Duel# of Duel At most O(n) unsucc., at most O(nAt most O(n) unsucc., at most O(n22m) succ.m) succ.
where each duel takes O(1) timewhere each duel takes O(1) time
Total:Total: O(n O(n22m) time, O(nm) time, O(n22) space) space
Candidate Consistency Candidate Consistency StageStage
Candidate Verification Candidate Verification StageStage
Candidate Verification Candidate Verification StageStage
For each location find maximal containing For each location find maximal containing intervalinterval
Can be solved in O(n) time per row using solution Can be solved in O(n) time per row using solution to Maximal Interval Problemto Maximal Interval Problem
Once we find the largest interval we: Once we find the largest interval we: Verify each row in O(m) time, using Verify each row in O(m) time, using
subcolumn repetition queriessubcolumn repetition queries Save the longest matching lengthSave the longest matching length For each candidate run a Range For each candidate run a Range
Minimum Query on the lengthsMinimum Query on the lengths
The pattern appears The pattern appears iffiff pattern size >= pattern size >= RMQRMQ
Candidate Verification Candidate Verification StageStage
Finding largest intervals:Finding largest intervals: O(n) time per row, O(nO(n) time per row, O(n22) total) total
Verifing columns:Verifing columns: O(nm) time per row, O(nO(nm) time per row, O(n22m) totalm) total
RMQ :RMQ : Preprocess: O(n) time per row, O(nPreprocess: O(n) time per row, O(n22) total) total Quering: O(1) time per candidate, O(nQuering: O(1) time per candidate, O(n22) )
totaltotal
Total:Total: O(n O(n22m) time, O(nm) time, O(n22) space) space
Candidate Verification Candidate Verification StageStage
Occurrence Recognition Occurrence Recognition StageStage
Recall: Scale elimination stage returned Recall: Scale elimination stage returned
At most O(m) steps At most O(m) steps per candiateper candiate
Total: O(nTotal: O(n22m) timem) time
ConclusionsConclusions
The algorithm consists of 4 stages:The algorithm consists of 4 stages:1. Scale Elimination1. Scale Elimination2. Candidate Consistency2. Candidate Consistency3. Candidate Verification3. Candidate Verification4. Occurrence Recognition4. Occurrence Recognition
Each stage takes O(nEach stage takes O(n22m) time and O(nm) time and O(n22) ) spacespace