dfa minimization algorithms in map reduce

38
DFA Minimization Algorithms in Map- Reduce Iraj Hedayati Somarin Master Thesis Defense – January 2016 Computer Science and Software Engineering Faculty of Engineering and Computer Science Concordia University upervisor: Gösta K. Grahne xaminer: Brigitte Jaumard xaminer: Hovhannes A. Harutyunyan hair: Rajagopalan Jayakumar

Upload: iraj-hedayati

Post on 16-Feb-2017

270 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: DFA minimization algorithms in map reduce

DFA Minimization Algorithms in Map-

ReduceIraj Hedayati Somarin

Master Thesis Defense – January 2016

Computer Science and Software EngineeringFaculty of Engineering and Computer Science

Concordia University

Supervisor: Gösta K. GrahneExaminer: Brigitte JaumardExaminer: Hovhannes A. HarutyunyanChair: Rajagopalan Jayakumar

Page 2: DFA minimization algorithms in map reduce

2

Outline• Introduction• DFA Minimization in Map-Reduce• Cost Analysis• Experimental Results• Conclusion

Page 3: DFA minimization algorithms in map reduce

3

INTRODUCTIONAn introduction about the problem and related works done

so far

Page 4: DFA minimization algorithms in map reduce

4

DFA, Big-Data and our Motivation• Finite Automata• Deterministic Finite Automata• DFA Minimization is the process of:

• Removing unreachable states• Merging non-distinguishable states

• What is Big-Data? (e.g. peta equal to 250 or 1015)• Insufficient study of DFA minimization for data-intensive

applications and parallel environments

𝐴=⟨𝑄 , Σ , 𝛿 , 𝑠 ,𝐹 ⟩

Page 5: DFA minimization algorithms in map reduce

5

DFA Minimization Methods(Watson, 1993)

Equivalence of States

()

Equivalence Relation

Bottom-Up Top-Down

Layer-wise Unordered State Pairs

Point-Wise

BrzozowskiDenote as a partition on , then:

Page 6: DFA minimization algorithms in map reduce

6

Moore’s Algorithm (Moore, 1956)• Input is DFA where and • Initialize partition over where:

• Iteratively refine the partition using equivalence relation in iteration (

• The initial partition is • Complexity

Page 7: DFA minimization algorithms in map reduce

7

Hopcroft’s Algorithm (Hopcroft, 1971)

• The idea is avoiding some unnecessary operations • Input is DFA where and • Initialize partition over where:

• Keep list of splitters• Iteratively divide partitions using splitter

where and

• Update the list of splitters• Complexity= ; Number of Iterations =

Page 8: DFA minimization algorithms in map reduce

8

Hopcroft’s Algorithm (Example)𝑃 𝐵

𝑄𝑈𝐸={ ⟨𝑃 ,𝑎 ⟩ , ⟨𝑃1 ,𝑎 ⟩ , ⟨𝑃2 ,𝑎 ⟩ }

𝑃1

𝑃2

𝑄𝑈𝐸=𝑄𝑈𝐸∪⟨ 𝐵1 ,𝑎⟩

𝑃 𝐵1 𝐵2

𝑃1

𝑃2

Page 9: DFA minimization algorithms in map reduce

9

Map-Reduce Model

DFSData 1

Data 2

Data 3

Data 4

Mapping

Mapper 1

Mapper 2

Reduce

Reducer 1

Reducer 2

Reducer 3

DFS

Data 1

Data 2

Data 3

Original Data Mapped Data

Page 10: DFA minimization algorithms in map reduce

10

Related Works in Parallel DFA Minimization

1) Employing EREW-PRAM model (Moore’s method) (Ravikumar and Xiong 1996)

2) Employing CRCW-PRAM model (Moore’s method) ()(Tewari et al. 2002)

3) Employing Map-Reduce model (Moore’s method) [Moore-MR] (Harrafi 2015)

• Challenge is how to store block numbers:1) Parallel in-block sorting and rename blocks in serial2) Parallel Perfect Hashing Function and partial sum3) No action is taken

Page 11: DFA minimization algorithms in map reduce

11

Cost Model• Communication Complexity (Yao 1979 & Kushilevitz 1997)• The Lower Bound Recipe for Replication Rate (Afrati et al. 2013)• Computational Complexity of Map-Reduce (Turan 2015)

Page 12: DFA minimization algorithms in map reduce

12

Cost Model – Communication Complexity

• Yao’s two-party model

Bob𝑦∈ {0,1 }𝑛

Alice𝑥∈ {0,1 }𝑛

𝑓 : {0,1 }𝑛× {0,1 }𝑛→ {0,1}

How much communication isrequired?

Upper Bound (Worst Case):

Rec 6Rec 4

Rec 1

Rec 2

Rec 5

Rec 3

𝐴⊂ {0,1 }𝑛

𝐵⊂ {0,1 }𝑛

Lower Bound:

where is the number of rectangles

Fooling set is a well-known method for finding f-monochromatic rectangles

Page 13: DFA minimization algorithms in map reduce

13

Cost Model – Lower Bound Recipe(Afrati et al. 2013)

Reducer 1

Reducer 2

Reducer n

Reducer Capacity =

Input =

𝜌1

𝜌2

𝜌𝑛

Output = O

𝑔 (𝜌1)

𝑔 (𝜌¿¿ 2)¿

𝑔 (𝜌¿¿𝑛)¿

ℛ=∑𝑖=1

𝑛

𝜌𝑖

¿ 𝐼∨¿¿

Page 14: DFA minimization algorithms in map reduce

14

Cost Model – Computational Complexity(Turan 2015)

• Lets denote a Turing machine where:• indicates whether it is a mapping task () or a reducer task ()• indicates the round number• indicates the input size• indicates the reducer size

• there is an -space and -time Turing machine and

Page 15: DFA minimization algorithms in map reduce

15

DFA MINIMIZATION IN MAP-REDUCE

Proposed algorithms for minimizing a DFA in Map-Reduce model

Page 16: DFA minimization algorithms in map reduce

16

Enhancement to Moore-MR• Moore-MR (Harrafi 2015):• Input • Pre-Processing: generate with records from • Mapping Schema: map every transition record of based on if and based on

and if

• Reducer Task: Compute new block number using Moore method• Note that, in order to accomplish reducer task in reducer , it requires for every

state it has a transition to. Transitions with are responsible to carry these data• Challenge is new block numbers are concatenation of other block numbers.

After round , the size of each is equal to .

Page 17: DFA minimization algorithms in map reduce

17

Enhancement to Moore-MRPPHF-MR

• Having and where , then is a one-to-one function

• Mapping: map every record to • Reducer Task: assign new block

number from range where is reducer number

Moore-MR-PPHF is obtained by applying PPHF-MR after each iteration of Moore-MR

Page 18: DFA minimization algorithms in map reduce

18

Hopcroft-MRPre-

Processing

PreProcessing

Mapper Reducer

Iterate Until QUE is not empty

PartitionDetect

Mapper Reducer

BlockUpdate

Mapper Reducer

PPHF-MR

Mapper Reducer

Construct

Minimal DFA

h (𝑞) h (𝑞) h (𝑝) h (𝜋𝑝)

Transition:

Δ blocks[a,Bi]

Block tuple:

, blocks[a,Bi]

Update tuple:

new, blocks[a,Bi],new , blocks[a,Bi]

Page 19: DFA minimization algorithms in map reduce

19

Hopcroft-MR vs. Hopcroft-MR-PAR

• In Hopcroft-MR we pick one splitter at a time while in Hopcroft-MR-PAR we pick all the splitters from QUE

• In Hopcroft-MR,

• In Hopcroft-MR-PAR, A

• Where A is bit vector

Page 20: DFA minimization algorithms in map reduce

20

COST ANALYSISAnalyzing cost measures for the proposed algorithms as well

as finding lower bound and upper bound on each

Page 21: DFA minimization algorithms in map reduce

21

Communication Cost Bounds• Upper-Bound for DFA minimization problem in parallel

environments

where and • Lower-Bound on DFA minimization problem in parallel

environments

Page 22: DFA minimization algorithms in map reduce

22

Lower Bound on Replication Rate

• : For every input record (transition) a reducer produces exactly one record of output. Hence

• The output is exactly equal to input size containing updated transitions. Hence, .

Page 23: DFA minimization algorithms in map reduce

23

Moore-MR-PPHF

• where is number of Map-Reduce rounds

Page 24: DFA minimization algorithms in map reduce

24

Hopcroft-MR

Page 25: DFA minimization algorithms in map reduce

25

Hopcroft-MR-PAR

Page 26: DFA minimization algorithms in map reduce

26

Comparison of Complexity Measures

Replication Rate

Communication Cost

Sensitive to Skewness

Lower Bound 1 -Moore-MR (Harrafi 2015)

No

Moore-MR-PPHF NoHopcroft-MR YesHopcroft-MR-PAR Yes

Page 27: DFA minimization algorithms in map reduce

27

EXPERIMENTAL RESULTS

Plotting the results gathered from running proposed algorithms on different data sets

Page 28: DFA minimization algorithms in map reduce

28

Data Generator - CircularInput DFA Minimized DFA

Page 29: DFA minimization algorithms in map reduce

29

Data Generator – Duplicated RandomInput DFA Minimized DFA

Page 30: DFA minimization algorithms in map reduce

30

Data Generator – Linear

Page 31: DFA minimization algorithms in map reduce

31

Moore-MR vs. Moore-MR-PPHF

Page 32: DFA minimization algorithms in map reduce

32

Circular DFA

Page 33: DFA minimization algorithms in map reduce

33

Replicated Random DFA

Page 34: DFA minimization algorithms in map reduce

34

Number of Rounds

Page 35: DFA minimization algorithms in map reduce

35

CONCLUSIONConcluding work done in this thesis and suggesting future

works and further questions

Page 36: DFA minimization algorithms in map reduce

36

Conclusion• In this work we studied DFA minimization algorithms in Map-Reduce and

PRAM• Proposed an enhancement to a DFA minimization algorithm in Map-

Reduce by introducing PPHF in Map-Reduce• Proposed a new algorithm in Map-Reduce based on Hopcroft’s method• Found lower bound on Replication Rate in Map-Reduce and

Communication Cost in parallel environment for DFA minimization problem

• Studied different measures of Map-Reduce algorithms• Found that two critical measures are missing: Sensitivity to Skewness

and Horizontal growth of data

Page 37: DFA minimization algorithms in map reduce

37

Future Works• Reducer Capacity vs. Number of Rounds trade-off• Investigating other methods of minimization • Extending complexity model and class• Is it possible to compare Map-Reduce algorithms with others in

different models (PRAM, serial, and etc.)?

Page 38: DFA minimization algorithms in map reduce

38

Thank you

Questions & Answer