a parallel pattern mining algorithm for multi-core …...frequent itemset mining apriori eclat...
TRANSCRIPT
![Page 1: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/1.jpg)
Laboratoire d’Informatiquede Grenoble
A Parallel Pattern Mining Algorithm forMulti-Core Architectures
Benjamin Negrevergne
Jury:
• Marie-Christine Roussetlig/Grenoble University
• Alexandre Termierlig/Grenoble University
• Jean-Francois Mehautlig/Grenoble University
• Hiroki ArimuraGraduate School of Science and
Technology/Hokkaido University
• Bruno Cremilleuxgreyc/University of Caen
• Anne Laurentlirmm/University of Montpellier
Laboratoire d’Informatique de Grenoble 1
![Page 2: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/2.jpg)
Pattern mining
Pattern mining is a sub-topic of Data MiningKnowledge discovery in databases [Fayyad, 96]
”Identifying valid, novel, potentially useful, and ultimatelyunderstandable patterns in data”
Pattern mining:
• Extract knowledge as patterns representing regularities (orirregularities) in data.
Laboratoire d’Informatique de Grenoble 2
![Page 3: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/3.jpg)
Broad range of applications
� Mining frequent set of items purchased together
.
� Mining frequent sub molecules in a moleculardatabase
.
� Mining correlations in sensors data
.
Laboratoire d’Informatique de Grenoble 3
![Page 4: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/4.jpg)
Broad range of applications
� Mining frequent set of items purchased together
O
Possible frequent pattern: {milk, cereals}
• Dataset: set of receipts
• Patterns: set of items
� Mining frequent sub molecules in a moleculardatabase
.
� Mining correlations in sensors data
.
Laboratoire d’Informatique de Grenoble 3
![Page 5: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/5.jpg)
Broad range of applications
� Mining frequent set of items purchased together
.
� Mining frequent sub molecules in a moleculardatabase
.
� Mining correlations in sensors data
.
Laboratoire d’Informatique de Grenoble 3
![Page 6: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/6.jpg)
Broad range of applications
� Mining frequent set of items purchased together
.
� Mining frequent sub molecules in a moleculardatabase
O
Krammer [2001] was able to identify the followingmolecular pattern in a set of anti-HIV molecules
OO
OO
N
N
N+
N
N-
HH
H
HHH
HH
H
H
H H
• Dataset: set of molecules (represented as graphs)
• Patterns: graphs
� Mining correlations in sensors data
.
Laboratoire d’Informatique de Grenoble 3
![Page 7: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/7.jpg)
Broad range of applications
� Mining frequent set of items purchased together
.
� Mining frequent sub molecules in a moleculardatabase
.
� Mining correlations in sensors data
.
Laboratoire d’Informatique de Grenoble 3
![Page 8: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/8.jpg)
Broad range of applications
� Mining frequent set of items purchased together
.
� Mining frequent sub molecules in a moleculardatabase
.
� Mining correlations in sensors data
O
Records from meteorological sensorsFrequent pattern:Temperature ↓ Pressure ↓ Windspeed ↑
• Dataset: list of numerical sensor records
• Patterns: set of variations
Laboratoire d’Informatique de Grenoble 3
![Page 9: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/9.jpg)
Broad range of applications
� Mining frequent set of items purchased together
.
� Mining frequent sub molecules in a moleculardatabase
.
� Mining correlations in sensors data
.
Laboratoire d’Informatique de Grenoble 3
![Page 10: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/10.jpg)
The broad range of solutions
Nowadays: many different algorithms for each pattern miningproblem• Frequent itemset mining
Apriori Eclat FP-Growth DCI-Closed LCM[Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]
• Graph MiningAGM gSpan FSG CloseGraph Gaston[Inokuchi 2000] [Yan 2002] [Kuramochi 2001] [Yan 2003] [Nijssen 2004]
• Tree MiningFreqT TreeMiner Dryade CMTreeMiner[Asai 2002] [Zaki 2005] [Termier 2004] [Chi 2004]
• Gradual itemset miningGrite PGP-mc PGLCM[Di Jorio 2009] [Laurent 2010] [Do 2010]
• SequenceSpade PrefixSpan CloSpan LCM seq[Zaki 2000] [Pei 2000] [Yan 2003] [Uno]
Laboratoire d’Informatique de Grenoble 4
![Page 11: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/11.jpg)
Challenge #1: Genericity
This algorithmic diversity
1. data owners refrain using pattern mining techniques
2. use inadequate algorithms
Challenge #1:
Use a generic approach to address different pattern miningproblems with a unique algorithm.
Laboratoire d’Informatique de Grenoble 5
![Page 12: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/12.jpg)
Challenge #2: Build an efficient and generic patternmining algorithm
Pattern mining is inherently combinatorial:I very large number of possible patterns
Example
basket market analysis: 1000 items → 21000 (∼ 10300) possiblepatterns.
To be efficient algorithms exploit problem properties to reduce thesearch space I efficient but non-generic
Challenge #2
Design an efficient generic and pattern mining algorithm
Laboratoire d’Informatique de Grenoble 6
![Page 13: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/13.jpg)
Contributions of this thesis
We proposed ParaMiner: both generic and efficient algorithmHow:
• Extend theoretical work on pattern enumeration [Boley 2007]and [Arimura 2009]
• Tackle large real world datasets through dataset reduction
• Exploit parallelism for multi-core architectures
� Results
ParaMiner:
• solves various pattern mining problems
• is time-efficient (compete with ad-hoc algorithms)
Laboratoire d’Informatique de Grenoble 7
![Page 14: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/14.jpg)
Contributions of this thesis
We proposed ParaMiner: both generic and efficient algorithmHow:
• Extend theoretical work on pattern enumeration [Boley 2007]and [Arimura 2009]
• Tackle large real world datasets through dataset reduction
• Exploit parallelism for multi-core architectures
� Results
ParaMiner:
• solves various pattern mining problems
• is time-efficient (compete with ad-hoc algorithms)
Laboratoire d’Informatique de Grenoble 7
![Page 15: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/15.jpg)
Outline
Generic framework and problem statement
ParaMinerEfficient exploration of the set of candidate patternsSpeeding up candidate pattern testingParallel execution of ParaMiner
ExperimentsParallel performance evaluationComparative experiments
Conclusion and future work
Laboratoire d’Informatique de Grenoble 8
![Page 16: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/16.jpg)
Illustration: frequent itemset mining
butter eggsmilk milk milk
breadbeer
receipt 1 receipt 2 receipt 3
cereals cereals
minimum frequency threshold = 50%
• P1 ={milk} is frequent (100%): t1, t2, t3
→ closed
• P2 ={cereals} is frequent (66%): t1, t2• P3 ={milk , cereals} is frequent (66%): t1, t2
→ closed
Closed pattern: maximal pattern in a set of transactions
Laboratoire d’Informatique de Grenoble 9
![Page 17: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/17.jpg)
Illustration: frequent itemset mining
items
transactions:
butter eggsmilk milk milk
breadbeer
receipt 1 receipt 2 receipt 3
cereals cereals
t1 t2 t3
minimum frequency threshold = 50%
• P1 ={milk} is frequent (100%): t1, t2, t3
→ closed
• P2 ={cereals} is frequent (66%): t1, t2• P3 ={milk , cereals} is frequent (66%): t1, t2
→ closed
Closed pattern: maximal pattern in a set of transactions
Laboratoire d’Informatique de Grenoble 9
![Page 18: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/18.jpg)
Illustration: frequent itemset mining
items
transactions:
butter eggsmilk milk milk
breadbeer
receipt 1 receipt 2 receipt 3
cereals cereals
t1 t2 t3
minimum frequency threshold = 50%
• P1 ={milk} is frequent (100%): t1, t2, t3
→ closed
• P2 ={cereals} is frequent (66%): t1, t2• P3 ={milk , cereals} is frequent (66%): t1, t2
→ closed
Closed pattern: maximal pattern in a set of transactions
Laboratoire d’Informatique de Grenoble 9
![Page 19: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/19.jpg)
Illustration: frequent itemset mining
items
transactions:
butter eggsmilk milk milk
breadbeer
receipt 1 receipt 2 receipt 3
cereals cereals
t1 t2 t3
minimum frequency threshold = 50%
• P1 ={milk} is frequent (100%): t1, t2, t3
→ closed
• P2 ={cereals} is frequent (66%): t1, t2
• P3 ={milk , cereals} is frequent (66%): t1, t2
→ closed
Closed pattern: maximal pattern in a set of transactions
Laboratoire d’Informatique de Grenoble 9
![Page 20: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/20.jpg)
Illustration: frequent itemset mining
items
transactions:
butter eggsmilk milk milk
breadbeer
receipt 1 receipt 2 receipt 3
cereals cereals
t1 t2 t3
minimum frequency threshold = 50%
• P1 ={milk} is frequent (100%): t1, t2, t3
→ closed
• P2 ={cereals} is frequent (66%): t1, t2• P3 ={milk , cereals} is frequent (66%): t1, t2
→ closed
Closed pattern: maximal pattern in a set of transactions
Laboratoire d’Informatique de Grenoble 9
![Page 21: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/21.jpg)
Illustration: frequent itemset mining
items
transactions:
butter eggsmilk milk milk
breadbeer
receipt 1 receipt 2 receipt 3
cereals cereals
t1 t2 t3
minimum frequency threshold = 50%
• P1 ={milk} is frequent (100%): t1, t2, t3 → closed
• P2 ={cereals} is frequent (66%): t1, t2• P3 ={milk , cereals} is frequent (66%): t1, t2 → closed
Closed pattern: maximal pattern in a set of transactions
Laboratoire d’Informatique de Grenoble 9
![Page 22: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/22.jpg)
Generic framework
I Every pattern represented as a set
I A pattern mining problem defined
• A ground set E
• A dataset DE
• A selection criterion Select
Laboratoire d’Informatique de Grenoble 10
![Page 23: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/23.jpg)
Ground set
� Definition
• Set of all possible elements
• Every candidate pattern is a subset of the ground set
� Examples
Frequent itemsets I E : all possible itemsE = {milk, cereals, eggs, . . .} e.g. candidate: {milk , eggs}
Connected relational graphs I E : all the possible arcsE = {(G1,G2), (G1,G3), . . . , (G5,G4)}(Gx ,Gy ) represent the arc Gx Gy
eg. candidate:Gx
Gy Gz
Gradual itemsets I E : all the possible variationsE = {T ↑,T ↓,P ↑, . . . ,W ↓} eg. candidate: {T ↑,W ↑}
Laboratoire d’Informatique de Grenoble 11
![Page 24: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/24.jpg)
Ground set
� Definition
• Set of all possible elements
• Every candidate pattern is a subset of the ground set
� Examples
Frequent itemsets I E : all possible itemsE = {milk, cereals, eggs, . . .} e.g. candidate: {milk , eggs}
Connected relational graphs I E : all the possible arcsE = {(G1,G2), (G1,G3), . . . , (G5,G4)}(Gx ,Gy ) represent the arc Gx Gy
eg. candidate:Gx
Gy Gz
Gradual itemsets I E : all the possible variationsE = {T ↑,T ↓,P ↑, . . . ,W ↓} eg. candidate: {T ↑,W ↑}
Laboratoire d’Informatique de Grenoble 11
![Page 25: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/25.jpg)
Dataset
� Definition
• Sequence of transactions DE
• Each transaction is a subset of E
� Examples
Frequent itemsets I in DE each transaction = receipt
Relational graphs I in DE each transaction = input graph
[
G2
G1
G3 G4 ,
G2
G1
G3 G4 ] ⇒[{(G1,G2), (G3,G1), (G3,G2), (G4,G1)},{(G1,G2), (G3,G2), (G4,G1)}]
Laboratoire d’Informatique de Grenoble 12
![Page 26: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/26.jpg)
Dataset
� Definition
• Sequence of transactions DE
• Each transaction is a subset of E
� Examples
Frequent itemsets I in DE each transaction = receipt
Relational graphs I in DE each transaction = input graph
[
G2
G1
G3 G4 ,
G2
G1
G3 G4 ] ⇒[{(G1,G2), (G3,G1), (G3,G2), (G4,G1)},{(G1,G2), (G3,G2), (G4,G1)}]
Laboratoire d’Informatique de Grenoble 12
![Page 27: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/27.jpg)
Dataset
� Definition
• Sequence of transactions DE
• Each transaction is a subset of E
� Examples
Frequent itemsets I in DE each transaction = receipt
Relational graphs I in DE each transaction = input graph
[
G2
G1
G3 G4 ,
G2
G1
G3 G4 ] ⇒[{(G1,G2), (G3,G1), (G3,G2), (G4,G1)},{(G1,G2), (G3,G2), (G4,G1)}]
Laboratoire d’Informatique de Grenoble 12
![Page 28: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/28.jpg)
Dataset (2)
Gradual itemsets I in DE each transaction = variation ofattributes between pairs of records
Date T P WDay 1 17.6 1021.20 57Day 2 18.5 1021.30 56Day 3 20.4 1018.20 51
⇒
(d1, d2) [{T ↑,P ↑,W ↓}(d1, d3) {T ↑,P ↓,W ↓}(d2, d1) {T ↓,P ↓,W ↑}(d2, d3) {T ↑,P ↓,W ↓}(d3, d1) {T ↓,P ↑,W ↑}(d3, d2) {T ↓,P ↑,W ↑}]
Laboratoire d’Informatique de Grenoble 13
![Page 29: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/29.jpg)
Selection criterion
� Definition
• User-defined predicate 2E → {true, false}• Select(P,DE ) ≡ P is a pattern of interest in DE
� Examples
Frequent itemsets I Select(P,DE ) ≡ P is frequent in DE
Connected relational graphs I Select(P,DE ) ≡ P is aconnected graph and P is frequent in DE
Gradual itemsets I Select(P,DE ) ≡ there exists a path[(dx1 , dx2), . . . , (dxn−1 , dxn)] with n ≥ minsup
Laboratoire d’Informatique de Grenoble 14
![Page 30: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/30.jpg)
Selection criterion
� Definition
• User-defined predicate 2E → {true, false}• Select(P,DE ) ≡ P is a pattern of interest in DE
� Examples
Frequent itemsets I Select(P,DE ) ≡ P is frequent in DE
Connected relational graphs I Select(P,DE ) ≡ P is aconnected graph and P is frequent in DE
Gradual itemsets I Select(P,DE ) ≡ there exists a path[(dx1 , dx2), . . . , (dxn−1 , dxn)] with n ≥ minsup
Laboratoire d’Informatique de Grenoble 14
![Page 31: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/31.jpg)
General pattern mining problem
� Closed patterns
Closed pattern: Any subset P of E such that
• P occurs in DE (DE [P] 6= ∅)• P satisfies the selection criterion
• P is the maximal pattern in DE [P]
� Problem statement
Given a ground set E , a dataset DE and a selection criterionI extract all the closed patterns
Laboratoire d’Informatique de Grenoble 15
![Page 32: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/32.jpg)
Outline
Generic framework and problem statement
ParaMinerEfficient exploration of the set of candidate patternsSpeeding up candidate pattern testingParallel execution of ParaMiner
ExperimentsParallel performance evaluationComparative experiments
Conclusion and future work
Laboratoire d’Informatique de Grenoble 16
![Page 33: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/33.jpg)
The generate and test principle
� The standard approach
generate and test.
1. Generate candidate patterns
2. Test if the candidate pattern is a pattern
� . . . is not naively applicable
• Too many candidate patterns
• Each test requires costly database accesses(e.g. to count frequency)
I Challenges
• Generic strategy to explore the set of candidate patterns
• Generic method to simplify the testing
Laboratoire d’Informatique de Grenoble 17
![Page 34: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/34.jpg)
The generate and test principle
� The standard approach
generate and test.
1. Generate candidate patterns
2. Test if the candidate pattern is a pattern
� . . . is not naively applicable
• Too many candidate patterns
• Each test requires costly database accesses(e.g. to count frequency)
I Challenges
• Generic strategy to explore the set of candidate patterns
• Generic method to simplify the testing
Laboratoire d’Informatique de Grenoble 17
![Page 35: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/35.jpg)
Outline
Generic framework and problem statement
ParaMinerEfficient exploration of the set of candidate patternsSpeeding up candidate pattern testingParallel execution of ParaMiner
ExperimentsParallel performance evaluationComparative experiments
Conclusion and future work
Laboratoire d’Informatique de Grenoble 18
![Page 36: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/36.jpg)
Efficient exploration of the set of candidate patterns
� Structured search space
Most algorithms define a pattern augmentation relationI search space is explored by repeatedly augmenting patterns
ParaMiner’s augmentation relation
P and Q, two patterns:
I Q is the augmentation of P ⇔ Q = P ∪ {e}
ACE → ABCE
Set of patterns + Augmentation relation = DAG1 structure
1directed acyclic graph
Laboratoire d’Informatique de Grenoble 19
![Page 37: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/37.jpg)
Structured search space
DAG-structure of a set of patterns
⊥
A B C D E
AB AC AD AE BC BD BE CD CE DE
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
non-patternpattern
I several arcs lead to the same pattern
Laboratoire d’Informatique de Grenoble 20
![Page 38: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/38.jpg)
Discover the structured search space
� An enumeration strategy
Is required to:
• discover all the patterns
• avoid duplicates generation
Which augmentation path must be followed ?
BC
ABC
?
AB×!
I AB is the first parent of ABC
Laboratoire d’Informatique de Grenoble 21
![Page 39: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/39.jpg)
Discover the structured search space
� An enumeration strategy
Is required to:
• discover all the patterns
• avoid duplicates generation
Which augmentation path must be followed ?
BC
ABC
AB
×ABC
×!
I AB is the first parent of ABC
Laboratoire d’Informatique de Grenoble 21
![Page 40: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/40.jpg)
Discover the structured search space
� An enumeration strategy
Is required to:
• discover all the patterns
• avoid duplicates generation
Which augmentation path must be followed ?
BC
ABC
AB×!
I BC is the first parent of ABC
Laboratoire d’Informatique de Grenoble 21
![Page 41: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/41.jpg)
Enumeration strategy
Follow the augmentation P → Q if:
first parent( Q ) = P
DAG-structure following a tree search
⊥
A B C D E
AB AC AD AE BC BD BE CD CE DE
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
non-patternsPattern
Laboratoire d’Informatique de Grenoble 22
![Page 42: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/42.jpg)
Enumeration strategy
Follow the augmentation P → Q if:
first parent( Q ) = P
DAG-structure following a tree search
⊥
A B C D E
AB AC AD AE BC BD BE CD CE DE
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
non-patternsPattern
Laboratoire d’Informatique de Grenoble 22
![Page 43: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/43.jpg)
How to compute the first parent of a pattern
� Requirement
A method that is:
• GenericI Sound for all our pattern mining problems
• Adequate to parallel exploration of the search spaceI Computable without global computations
� State of the art:
Theoretical work closed pattern enumeration strategies
• [Arimura et al. 2009]
• [Boley et al. 2010]
I Poly-space enumeration strategiesI first parent do not rely on a global representation of the
search space
Laboratoire d’Informatique de Grenoble 23
![Page 44: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/44.jpg)
How to compute the first parent of a pattern
� Requirement
A method that is:
• GenericI Sound for all our pattern mining problems
• Adequate to parallel exploration of the search spaceI Computable without global computations
� State of the art:
Theoretical work closed pattern enumeration strategies
• [Arimura et al. 2009]
• [Boley et al. 2010]
I Poly-space enumeration strategiesI first parent do not rely on a global representation of the
search space
Laboratoire d’Informatique de Grenoble 23
![Page 45: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/45.jpg)
Structural properties of the set of patterns
[Boley 2010] Strong accessibility:Idea: We can reach every pattern by augmenting any subset thatis a pattern
We can detect first parent by memorizing only the root of theforking branches:
⊥
D
DA DE
DEB DEF
?
...
...
...
EL = {∅}
EL = {A}
EL = {A,B}
• Branch can be explored independently → Parallel
• FIM, CRG and GRI satisfy this property (proved) → Generic
Laboratoire d’Informatique de Grenoble 24
![Page 46: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/46.jpg)
Structural properties of the set of patterns
[Boley 2010] Strong accessibility:Idea: We can reach every pattern by augmenting any subset thatis a patternWe can detect first parent by memorizing only the root of theforking branches:
⊥
D
DA DE
DEB DEF
?
...
...
...
EL = {∅}
EL = {A}
EL = {A,B}
• Branch can be explored independently → Parallel
• FIM, CRG and GRI satisfy this property (proved) → Generic
Laboratoire d’Informatique de Grenoble 24
![Page 47: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/47.jpg)
Conclusion on pattern enumeration
• Pattern augmentation
• Structured search space
• Enumeration strategy required
• Problem of detecting the first parent
• Strong accessibility property
• Exclusion list
I generic and parallelizable enumeration strategy
Next: Identifying patterns
Laboratoire d’Informatique de Grenoble 25
![Page 48: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/48.jpg)
Outline
Generic framework and problem statement
ParaMinerEfficient exploration of the set of candidate patternsSpeeding up candidate pattern testingParallel execution of ParaMiner
ExperimentsParallel performance evaluationComparative experiments
Conclusion and future work
Laboratoire d’Informatique de Grenoble 26
![Page 49: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/49.jpg)
Simplifying candidate pattern testing
� Candidate pattern testing
• Check candidate pattern occurrence in the dataset
• Computing the selection criterion
• Computing pattern closure
I Intensive access to the dataset
Problem: How to reduce dataset accesses ?I dataset reduction [Han 2000, Uno 2004]
Laboratoire d’Informatique de Grenoble 27
![Page 50: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/50.jpg)
Dataset reduction: principle
Motivation: Every element in the dataset are not required to testeach candidate pattern
Principle:
1. build a dataset for each new pattern P
2. filter unnecessary elements to P and its descendants
3. use it to test P and its descendants
In ParaMiner : EL-based filter I generic
Laboratoire d’Informatique de Grenoble 28
![Page 51: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/51.jpg)
EL-based reduction (Proved)
Dataset:DE
Pattern:P = {D, E}EL:EL = {A, B, C}
I DreducedP ?
DE
∈ EL P augm. (/∈ EL)
t1 A, B, D, E, F, G,
}P1
t2 A, D, D, E, F, G,t3 A, B, D, E, F, H,
}P2
⇓Dreduced
P
⋂= =
t ′1 ⊕ t ′2 A, D, E, F, G,t ′3 A, B, D, E, F, H,
• Element required: Any e that can belongs to a closed patternincluding P I ensure that first parent is sound
How to detect them:
• grouping transactions per augmentations supported
• Apply reduction
Laboratoire d’Informatique de Grenoble 29
![Page 52: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/52.jpg)
EL-based reduction (Proved)
Dataset:DE
Pattern:P = {D, E}EL:EL = {A, B, C}
I DreducedP ?
DE ∈ EL P augm. (/∈ EL)
t1 A, B, D, E, F, G,
}P1
t2 A, D, D, E, F, G,t3 A, B, D, E, F, H,
}P2
⇓Dreduced
P
⋂= =
t ′1 ⊕ t ′2 A, D, E, F, G,t ′3 A, B, D, E, F, H,
• Element required: Any e that can belongs to a closed patternincluding P I ensure that first parent is sound
How to detect them:
• grouping transactions per augmentations supported
• Apply reduction
Laboratoire d’Informatique de Grenoble 29
![Page 53: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/53.jpg)
EL-based reduction (Proved)
Dataset:DE
Pattern:P = {D, E}EL:EL = {A, B, C}
I DreducedP ?
DE ∈ EL P augm. (/∈ EL)
t1 A, B, D, E, F, G,
}P1
t2 A, D, D, E, F, G,t3 A, B, D, E, F, H,
}P2
⇓Dreduced
P
⋂= =
t ′1 ⊕ t ′2 A, D, E, F, G,t ′3 A, B, D, E, F, H,
• Element required: Any e that can belongs to a closed patternincluding P I ensure that first parent is sound
How to detect them:
• grouping transactions per augmentations supported
• Apply reduction
Laboratoire d’Informatique de Grenoble 29
![Page 54: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/54.jpg)
EL-based reduction (Proved)
Dataset:DE
Pattern:P = {D, E}EL:EL = {A, B, C}
I DreducedP ?
DE ∈ EL P augm. (/∈ EL)
t1 A, B, D, E, F, G,}P1t2 A, D, D, E, F, G,
t3 A, B, D, E, F, H, }P2
⇓Dreduced
P
⋂= =
t ′1 ⊕ t ′2 A, D, E, F, G,t ′3 A, B, D, E, F, H,
• Element required: Any e that can belongs to a closed patternincluding P I ensure that first parent is sound
How to detect them:
• grouping transactions per augmentations supported
• Apply reduction
Laboratoire d’Informatique de Grenoble 29
![Page 55: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/55.jpg)
EL-based reduction (Proved)
Dataset:DE
Pattern:P = {D, E}EL:EL = {A, B, C}
I DreducedP ?
DE ∈ EL P augm. (/∈ EL)
t1 A, B, D, E, F, G,}P1t2 A, D, D, E, F, G,
t3 A, B, D, E, F, H, }P2
⇓Dreduced
P
⋂= =
t ′1 ⊕ t ′2 A, D, E, F, G,t ′3 A, B, D, E, F, H,
• Element required: Any e that can belongs to a closed patternincluding P I ensure that first parent is sound
How to detect them:
• grouping transactions per augmentations supported
• Apply reduction
Laboratoire d’Informatique de Grenoble 29
![Page 56: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/56.jpg)
Evaluation of dataset reduction
� Metric
reduction factor =|DE |
|DreducedP |
average reduction factor =
∑P∈C |DE |/|Dreduced
P ||C|
� Frequent itemset mining, mushroom dataset
dataset name ground set size # transactions dataset size
Mushroom 119 8,124 186,852
Laboratoire d’Informatique de Grenoble 30
![Page 57: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/57.jpg)
Evaluation of dataset reduction
� Metric
reduction factor =|DE |
|DreducedP |
average reduction factor =
∑P∈C |DE |/|Dreduced
P ||C|
0
50
100
150
200
250
300
350
20 30 40 50 60 70 80 90
Red
uctio
n fa
ctor
Relative frequency threshold (%)
Problem: FIM, dataset: Mushroom
Reduction factor
0.1
1
10
100
1000
10000
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
time
(s, l
ogsc
ale)
Relative frequency threshold (%)
Problem: FIM, dataset: Mushroom
PARAMINERNODSRPARAMINER
Laboratoire d’Informatique de Grenoble 30
![Page 58: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/58.jpg)
ParaMiner: algorithm
1: procedure expand(P, DreducedP , EL)
2: for all e such that e occurs in DreducedP do
3: if Select(P ∪ {e},DreducedP ) then
4: Q ← Clo(P ∪ {e},DreducedP )
5: if is first parent(P,EL,Q) then6: output Q7: Dreduced
Q ← reduce(DreducedP , e,EL)
8: spawn expand(Q,DreducedQ ,EL)
9: EL← EL ∪ {e}10: end if11: end if12: end for13: end procedure
Laboratoire d’Informatique de Grenoble 31
![Page 59: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/59.jpg)
ParaMiner: main achievements
• Proved sound and complete under some constraints:I the set of patterns satisfying the strong accessibility propertyI the selection criterion is decomposable
Problems FIM, CRG and GRI satisfy these properties
• Parallelized: space exploration is divided into independenttasksI non-trivial for closed pattern mining
• Melinda a parallel execution engine for data-drivenalgorithms
I can be extended with task scheduling strategiesI execute tasks spawned by ParaMinerI also used in PLCM [Negrevergne 2010], PGLCM [Do 2010]
Laboratoire d’Informatique de Grenoble 32
![Page 60: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/60.jpg)
ParaMiner: main achievements
• Proved sound and complete under some constraints:I the set of patterns satisfying the strong accessibility propertyI the selection criterion is decomposable
Problems FIM, CRG and GRI satisfy these properties
• Parallelized: space exploration is divided into independenttasksI non-trivial for closed pattern mining
• Melinda a parallel execution engine for data-drivenalgorithms
I can be extended with task scheduling strategiesI execute tasks spawned by ParaMinerI also used in PLCM [Negrevergne 2010], PGLCM [Do 2010]
Laboratoire d’Informatique de Grenoble 32
![Page 61: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/61.jpg)
Outline
Generic framework and problem statement
ParaMinerEfficient exploration of the set of candidate patternsSpeeding up candidate pattern testingParallel execution of ParaMiner
ExperimentsParallel performance evaluationComparative experiments
Conclusion and future work
Laboratoire d’Informatique de Grenoble 33
![Page 62: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/62.jpg)
Melinda
� Melinda: Main concepts
• Tuples (A,B,C ) I block of data with a fixed structure
• Tuple Space I synchronized memory spaceI can contain tuplesI supports put() and get()
� Parallelizing ParaMiner with Melinda
ParaMiner parallelization scheme: A new tuple for each recursivecallTuple structure: Parameters of recursive calls(Pattern,Reduce dataset,Exclusion list)
• Tuples are produced when a pattern is discovered
• Tuples are consumed when a core is idle
Laboratoire d’Informatique de Grenoble 34
![Page 63: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/63.jpg)
Melinda
� Melinda: Main concepts
• Tuples (A,B,C ) I block of data with a fixed structure
• Tuple Space I synchronized memory spaceI can contain tuplesI supports put() and get()
� Parallelizing ParaMiner with Melinda
ParaMiner parallelization scheme: A new tuple for each recursivecallTuple structure: Parameters of recursive calls(Pattern,Reduce dataset,Exclusion list)
• Tuples are produced when a pattern is discovered
• Tuples are consumed when a core is idle
Laboratoire d’Informatique de Grenoble 34
![Page 64: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/64.jpg)
Parallel execution
⊥
a
ab
abdabe
ad
bcd
df
def
e
Core #1
Core #2
Core #3
Tuple Space
Laboratoire d’Informatique de Grenoble 35
![Page 65: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/65.jpg)
Parallel execution
⊥
a
ab
abdabe
ad
bcd
df
def
e
Core #1
Core #2
Core #3
Tuple Space
Laboratoire d’Informatique de Grenoble 35
![Page 66: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/66.jpg)
Parallel execution
⊥
a
ab
abdabe
ad
bcd
df
def
e
Core #1
Core #2
Core #3
Tuple Space
Laboratoire d’Informatique de Grenoble 35
![Page 67: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/67.jpg)
Parallel execution
⊥
a
ab
abdabe
ad
bcd
df
def
e
Core #1
Core #2
Core #3
Tuple Space
Laboratoire d’Informatique de Grenoble 35
![Page 68: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/68.jpg)
Parallel execution
⊥
a
ab
abdabe
ad
bcd
df
def
e
Core #1
Core #2
Core #3
Tuple Space
Laboratoire d’Informatique de Grenoble 35
![Page 69: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/69.jpg)
Parallel execution
⊥
a
ab
abdabe
ad
bcd
df
def
e
Core #1
Core #2
Core #3
Tuple Space
Laboratoire d’Informatique de Grenoble 35
![Page 70: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/70.jpg)
Parallel execution
⊥
a
ab
abdabe
ad
bcd
df
def
e
Core #1
Core #2
Core #3
Tuple Space
Laboratoire d’Informatique de Grenoble 35
![Page 71: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/71.jpg)
Parallel execution
⊥
a
ab
abdabe
ad
bcd
df
def
e
Core #1
Core #2
Core #3
Tuple Space
Laboratoire d’Informatique de Grenoble 35
![Page 72: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/72.jpg)
Outline
Generic framework and problem statement
ParaMinerEfficient exploration of the set of candidate patternsSpeeding up candidate pattern testingParallel execution of ParaMiner
ExperimentsParallel performance evaluationComparative experiments
Conclusion and future work
Laboratoire d’Informatique de Grenoble 36
![Page 73: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/73.jpg)
Outline
Generic framework and problem statement
ParaMinerEfficient exploration of the set of candidate patternsSpeeding up candidate pattern testingParallel execution of ParaMiner
ExperimentsParallel performance evaluationComparative experiments
Conclusion and future work
Laboratoire d’Informatique de Grenoble 37
![Page 74: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/74.jpg)
Parallel performance evaluation
� ParaMiner instances
• frequent itemset mining• frequent relational graph mining• gradual itemset mining
� Computing platforms
Laptop:
• 4-cores (4 core i7)
• 8 GB memory
Server:
• 32-cores (4 core i7 witheach core each)
• 64 GB memory
� Metric
speedup =time using one core
time using n coresLaboratoire d’Informatique de Grenoble 38
![Page 75: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/75.jpg)
Speedup measurements
� 1. Laptop
I speedup on 4 cores: 3 to 4
Laboratoire d’Informatique de Grenoble 39
![Page 76: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/76.jpg)
Speedup measurements
� 2. Server
Laboratoire d’Informatique de Grenoble 39
![Page 77: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/77.jpg)
Performance analysis
In [Buerhrer 2006, Tatikonda 2008], authors exhibit two importantissues:
• load imbalance
• memory issues
� Bus contention
Memory bus: connect cores to memory
too many memory access I subject to contention
Core #1
Core #2
Core #3
Memory
Laboratoire d’Informatique de Grenoble 40
![Page 78: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/78.jpg)
Performance analysis
In [Buerhrer 2006, Tatikonda 2008], authors exhibit two importantissues:
• load imbalance
• memory issues
� Bus contention
Memory bus: connect cores to memory
too many memory access I subject to contention
Core #1
Core #2
Core #3
Memory
Laboratoire d’Informatique de Grenoble 40
![Page 79: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/79.jpg)
Performance analysis
In [Buerhrer 2006, Tatikonda 2008], authors exhibit two importantissues:
• load imbalance
• memory issues
� Bus contention
Memory bus: connect cores to memorytoo many memory access I subject to contention
Core #1
Core #2
Core #3
Memory!
Laboratoire d’Informatique de Grenoble 40
![Page 80: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/80.jpg)
Bus contention in ParaMiner
� % hi-lat. memory operation VS # of cores
gradual itemset mining(good speedup)
100
95
90
85 1 2 4 8 16 32%
of m
emor
y op
erat
ions
#cores
frequent itemset mining(bounded speedup)
100
95
90
85 1 2 4 8 16 32%
of m
emor
y op
erat
ions
#cores
dark slices = high latencies memory operations
• GRI: more cores I % hi-lat. memory operations remainsconstant
• FIM: more cores I increasing % of hi-lat. memory operations
Laboratoire d’Informatique de Grenoble 41
![Page 81: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/81.jpg)
Melinda’s approach
[Tatikonda 2008] proposed architecture conscious pattern miningalgorithm
Requires deep modifications in the algorithm
Inadequate to ParaMiner:I could penalize executions that perform well
� Melinda’s tuple distribution strategies
• distribute tuples to available cores
• user-definedI can exploit algorithmic level information (in tuples)I can exploit architectural level information (available in
Melinda)
Laboratoire d’Informatique de Grenoble 42
![Page 82: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/82.jpg)
Tuple distribution strategies
� Structured tuplespace
Tuple space is divided into internalsI Internals can be used to classify tuples in the tuple space
� Defining new strategies
• distribute() defines in which internal to put the tuple
• retrieve() from which internal the tuple has to be taken
I supports heavy usage
Laboratoire d’Informatique de Grenoble 43
![Page 83: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/83.jpg)
Architecture conscious tuple distribution strategy
1. Distribute: Tuples sharing datastructures go in the sameinternal
2. Retrieve: Cores of the same processor retrieve tuples fromthe same internal
I Improve cache usage and reduce bus contention
� Frequent itemset mining, BMS-WebView/Accidentsdataset
dataset # items # transactions dataset size Density (%)BMS-WebView-2 3,340 77,512 320,601 0.14Accidents 468 340,184 11,500,870 7.22
Laboratoire d’Informatique de Grenoble 44
![Page 84: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/84.jpg)
Architecture conscious tuple distribution strategy
1. Distribute: Tuples sharing datastructures go in the sameinternal
2. Retrieve: Cores of the same processor retrieve tuples fromthe same internal
I Improve cache usage and reduce bus contention
� Frequent itemset mining, BMS-WebView/Accidentsdataset
dataset # items # transactions dataset size Density (%)BMS-WebView-2 3,340 77,512 320,601 0.14Accidents 468 340,184 11,500,870 7.22
Laboratoire d’Informatique de Grenoble 44
![Page 85: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/85.jpg)
Architecture conscious tuple distribution strategy
1. Distribute: Tuples sharing datastructures go in the sameinternal
2. Retrieve: Cores of the same processor retrieve tuples fromthe same internal
1
2
3
4
5
6
7
8
9
10
0 4 8 12 16 20 24 28 32
spee
dup
#cores
SdefaultSarch
y=x
1
2
3
4
5
6
7
8
9
10
0 4 8 12 16 20 24 28 32sp
eedu
p
#cores
SdefaultSarch
y=x
No source code modification I 30% performance gain
Laboratoire d’Informatique de Grenoble 44
![Page 86: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/86.jpg)
Outline
Generic framework and problem statement
ParaMinerEfficient exploration of the set of candidate patternsSpeeding up candidate pattern testingParallel execution of ParaMiner
ExperimentsParallel performance evaluationComparative experiments
Conclusion and future work
Laboratoire d’Informatique de Grenoble 45
![Page 87: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/87.jpg)
Comparative experiments: frequent itemset mining
� Algorithms
• PLCM [Negrevergne 2010] parallel implementation of LCM[Uno 2004] FIMI’04 Award
• MT-Closed [Lucchese 2007] parallel implementation ofDCI-Closed [Lucchese 2004]
� Datasets
dataset # items # transactions dataset size Density (%)BMS-WebView-2 3,340 77,512 320,601 0.14Accidents 468 340,184 11,500,870 7.22
density(DE ) =#items ×#transactions
|DE |(×100)
Laboratoire d’Informatique de Grenoble 46
![Page 88: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/88.jpg)
Comparative experiments frequent itemset mining (2)
� BMS-WebView-2 (sparse)
0.1
1
10
100
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
time
(s, l
ogsc
ale)
Relative frequency threshold (%)
Problem: FIM, dataset: BMS-WebView-2
ParaMiner 32 coresPLCM 32 coresMT-Closed 32 cores
Laboratoire d’Informatique de Grenoble 47
![Page 89: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/89.jpg)
Comparative experiments frequent itemset mining (2)
� Accidents (dense)
1
10
100
1000
10 20 30 40 50 60 70 80 90
time
(s, l
ogsc
ale)
Relative frequency threshold (%)
Problem: FIM, dataset: Accidents
ParaMiner 32 coresPLCM 32 coresMT-Closed 32 cores
Laboratoire d’Informatique de Grenoble 47
![Page 90: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/90.jpg)
Comparative experiments: gradual itemset mining
� Algorithms
• PGLCM [Do 2010]
• PGP-mc (non-closed) [Laurent 2010]
� Datasets
dataset name ground set size # transactions dataset size Density (%)I4408 8824 11,556 50,985,072 50
Laboratoire d’Informatique de Grenoble 48
![Page 91: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/91.jpg)
Comparative experiments gradual itemset mining (2)
� I4408 (Real)
100
1000
10000
100000
70 75 80 85 90 95
time
(s, l
ogsc
ale)
Relative frequency threshold (%)
Gradual Itemset Mining : I4408
ParaMiner 32 coresPGLCM
Laboratoire d’Informatique de Grenoble 49
![Page 92: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/92.jpg)
Outline
Generic framework and problem statement
ParaMinerEfficient exploration of the set of candidate patternsSpeeding up candidate pattern testingParallel execution of ParaMiner
ExperimentsParallel performance evaluationComparative experiments
Conclusion and future work
Laboratoire d’Informatique de Grenoble 50
![Page 93: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/93.jpg)
Conclusion
• ParaMinerI based on state of the art on pattern enumerationI useful to parallel pattern mining
I designed an efficient generic dataset reductionI exploit multi-core architectures
• MelindaI parallel engine of ParaMiner (and other algorithms)I extensible through strategies
� Novelty
• ParaMiner: an efficient solution for new pattern miningproblemsNew pattern mining problem can benefit from 15 years of research
• ParaMiner/Melinda: tool to learn about parallel patternmining
Laboratoire d’Informatique de Grenoble 51
![Page 94: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/94.jpg)
Conclusion
• ParaMinerI based on state of the art on pattern enumerationI useful to parallel pattern mining
I designed an efficient generic dataset reductionI exploit multi-core architectures
• MelindaI parallel engine of ParaMiner (and other algorithms)I extensible through strategies
� Novelty
• ParaMiner: an efficient solution for new pattern miningproblemsNew pattern mining problem can benefit from 15 years of research
• ParaMiner/Melinda: tool to learn about parallel patternmining
Laboratoire d’Informatique de Grenoble 51
![Page 95: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/95.jpg)
Future works
� Extend ParaMiner’s genericity
• Generalize strong accessibility properties to other structuresI handle sequences, and general graphs
• Study partial strong accessibility of set of patternsI bridge with constraint pattern mining
� Extend ParaMiner/Melinda’s efficiency
• Improve Melinda’s strategies I More flexible and expressivequerying tuples
• Target larger computing platforms I cluster of computers
Laboratoire d’Informatique de Grenoble 52
![Page 96: A Parallel Pattern Mining Algorithm for Multi-Core …...Frequent itemset mining Apriori Eclat FP-Growth DCI-Closed LCM [Agrawal 94] [Zaki 1997] [Han 2004] [Lucchese 2004] [Uno 2004]](https://reader035.vdocuments.net/reader035/viewer/2022071412/6109bbf36d57bd106934d5cd/html5/thumbnails/96.jpg)
Thank you !
Laboratoire d’Informatique de Grenoble 53