a new method for vertical parallelisation of tan learning based on balanced incomplete block designs
TRANSCRIPT
A New Method for Vertical Parallelisation of TAN Learning Based on
Balanced Incomplete Block Designs
Anders L. Madsen12 Frank Jensen1 Antonio Salmerón3 MartinKarlsen1 Helge Langseth4 Thomas D. Nielsen2
HUGIN EXPERT A/S, Aalborg, Denmark
Department of Computer Science, Aalborg University, Denmark
Department of Mathematics, University of Almería , Spain
Department of Computer and Information Science, Norwegian University of Science andTechnology, Norway
PGM, Sept 17-19, 2014 1
Outline
1 Introduction
2 Learning TAN Using BIB Designs
3 Experimental Analysis
4 Conclusion & Future Work
PGM, Sept 17-19, 2014 2
Introduction
A Bayesian network (BN) is an excellent knowledge representation,but learning a BN from data can be a challenge
I Data sets are increasing in size w.r.t. both variables and casesI The computational power of computers is increasing
Research ProblemLearning the structure of a Tree-Augmented Naive Bayes modelfrom massive amounts of data in parallel
This work is performed as part of the AMIDST project (no. 619209)
PGM, Sept 17-19, 2014 3
Learning a TAN model
Let F be a set of features and C be the class variable
C
F1 F2 · · · Fn
The structure of a TAN can be constructed as ([11] and [3]):1. Compute mutual information I (Fi ,Fj |C ) for each pair, i 6= j .2. Build a complete graph G over F with edges annotated by
I (Fi ,Fj |C ).3. Build a maximal spanning tree T from G .4. Select a vertex and recursively direct edges outward from it.5. Add C as parent of each F ∈ F .
PGM, Sept 17-19, 2014 4
Parallel TAN learning
I Step 1 (scoring all pairs w.r.t. MI) is the most time-consumingelement
I Both horizontal and vertical parallelization of learning ispossible
I Data should be distributed, we assume each variable has itsown file
ChallengeGiven a set of features F we need to distribute MI-computationssuch that each pair is scored exactly once
We use Balanced Incomplete Block (BIB) Designs to achievevertical parallelisation
PGM, Sept 17-19, 2014 5
Balanced Incomplete Block (BIB) Designs[23]
Definition (Design)A design is a pair (X ,A) s. t. the following properties are satisfied:1. X is a set of elements called points, and2. A is a collection of nonempty subsets of X called blocks.
Definition (BIB design)Let v , k and λ be positive integers s. t. v > k ≥ 2. A (v , k , λ)-BIBdesign is a design (X ,A) s. t. the following properties are satisfied:1. |X | = v,2. each block contains exactly k points, and3. every pair of distinct points is contained in exactly λ blocks.
If v = b where b = |A|, then the design is symmetricPGM, Sept 17-19, 2014 6
BIB Design Intuition
Let F be the features with |F| = 7 and let (v = 7, k = 3, λ = 1)be a symmetric BIB design, i.e., v = b, with blocks
{013}, {124}, {235}, {346}, {450}, {561}, {602}.
We observeI λ = 1 - we compute mutual information exactly onceI a symmetric BIB design, i.e., v = b
I k = 3 is the number of (subset) of features in each blockWe take the following approach
I create a process for each block to compute mutual informationI a point p represents Fp when |F| > v .I each process generates its block based on its rank
PGM, Sept 17-19, 2014 7
Difference Sets for Symmetric BIB designs
BIB design Difference set k/v
(3,2,1) (0,1) 0.67(7,3,1) (0,1,3) 0.43(13,4,1) (0,1,3,9) 0.31(21,5,1) (0,1,4,14,16) 0.24(31,6,1) (0,1,3,8,12,18) 0.19(57,8,1) (0,1,3,13,32,36,43,52) 0.14(73,9,1) (0,1,3,7,15,31,36,54,63) 0.12(91,10,1) (0,1,3,9,27,49,56,61,77,81) 0.11(133,12,1) (0,9,10,12,26,30,67,74,82,109,114,120) 0.09(183,14,1) (0,12,19,20,22,43,60,71,76,85,89,115,121,168) 0.08
Difference sets for (v , k, λ)-BIB designs where v is the number of points and k is the
number of points in a block
PGM, Sept 17-19, 2014 8
BIB Design and TAN Learning
All processes should compute the same number of I (Fi ,Fj |C )scores
I There is(n2
)= n(n − 1)/2 pairs where n = |F|
I We assume each variable is in one fileIf we have p processes and each process reads data from kvariables, then the following must be satisfied
p
(k
2
)≥
(n
2
).
Solving for k produces k ≥ n/√
p, and equality holds only whenp = 1.
PGM, Sept 17-19, 2014 9
BIB Design Example
Consider the (7, 3, 1)-BIB design and assume |F| = 14I Each point represents two featuresI Each process is assigned six features
The seven blocks (b = 7) are:
{013}, {124}, {235}, {346}, {450}, {561}, {602}
The pairwise scoring is performed as
p = 1
0 1 3
X1 X2 X3 X4 X7 X8
x1 x2 x3 x4 x7 x8
· · ·
p = 7
6 0 2
X13 X14 X1 X2 X5 X6
x13 x14 x1 x2 x5 x6
Intra
Inter
PGM, Sept 17-19, 2014 10
Experimental Setup
I Software implementation based on HUGIN software and usingMessage Passing Interface (MPI)
I Client-server architecture; a master and worker processesI Average run time (wall-clock) is computed over ten runs of the
algorithm on each dataset with specific number of processesI Clock starts before reading data and ends after all results are
received by master process
data set |X | N
Munin1 189 750,000Munin2 1,003 750,000Bank 1,823 1,140,000
PGM, Sept 17-19, 2014 11
Three Different Test Computers
1. Ubuntu: a linux server running - 1 Intel Xeon(TM) E3-1270Processor (4 cores) and 32 GB RAM
2. Fyrkat: cluster with 80 nodes - each has 2 Intel Xeon (TM)X5260 Processors (2 cores) and 16GB RAM
3. Vilje: cluster with 1404 nodes - each has 2 Intel Xeon E5-2670Processors (8 cores) and 32GB
Test computers have different resource management systemsFor 2. and 3. we allocate one worker node for each process
PGM, Sept 17-19, 2014 12
Ubuntu - Linux Server One Node
0
5
10
15
20
25
30
35
40
45
50
0 5 10 15 20 25
Av
erag
e ru
n t
ime
in s
eco
nd
s
Number of processors
Munin1 on Ubuntu
0
100
200
300
400
500
600
700
800
900
1000
0 5 10 15 20 25
Av
erag
e ru
n t
ime
in s
eco
nd
s
Number of processors
Munin2 on Ubuntu
Ubuntu: A linux server running Ubuntu - Intel Xeon(TM) E3-1270 Processor (4 cores) and 32 GB RAM
PGM, Sept 17-19, 2014 13
Fyrkat - Cluster 80 Nodes
0
200
400
600
800
1000
1200
1400
0 5 10 15 20 25 30 35
Av
erag
e ru
n t
ime
in s
eco
nd
s
Number of processors
Munin2 on Fyrkat
0
1000
2000
3000
4000
5000
6000
7000
0 5 10 15 20 25 30 35
Av
erag
e ru
n t
ime
in s
eco
nd
s
Number of processors
Bank on Fyrkat
Fyrkat: 80 nodes - each has 2 Intel Xeon (TM) X5260 Processors (2 cores) and 16GB RAM
PGM, Sept 17-19, 2014 14
Vilje - Cluster 1404 Nodes
0
200
400
600
800
1000
1200
1400
1600
1800
0 10 20 30 40 50 60 70 80 90 100
Av
erag
e ru
n t
ime
in s
eco
nd
s
Number of processors
Munin2 on Vilje
0
1000
2000
3000
4000
5000
6000
7000
0 20 40 60 80 100 120 140 160 180 200
Av
erag
e ru
n t
ime
in s
eco
nd
s
Number of processors
Bank on Vilje
Vilje: 1404 nodes - each has 2 Intel Xeon E5-2670 Processors (8 cores) and 32GB
PGM, Sept 17-19, 2014 15
Speedup Factors - Vilje & Bank
Relative performance compared to case of using one processor
0
5
10
15
20
25
0 20 40 60 80 100 120 140 160 180 200
Sp
eed
up
fac
tor
Number of processors
Time
Files
Optimal
Time speed-up factors for reading data
Files theoretical number of files read byeach process (k/v · |X |)
Optimal square root of the number ofprocessors
0
20
40
60
80
100
120
140
160
180
0 20 40 60 80 100 120 140 160 180 200
Sp
eed
up
fac
tor
Number of processors
Score
Score speed-up factor for computingmutual information
PGM, Sept 17-19, 2014 16
Conclusion & Future Work
ConclusionI Introduced a new algorithm for learning TAN structure using
parallel computingI Empirical evaluation shows that significant performance
improvements are possible
Future WorkI Apply principles to structure learning in general <> TANI Multithread implementation <> MultiprocessI Horizontal parallelization <> Vertical parallelization
PGM, Sept 17-19, 2014 17
This project has received funding from the European Union’sSeventh Framework Programme for research, technologicaldevelopment and demonstration under grant agreement no 619209
PGM, Sept 17-19, 2014 18