a new method for vertical parallelisation of tan learning based on balanced incomplete block designs

A New Method for Vertical Parallelisation of TAN Learning Based on

Balanced Incomplete Block Designs

Anders L. Madsen12 Frank Jensen1 Antonio Salmerón3 MartinKarlsen1 Helge Langseth4 Thomas D. Nielsen2

HUGIN EXPERT A/S, Aalborg, Denmark

Department of Computer Science, Aalborg University, Denmark

Department of Mathematics, University of Almería , Spain

Department of Computer and Information Science, Norwegian University of Science andTechnology, Norway

PGM, Sept 17-19, 2014 1

Outline

1 Introduction

2 Learning TAN Using BIB Designs

3 Experimental Analysis

4 Conclusion & Future Work

PGM, Sept 17-19, 2014 2

Introduction

A Bayesian network (BN) is an excellent knowledge representation,but learning a BN from data can be a challenge

I Data sets are increasing in size w.r.t. both variables and casesI The computational power of computers is increasing

Research ProblemLearning the structure of a Tree-Augmented Naive Bayes modelfrom massive amounts of data in parallel

This work is performed as part of the AMIDST project (no. 619209)

PGM, Sept 17-19, 2014 3

Learning a TAN model

Let F be a set of features and C be the class variable

C

F1 F2 · · · Fn

The structure of a TAN can be constructed as ([11] and [3]):1. Compute mutual information I (Fi ,Fj |C ) for each pair, i 6= j .2. Build a complete graph G over F with edges annotated by

I (Fi ,Fj |C ).3. Build a maximal spanning tree T from G .4. Select a vertex and recursively direct edges outward from it.5. Add C as parent of each F ∈ F .

PGM, Sept 17-19, 2014 4

Parallel TAN learning

I Step 1 (scoring all pairs w.r.t. MI) is the most time-consumingelement

I Both horizontal and vertical parallelization of learning ispossible

I Data should be distributed, we assume each variable has itsown file

ChallengeGiven a set of features F we need to distribute MI-computationssuch that each pair is scored exactly once

We use Balanced Incomplete Block (BIB) Designs to achievevertical parallelisation

PGM, Sept 17-19, 2014 5

Balanced Incomplete Block (BIB) Designs[23]

Definition (Design)A design is a pair (X ,A) s. t. the following properties are satisfied:1. X is a set of elements called points, and2. A is a collection of nonempty subsets of X called blocks.

Definition (BIB design)Let v , k and λ be positive integers s. t. v > k ≥ 2. A (v , k , λ)-BIBdesign is a design (X ,A) s. t. the following properties are satisfied:1. |X | = v,2. each block contains exactly k points, and3. every pair of distinct points is contained in exactly λ blocks.

If v = b where b = |A|, then the design is symmetricPGM, Sept 17-19, 2014 6

BIB Design Intuition

Let F be the features with |F| = 7 and let (v = 7, k = 3, λ = 1)be a symmetric BIB design, i.e., v = b, with blocks

{013}, {124}, {235}, {346}, {450}, {561}, {602}.

We observeI λ = 1 - we compute mutual information exactly onceI a symmetric BIB design, i.e., v = b

I k = 3 is the number of (subset) of features in each blockWe take the following approach

I create a process for each block to compute mutual informationI a point p represents Fp when |F| > v .I each process generates its block based on its rank

PGM, Sept 17-19, 2014 7

Difference Sets for Symmetric BIB designs

BIB design Difference set k/v

(3,2,1) (0,1) 0.67(7,3,1) (0,1,3) 0.43(13,4,1) (0,1,3,9) 0.31(21,5,1) (0,1,4,14,16) 0.24(31,6,1) (0,1,3,8,12,18) 0.19(57,8,1) (0,1,3,13,32,36,43,52) 0.14(73,9,1) (0,1,3,7,15,31,36,54,63) 0.12(91,10,1) (0,1,3,9,27,49,56,61,77,81) 0.11(133,12,1) (0,9,10,12,26,30,67,74,82,109,114,120) 0.09(183,14,1) (0,12,19,20,22,43,60,71,76,85,89,115,121,168) 0.08

Difference sets for (v , k, λ)-BIB designs where v is the number of points and k is the

number of points in a block

PGM, Sept 17-19, 2014 8

BIB Design and TAN Learning

All processes should compute the same number of I (Fi ,Fj |C )scores

I There is(n2

)= n(n − 1)/2 pairs where n = |F|

I We assume each variable is in one fileIf we have p processes and each process reads data from kvariables, then the following must be satisfied

p

(k

2

)≥

(n

2

).

Solving for k produces k ≥ n/√

p, and equality holds only whenp = 1.

PGM, Sept 17-19, 2014 9

BIB Design Example

Consider the (7, 3, 1)-BIB design and assume |F| = 14I Each point represents two featuresI Each process is assigned six features

The seven blocks (b = 7) are:

{013}, {124}, {235}, {346}, {450}, {561}, {602}

The pairwise scoring is performed as

p = 1

0 1 3

X1 X2 X3 X4 X7 X8

x1 x2 x3 x4 x7 x8

· · ·

p = 7

6 0 2

X13 X14 X1 X2 X5 X6

x13 x14 x1 x2 x5 x6

Intra

Inter

PGM, Sept 17-19, 2014 10

Experimental Setup

I Software implementation based on HUGIN software and usingMessage Passing Interface (MPI)

I Client-server architecture; a master and worker processesI Average run time (wall-clock) is computed over ten runs of the

algorithm on each dataset with specific number of processesI Clock starts before reading data and ends after all results are

received by master process

data set |X | N

Munin1 189 750,000Munin2 1,003 750,000Bank 1,823 1,140,000

PGM, Sept 17-19, 2014 11

Three Different Test Computers

1. Ubuntu: a linux server running - 1 Intel Xeon(TM) E3-1270Processor (4 cores) and 32 GB RAM

2. Fyrkat: cluster with 80 nodes - each has 2 Intel Xeon (TM)X5260 Processors (2 cores) and 16GB RAM

3. Vilje: cluster with 1404 nodes - each has 2 Intel Xeon E5-2670Processors (8 cores) and 32GB

Test computers have different resource management systemsFor 2. and 3. we allocate one worker node for each process

PGM, Sept 17-19, 2014 12

Ubuntu - Linux Server One Node

0

5

10

15

20

25

30

35

40

45

50

0 5 10 15 20 25

Av

erag

e ru

n t

ime

in s

eco

nd

s

Number of processors

Munin1 on Ubuntu

0

100

200

300

400

500

600

700

800

900

1000

0 5 10 15 20 25

Av

erag

e ru

n t

ime

in s

eco

nd

s


Munin2 on Ubuntu

Ubuntu: A linux server running Ubuntu - Intel Xeon(TM) E3-1270 Processor (4 cores) and 32 GB RAM

PGM, Sept 17-19, 2014 13

Fyrkat - Cluster 80 Nodes

0

200

400

600

800

1000

1200

1400

0 5 10 15 20 25 30 35

Av

erag

e ru

n t

ime

in s

eco

nd

s


Munin2 on Fyrkat

0

1000

2000

3000

4000

5000

6000

7000

0 5 10 15 20 25 30 35

Av

erag

e ru

n t

ime

in s

eco

nd

s


Bank on Fyrkat

Fyrkat: 80 nodes - each has 2 Intel Xeon (TM) X5260 Processors (2 cores) and 16GB RAM

PGM, Sept 17-19, 2014 14

Vilje - Cluster 1404 Nodes

0

200

400

600

800

1000

1200

1400

1600

1800

0 10 20 30 40 50 60 70 80 90 100

Av

erag

e ru

n t

ime

in s

eco

nd

s


Munin2 on Vilje

0

1000

2000

3000

4000

5000

6000

7000

0 20 40 60 80 100 120 140 160 180 200

Av

erag

e ru

n t

ime

in s

eco

nd

s


Bank on Vilje

Vilje: 1404 nodes - each has 2 Intel Xeon E5-2670 Processors (8 cores) and 32GB

PGM, Sept 17-19, 2014 15

Speedup Factors - Vilje & Bank

Relative performance compared to case of using one processor

0

5

10

15

20

25

0 20 40 60 80 100 120 140 160 180 200

Sp

eed

up

fac

tor


Time

Files

Optimal

Time speed-up factors for reading data

Files theoretical number of files read byeach process (k/v · |X |)

Optimal square root of the number ofprocessors

0

20

40

60

80

100

120

140

160

180

0 20 40 60 80 100 120 140 160 180 200

Sp

eed

up

fac

tor


Score

Score speed-up factor for computingmutual information

PGM, Sept 17-19, 2014 16

Conclusion & Future Work

ConclusionI Introduced a new algorithm for learning TAN structure using

parallel computingI Empirical evaluation shows that significant performance

improvements are possible

Future WorkI Apply principles to structure learning in general <> TANI Multithread implementation <> MultiprocessI Horizontal parallelization <> Vertical parallelization

PGM, Sept 17-19, 2014 17

This project has received funding from the European Union’sSeventh Framework Programme for research, technologicaldevelopment and demonstration under grant agreement no 619209

PGM, Sept 17-19, 2014 18