[ieee 2013 second international conference on informatics & applications (icia 2013) - lodz,...

7
Improving Genetic Process Mining using Honey Bee Algorithm Yahia Z. Seleem Faculty of computer and information Department of Information Systems Assiut University Assiut, Egypt [email protected] Marghny H. Mohamed Faculty of computer and information Department of Computer Science Assiut University Assiut, Egypt [email protected] Khaled F. Hussain Faculty of computer and information Department of Information Technology Assiut University Assiut, Egypt [email protected] AbstractProcess mining refers to the extraction of process models from event logs. This paper presents a new process mining approach based on the combination of Honey Bee algorithm and Genetic Algorithm in which the benefits of Honey Bee algorithm is used where the concept of neighborhood search for a solution emerges from intelligent behavior of honeybee and the diversity of Genetic algorithm to find the global optimum. The new process mining approach presented in this paper is implemented as a plug-in in the process mining framework http://www.processmining.org. Computational experiments show that the process mining approach present in this paper gives a significant improvement over the basic Genetic algorithm. KeywordsProcess Mining, Genetic Algorithm, Honey Bee Algorithm, Data mining, Petri nets. I. INTRODUCTION Process mining is considered a new research area that combines both data mining on the one hand, and process modeling on the other hand. The main aims of process mining are to discover, improve and analyze the existing processes by extracting process models from event logs that generated from information systems, Process mining include three main perspectives: (1) process perspective to find a good characterization of all possible paths, (2) the organizational perspective focus on the performers of the activities and how they are related, (3) the case perspective focuses on properties of cases [1]. A sample log file is given in Table1 involving 8 activities and 9 events that identify defect life cycle in software in a real software development project. TABLE 1. SAMPLE PROCESS LOG Task ID Action Time Stamp Originator ID 1001 Submit 2012-02-13 21:23:10 Q00617 1002 Submit 2012-02-14 11:12:45 S00414 1002 Assign 2012-02-14 12:23:20 T00405 1002 Accept 2012-02-14 12:30:16 V00390 1001 Cancel 2012-02-14 15:16:04 Q00617 1002 Request Review 2012-02-14 16:08:15 V00390 1002 Reject 2012-02-15 09:15:10 S00414 A. The classical genetic algorithm Genetic algorithms are inspired by nature and rely on Darwin’s principle of “survival of the fittest”. The goal is to reach the optimum solution by generating new generations by starting with random initial population. Next generations are generated using genetic operators crossover and mutation. A fitness function is used to measure the quality of the new generated population to determine the probability of the population to be used in the next generation (i.e. survival of the fittest). The algorithm ends when a termination condition is met [2]. The Basic GA can be described as shown in Fig.1 1) Choose the initial population of individuals. 2) Evaluate the fitness of each individual in that population. 3) Repeat on this generation until termination: (time limit, sufficient fitness achieved, etc.) (4) Select the best-fit individuals for reproduction. (5) Breed new individuals through crossover and mutation operations to give birth to offspring (6) Evaluate the individual fitness of new individuals. (7) Replace least-fit population with new individuals. Fig. 1. Pseudo code of the basic genetic algorithm B. The classical honey bee algorithm The Bees Algorithm is an optimization algorithm proposed by Pham et al based on the food foraging behavior of honey bees in nature. The Bees algorithm has become one of the most successful optimization algorithms due to its successful implementation in various applications for optimization problems such as neural network, data clustering, and data classification [3]. C. Honey Bee Algorithm Fig.2 shows the pseudo code for the algorithm in its simplest form. The algorithm requires a number of parameters to be set, namely: number of scout bees (n), number of sites selected out of n visited sites (m), number of best sites out of ISBN: 978-1-4673-5256-7/13/$31.00 ©2013 IEEE 59

Upload: khaled-f

Post on 26-Jan-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2013 Second International Conference on Informatics & Applications (ICIA 2013) - Lodz, Poland (2013.09.23-2013.09.25)] 2013 Second International Conference on Informatics & Applications

Improving Genetic Process Mining using

Honey Bee Algorithm

Yahia Z. Seleem

Faculty of computer and information

Department of Information Systems

Assiut University

Assiut, Egypt

[email protected]

Marghny H. Mohamed

Faculty of computer and information

Department of Computer Science

Assiut University

Assiut, Egypt

[email protected]

Khaled F. Hussain

Faculty of computer and information

Department of Information Technology

Assiut University

Assiut, Egypt

[email protected]

Abstract— Process mining refers to the extraction of process

models from event logs. This paper presents a new process

mining approach based on the combination of Honey Bee

algorithm and Genetic Algorithm in which the benefits of Honey

Bee algorithm is used where the concept of neighborhood search

for a solution emerges from intelligent behavior of honeybee and

the diversity of Genetic algorithm to find the global optimum.

The new process mining approach presented in this paper is

implemented as a plug-in in the process mining framework

http://www.processmining.org. Computational experiments show

that the process mining approach present in this paper gives a

significant improvement over the basic Genetic algorithm.

Keywords— Process Mining, Genetic Algorithm, Honey Bee

Algorithm, Data mining, Petri nets.

I. INTRODUCTION

Process mining is considered a new research area that combines both data mining on the one hand, and process modeling on the other hand. The main aims of process mining are to discover, improve and analyze the existing processes by extracting process models from event logs that generated from information systems, Process mining include three main perspectives: (1) process perspective to find a good characterization of all possible paths, (2) the organizational perspective focus on the performers of the activities and how they are related, (3) the case perspective focuses on properties of cases [1].

A sample log file is given in Table1 involving 8 activities

and 9 events that identify defect life cycle in software in a real

software development project.

TABLE 1. SAMPLE PROCESS LOG

Task ID Action Time Stamp Originator

ID

1001 Submit 2012-02-13 21:23:10 Q00617

1002 Submit 2012-02-14 11:12:45 S00414

1002 Assign 2012-02-14 12:23:20 T00405

1002 Accept 2012-02-14 12:30:16 V00390

1001 Cancel 2012-02-14 15:16:04 Q00617

1002 Request

Review 2012-02-14 16:08:15 V00390

1002 Reject 2012-02-15 09:15:10 S00414

A. The classical genetic algorithm

Genetic algorithms are inspired by nature and rely on

Darwin’s principle of “survival of the fittest”. The goal is to

reach the optimum solution by generating new generations by

starting with random initial population. Next generations are

generated using genetic operators crossover and mutation. A

fitness function is used to measure the quality of the new

generated population to determine the probability of the

population to be used in the next generation (i.e. survival of

the fittest). The algorithm ends when a termination condition

is met [2]. The Basic GA can be described as shown in Fig.1

1) Choose the initial population of individuals.

2) Evaluate the fitness of each individual in that population.

3) Repeat on this generation until termination: (time limit,

sufficient fitness achieved, etc.)

(4) Select the best-fit individuals for

reproduction.

(5) Breed new individuals through

crossover and mutation operations to give

birth to offspring

(6) Evaluate the individual fitness of new

individuals.

(7) Replace least-fit population with new

individuals.

Fig. 1. Pseudo code of the basic genetic algorithm

B. The classical honey bee algorithm

The Bees Algorithm is an optimization algorithm proposed

by Pham et al based on the food foraging behavior of honey

bees in nature. The Bees algorithm has become one of the

most successful optimization algorithms due to its successful

implementation in various applications for optimization

problems such as neural network, data clustering, and data

classification [3].

C. Honey Bee Algorithm

Fig.2 shows the pseudo code for the algorithm in its

simplest form. The algorithm requires a number of parameters

to be set, namely: number of scout bees (n), number of sites

selected out of n visited sites (m), number of best sites out of

ISBN: 978-1-4673-5256-7/13/$31.00 ©2013 IEEE 59

Page 2: [IEEE 2013 Second International Conference on Informatics & Applications (ICIA 2013) - Lodz, Poland (2013.09.23-2013.09.25)] 2013 Second International Conference on Informatics & Applications

m selected sites (e), number of bees recruited for best e sites

(nep), number of bees recruited for the other (m-e) selected

sites (nsp), the initial size of patches (ngh) which includes site

and its neighbourhood and stopping criterion.

1. Initialize population with random solutions (n).

2. Evaluate the fitness of each individual in the initial population.

3. Loop: While (stopping criterion not met)

4. Select sites for neighborhood search (m), (m-e).

5. Recruit new bees for best selected sites (more bees for

best e sites) and evaluate fitness (nep, nsp).

6. Select the fittest bee from each patch.

7. Assign remaining bees to search randomly and evaluate

their fitness.

8. End Loop.

Fig. 2. Pseudo code of the basic honey bee algorithm.

II.RELATED WORK

Many researchers have contributed to the field of process

mining. In this section, we will give a brief overview of a few

representative works. In general, all process mining

approaches take an event log as input and as a starting point

for the discovery of underlying processes. Some of the

algorithms used for process mining include [4].

•The α-Algorithm:-this approach use ordering relations that

founded in the event log to crate the Petri-net. This algorithm

assumes that the log must contain the process instance

identifier (case id) and it must be rather complete in the sense

that all ordering relations should be present in the log.

•Genetic Algorithm:-an algorithm in which initial random

solutions are generated and creating new generations using

genetic algorithm operator’s crossover and mutation. The

search space is the set of all possible solutions with different

combinations of the activities that appear in the event log,

Fitness function is used to measure the completeness and

accuracy of the new individuals. The log should contain a

relatively high number of execution traces.

•Instance graphs:-an instance graph is generated using

Event-driven Process Chains (EPCs). For each execution

traces found in the log. Instance graph is constructed using the

dependencies found in the entire log. Several instance graphs

can then be aggregated in order to obtain the overall model for

that log.

III.LIMITATIONS OF CURRENT APPROACHES

The main challenge in mining workflow models lies in the

detection of dependencies between activities, Existing

approaches for mining the process perspective have problems

dealing with issues such as duplicate activities, hidden

activities, non free-choice constructs, noise, and

incompleteness. The problem with duplicate activities occurs

when the same activity can occur at multiple places in the

process. This is a problem because it is no longer clear to

which activity some event refers. The problem with hidden

activities is that essential routing decisions are not logged but

impact the routing of cases. Non-free-choice constructs are

problematic because it is not possible to separate choice from

synchronization, also research for a better computation time

remains challenging [1].

Although the GA is able to mine models with all structural

constructs but duplicate tasks and is robust to noise, it has a

drawback that cannot be neglected: the computational time

[8].

IV.PROPOSED APPROACH

Honey Bee Colony (HBC) algorithm is considered new

and widely used in searching for optimum solutions. This is

due to its uniqueness in problem-solving method where the

solution for a problem emerges from intelligent behaviour of

honey bee swarms.

Based on previous research mentioned in the papers

[4][5][6]we found that Bee algorithm has been used for the

optimization purposes in many fields. Therefore, this paper

aims at applying Honey Bee algorithm for process mining

applications. A new process mining approach (HBGA) based

on the combination of Honey Bee algorithm and Genetic

Algorithm in which the benefits of the Honey Bee algorithm

is used where the concept of neighbourhood searching for a

solution emerges from intelligent behaviour of honeybee and

the diversity of Genetic algorithm to find the global optimum.

The performance of the HBGA will be compared with the

original Genetic process mining algorithm.

The proposed approach main concept relies on searching

for the fittest solution such that the number of executions of

the fitness function is minimized this done by applying the

neighbourhood search concept of the Honey Bee algorithm to

find the best individuals this differs from the traditional

approach, which selects both parents randomly from the whole

pool. The basic steps of the HBGA are explained in Fig 3 and

its flowchart is shown in Fig 4.

0. Initialize parameters (max number of generations, n, m, e,

crossover rate, mutation rate).

1. Initialize population with n random solutions.

2. Evaluate the fitness of the population.

3. Select m site for the neighborhood search.

4. While( stop criteria is not met)

5. Find the e Elite bees from the m sites.

6. Find (m-e) bees from the m sites.

7. Loop for each e sites

8. Crossover ei with ni

9. Mutate ei with ni

10. End Loop

11. Loop for each m-e sites

12. Crossover (m-e)i with ni

13. Mutate (m-e)i with ni

14. End Loop

15. Select the fittest bee from each m sites for new

population.

ISBN: 978-1-4673-5256-7/13/$31.00 ©2013 IEEE 60

Page 3: [IEEE 2013 Second International Conference on Informatics & Applications (ICIA 2013) - Lodz, Poland (2013.09.23-2013.09.25)] 2013 Second International Conference on Informatics & Applications

Fig. 3. Pseudo code of the proposed HBGA.

Fig. 4. Flow chart of the proposed HBGA.

HBGA benefits from the search capability of the Honey Bee algorithm and Genetic algorithm, more specifically, the task is to search for the fittest individual such that the execution time and number of executions of fitness function is minimized. The steps of the proposed algorithm are further described below.

The Algorithm requires a number of parameters to be set. These are; number of scout bees (n), number of sites selected for neighborhood search (m), number of best “elite” sites out of m selected sites (e), number of bees recruited for the best e sites (n1), number of bees recruited for the other (m-e) selected sites (n2) and a stopping criterion (reaching specific fitness values or maximum number of generations, the fittest individual has not changed for n/2 generations in a row), mutation probability in GA and crossover probability of GA.

In step 1:- the algorithm starts with an initial population of n

scout bees. Each bee represents a potential solution (process

model).

In step 2:- the algorithm calculates the fitness value of each

individual.

In step3:- the m sites with the highest fitness are designated as

“selected sites” and chosen for neighbourhood search.

In steps from 4 to 15 a repeat of generating a new population

of solutions is done.

In step5:- Selection of the best e sites.

In steps 7 to 10:- Sending more bees for the best e sites here

the role o genetic algorithm, using genetic algorithm mutation

and crossover operations create new individuals.

In steps 11 to 14:- Sending fewer bees for the remaining best

m-e sites using genetic algorithm mutation and crossover

operations.

In step 15:-Use the new individuals generated from the

crossover and mutation operations to generate new population.

The loop continues until stop criteria is met (reaching specific

fitness values or maximum number of generations, the fittest

individual has not changed in n/2 generations in a row).

V.EXPERMINTES AND RESULTS

In this section, results of implementing and testing HBGA

algorithm along with its comparison with the results of the

original genetic algorithm stated in the literature are mentioned

.Configuration of computer system we used for the

experiments are shown in Table 2.

TABLE 2. COMPUTER CONFIGURATION

Configuration CPU Intel® Core™2 Duo CPU T6670 @

2.20GHz 2.20 GHz

Memory 3.00 GB

Hard-Disk 320GB (5400RPM)

Operating System Windows 7 Ultimate 64 bit.

In order to compare and evaluate these algorithms well

known real data sets from the process mining website

(http://www.processmining.org/event_logs_and_models_used

_in_book ) different type of data sets used in the comparison.

Table 3 contains the definition of the datasets used in the

experiments small with 42 and 71 events used, and larger

dataset with 7539 events used in the experiments. Table 3

shows the parameter values for each algorithm used in this

test.

TABLE 3.

DATASETS USED IN THE EXPERIMENTS

The noticeable point in running and testing of HBGA

algorithm is that running the ProM tool to mine the process

logs in Table3 using both the genetic process mining

algorithm and the HBGA algorithm and measuring the

performance of the both algorithms. For the assessment of the

tested algorithm performance two main factors are used

(number of executions of the fitness functions and execution

time) to reach the target fitness value, in this paper we didn’t

use execution of fitness function as a main measure because

execution time may vary among different operating systems

Data source cases events Event

classes

event

types Originators

running-

example 6 42 8 1 6

running-

example non

confirming 10 71 8 1 6

Bigger-example 1391 7539 8 1 1

1-Initialize parameters

2- Initialize random solution population n

3-Evaluate the fitness of the population

4-Select best m sites.

5-Use crossover and mutation operators to

create n1 neighbours for the best e sites.

6. Select best m-e site

9-Use n1 and n2 neighbours to generate

new population.

7- Use crossover and mutation operators to

create n2 neighbours for the best m-e sites.

ISBN: 978-1-4673-5256-7/13/$31.00 ©2013 IEEE 61

Page 4: [IEEE 2013 Second International Conference on Informatics & Applications (ICIA 2013) - Lodz, Poland (2013.09.23-2013.09.25)] 2013 Second International Conference on Informatics & Applications

and different code implementations, so our main factor in the

comparison is the numbers of execution of the fitness

function. Both algorithms were executed 10 times for each

dataset with different parameter values. For each dataset, we

have done 10 times testing to analyze the result Table 13 show

the average number of finesse calculations and the average

time of algorithm executions.

A. Dataset 1 (running-example none confirming.xml)

Table 4 shows the parameters used to test the first dataset

(running-example none confirming.xml) for both the basic

genetic algorithm and the HBGA.

TABLE 4. PARAMETERS USED IN THE EXPERIMENTS

Table5 show the results obtained for the basic Genetic

algorithm the result in Table5 show that the Genetic

Algorithm execution time is 30035 ms to reach the target

fitness value which is 0.98 and the number of executions of

the fitness function is 23000. Table6 shows the results

obtained for the HBGA when executed in the ProM tool, the

results show that the HBGA take 8163 ms to find the

individual with the target fitness value 0.98 and numbers of

execution of the fitness function is 6100.

Figure 5 show A screenshot of the HBGA plug-in in the

ProM framework indicating the parameters used in the

experiments. Fig.6. show a chart that compares between the

basic Genetic Algorithm and HBGA in terms of number of

fitness function and fitness values.

TABLE 5. RESULTS FOR THE BASIC GENETIC MINING

ALGORITHM(RUNNING-EXAMPLE NONE CONFIRMING.XML).

Fitness Value Number of Fitness

Calculations

Time

(ms)

0.78579 1300 861

0.8968343 1900 1658

0.9177675 2500 2491

0.9169595 3100 4600

0.92113739 3700 4846

0.91817572 4300 5596

0.916356938 4900 6358

0.91736474 5500 7214

0.91826426 6100 7940

0.9177925 7000 9026

0.92002162 10300 13361

0.96501968 22500 29315

0.97969391 23000 30035

TABLE 6. RESULTS FOR THE HBGA MINING ALGORITHM FOR THE

DATASET (RUNNING-EXAMPLE NONE CONFIRMING.XML).

Fitness Value Number of Fitness

Calculations

Time (ms)

0.714259 1300 1772

0.783327 1900 2514

0.915081 2500 3359

0.955277 3100 4108

0.961742 3700 4879

0.965702 4300 5784

0.9651 4900 6545

0.965774 5500 7333

0.9804506 6100 8163

Fig. 5. A screenshot of the HBGA plug-in in the ProM framework

Fig. 6. A Comparison between the basic GA and the HBGA in terms of

number of fitness calculations and the fitness value for the dataset

(running-example none confirming.xml).

Algorithm

Max

number of

generations

Population

Size

N

m e N1

N2

GA 230 100 - - - -

HBGA 10 50 50 30 20 10

ISBN: 978-1-4673-5256-7/13/$31.00 ©2013 IEEE 62

Page 5: [IEEE 2013 Second International Conference on Informatics & Applications (ICIA 2013) - Lodz, Poland (2013.09.23-2013.09.25)] 2013 Second International Conference on Informatics & Applications

B. Dataset 2 (running-example.xml)

Table 7 shows the parameters used to test the first dataset

(running-example.xml) for both the basic genetic algorithm

and the HBGA. Table8 show the results obtained from the

basic Genetic algorithm. The result in Table8 show that the

Genetic algorithm execution time is 26777 ms to reach the

target fitness value which is .91 and the numbers of executions

of the fitness function is 26777. Table9 shows the results

obtained for the HBGA when executed in the ProM tool, the

results show that the HBGA take 6656 ms to find the

individual with the target fitness value 0.987 and the number

of executions of the fitness function is 4700. Fig 7 shows a

comparison between the genetic algorithm and HBGA in

terms of number of executions of the fitness function and the

fitness values reached.

TABLE 7. PARAMETERS USED IN THE EXPERIMENTS FOR THE DATA SET

(RUNNING-EXAMPLE.XML)

TABLE 8. RESULTS FOR THE BASIC GENETIC MINING ALGORITHM

(RUNNING-EXAMPLE.XML).

Fitness

Value

Number of

Fitness

Calculations

Time (ms)

0.80289 650 647

0.806428 1100 1074

0.876798 1550 1454

0.8783744 2000 2140

0.8774159 2450 2405

0.885432 2900 3009

0.901339 3350 11382

0.88746 3800 3693

0.8875496 4250 3944

0.886828 4700 4290

0.902574 5250 4754

0.91418 30000 26777

TABLE 9. RESULTS FOR THE HBGA ALGORITHM FOR THE DATA SET

(RUNNING-EXAMPLE.XML).

Fitness

Value

Number of

Fitness

Calculations

Time (ms)

0.6278965 650 1222

0.7138262 1100 1714

0.93573 1550 2221

0.936836 2000 2840

0.937789 2450 3387

0.984541 2900 4087

0.98433 3350 4679

0.98488 3800 5263

0.98521 4250 6006

0.98772 4700 6656

Fig. 7. Comparison between the basic GA and the HBGA in terms of fitness

function for the dataset (running-example.xml).

C. Dataset 3 (bigger-example.xml)

Tables 10,11,12 shows the parameters and the results

obtained for both the basic genetic algorithm and the HBGA

and Fig 8 shows a comparison between the genetic algorithm

and HBGA in terms of number of executions of the fitness

function and the fitness values reached. Fig.8. show a chart

that compares between the basic Genetic Algorithm and

HBGA in terms of number of executions of the fitness

function and the fitness values.

TABLE 10. PARAMETERS USED IN THE EXPERIMENTS FOR THE DATA SET

(BIGGER-EXAMPLE.XML).

TABLE 11. RESULTS FOR THE BASIC GENETIC MINING ALGORITHM

(BIGGER-EXAMPLE.XML)

Fitness Value Number of Fitness Calculations Time (ms)

0.4113516 300 44612

0.5734449 500 67729

0.86284727 700 90731

0.8714939 900 90734

0.87251332 1100 114441

0.86919456 1300 214750

0.90599751 1500 162180

0.91268862 1700 214767

0.91103113 1900 186025

0.9323182 2100 231724

0.9429656 2300 231726

0.96310089 2500 231728

0.964984783 3177 324901

0.9813097 3214 300634

Algorithm

Max

number of

generations

Population

Size

N

m e N1

N2

GA 1000 200 - - - -

HBGA 10 200 50 30 15 10

Algorithm

Max

number of

generation

Population

Size

N

m e N1

N2

GA 200 100 - - - -

HBGA 10 100 100 50 20 10

ISBN: 978-1-4673-5256-7/13/$31.00 ©2013 IEEE 63

Page 6: [IEEE 2013 Second International Conference on Informatics & Applications (ICIA 2013) - Lodz, Poland (2013.09.23-2013.09.25)] 2013 Second International Conference on Informatics & Applications

TABLE 12. RESULTS FOR THE HBGA ALGORITHM (BIGGER-EXAMPLE.XM)

Fitness Value Number of

Fitness

Calculations

Time

(ms)

0.851192078 300 28567

0.951407673 500 48506

0.950046223 700 68507

0.95873926 900 87893

0.960695308 1100 107508

0.968393951 1300 129293

0.970666059 1500 151641

0.975384609 1700 172994

0.975102523 1900 193383

0.980810988 2100 214455

0.983229058 2300 235525

0.985469131 2500 257107

Fig. 8. Comparison between the basic GA and the HBGA in terms of fitness

function for the dataset (Bigger-example.xml).

In these experiments we run each algorithm 10 times in

each dataset .Table13 show the mean numbers of execution of

the fitness function for 10 runs of the basic genetic algorithm

and the HBGA for the three data sets used in this paper.

TABLE13: RESULTS FOR AVERAGE VALUES FOR THE NUMBER OF EXECUTIONS

OF THE FITNESS FUNCTION AND THE EXECUTION TIME FOR 10 RUNS THE

HBGA ALGORITHM AND THE BASIC GENETIC ALGORITHM WITH DIFFERENT

PARAMETER VALUES.

VI. DISCUSSION

The experiments and results in the previous section prove

that the HBGA algorithm reaches the fittest population with

minimum number of executions of the fitness function when

compared with the basic genetic algorithm.

In this paper we use two factors for the comparison between

the basic genetic algorithm and the HBGA:-

1-Number of executions of the fitness function.

2-Execution time of the algorithm.

Results prove that in the small and large event logs the

HBGA performance is better than the basic genetic algorithm.

From the results obtained in Table 12 we can compare the

performance of the HBGA over the basic genetic algorithm for

the three data sets used in this paper .

To calculate the percentage of enhancement of the HBGA

over in terms of number of executions of the fitness function.

Let Pfitness be the performance enhancement of the HBGA.

Number of fitness calculations for HBGA

Pfitness= ــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــ

Number of fitness calculations for GA

For the dataset (running example non confirming.xml)

Pfitness =7023/22700=31%.

For the dataset (running example.xml)

Pfitness =10756/49833=21%.

For the dataset (bigger example.xml)

Pfitness =2026/3000=67%.

To calculate the percentage of enhancement of the HBGA

over the basic Genetic algorithm terms of execution time of

the algorithm.

Let Ptime be the performance enhancement of the HBGA.

Execution time of HBGA

Pfitness= ــــــــــــــــــــــــــــــــــــــــــــــــــــــــــ Execution time of GA

For the dataset (running example non confirming.xml)

Ptime=10450/30530=34%.

For the dataset (running example.xml)

Ptime=7240/44114=16%.

0.6 0.65

0.7 0.75

0.8 0.85

0.9 0.95

1 1.05

1.1

30

0

70

0

11

00

15

00

19

00

23

00

31

77

Fitn

ess

Val

ue

number of Fintess calculations

GA

BAGA

Running

example.xml

Running Example

none

confirming.xml

(Bigger-example.xml)

Algorithm Number of

fitness

calculation

Time

(MS)

Number of

fitness

calculation

Time

(MS)

Number of

fitness

calculation

Time

(MS)

HBGA 10756 7240 7023 10450 2026 150995

GA 49833 44114 22700 30530 3000 227792

ISBN: 978-1-4673-5256-7/13/$31.00 ©2013 IEEE 64

Page 7: [IEEE 2013 Second International Conference on Informatics & Applications (ICIA 2013) - Lodz, Poland (2013.09.23-2013.09.25)] 2013 Second International Conference on Informatics & Applications

For the dataset (bigger example.xml)

Ptime=150995/227792=66%.

From the previous results we can find that the

performance of the HBGA produces better results than the

basic genetic algorithm in all cases in terms of execution time

and number of executions of the fitness function.

The above test results can clearly prove that the proposed

HBGA process mining algorithm can handle the process

mining task successfully and obtain the superior result when

compared with other process mining algorithms.

VII.CONCLUSION

In this paper, we presented a new approach HBGA for

process mining to improve the performance of the basic

Genetic algorithm combining the Bees Algorithm with the

basic genetic algorithm .Although the basic Genetic algorithm

is able to mine models with all structural constructs but

duplicate tasks and is robust to noise, it has a drawback that

cannot be neglected: the computational time, so the main

purpose of this paper is to improve the performance of the

genetic process mining by decreasing the number of

executions of the fitness function. We especially focused on

the performance differences between the HBGA and the basic

Genetic algorithm.

The HBGA approach presented in this paper use the

Honey Bee algorithm to improve the performance of the

genetic process mining using the concept of neighborhood

search in the Honey Bee algorithm. The HBGA present in this

paper implemented using java programming language and

tested in the ProM tool and three data sets used in the

experiments http://www.processmining.org/event_logs_and_

models_used_in_book.

Experiential results show that the HBGA provide better

performance over the basic genetic process mining approach in both execution time and number of executions of the fitness function from 16% to 76%. Therefore, we can conclude

that the proposed HBGA algorithm for process mining can

obtain competitive results against five traditional process

mining algorithms and can be considered as useful and

accurate process mining algorithm. One of the drawbacks of HBGA is the number of the

parameters used that should be tuned in this algorithm, also

when dealing with very large event logs execution time of the

algorithm need more enhancements as a result, working on

finding a solution to help the algorithm users in order to

choose the appropriate value of parameters and finding

solutions to decrease the execution time of the algorithm is of

great value these subjects can be scheduled as future works.

VIII. REFERENCES

[1] F. Daniel, K. Barkaoui, S. Dustdar (Eds.). “IEEE Task Force on process

Mining”. BPM 2011 Workshops, Part I, LNBIP 99, Springer-Verlag (ISBN

978-3-642-28107-5) 2012 . pp. 169–194.

[2] Jan Claes, Geert Poels , “Integrating Computer Log Files for Process

Mining a Genetic Algorithm Inspired Technique ”. 2011. 1st Workshop on Integration of IS Engineering Tools (INISET 2011). pp 5-6.

[3] F. Daniel, K. Barkaoui, S. Dustdar (Eds.). “IEEE Task Force on Process Mining”. BPM 2011 Workshops, Part I, LNBIP 99, Springer-Verlag (ISBN

978-3-642-28107-5) 2012 . pp. 169–194.

[4]Diogo Ferreira1,Marielba Zacarias,Miguel Malheiros, Pedro Ferreira , “Approaching Process Mining with Sequence Clustering: Experiments and

Findings”. BPM'07 Proceedings of the 5th international conference. pp. 2-3.

[5] Mohd Afizi Mohd Shukran , Yuk Ying Chung , Wei-Chang Yeh,

Noorhaniza Wahid, Ahmad Mujahid Ahmad Zaidi “Artificial Bee Colony based Data Mining Algorithms for Classification Tasks”. Modern Applied

Science . ISSN 1913-1852 . 2011,pp 219-220.

[6]Du San Teodorovc,Tat Jana Davidovic , Milica Selmic , “Bee Colony

Optimization The Applications Survey”. ACM Transactions on

Computational Logic. 2011, pp7-9.

ISBN: 978-1-4673-5256-7/13/$31.00 ©2013 IEEE 65