process mining: discovering processes from event logs all truths are easy to understand once they...

62
Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo Galilei (1564 - 1642) Prof.dr.ir. Wil van der Aalst Eindhoven University of Technology Department of Information and Technology P.O. Box 513, 5600 MB Eindhoven The Netherlands [email protected]

Upload: helen-mcdaniel

Post on 29-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Process Mining:Discovering processes from event

logs

All truths are easy to understand once they are discovered; the point is to discover them. Galileo Galilei (1564 - 1642)

Prof.dr.ir. Wil van der AalstEindhoven University of Technology

Department of Information and TechnologyP.O. Box 513, 5600 MB Eindhoven

The [email protected]

Page 2: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Outline • Process Mining

– overview– alpha algorithm– genetic mining

• ProM– Architecture– Convertors (e-mail, Staffware, InConcert, SAP, etc.) – Process mining plug-ins

• Alpha-algorithm• Multi-phase mining• Genetic mining

– Analysis plug-ins– Conformance testing plug-in– LTL checker plug-in– Social network plug-in

• Conclusion

Page 3: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Process Mining

processdesign

implementation/configuration

processenactment

diagnosis

Page 4: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Motivation: Reversing the process

• Process mining can be used for:– Process discovery (What is the process?)

– Delta analysis (Are we doing what was specified?)

– Performance analysis (How can we improve?)

process mining

Registerorder

Prepareshipment

Shipgoods

Receivepayment

(Re)sendbill

Contactcustomer

Archiveorder

Page 5: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Overview

1) basic performance metrics

2) process modelStart

Register order

Prepareshipment

Ship goods

(Re)send bill

Receive paymentContact

customer

Archive order

End

3) organizational model 4) social network

5) performance characteristics

If …then …

6) auditing/security

www.processmining.org

Page 6: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Let us focus on mining process models …

1) basic performance metrics

2) process modelStart

Register order

Prepareshipment

Ship goods

(Re)send bill

Receive paymentContact

customer

Archive order

End

3) organizational model 4) social network

5) performance characteristics

If …then …

6) auditing/security

... and a very simple approach: The alpha algorithm

Page 7: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Alpha algorithm

α

Page 8: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Process log• Minimal information in

log: case id’s and task id’s.

• Additional information: event type, time, resources, and data.

• In this log there are three possible sequences:– ABCD– ACBD– EF

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

Page 9: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

>,,||,# relations

• Direct succession: x>y iff for some case x is directly followed by y.

• Causality: xy iff x>y and not y>x.

• Parallel: x||y iff x>y and y>x

• Choice: x#y iff not x>y and not y>x.

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

A>BA>CB>CB>DC>BC>DE>F

AB

AC

BD

CD

EF

B||CC||B

Page 10: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Basic idea (1)

x y

xy

Page 11: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Basic idea (2)

xy, xz, and y||z

x

z

y

Page 12: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Basic idea (3)

xy, xz, and y#z

x

z

y

Page 13: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Basic idea (4)

xz, yz, and x||y

x

y

z

Page 14: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Basic idea (5)

xz, yz, and x#y

x

y

z

Page 15: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

It is not that simple: Basic alpha algorithm

Let W be a workflow log over T. (W) is defined as follows.

1. TW = { t T     W t },

2. TI = { t T     W t = first() },

3. TO = { t T     W t = last() },

4. XW = { (A,B)   A TW   B TW    a Ab B a W b     a1,a2 A a1#W a2

   b1,b2 B b1#W b2 },

5. YW = { (A,B) X    (A,B) XA A B B (A,B) = (A,B) },

6. PW = { p(A,B)    (A,B) YW } {iW,oW},

7. FW = { (a,p(A,B))    (A,B) YW   a A }   { (p(A,B),b)    (A,B) YW   b B

}  { (iW,t)    t TI}  { (t,oW)   t TO}, and

8. (W) = (PW,TW,FW). The alpha algorithm has been proven to be correct for a large class of free-choice nets.

Page 16: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Examplecase 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

A

B

C

D

E F

(W)

W

Page 17: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

DEMOAlpha algorithm

A

E

G

invitereviewers

D

get review 2

time-out 2

collectreviews

H

decide

I

accept

J

reject

inviteadditionalreviewer

K

M

L

get review X

time-out X

C

B

get review 1

time-out 1

G

F

get review 3

time-out 3

48 cases16 performers

Page 18: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Challenges• Refining existing algorithm for (control-flow/process

perspective)– Hidden tasks– Duplicate tasks– Non-free-choice constructs– Loops– Detecting concurrency (implicit or explicit)– Mining and exploiting time– Dealing with noise– Dealing with incompleteness

• Mining other perspectives (data, resources, roles, …) • Gathering data from heterogeneous sources• Visualization of results• Delta analysis

Page 19: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Genetic mining

Page 20: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Approach

Page 21: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Genetic mining: The two main questions

• How to represent an individual? (Petri net?)• How to define the genetic operators? (e.g.,

crossover)

Page 22: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

How to represent an individual?

• Problems with Petri nets:– Places do not exist in log– difficulties defining mutation and crossover– problems describing subtle rules without adding transitions

A

B

C

D

Page 23: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Representation of the goal processtrue A A A D D E^F BvCvG

→ A B C D E F G H

A 0 1 1 1 0 0 0 0 BvCvD

B 0 0 0 0 0 0 0 1 H

C 0 0 0 0 0 0 0 1 H

D 0 0 0 0 1 1 0 0 E^F

E 0 0 0 0 0 0 1 0 G

F 0 0 0 0 0 0 1 0 G

G 0 0 0 0 0 0 0 1 H

H 0 0 0 0 0 0 0 0 true

A

B

D

E

C

F

G

H

Page 24: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

A more compact representation

ACTIVITY INPUT OUTPUT

A {} {{B,C,D}}

B {{A}} {{H}}

C {{A}} {{H}}

D {{A}} {{E},{F}}

E {{D}} {{G}}

F {{D}} {{G}}

G {{E},{F}} {{H}}

H {{B,C,G}} {}

A

B

D

E

C

F

G

H

Page 25: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Any Petri net can be mapped onto a causal matrix:

ACTIVITY INPUT OUTPUT

A {...} {{C,D},...}

B {...} {{C,D},...}

C {{A,B},...} {...}

D {{A,B},...} {...}

A

B D

C

but ...

Page 26: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Mapping a causal matrix onto a Petri net?

ACTIVITY INPUT OUTPUT

A {{i11,i12,i13},{i21,i22,i23}} {{o11,o12,o13},{o21,o22,o23}}

A

i11i12i13

i21i22i23

o11o12o13

o21o22o23

Page 27: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Wiring based on input and output sets

A

i11i12i13

i21i22i23

B

o11o12o13

o21o22o23

A

B

?

Using place fusion or silent transitions.

Page 28: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Example: Event logcase id activity id originator timestamp

case 1 activity A John 9-3-2004:15.01

case 2 activity A John 9-3-2004:15.12

case 3 activity A Sue 9-3-2004:16.03

case 3 activity D Carol 9-3-2004:16.07

case 1 activity B Mike 9-3-2004:18.25

case 1 activity H John 10-3-2004:9.23

case 2 activity C Mike 10-3-2004:10.34

case 4 activity A Sue 10-3-2004:10.35

case 2 activity H John 10-3-2004:12.34

case 3 activity E Pete 10-3-2004:12.50

case 3 activity F Carol 11-3-2004:10.12

case 4 activity D Pete 11-3-2004:10.14

case 3 activity G Sue 11-3-2004:10.44

case 3 activity H Pete 11-3-2004:11.03

case 4 activity F Sue 11-3-2004:11.18

case 4 activity E Clare 11-3-2004:12.22

case 4 activity G Mike 11-3-2004:14.34

case 4 activity H Clare 11-3-2004:14.38

Page 29: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

A

B

D

E

C

F

G

H

Goal

Page 30: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Example: Starting pointcase id activity id

case 1 activity A

case 2 activity A

case 3 activity A

case 3 activity D

case 1 activity B

case 1 activity H

case 2 activity C

case 4 activity A

case 2 activity H

case 3 activity E

case 3 activity F

case 4 activity D

case 3 activity G

case 3 activity H

case 4 activity F

case 4 activity E

case 4 activity G

case 4 activity H

+ 500 randomly generated initial individuals

Page 31: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Two individuals

ACTIVITY INPUT OUTPUT

A {} {{B,C,D}}

B {{A}} {{H}}

C {{A}} {{H}}

D {{A}} {{E}}

E {{D}} {{G}}

F {} {{G}}

G {{E},{F}} {{H}}

H {{C,B,G}} {}

ACTIVITY INPUT OUTPUT

A {} {{B,C,D}}

B {{A}} {{H}}

C {{A}} {{H}}

D {{A}} {{E,F}}

E {{D}} {{G}}

F {{D}} {{G}}

G {{E},{F}} {{H}}

H {{C},{B},{G}} {}

Page 32: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Crossover

ACTIVITY INPUT OUTPUT

A {} {{B,C,D}}

B {{A}} {{H}}

C {{A}} {{H}}

D {{A}} {{E, F}}

E {{D}} {{G}}

F {{D}} {{G}}

G {{E},{F}} {{H}}

H {{C,B,G}} {}

ACTIVITY INPUT OUTPUT

A {} {{B,C,D}}

B {{A}} {{H}}

C {{A}} {{H}}

D {{A}} {{E}}

E {{D}} {{G}}

F {} {{G}}

G {{E},{F}} {{H}}

H {{C},{B},{G}} {}

Page 33: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Resulting CM with fitness 1.0

true A A A D D E^F BvCvG

→ A B C D E F G H

A 0 1 1 1 0 0 0 0 BvCvD

B 0 0 0 0 0 0 0 1 H

C 0 0 0 0 0 0 0 1 H

D 0 0 0 0 1 1 0 0 E^F

E 0 0 0 0 0 0 1 0 G

F 0 0 0 0 0 0 1 0 G

G 0 0 0 0 0 0 0 1 H

H 0 0 0 0 0 0 0 0 true

Page 34: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Mapping

A

B

D

E

C

F

G

H

true00000000H

H10000000G

G01000000F

G01000000E

E^F00110000D

H10000000C

H10000000B

BvCvD00001110A

H G F E D C B A→

BvCvGE^F D D A A A true

true00000000H

H10000000G

G01000000F

G01000000E

E^F00110000D

H10000000C

H10000000B

BvCvD00001110A

H G F E D C B A→

BvCvGE^F D D A A A true

Page 35: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

ProM framework

Page 36: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

ProMStaffware

InConcert

MQ Series

workflow management systems

FLOWer

Vectus

Siebel

case handling / CRM systems

SAP R/3

BaaN

Peoplesoft

ERP systems

common XML format for storing/exchanging workflow logs

input/outputCore

Plugins

ProMframework

visualization analysis

alpha algorithmgenetic

algorithmTsinghua alpha

algorithmMulti phasealgorithms

social networkminer

case dataextraction

property verifier

ExternalTools

NetMiner Viscovery ......

...

Page 37: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Converter plug-in: EMailAnalyzer

Page 38: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

XML format

Page 39: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

ProM architecture

UserInterface

+User

Interaction

StaffwareFlowerSAPInConcert...

Heuristic NetAris Graph Format(Aris AML Format)PNMLTPN...

MiningPlugin

ImportPlugin

ExportPlugin

AnalysisPlugin

ConversionPlugin

Heuristic Net PNMLAris Graph format TPNNetMiner file Agna fileAris PPM Instances DOTComma Seperated Values …...

Log Filter

VisualisationEngine

XML Log

ResultFrame

Page 40: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Mining plug-in: Alpha algorithm

Page 41: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Mining plug-in: Genetic Miner

Page 42: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Mining plug-in: Multi-phase mining

Page 43: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Step 1: Get instances

A BD

CE A B

D

CF

A BD

CG

H

IB

D

CE

Page 44: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Step 2: Project

A BD

CG

1

H

I

E

1 2

2

2

1

11

1

11

11

1

1

1

1

ts 11

tf 11

A BD

CE1

1 1

1

1

1tf 1

1ts 11

A BD

CF1

1 1

1

1

1tf 1

1ts 11

Page 45: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Step 3: Aggregate

A BD

CG

3

H

I

E

3 4

4

4

2

11

1

22

11

1

1

1

1

ts 33

tf 32

F1

1

1

Page 46: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Step 4: Map onto EPCets

ts

B

C D

H I

G

etf

E F

eB

eH eI

eC eD

eG

eE eF

A

eA

Page 47: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Step 5: Map onto Petri net (or other language)

A B

C

D

E

F

G

H

I

ts

Page 48: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Mining plug-in: Social network miner

Page 49: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo
Page 50: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Cliques

Page 51: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

SN based on hand-over of work metric

density of network is 0.225

Page 52: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

SN based on working together (and ego network)

Page 53: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Analysis plug-in: LTL checker

Page 54: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo
Page 55: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Analysis plug-in: Conformance checker

Do they agree?

Page 56: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo
Page 57: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Fitness is not enough

Page 58: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Screenshot

(Also runs on Mac.)

Page 59: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Other analysis plug-ins

Page 60: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

More demos?

Page 61: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

Conclusion

• Process mining provides many interesting challenges for scientists, customers, users, managers, consultants, and tool developers.

• Involves multiple perspectives (process, data, resources, etc.)

• Get ProM-ed!• You can contribute by applying ProM and developing

plug-ins

processdesign

implementation/configuration

processenactment

diagnosis

Page 62: Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo

More information

http://www.workflowcourse.com

http://www.workflowpatterns.com

http://www.processmining.org