33 process cubes - pa.win.tue.nl
TRANSCRIPT
33
PAGE 0
Process Cubes
Wil van der Aalst
AIS Meeting 13-2-2014
DSC/e's Motto
Turning Data into Value
What
happened?
Why did
it happen?
What will
happen?
What is the
best that
can happen?
Data Science
aims to
answer such
questions
Process Mining
Motivation
PAGE 7
4.56 compute average
create model
(freq., sum, variance, mode, …)
(alpha, heuristic, fuzzy, social,
dotted, compliance, etc., etc.)
2 dimensions: model = number
PAGE 8
hours: 50.5
number: 32 passed
failed
TU/e HSE QUT
hours: 48.5
number: 55
hours: 23.2
number: 49
hours: 23.5
number: 23
hours: 10.5
number: 4
hours: 24.5
number: 8
2 dimensions: higher-level models
PAGE 9
passed
failed
TU/e HSE QUT
Process Cubes
PAGE 10
W.M.P. van der Aalst.
Process Cubes: Slicing, Dicing, Rolling Up
and Drilling Down Event Data for Process
Mining. In M. Song, M. Wynn, and J. Liu, editors, Asia Pacific
Conference on Business Process Management (AP-
BPM 2013), volume 159 of Lecture Notes in Business
Information Processing, pages 1-22. Springer-Verlag,
Berlin, 2013.
10.1007/978-3-319-02922-1_1
Process Cube: Dimensions and Cells
PAGE 11
time
win
dow
event class
case
typ
e
process cell
event
New behaviorprocess cube
cell model
cell sublog
case
s
time
ct
ec
tw
dimension
caseactivitytimestampresourcetype….
event
Example Dimensions
PAGE 12
time location
group
concept
drift
analysis cross-
organizational
process
mining
clustering and
classification
acbe abce ade
acbe abce ade
acbe abce ade
cases may
be
distributed
over
multiple
cells, the
same event
may belong
to multiple
cells
Example
PAGE 13
gold
silver
normal
Another Example
PAGE 14
>100k
50k
>50k & 100k
Another Example
PAGE 15
Another Example
PAGE 16
{1,2,3,4}
{5,6,7}
{8,9,10}
Roll up and drill down
PAGE 17
time
win
dow
event class
case
typ
e A
G
B
C
D
E
F
1: ACDEG2: BCFG3: BCFG
4: ACEDG5: ACFG
6: ACEDG7: BCEDG8: BCDEG
1: AC4: AC5: AC6: AC
1: CDEG4: CEDG5: CFG
6: CEDG
2: BC3: BC7: BC8: BC
2: CFG3: CFG
7: CEDG8: CDEG
A
C
B
C
GC
D
E
F
GC
D
E
F
time
win
dow
event class
case
typ
e
drill down
gold
cu
sto
mer
silv
er c
ust
om
er
2012
2013
2012
2013
sales delivery
roll up
Two foundational ways of spitting event
data: horizontal or vertical
PAGE 18
A
start complete
G
B
C
D
E
F
1: ACDEG2: BCFG3: BCFG
4: ACEDG5: ACFG
6: ACEDG7: BCEDG8: BCDEG
1: AC2: BC3: BC4: AC5: AC6: AC7: BC8: BC
A
G
B
C
D
E
FC
1: CDEG2: CFG3: CFG
4: CEDG5: CFG
6: CEDG7: CEDG8: CDEG
split
horizontally
A C F G
B C F G
B C
D
G
E
A C
D
G
E
1: ACDEG4: ACEDG6: ACEDG
7: BCEDG8: BCDEG
5: ACFG
2: BCFG3: BCFG
merge
horizontally
split
vertically
merge
vertically
Ingredients
PAGE 19
Event Base (EB) with
(raw) event data
Process Cube Structure (PCS)
defining dimensions
independent of content
Process Cube View (PCV)
based on PCS that can be
linked to real events
Event Base (E,P, π)
• Set of events E
• Set of properties P
• Partial mapping π
(πp(e)=v means event e has property p with value v)
PAGE 20
Slightly more formal
PAGE 21
Process Cube Structure (D,type,hier)
• D is the set of dimensions
• type defines the type of each dimension (string, int,
etc.)
• hier defines the hierarchy per dimension (does not
need to be a tree, elements at the same level may be
partially overlapping)
PAGE 22
caseactivityresourcetypetotaltime
Process Cube Dimensions
Compatible: D P and correct types
PAGE 23
reso
urce
activity
case
typ
e
Event Base (EB) with
(raw) event data
Process Cube Structure
(PCS) defining
dimensions D
independent of content
Slightly more formal
PAGE 24
Process Cube View (Dsel , sel)
• View on a process cube structure (D,type,hier).
• Dsel D are the selected dimensions.
• sel is a function determining the structure of each
dimension in D .
PAGE 25
reso
urce
activity
case
typ
e
reso
urce
activity
Slightly more formal
PAGE 26
Process Cube View Example (1/3):
Dimension resource is not selected
PAGE 27
reso
urce
activity
case
typ
e
activity
case
typ
e
PCV = (Dsel , sel) PCS = (D,type,hier)
Process Cube View Example (2/3):
Dimension activity is coarsened
PAGE 28
reso
urce
activityca
se t
ype
e.g., per subprocess rather than per activity
reso
urce
activity
case
typ
e
PCV = (Dsel , sel) PCS = (D,type,hier)
Process Cube View Example (3/3):
Dimension case type is filtered
PAGE 29
reso
urce
activity
case
typ
e
part of the case type dimension is not selected
reso
urce
activity
case
typ
e
PCV = (Dsel , sel) PCS = (D,type,hier)
Materialized Process Cube View
PAGE 30
activity
case
typ
e
reso
urce
activity
case
typ
e
reso
urce
activity
case
typ
e
3 x 2 x 1 = 6 event logs
2 x 2 x 3 = 12 event logs
3 x 1 x 3 = 9 event logs
An event may be part of multiple cells!!
• departments may
overlap
• subprocesses share
interfaces
• contracts involving
multiple parties
• ….
PAGE 31
activity
case
typ
e
• events are unique,
• but may be part of multiple cells
Slightly more formal
PAGE 32
OLAP (Online Analytical Processing)
PAGE 33
OLAP operations can be formalized (see W.M.P. van der Aalst. Process Cubes: Slicing, Dicing, Rolling Up and Drilling Down Event Data
for Process Mining. In AP-BPM 2013, volume 159 of Lecture Notes in Business Information
Processing, pages 1-22. Springer-Verlag, Berlin, 2013.)
PAGE 34
Initial Implementation
(work of Tatiana Mamaliga)
PAGE 35 http://www.processmining.org/_media/blogs/pub2013/masterthesistatianamamaliga.pdf
Architecture
PAGE 36
Results Generated Using ProCube
PAGE 37
PAGE 38
problem 1
performance
Sparsity is a problem when unloading
cells to ProM
PAGE 39
Process cubes tend to
be sparse. Palo does
not handle this well.
Ongoing work of Shengnan Guo.
PAGE 40
problem 2
interpretation
PAGE 41
Two variants of the same "model" …
PAGE 42
comparing models is closely related to configurable process models
PAGE 43
link to decomposed
process mining
Remember
PAGE 44
A
start complete
G
B
C
D
E
F
1: ACDEG2: BCFG3: BCFG
4: ACEDG5: ACFG
6: ACEDG7: BCEDG8: BCDEG
1: AC2: BC3: BC4: AC5: AC6: AC7: BC8: BC
A
G
B
C
D
E
FC
1: CDEG2: CFG3: CFG
4: CEDG5: CFG
6: CEDG7: CEDG8: CDEG
split
horizontally
A C F G
B C F G
B C
D
G
E
A C
D
G
E
1: ACDEG4: ACEDG6: ACEDG
7: BCEDG8: BCDEG
5: ACFG
2: BCFG3: BCFG
merge
horizontally
split
vertically
merge
vertically
both process
cubes and
decomposed
process mining
split event logs
Decomposing Conformance Checking
PAGE 45
SNprocessmodel
Levent log
decomposition technique
conformancechecking
technique
M1
submodel
L1sublog
conformance check
M2
submodel
L2sublog
conformance check
Mn
submodel
Lnsublog
conformance check
conformance diagnostics
decompose model
decompose event log
e.g., maximal decomposition, passage-based decomposition, or SESE/RPST-based decomposition
e.g., A* based alignments, token-based replay, or simple replay until first deviation
yields a (valid) activity partitioning
See "divide and conquer"
framework by Eric Verbeek.
Example of a valid decomposition
PAGE 46
a
start t1
SN1
a
c
e
c2
t1
t4
t6SN3
ab
d
e
c1 c3
t1
t2
t3
t5
t6
SN2
d
g
h
e
c5
f
t5
t6
t7 t8
t9
t10
c6
c7
SN5
c
d
c4
t4
t5
SN4
g
h
end
f
t8
t9
t10
t11c8
c9
SN6
a
start
b
c
d
g
h
e
end
c1
c2
c3
c4
c5t1
f
t2
t3
t4
t5
t6
t7 t8
t9
t10
t11
c6
c7 c8
c9
Log can be split in the same way!
Example of an alignment for observed
trace a,b,c,d,e,c,d,g,f
PAGE 47
a
start
b
c
d
g
h
e
end
c1
c2
c3
c4
c5t1
f
t2
t3
t4
t5
t6
t7 t8
t9
t10
t11
c6
c7 c8
c9
a
start t1
SN1
a
c
e
c2
t1
t4
t6SN3
ab
d
e
c1 c3
t1
t2
t3
t5
t6
SN2
d
g
h
e
c5
f
t5
t6
t7 t8
t9
t10
c6
c7
SN5
c
d
c4
t4
t5
SN4
g
h
end
f
t8
t9
t10
t11c8
c9
SN6
Etc.
a,b,c,d,e,c,d,g,f
Conformance checking can be
decomposed !!!
• General result for any valid decomposition: Any
event log or trace is perfectly fitting the overall
model if and only if it is also fitting all the individual
fragments
PAGE 48
a
start
b
c
d
g
h
e
end
c1
c2
c3
c4
c5t1
f
t2
t3
t4
t5
t6
t7 t8
t9
t10
t11
c6
c7 c8
c9
a
start t1
SN1
a
c
e
c2
t1
t4
t6SN3
ab
d
e
c1 c3
t1
t2
t3
t5
t6
SN2
d
g
h
e
c5
f
t5
t6
t7 t8
t9
t10
c6
c7
SN5
c
d
c4
t4
t5
SN4
g
h
end
f
t8
t9
t10
t11c8
c9
SN6
Wil van der Aalst, Decomposing Petri nets for process mining:
A generic approach. Distributed and Parallel Databases,
Volume 31, Issue 4, pp 471-507, 2013
Decomposing Process Discovery
PAGE 49
Mprocessmodel
Levent log
decomposition technique
process discoverytechnique
M1
submodel
L1sublog
discovery
M2
submodel
L2sublog
Mn
submodel
Lnsublog
compose model
decompose event log
discovery discovery
e.g., causal graph based on frequencies is decomposed using passages or SESE/RPST
e.g., language/state-based region discovery, variants of alpha algorithm, genetic process mining
yields a (valid) activity partitioning
See "divide and conquer"
framework by Eric Verbeek.
Learn more about decomposing process
mining problems?
• W.M.P. van der Aalst. Decomposing Petri Nets for Process Mining: A Generic Approach. Distributed and Parallel
Databases, 31(4):471-507, 2013.
• W.M.P. van der Aalst. A General Divide and Conquer Approach for Process Mining. In M. Ganzha, L. Maciaszek,
and M. Paprzycki, editors, Federated Conference on Computer Science and Information Systems (FedCSIS
2013), pages 1-10. IEEE Computer Society, 2013.
• W.M.P. van der Aalst and E. Verbeek. Process Discovery and Conformance Checking Using Passages.
Fundamenta Informaticae, 2014.
• W.M.P. van der Aalst. Decomposing Process Mining Problems Using Passages. In S. Haddad and L. Pomello,
editors, Applications and Theory of Petri Nets 2012, volume 7347 of Lecture Notes in Computer Science, pages
72-91. Springer-Verlag, Berlin, 2012.
• J. Munoz-Gama, J. Carmona, and W.M.P. van der Aalst. Hierarchical Conformance Checking of Process Models
Based on Event Logs. In J.M. Colom and J. Desel, editors, Applications and Theory of Petri Nets 2013, volume
7927 of Lecture Notes in Computer Science, pages 291-310. Springer-Verlag, Berlin, 2013.
• J. Munoz-Gama, J. Carmona, and W.M.P. van der Aalst. Conformance Checking in the Large: Partitioning and
Topology. In F. Daniel, J. Wang, and B. Weber, editors, International Conference on Business Process
Management (BPM 2013), volume 8094 of Lecture Notes in Computer Science, pages 130-145. Springer-Verlag,
Berlin, 2013.
• J. Munoz-Gama, J. Carmona, and W.M.P. van der Aalst. Single-Entry Single-Exit Decomposed Conformance
Checking. Information Systems, 2014.
• E. Verbeek and W.M.P. van der Aalst. Decomposing Replay Problems: A Case Study. In D. Moldt and H. Roelke,
editors, Proceedings of the International Workshop on Petri Nets in Software Engineering (PNSE 2013), volume
989 of CEUR Workshop Proceedings, pages 213-232. CEUR-WS.org, 2013.
PAGE 50
PAGE 51
Challenges
PAGE 52
PAGE 53
formal
(not just a
picture)
fast
(should not
take years) sound
(result should
at least be free
of deadlocks,
etc.)
ability to balance
all conformance
dimensions
(fitness, precision,
generalization, and
simplicity) incl.
noise
provide
guarantees
(not just a best
effort)
1
2
3 4
5
process
discovery
challenge
Selected challenges in process mining
• Process mining in the Large (DeLiBiDa project)
− decomposition & distribution (SESE, passages, etc.)
− streaming data
− not just discovery (conformance, repair, ext., pred.!)
• Mining with/for stochastic models
− likelihood of models, alignments, delays
• Operational support
− cases!
• Data-aware and resource-aware process mining
− decision points, work distribution, etc.
− exploiting more informative event logs
− low-level logs versus higher-level processes
PAGE 54
Selected challenges in process mining
• Process of process mining
− workflow support, reporting
− parameter tuning, experimentation
− links to other types of mining (e.g., text mining)
− repositories (models, data, and analysis techniques)
− guidelines
• Log extraction
− correlation, n:m relations, etc.
− continuous to discrete ("mine your own body")
• Comparative process mining
− process cubes, concept drift, etc.
• Process improvement
− optimization (improve your past), simulation, cigar box
PAGE 55
Conclusion
• Challenges in process mining:
concept drift, comparing processes,
cases, and organizations, need for
contextual analysis, limited
scalability, performance problems.
• Unifying view: Process Cubes.
• Similar to OLAP but also many
differences.
• Related to decomposing process
mining problems.
• Initial implementations, but just the
starting point.
PAGE 56