process mining chapter_04_getting_the_data

24
Chapter 4 Getting the Data prof.dr.ir. Wil van der Aalst www.processmining.org

Upload: muhammad-ajmal

Post on 25-Dec-2014

83 views

Category:

Documents


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Process mining chapter_04_getting_the_data

Chapter 4Getting the Data

prof.dr.ir. Wil van der Aalstwww.processmining.org

Page 2: Process mining chapter_04_getting_the_data

Overview

PAGE 1

Part I: Preliminaries

Chapter 2 Process Modeling and Analysis

Chapter 3Data Mining

Part II: From Event Logs to Process Models

Chapter 4 Getting the Data

Chapter 5 Process Discovery: An Introduction

Chapter 6 Advanced Process Discovery Techniques

Part III: Beyond Process Discovery

Chapter 7 Conformance Checking

Chapter 8 Mining Additional Perspectives

Chapter 9 Operational Support

Part IV: Putting Process Mining to Work

Chapter 10 Tool Support

Chapter 11 Analyzing “Lasagna Processes”

Chapter 12 Analyzing “Spaghetti Processes”

Part V: Reflection

Chapter 13Cartography and Navigation

Chapter 14Epilogue

Chapter 1 Introduction

Page 3: Process mining chapter_04_getting_the_data

Goal of process mining

• What really happened in the past?• Why did it happen?• What is likely to happen in the future?• When and why do organizations and people deviate?• How to control a process better?• How to redesign a process to improve its

performance?

PAGE 2

Page 4: Process mining chapter_04_getting_the_data

Getting the data

PAGE 3

software system

(process)model

eventlogs

modelsanalyzes

discovery

records events, e.g., messages,

transactions, etc.

specifies configures implements

analyzes

supports/controls

enhancement

conformance

“world”

people machines

organizationscomponents

businessprocesses

Page 5: Process mining chapter_04_getting_the_data

From heterogeneous data sources to process mining results

PAGE 4

data source

data source

data source

data source

data source

data warehouse

extract

ELT

ELT

unfiltered event logs

filtered event logs (process) models answers

discovery conformance enhancement

process mining

ELT

filter

coarse-grained scoping

fine-grained scoping

optional

Extract, Transform, and Load (ELT)

XES, MXML, or similar

Page 6: Process mining chapter_04_getting_the_data

Example log

PAGE 5

• A process consists of cases.• A case consists of events such that each event relates to precisely one case.• Events within a case are ordered.• Events can have attributes. • Examples of typical attribute names are activity, time, costs, and resource.

Page 7: Process mining chapter_04_getting_the_data

Another view

PAGE 6

process cases events

...

attributesactivity= register requesttime = 30-12-2010:11.02 resource = Pete costs = 50

activity= reject requesttime = 07-01-2011:14.24resource = Pete costs = 200

35654423

35654427

35654424

...

activity= register requesttime = 30-12-2010:11.32 resource = Mike costs = 50

activity= pay compensationtime = 08-01-2011:12.05resource = Ellen costs = 200

35654483

35654489

35654485

...

...

1

2

...

activity= register requesttime = 30-12-2010:11.32 resource = Pete costs = 50

activity= pay compensationtime = 15-01-2011:10.45resource = Ellen costs = 200

35654521

35654533

35654522 ...

3

... ... ...

Page 8: Process mining chapter_04_getting_the_data

Standard transactional life-cycle model

PAGE 7

schedule assign reassignsuspend

resumestart

autoskipmanualskip complete

abort_case

abort_activity

withdraw

successful termination

unsuccessful termination

Page 9: Process mining chapter_04_getting_the_data

Five activity instances

PAGE 8

schedule assign start complete

schedule assign reassign start suspend resume complete

a:

b:start suspend resume

c:suspend abort_activity

d:

e:

start complete

complete

schedule assign reassignsuspend

resumestart

autoskipmanualskip complete

abort_case

abort_activity

withdraw

successful termination

unsuccessful termination

Page 10: Process mining chapter_04_getting_the_data

Overlapping activity instances

PAGE 9

completea:

start start complete completea:

start start complete

5 hours

6 hours 2 hours

9 hours

schedule assign reassignsuspend

resumestart

autoskipmanualskip complete

abort_case

abort_activity

withdraw

successful termination

unsuccessful termination

Page 11: Process mining chapter_04_getting_the_data

Using attributes

PAGE 10

astart register

request

bexamine

thoroughly

cexamine casually

d

check ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

c1

c2

c3

c4

c5

A

A

A

A

A

E

M

M

Pete

Mike

Ellen

Sue

Sean

Sara

social network showing how work flows from one person to another

Activity bFrequency: 456Waiting time: 15.6 +/- 2.5 hoursService time: 1.2 +/- 0.5 hoursCosts: 412 +/- 55 euros

performance indicators per activity

Activity gFrequency: 311Waiting time: 12.4 +/- 2.1 hoursService time: 0.5 +/- 0.2 hoursCosts: 198 +/- 35 euros

Activity hFrequency: 407Waiting time: 7.4 +/- 1.8 hoursService time: 1.1 +/- 0.3 hoursCosts: 209 +/- 38 euroscontrol flow

role

Page 12: Process mining chapter_04_getting_the_data

XES (eXtensible Event Stream)

• See www.xes-standard.org.• Adopted by the IEEE Task Force on Process Mining.• Predecessor: MXML and SA-MXML.• The format is supported by tools such as ProM (as of

version 6), Nitro, XESame, and OpenXES.• ProMimport supports MXML.

PAGE 11

Page 13: Process mining chapter_04_getting_the_data

PAGE 12

Page 14: Process mining chapter_04_getting_the_data

PAGE 13

XES (compatible with MXML)

Event log consists of:• traces (process instances)− events

• Standard extensions:• concept (for naming)• lifecycle (for transactional properties)• org (for the organizational perspective)• time (for timestamps)• semantic (for ontology references)

Page 15: Process mining chapter_04_getting_the_data

PAGE 14

extensions loaded

every trace has a name

every event has a name and a transition

classifier = name + transitionstart of trace (i.e. process instance)

name of trace

name of event (activity name)

resource

transition

timestamp

Page 16: Process mining chapter_04_getting_the_data

PAGE 15PAGE 15

start of trace

name of trace

name of event (activity name)

resource

data associated to event

timestamp

end of trace (i.e. process instance)

Page 17: Process mining chapter_04_getting_the_data

Challenges when extracting event logs

• Correlation: Events in an event log are grouped per case. This simple requirement can be quite challenging as it requires event correlation, i.e., events need to be related to each other.

• Timestamps: Events need to be ordered per case. Typical problems: only dates, different clocks, delayed logging.

• Snapshots. Cases may have a lifetime extending beyond the recorded period, e.g., a case was started before the beginning of the event log.

• Scoping. How to decide which tables to incorporate?• Granularity: the events in the event log are at a different

level of granularity than the activities relevant for end users. PAGE 16

Page 18: Process mining chapter_04_getting_the_data

Flattening reality into event logs

PAGE 17

Order

Customer : CustID

Amount : Euro

Created : DateTime

Paid : DateTime

Completed : DateTime

Orderline

Product : ProdID

NofItems : PosInt

TotalWeight : Weight

Entered : DateTime

BackOrdered : DateTime

Secured : DateTime

Delivery

DelAddress : Address

Contact : PhoneNo

Attempt

When : DateTime

Successful : Bool

1

1..*

0..* 1

0..1

1..*OrderID : OrderID OrderID : OrderID

OrderLineID : OrderLineID

DelID : DelID

DelID : DelID

DelID : DelID

Page 19: Process mining chapter_04_getting_the_data

Tables

PAGE 18

Order

Customer : CustID

Amount : Euro

Created : DateTime

Paid : DateTime

Completed : DateTime

Orderline

Product : ProdID

NofItems : PosInt

TotalWeight : Weight

Entered : DateTime

BackOrdered : DateTime

Secured : DateTime

Delivery

DelAddress : Address

Contact : PhoneNo

Attempt

When : DateTime

Successful : Bool

1

1..*

0..* 1

0..1

1..*OrderID : OrderID OrderID : OrderID

OrderLineID : OrderLineID

DelID : DelID

DelID : DelID

DelID : DelID

Page 20: Process mining chapter_04_getting_the_data

Order instance

PAGE 19

Case id: 91245Activity: create order

Timestamp: 28-11-2011:08.12Customer: John

Amount: 100

Order:91245

Case id: 91245Activity: pay order

Timestamp: 02-12-2011:13.45Customer: John

Amount: 100

Case id: 91245Activity: complete order

Timestamp: 05-12-2011:11.33Customer: John

Amount: 100

OrderLine:112345

Case id: 91245Activity: enter order line

Timestamp: 28-11-2011:08.13OrderLineID: 112345Product: iPhone 4G

NofItems: 1TotalWeight: 0.250

DellID: 882345

Case id: 91245Activity: secure order line

Timestamp: 28-11-2011:08.55OrderLineID: 112345Product: iPhone 4G

NofItems: 1TotalWeight: 0.250

DellID: 882345

Delivery:882345

Attempt:882345-1

Case id: 91245Activity: delivery attempt

Timestamp: 05-12-2011:08.55DellID: 882345

Successful: falseDelAddress: 5513VJ-22aContact: 0497-2553660

Attempt:882345-2

Case id: 91245Activity: delivery attempt

Timestamp: 06-12-2011:09.12DellID: 882345

Successful: falseDelAddress: 5513VJ-22aContact: 0497-2553660

Attempt:882345-3

Case id: 91245Activity: delivery attempt

Timestamp: 07-12-2011:08.56DellID: 882345Successful: true

DelAddress: 5513VJ-22aContact: 0497-2553660

OrderLine:112346

Case id: 91245Activity: enter order line

Timestamp: 28-11-2011:08.14OrderLineID: 112346Product: iPod nano

NofItems: 2TotalWeight: 0.300

DellID: 882346

Case id: 91245Activity: create backorder

Timestamp: 28-11-2011:08.55OrderLineID: 112346Product: iPod nano

NofItems: 2TotalWeight: 0.300

DellID: 882346

Case id: 91245Activity: secure order line

Timestamp: 30-11-2011:09.06OrderLineID: 112346Product: iPod nano

NofItems: 2TotalWeight: 0.300

DellID: 882346

OrderLine:112347

Case id: 91245Activity: enter order line

Timestamp: 28-11-2011:08.15OrderLineID: 112347Product: iPod classic

NofItems: 1TotalWeight: 0.200

DellID: 882345

Case id: 91245Activity: secure order line

Timestamp: 29-11-2011:10.06OrderLineID: 112347Product: iPod classic

NofItems: 1TotalWeight: 0.200

DellID: 882345

Delivery:882346

Attempt:882346-1

Case id: 91245Activity: delivery attempt

Timestamp: 05-12-2011:08.43DellID: 882346Successful: true

DelAddress: 5513XG-45Contact: 040-2298761

Order

Customer : CustID

Amount : Euro

Created : DateTime

Paid : DateTime

Completed : DateTime

Orderline

Product : ProdID

NofItems : PosInt

TotalWeight : Weight

Entered : DateTime

BackOrdered : DateTime

Secured : DateTime

Delivery

DelAddress : Address

Contact : PhoneNo

Attempt

When : DateTime

Successful : Bool

1

1..*

0..* 1

0..1

1..*OrderID : OrderID OrderID : OrderID

OrderLineID : OrderLineID

DelID : DelID

DelID : DelID

DelID : DelID

Page 21: Process mining chapter_04_getting_the_data

PAGE 20

Case id: 91245Activity: create order

Timestamp: 28-11-2011:08.12Customer: John

Amount: 100

Order:91245

Case id: 91245Activity: pay order

Timestamp: 02-12-2011:13.45Customer: John

Amount: 100

Case id: 91245Activity: complete order

Timestamp: 05-12-2011:11.33Customer: John

Amount: 100

OrderLine:112345

Case id: 91245Activity: enter order line

Timestamp: 28-11-2011:08.13OrderLineID: 112345Product: iPhone 4G

NofItems: 1TotalWeight: 0.250

DellID: 882345

Case id: 91245Activity: secure order line

Timestamp: 28-11-2011:08.55OrderLineID: 112345Product: iPhone 4G

NofItems: 1TotalWeight: 0.250

DellID: 882345

Delivery:882345

Attempt:882345-1

Case id: 91245Activity: delivery attempt

Timestamp: 05-12-2011:08.55DellID: 882345

Successful: falseDelAddress: 5513VJ-22aContact: 0497-2553660

Attempt:882345-2

Case id: 91245Activity: delivery attempt

Timestamp: 06-12-2011:09.12DellID: 882345

Successful: falseDelAddress: 5513VJ-22aContact: 0497-2553660

Attempt:882345-3

Case id: 91245Activity: delivery attempt

Timestamp: 07-12-2011:08.56DellID: 882345Successful: true

DelAddress: 5513VJ-22aContact: 0497-2553660

OrderLine:112346

Case id: 91245Activity: enter order line

Timestamp: 28-11-2011:08.14OrderLineID: 112346Product: iPod nano

NofItems: 2TotalWeight: 0.300

DellID: 882346

Case id: 91245Activity: create backorder

Timestamp: 28-11-2011:08.55OrderLineID: 112346Product: iPod nano

NofItems: 2TotalWeight: 0.300

DellID: 882346

Case id: 91245Activity: secure order line

Timestamp: 30-11-2011:09.06OrderLineID: 112346Product: iPod nano

NofItems: 2TotalWeight: 0.300

DellID: 882346

OrderLine:112347

Case id: 91245Activity: enter order line

Timestamp: 28-11-2011:08.15OrderLineID: 112347Product: iPod classic

NofItems: 1TotalWeight: 0.200

DellID: 882345

Case id: 91245Activity: secure order line

Timestamp: 29-11-2011:10.06OrderLineID: 112347Product: iPod classic

NofItems: 1TotalWeight: 0.200

DellID: 882345

Delivery:882346

Attempt:882346-1

Case id: 91245Activity: delivery attempt

Timestamp: 05-12-2011:08.43DellID: 882346Successful: true

DelAddress: 5513XG-45Contact: 040-2298761

Page 22: Process mining chapter_04_getting_the_data

Orderline instance

PAGE 21

Case id: 112345Activity: create order

Timestamp: 28-11-2011:08.12Customer: John

Amount: 100

Order:91245

Case id: 112345Activity: pay order

Timestamp: 02-12-2011:13.45Customer: John

Amount: 100

Case id: 112345Activity: complete order

Timestamp: 05-12-2011:11.33Customer: John

Amount: 100

OrderLine:112345

Case id: 112345Activity: enter order line

Timestamp: 28-11-2011:08.13OrderLineID: 112345Product: iPhone 4G

NofItems: 1TotalWeight: 0.250

DellID: 882345

Case id: 112345Activity: secure order line

Timestamp: 28-11-2011:08.55OrderLineID: 112345Product: iPhone 4G

NofItems: 1TotalWeight: 0.250

DellID: 882345

Delivery:882345

Attempt:882345-1

Case id: 112345Activity: delivery attempt

Timestamp: 05-12-2011:08.55DellID: 882345

Successful: falseDelAddress: 5513VJ-22aContact: 0497-2553660

Attempt:882345-2

Case id: 112345Activity: delivery attempt

Timestamp: 06-12-2011:09.12DellID: 882345

Successful: falseDelAddress: 5513VJ-22aContact: 0497-2553660

Attempt:882345-3

Case id: 112345Activity: delivery attempt

Timestamp: 07-12-2011:08.56DellID: 882345Successful: true

DelAddress: 5513VJ-22aContact: 0497-2553660

Order

Customer : CustID

Amount : Euro

Created : DateTime

Paid : DateTime

Completed : DateTime

Orderline

Product : ProdID

NofItems : PosInt

TotalWeight : Weight

Entered : DateTime

BackOrdered : DateTime

Secured : DateTime

Delivery

DelAddress : Address

Contact : PhoneNo

Attempt

When : DateTime

Successful : Bool

1

1..*

0..* 1

0..1

1..*OrderID : OrderID OrderID : OrderID

OrderLineID : OrderLineID

DelID : DelID

DelID : DelID

DelID : DelID

Page 23: Process mining chapter_04_getting_the_data

Other examples

• The life cycles of reviewers, authors, papers, reviews, PC chairs, etc.

• The life cycles of job applications and vacancies.• X-ray machine logs: machine, machine day, patient,

treatment, routine, etc.?

• Therefore, the selection and scoping of instances is needed.

• Like making deciding on the elements to be put on map; there may be many maps covering partially overlapping areas.

PAGE 22

Page 24: Process mining chapter_04_getting_the_data

Extracting event logs

• Not just a syntactical issue.• Different views are possible.• Important:

− Selecting the right instance notion.− Ordering of events.− Selection of events.

• Proclets: the true fabric of real-life processes.

PAGE 23