process mining for erp systems

27
Process Mining for ERP Systems Erik Nooijen, Boudewijn v. Dongen, Dirk Fahland

Upload: dirk-fahland

Post on 01-Dec-2014

1.037 views

Category:

Technology


0 download

DESCRIPTION

Presentation held at the 1st Workshop for Data- and Artifact-Centric Processes, co-located with BPM 2012, September 2012.

TRANSCRIPT

Page 1: Process Mining for ERP Systems

Process Mining for ERP Systems

Erik Nooijen,

Boudewijn v. Dongen, Dirk Fahland

Page 2: Process Mining for ERP Systems

PAGE 1

Process Discovery

event

log

process

discovery

algorithm

process

model

c1: A B C D E

c2: A C B D E

c3: A F D E

assumptions

• case = sequence of events of this case

• cases are isolated:

event A in c1 happens only in c1 (and not in c2)

• cases of the same process

• one unique case id,

• each event associated to exactly one case id

Page 3: Process Mining for ERP Systems

PAGE 2

Typical Process in an ERP System

Build to Order

Material A

Material B order

product X Alice

order

product Y

Material B

Material C

Bob

Material B

Material B

Material A

Material C

ACME Inc.

Mega Corp.

Manufacturer

order

materials

order

materials

Page 4: Process Mining for ERP Systems

PAGE 3

n-to-m relations database

poID cust. … created processed built shipped

po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15

po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18

ProductOrder

poID moID type added

po1 mo3 B 30-08 13:13

po1 mo4 A 30-08 13:14

po2 mo3 B 30-08 13:15

po2 mo4 C 30-08 13:16

OrderedMaterial

moID suppl. … completed sent received

mo3 ACME 30-08 13:15 30-08 14:15 01-09 9:05

mo4 MEGA 30-08 13:17 30-08 16:12 01-09 10:13

MaterialOrder

cust. address …

Alice … …

Bob … …

Customer id attributes time-stamp attributes

relations

id attributes relations data attributes

process

discovery

algorithm

process

model

Page 5: Process Mining for ERP Systems

MaterialOrder

- moID

- supplier

- completed

- sent

- received

OrderedMat.

- poID

- moID

- type

- added

Customer

- cust

- …

ProductOrder

- poID

- cust

- created

- processed

- built

- shipped

PAGE 4

Process Discovery for ERP Systems

process

discovery

algorithm

process

model

reality: data in a relational DB

• events stored as time-stamped

attributes in tables

• multiple primary keys

multiple notions of case

• tables are related

one event related to

multiple cases

1

0..*

1

1..* 1

1..*

Page 6: Process Mining for ERP Systems

PAGE 5

Process Discovery for ERP Systems

process

discovery

algorithm

process

model

reality: data in a relational DB

• events stored as time-stamped

attributes in tables

• multiple primary keys

multiple notions of case

• tables are related

one event related to

multiple cases

MaterialOrder

- moID

- supplier

- completed

- sent

- received

OrderedMat.

- poID

- moID

- type

- added

Customer

- cust

- …

ProductOrder

- poID

- cust

- created

- processed

- built

- shipped

1

0..*

1

1..* 1

1..*

Page 7: Process Mining for ERP Systems

PAGE 6

Outline

process

model

decompose by primary keys

log f.

PO

log f.

MO discovery

model f.

PO

discovery

model f.

MO

related by

primary foreign-key

relations

Page 8: Process Mining for ERP Systems

PAGE 7

Find Artifact Schemas

process

model

decompose by primary keys

log f.

PO

log f.

MO discovery

model f.

PO

discovery

model f.

MO

related by

primary foreign-key

relations

Page 9: Process Mining for ERP Systems

document schema vs. actual schema identify

• column types (esp. time-stamped columns)

• primary keys

• foreign keys

various (non-trivial) techniques available

key discovery is NP-complete in the size of the

table(s)

result:

PAGE 8

Step 0: discover database schema

Page 10: Process Mining for ERP Systems

= schema summarization

PAGE 9

Step 1: decompose schema into processes

ProductOrder MaterialOrder

1. sets of

corresponding

tables

2. links between

those

find:

Page 11: Process Mining for ERP Systems

Automatic Schema Summarization

= group similar tables

through clustering

define a distance between

any 2 tables

• by relations

• by information content

tables that are close to

each other

same cluster

# of clusters: user input

PAGE 10

Page 12: Process Mining for ERP Systems

Automatic Schema Summarization

1. structural distance

between tables

fanout ~ avg. # of child

records related to the

same parent record

PAGE 11

A

1

2

A B

1 X

2 Y

A B

1 X

1 Y

2 Z

2 U

A B

1 X

1 Y

fanout: 1

fanout: 2

fanout: 1 = (2+0)/2

Page 13: Process Mining for ERP Systems

Automatic Schema Summarization

1. structural distance

between tables

fanout ~ avg. # of child

records related to the

same parent record

matched fraction ~

1 / (fraction of records in

parent with matching child

record)

PAGE 12

A

1

2

A B

1 X

2 Y

A B

1 X

1 Y

2 Z

2 U

A B

1 X

1 Y

fanout: 1

fanout: 2

fanout: 1

m.fr: 1

m.fr: 1

m.fr: 2 = 1/ (1/2)

Page 14: Process Mining for ERP Systems

Grouping by Clustering

1. structural distance

2. information distance

importance of each table

= entropy (is maximal if all

records are different)

distance: 2 tables with high

entropies large distance

3. weighted distance by

structure + information

4. k-means clustering:

k clusters based on

weighted distance

PAGE 13

most important table of cluster

= table with least distance to all

key attribute of the cluster

Page 15: Process Mining for ERP Systems

PAGE 14

Artifact Schema Artifact Log

process

model

decompose by primary keys

log f.

PO

log f.

MO discovery

model f.

PO

discovery

model f.

MO

related by

primary foreign-key

relations

Page 16: Process Mining for ERP Systems

poID moID type added

po1 mo3 B 30-08 13:13

po1 mo4 A 30-08 13:14

po2 mo3 B 30-08 13:15

po2 mo4 C 30-08 13:16

poID cust. … created processed built shipped

po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15

po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18

PAGE 15

Log Extraction

log f.

PO

cluster = set of related tables

+ primary key of most important table

case id

po1:

po2:

Page 17: Process Mining for ERP Systems

(created, poID=po1, time=30-08 9:22, …)

poID moID type added

po1 mo3 B 30-08 13:13

po1 mo4 A 30-08 13:14

po2 mo3 B 30-08 13:15

po2 mo4 C 30-08 13:16

poID cust. … created processed built shipped

po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15

po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18

PAGE 16

Log Extraction

log f.

PO

cluster = set of related tables

+ primary key of most important table

time-stamped attribute event

case id

po1:

Page 18: Process Mining for ERP Systems

(created, poID=po1, time=30-08 9:22, cust.=Alice, …)

poID moID type added

po1 mo3 B 30-08 13:13

po1 mo4 A 30-08 13:14

po2 mo3 B 30-08 13:15

po2 mo4 C 30-08 13:16

poID cust. … created processed built shipped

po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15

po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18

PAGE 17

Log Extraction

log f.

PO

cluster = set of related tables

+ primary key of most important table

time-stamped attribute event

case id

related attributes event attributes

po1:

Page 19: Process Mining for ERP Systems

(created, poID=po1, time=30-08 9:22, cust.=Alice, …)

poID moID type added

po1 mo3 B 30-08 13:13

po1 mo4 A 30-08 13:14

po2 mo3 B 30-08 13:15

po2 mo4 C 30-08 13:16

poID cust. … created processed built shipped

po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15

po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18

PAGE 18

Log Extraction

log f.

PO

cluster = set of related tables

+ primary key of most important table

time-stamped attribute event

case id

related attributes event attributes

po1:

(processed, poID=po1, time=30-08 13:12, …)

Page 20: Process Mining for ERP Systems

(created, poID=po1, time=30-08 9:22, cust.=Alice, …)

poID moID type added

po1 mo3 B 30-08 13:13

po1 mo4 A 30-08 13:14

po2 mo3 B 30-08 13:15

po2 mo4 C 30-08 13:16

poID cust. … created processed built shipped

po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15

po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18

PAGE 19

Log Extraction

log f.

PO

cluster = set of related tables

+ primary key of most important table

time-stamped attribute event

case id

related attributes event attributes

po1:

(processed, poID=po1, time=30-08 13:12, …)

(added, poID=po1, time=30-08 13:13, moID=mo3, …)

refers to artifact “MaterialOrder”

Page 21: Process Mining for ERP Systems

PAGE 20

Outline

process

model

decompose by primary keys

log f.

quote

log f.

order discovery

model f.

order

discovery

model f.

quote

compose by

primary foreign-key

relations

Page 22: Process Mining for ERP Systems

PAGE 21

Resulting Model(s)

create

processed

added

built

shipped

added

completed

sent

received

1..*

1..*

Product Order Material Order

(addded, poID=po1, …, moID=mo3)

Page 23: Process Mining for ERP Systems

prototype tool

• input: relational database (via JDBC), .csv tables

• steps

− discover database schema (types, keys, relations)

− discover artifact schema

− by k-means clustering

− by user picking tables

− extract logs ProM

PAGE 22

Implementation & Evaluation

Page 24: Process Mining for ERP Systems

> 300 tables, > 40 GiB of data

schema extraction

clustering

log extraction

PAGE 23

Evaluation: SAP System of Sligro

time-stamp attributes: 15 hrs

primary keys: 4 hrs

foreign keys: 5 hrs (single col)/

6 days (double col.)

entropies: 17 hrs

table distances: 5 hrs

clustering: a few seconds

~20 different artifacts found

largest: 47 tables, 869 columns

extract 1000 traces of > 246,000 events

query database: 1 hrs

write log file: 32 hrs

Page 25: Process Mining for ERP Systems

PAGE 24

Sligro: Artikel lifecycle model

Page 26: Process Mining for ERP Systems

performance

• key discovery: NP-complete in R (# of records)

• foreign key discovery: NP-complete in R2

• problem is in the “hard part” of NP

• sampling of data, domain knowledge, semi-automatic

requires good database structure

• proper relations, proper keys

• otherwise wrong clusters are formed

• events don’t get right attributes

• semi-automatic approach

events shared by multiple cases… working on it…

PAGE 25

Open issues

Page 27: Process Mining for ERP Systems

Process Mining for ERP Systems

Erik Nooijen,

Boudewijn v. Dongen, Dirk Fahland