the application study of erp data quality assessment and improvement methodology

4
The Application Study of ERP Data Quality Assessment and Improvement Methodology Zhao Xiaosong He Zhen Zhang Meng Yu Dainuan Zhang Ting Department of Industry Engineering, Tianjin University, Tianjin 300072 China Abstract-The problem of ERP data quality is studied and the model of ERP data quality assessment and improvement is established. The problem of ERP data quality can be detected and improved effectively by measuring ERP data quality. Finally, ERP data quality assessment and improvement methodology is verified by case study. Keywords-ERP, data quality, assessment and improvement, fuzzy assessment I. INTRODUCTION With the upgrading attention theory researchers and practitioners pay to ERP(Enterprise Resource Planning) data quality, many scholars are developing and improving the ERP management concept and structure continuously combining actual conditions of various industries and regions. How to guarantee the data quality of ERP as the carrier of massive data has become an important issue for many researchers and enterprises. Hongjiang Xu, Jeretta Horn Nord et al. [1] studied the importance of data quality when ERP is implemented, Yu Jinlong [2] established the method based on the principle of ERP data management, Chen Yuan et al. [3] studied the data quality of information system and analyzed the reason for data quality problem, Liu Xia et al. [4] studied the data in PDM system and proposed the planning method of data acquisition quality and the plan for data quality assurance. Base on cumulated research efforts, there are some problems to be solved: the lack of systemic research for ERP data quality, system of evaluating methods of data quality according to ERP system and specific instruction to improve the quality of whole ERP data. In this paper, method of resolving data quality problem especially for ERP system is proposed combining IP-MAP and a series of integrate ERP data quality management system from quality assessment, quality improvement to quality assurance is established. II. MODEL BUILDING For higher ERP data quality level, we should evaluate data quality level both before online and after online. Only in this way, can we assure the data quality consistency between data source and transport process. A. Assessment Model of Data Quality before ERP Online Data before online are prepared for implementing ERP system and their attributes are determined by the need of each ERP module. Therefore, static data and initial data can be described by the following major elements: accuracy, completeness, uniqueness, consistency. Following assessment model is built according to the attributes of data quality before ERP online: 1. Suppose tetrad , , , H BDQG =< > [5] , in which (1) B is the set which includes every functional module the system would implement, and is denoted by 1 2 { , , . } n B B B B = " in which i B ( 1, 2, ,) i n = " is the corresponding module. (2) D is the set which includes datasets of every department before online, and is denoted by 1 2 { , , , } m D D D D = " in which ( 1, 2, , ) k D k m = " is the corresponding dataset, and t s D D =∅ , 1 2 m D D D D = " . All departments collect the data before online with assistance of some tool software such as MS ExcelAccess and SQL Server. We can confirm that there aren’t any intersection sets between dataset D and each sub-dataset and find out the sub-datasets supporting the according module from the sets above. (3) Q is the set which includes quality elements of ERP data, and is denoted by 1 2 { , , , } j Q QQ Q = " , 4 j = . 1 Q is accuracy, 2 Q is completeness, 3 Q is uniqueness and 4 Q is consistency. (4) G is the set which includes rules made for the quality situations of every quality element, and is denoted by , ,, { } ikjg G G = . ,,, ikjg G represents that rule g G which is made to quality element j Q by dataset k D of supporting module i B ,and is denoted by ,,, ( , , ) ikjg g i k j G G BD Q = , 1, 2, g = " . Suppose that an enterprise is collecting and arranging data and preparing for implementing ERP. 1 D is a sub-dataset prepared for module 1 B . Two rules are made for consistency 4 Q : the length of x value is longer than 3 and shorter than or equal to 20; z value must among range of value Z. Therefore, following rule set could be obtained: 1 1 1 4 2 1 1 4 { ( , , ), ( , , )} G G B DQ G B DQ = . , ,, ( ) ikjg WG G is the weight assigned to rule ,,, ikjg G . Its importance is different because of the module and dataset for each rule is different. For example, rule made for field “name” is more important than weight assigned to field “sex”, because “name” can show attributes of a record better than “sex”. The weights are given by experts or project members. 978-1-4244-1718-6/08/$25.00 ©2008 IEEE Pg 1036

Upload: fahad-tariq

Post on 29-Nov-2014

43 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: The Application Study of ERP Data Quality Assessment and Improvement Methodology

The Application Study of ERP Data Quality Assessment and Improvement Methodology

Zhao Xiaosong He Zhen Zhang Meng Yu Dainuan Zhang Ting

Department of Industry Engineering, Tianjin University, Tianjin 300072 China

Abstract-The problem of ERP data quality is studied and the

model of ERP data quality assessment and improvement is established. The problem of ERP data quality can be detected and improved effectively by measuring ERP data quality. Finally, ERP data quality assessment and improvement methodology is verified by case study.

Keywords-ERP, data quality, assessment and improvement, fuzzy assessment

I. INTRODUCTION

With the upgrading attention theory researchers and practitioners pay to ERP(Enterprise Resource Planning) data quality, many scholars are developing and improving the ERP management concept and structure continuously combining actual conditions of various industries and regions. How to guarantee the data quality of ERP as the carrier of massive data has become an important issue for many researchers and enterprises.

Hongjiang Xu, Jeretta Horn Nord et al.[1] studied the importance of data quality when ERP is implemented, Yu Jinlong[2] established the method based on the principle of ERP data management, Chen Yuan et al.[3] studied the data quality of information system and analyzed the reason for data quality problem, Liu Xia et al.[4] studied the data in PDM system and proposed the planning method of data acquisition quality and the plan for data quality assurance.

Base on cumulated research efforts, there are some problems to be solved: the lack of systemic research for ERP data quality, system of evaluating methods of data quality according to ERP system and specific instruction to improve the quality of whole ERP data.

In this paper, method of resolving data quality problem especially for ERP system is proposed combining IP-MAP and a series of integrate ERP data quality management system from quality assessment, quality improvement to quality assurance is established.

II. MODEL BUILDING

For higher ERP data quality level, we should evaluate data quality level both before online and after online. Only in this way, can we assure the data quality consistency between data source and transport process. A. Assessment Model of Data Quality before ERP Online

Data before online are prepared for implementing ERP system and their attributes are determined by the need of each ERP module. Therefore, static data and initial data can be described by the following major elements: accuracy,

completeness, uniqueness, consistency. Following assessment model is built according to the

attributes of data quality before ERP online: 1. Suppose tetrad , , ,H B D Q G=< > [5], in which (1) B is the set which includes every functional module the

system would implement, and is denoted by 1 2{ , , . }nB B B B= in which iB ( 1, 2, , )i n= is the

corresponding module. (2) D is the set which includes datasets of every department

before online, and is denoted by 1 2{ , , , }mD D D D= in which ( 1,2, , )kD k m= is the corresponding dataset, and t sD D∩ = ∅ , 1 2 mD D D D∪ ∪ ∪ = .

All departments collect the data before online with assistance of some tool software such as MS Excel,Access and SQL Server. We can confirm that there aren’t any intersection sets between dataset D and each sub-dataset and find out the sub-datasets supporting the according module from the sets above.

(3) Q is the set which includes quality elements of ERP data, and is denoted by 1 2{ , , , }jQ Q Q Q= , 4j = . 1Q is accuracy, 2Q is completeness, 3Q is uniqueness and 4Q is consistency.

(4) G is the set which includes rules made for the quality situations of every quality element, and is denoted by , , ,{ }i k j gG G= . , , ,i k j gG represents that rule gG which is made to quality element jQ by dataset kD of supporting module iB ,and is denoted by , , , ( , , )i k j g g i k jG G B D Q= ,

1, 2,g = . Suppose that an enterprise is collecting and arranging data

and preparing for implementing ERP. 1D is a sub-dataset prepared for module 1B . Two rules are made for consistency 4Q : the length of x value is longer than 3 and shorter than or equal to 20; z value must among range of value Z. Therefore, following rule set could be obtained:

1 1 1 4 2 1 1 4{ ( , , ), ( , , )}G G B D Q G B D Q= .

, , ,( )i k j gWG G is the weight assigned to rule , , ,i k j gG . Its importance is different because of the module and dataset for each rule is different. For example, rule made for field “name” is more important than weight assigned to field “sex”, because “name” can show attributes of a record better than “sex”. The weights are given by experts or project members.

978-1-4244-1718-6/08/$25.00 ©2008 IEEE Pg 1036

Page 2: The Application Study of ERP Data Quality Assessment and Improvement Methodology

, , ,( )i k j gM G is assessment result of rule , , ,i k j gG . It can be obtained by programming, calculating or experts’ judging and it is the precondition for calculating current quality level.

(5) ( , , )i k jRQB B D Q is the DQ quantification level that module iB must achieve on quality element jQ in dataset

kD . ( , )k jRQ D Q is the quantification level that dataset kD

must achieve on quality element jQ and assessment is in terms of it.

(6) ( , , )i k jWQ B D Q is the weight assigned to ( , , )i k jRQB B D Q (it is also the weight assigned to ( , , )i k jCQB B D Q ), that is the influence which quality

element jQ in dataset kD acts on module iB can be given by experts or project members.

(7) ( , , )i k jCQB B D Q is the DQ quantification level that module iB achieves on quality element jQ in dataset kD nowadays.

( , )k jCQ D Q is the quantification level that dataset kD achieves on quality element jQ nowadays.

2. Calculate ( , )k jRQ D Q and ( , )k jCQ D Q ( , )k jRQ D Q can be calculated when B, D, Q is confirmed.

Usually, ( , , )i k jRQB B D Q can be obtained by experts doing demand analysis about data users. Some quality aspect of one dataset may be corresponding to many function modules, so

( , )k jRQ D Q can be calculated only when rule levels of a group of kD and jQ of all modules are known.

( , , ) ( , , )( , )

( , , )

i k j i k jB

k ji k j

B

WQ B D Q RQB B D QRQ D Q

WQ B D Q

⋅=∑

∑ (1)

Then a method is in need to calculating current quality ( , )k jCQ D Q to judge that if current DQ is qualified.

Every rule in rule set G could be realized by programming methods (such as SQL statement, VB and VC program segment). There are rule set 1 1 1 4{ ( , , ),G G B D Q=

2 1 1 4( , , )}G B D Q and calculation formulae of the two rules are as follows:

1,1,4,1( ) ( ) 3 ( ) 20

( ) k

k

select count x from D where len x or len xM G

D≤ >

=

1,1,4,2( ) ( )

( ) k

k

select count z from D where z not in ZM G

D=

Assign weight 1,1,4,1( )WG G and 1,1,4,2( )WG G to the two rules respectively, and

, , , , , ,

, , ,

( ) ( )( , , )

( )

i k j g i k j gG

i k ji k j g

G

WG G M GCQB B D Q

WG G

⋅=∑

∑ (2)

( , )k jCQ D Q can be calculated with ( , , )i k jWQ B D Q :

( , , ) ( , , )( , )

( , , )

i k j i k jB

k ji k j

B

WQ B D Q CQB B D QCQ D Q

WQ B D Q

⋅=∑

∑ (3)

Datasets and quality aspects required to be optimized could be distinguished by comparing ( , )k jRQ D Q and

( , )k jCQ D Q . Enterprise should evaluate data quality of the initial data by

this assessment model of data quality before ERP implementation in order to make sure the initial data quality level Elementary data with high quality before online are the foundation of the data quality after ERP implementation. B. Assessment model of data quality after ERP online

There are more influencing factors including qualitative and quantitative factors when ERP system is running. The factors are possessed of fuzziness, so this paper selects using fuzzy assessment method to evaluate data quality after ERP online.

1. Build order increased hierarchy structural system Build order increased hierarchy structural system by using

analytic hierarchy process according to major factors that influence ERP data quality. As shown in Fig. 1.

C : ERP data quality grade of each function module

1C : accuracy 2C : timeliness 3C : consistency 4C : security 5C : completeness

11C : data accurate degree 12C : data qualified degree 13C : data reliable degree

21C : system module complex degree 22C : system operation reaction rate

31C : coding rule conformable degree 32C : code update degree

33C : operation authority reasonable degree 41C : system maintenance degree

42C : system inherent vulnerability 43C : system design and detect capability

51C : data missing degree 52C : data cover degree

Fig. 1 ERP data quality level of each function module

2. Build comparison judgment matrix Use scale method to compare. Judgment matrix represents the

importance that the relevant elements in this level according to a certain element in upper level. Suppose element kQ in level Q has relation with 1 2, , , nA A A in lower level and build

judgment matrix as follows:

Qk A1 A2 …… An

A1 a11 a12 …… a1n A2 a21 a22 …… a2n

An an1 an2 …… ann

C

c

C

c1C

c2C

c5C

c3 C

c4

C

c11C

c12C

c13C

c21C

c33 C

c31C

c32 C

c22C

c41 C

c42 C

c43C

c51C

c52

Pg 1037

Page 3: The Application Study of ERP Data Quality Assessment and Improvement Methodology

ija is numerical value of the importance that iA according to

jA . Judgment matrix has some attributes as follows: 0ija > ;

1ij

ji

aa

= , in which i j≠ ; 1iia = .

3. Calculate eigenvalue and eigenvector maxAW Wλ= (4)

maxλ is the maximum eigenvalue of A; W is normalization eigenvector corresponding to maxλ .

For checking the consistency of the judgment matrix, it is necessary to calculate consistency index C.I.:

max. .1

nC I

nλ −

=−

(5)

4. Build fuzzy synthetical assessment model Divide index set into N sub-factor sets denoted as

1 2, , , pC C C , which satisfies

1

p

ii

C C=

=∪ , i jC C i jφ∩ = ≠,

1 2( , , , )ni i i inC C C C= , 1, 2, ,i p= , in is the number of

composition elements. Make a synthetical assessment of each factor set iC . Suppose assessment set 1 2( , , , )mV v v v= and

corresponding assessment scale set 1 2{ , , , }mE e e e= . Assign weight 1 2[ ]

ii i i inW w w w= to iC and require that ijw satisfies

11

in

ijj

w=

=∑ (6)

Make a single fuzzy assessment of iC and determine fuzzy relation matrix iR from iC to C:

( ) , 1,2, ; 1,2, , ; 1,2, ,i ijk n m iR r j n k m i p×= = = = , in which

ijkr denotes ijC is the subordination of kv . Then calculate the fuzzy synthesis i iW R× ,so the first level

synthetical appraisal is [ ]1 2 , 1,2,i i i i i imb W R b b b i p= × = = (7)

After normalized processing, the appraisal object respective level is determined according to the biggest subordination, the score is T

i iS Eb= (8) Each iC is an element, ib is the single factor judgment, so the

judgement matrix is 11 12 11

2 21 22 2

1 2

m

m

p p p pm

b b bbb b b b

R

b b b b

= =

R is the single factor judgment matrix of 1 2( , , , )pC C C , each

iC is the constitute index of C, 1 2[ ]pW w w w= is the weight distribution according to their importance.

Then the second level synthetical judgment is B W R= × , [ ]1 2 mB B B B= (9)

jB shows that the ERP data quality is evaluated the subordination of jv , according to the level of the biggest subordination, its scores is TS EB= (10)

�. NUMERICAL EXAMPLE A. Background and problem description

An enterprise has decided to implement ERP system among the whole company. While preparing and arranging the data, it is found that data management of the enterprise is handiwork now. To run system normally and effectively after online and keep the ERP data quality both before online and after online, we will do data collection, arrangement, assessment, analysis and improvement[6]. B. Assessment

1. Assessment of ERP data quality before online Based on IP-MAP, the ERP system modules which the

company will implement include: marketing function module 1B , finance function module 2B , production function module 3B and purchase function module 4B .

Data group will arrange and classify data before online. The data collected from various operating departments include: marketing department’s data 1D , finance department’s data 2D , production department’s data 3D . Datasets 1D , 2D and 3D supports 1B , 2B , 3B and 4B respectively.

According to the definition of assessment model of data quality stated above, we define that quality elements consist of accuracy 1Q , completeness 2Q , uniqueness 3Q and consistency

4Q . Take 2Q as the example,then 1D quality level on 2Q is

2 1 2

1 22

( , ) ( , , )( , ) 0.85

( , )

i iB

iB

WQ B Q CQB B D QCQ D Q

WQ B Q

⋅= =∑

so 2 2( , ) 0.92CQ D Q = , 3 2( , ) 0.89CQ D Q =

Then the quality level differences of 1D , 2D and 3D on 2Q are

1 2 1 2( , ) ( , ) 0.12RQ D Q CQ D Q− =

2 2 2 2( , ) ( , ) 0.06RQ D Q CQ D Q− =

3 2 3 2( , ) ( , ) 0.03RQ D Q CQ D Q− = Through analysis, more than 0.10 data quality difference is

made to improve. So 1D should be improved on 2Q .

2. Assessment model of data quality after ERP online Take marketing module data quality level as example to be

evaluated: First, based on the model of the paper, calculate the weigh of

iC .Calculation results are as TABLE I.

Pg 1038

Page 4: The Application Study of ERP Data Quality Assessment and Improvement Methodology

TABLE I

JUDGMENT MATRIX C1- C5

C C1 C2 C3 C4 C5 W C.R. C1 1 6 5 7 4 0.518 C2 1/6 1 1/2 1 1/3 0.073 C3 1/5 2 1 2 1/2 0.138 C4 1/7 1 1/2 1 1/3 0.071 C5 1/4 3 2 3 1 0.201

0.018<0.1

Then the application level analytic method computation quality element next various factors weight, through confirms each judgement matrix through the uniform examination.

Then calculating the next level factors weight, all judgement matrix pass the uniform examination.

Second, first level of synthesis judgment: Here takes m=4, assessment collection V= (superior, good,

general,poor). Through the random sampling, the data accuracy 11C and

non-missing rate 51C are 0.717 and 0.756. After experts judgment, the other factors judgment results are followed:

TABLE II

FUZZY ASSESSMENT RESULT OF MARKETING MODULE DATA QUALITY

subordinates asswssmwnt contents

ci Cij

superior good general poor

Single Level weigh

Level compo- sitor

C11 0 0 0.78 0.22 0.429 0.236 C12 0 0.43 0.5 0.07 0.429 0.236

C1

1 0.518w = C13 0.2 0.71 0.09 0 0.142 0.079 C21 0.3 0.7 0 0 0.75 0.052 C2

2 0.073w = C22 0.1 0.72 0.18 0 0.25 0.017 C31 0.2 0.6 0.2 0 0.582 0.069 C32 0.1 0.5 0.3 0.1 0.309 0.037

C3

3 0.138w =C33 0 0.6 0.3 0.1 0.109 0.013 C41 0.25 0.53 0.22 0 0.6 0.041 C42 0 0.48 0.52 0 0.2 0.014

C4

4 0.071w =C43 0 0.5 0.39 0.11 0.2 0.014 C51 0 0.06 0.94 0 0.667 0.129 C5

5 0.201w = C52 0 0.73 0.27 0 0.333 0.064

1 1 1

0 0 0.78 0.22[0.429 0.429 0.142] 0 0.43 0.5 0.07

0.2 0.71 0.09 0[0.0284 0.2865 0.5606 0.125]

b W R = × =

=

By most greatly subordinates principle, the module accurate level is " general ".

so 2 [0.25 0.705 0.045 0]b =

3 [0.1472 0.5691 0.2519 0.0418]b = 4 [0.1505 0.5141 0.3135 0.0219]b = 5 [0 0.283 0.717 0]b =

Third, two levels of fuzzy syntheses judgments:

[ ]

[0.518 0.073 0.138 0.0710.201]0.0289 0.2865 0.5606 0.124

0.25 0.705 0.045 00.1472 0.5691 0.2519 0.04180.1505 0.5141 0.3135 0.0219

0 0.283 0.717 0

0.064 0.372 0.493 0.071

B W R= × = ⋅

=

It can be seen from the computing result that: data quality of marketing module is evaluated as “general”, accuracy 1C and completeness 5C are also evaluated as “general”and other three quality elements are evaluated as “good”. Therefore, data quality of marketing module should be improved especially in aspects of accuracy and completeness.

According to the module analysis, the enterprise made modified plan for ERP data quality improvement and established according assurance system.We keep on supervising and controlling data quality after improvement plan implemented. The error rate is 0 in terms of 200 sales plans and 200 invoices that are tested check.

ACKNOWLEDGMENT The paper is supported by Tianjin Natural Science

Foundation (No. 06YFGZGGX06100). Thank all the members of our group. Thank all the authors of the references.

REFERENCES [1] Hongjiang Xu, Jeretta Horn Nord, “Noel Brown et al. Data quality

issues in implementing an ERP,” Industrial Management & Data Systems, 2002, 2(1):47 -58

[2] Yu Jinlong, “Data Managemant Reaserch of ERP,” Sichuan: Southwest Petroleum Universitry, 2005

[3] Chen Yuan, Luo Lin, Shen Xiangxing, “A Study of Data Quality in Information System,” The Journal of the Library Science in China, 2004, 30(149): 48-50

[4] Liu Xia,Liu Feng,Zhang Ping, “The research On the data quality of PDM,” Machinery Design & Manufacture, 2006,6:166-167

[5] Yang Qingyun, Zhao Peiying, Yang Dongqing, “Research on Data Quality Assessment Methodology,” Computer Engineering and Applications, 2004, 40(9):3-4

[6] Zhang Ting, “The Application Study of ERP Data Quality Assessment and Improvement Methodology,” Tianjin: Tianjin University, 2007

Pg 1039