1 presentation outline introduction objective sorting probability and loss matrices the proposed...

1

Presentation Outline

Introduction Objective Sorting Probability and Loss Matrices The Proposed Model Analysis Of Some Loss Functions Case Study Redundancy based methods Illustrative example Summary

2

Introduction

A. Accuracy and Precision

B. Types of data

C. Binary Situation

3

A. Accuracy and Precision.

Accuracy The closeness of agreement between the

result of measurement and the true (reference) value of the product being

sorted.Precision

Estimate of both the variation in repeated measurements obtained under the same

conditions (Repeatability) and the variation of repeated measurements obtained under

different conditions (Reproducibility).

4

B. Two types of data characterizing products or

processes

Variables (results of measurement, Interval or Ratio Scales)

Attributes (results of testing, Nominal or Ordinal

Scales ).

5

Four types of data

The four levels were proposed by Stanley Smith Stevens in his 1946 article.Different mathematical operations on variables are possible, depending on the level at which a variable is measured.

6

Categorical Variables

1. Nominal scale:

gender, race, religious affiliation, the number of a bus.

Possible operations:

,

7

Categorical Variables(cont.)

2. Ordinal scale :

results of internet page rank, alphabetic order, Mohs hardness scale (10 levels from talc to diamond) customer satisfaction grade , quality sort, customer importance (QFD) vendor’s priority, severity of failure or RPN (FMECA), the power of linkage (QFD)


,,,

8

Numerical data.

3. Interval scale: temperature in Celsius or Fahrenheit scale , object coordinate, electric potential.


,,,,,

9

Numerical data (cont.)

4. Ratio scale: most physical quantities, such as mass, or

energy, temperature, when it is measured in kelvins, amount of children in family, age.


/,,,,,,,

10

C. The sorting probability matrix

for the binary situationAccuracy is characterized by:

1. Type I Errors (non-defective is

reported as defective) – alfa risk

2. Type II Errors (defective is reported as non-defective) – beta risk

11

The sorting probability matrix

for the binary situation (cont.)

E. Bashkansky, S. Dror, R. Ravid, P. Grabov

ICPR-18, Salerno August, 2, Session 19 6

Factual

Actu

al

+ -+-

1-α α

1- ββ

12

Objective

Developing a new statistical procedure for evaluating the accuracy and effectiveness of measurement systems applicable to Attribute Data based on the Taguchi approach.

13

Sorting Probability Matrix

The sorting matrix is an 'm by m' matrix.

Its components Pi,j are the conditional probabilities that an item will be classified as quality level j, given its quality level is i.

A stochastic matrix:

P̂

miPm

j

ij

1,11

14

Four Interesting Sorting Matrices

(a) The most exact sorting:

(b) The uniform sorting: (designated as MDS: most disordered sorting):

(c) The “worst case” sorting. For example, if m = 4:

ijijP

mPij

1

0001

0001

1000

1000

P

15

Four Interesting Sorting Matrices (cont.)

(d) Absence of any sorting .For example, if m = 2:

01

01P

16

Indicator of the classification system inexactness

m

P

m

DG

ij

m

i

m

jij

2

)(

2

ˆ2

1 1

2

17

Loss Matrix Definition

Let Lij - be the loss incurred by classifying

sort i as sort j.

18

The Proposed Model

Expected Loss Definition:

Effectiveness Measure:

ijij

m

i

m

j

i LPpEL 1 1

)(1

MDStheforELEL

Eff

19

Analysis Of Some Loss Functions

Equal loss:

Quadratic loss:

Entropy loss:

Linear loss:

ijijL 1

2)( ijLij

ijijPL

2log

)( ijLij

20

Equal loss

m

mm

PTrace

Effm

PTraceEL

1

)ˆ(1

1)ˆ(

1

If there is no preliminary information about pi

21

Equal loss (Cont.)

If any preliminary information about pi

is available :

m

m

Pp

EffPpEL

ii

m

i

i

ii

m

i

i 1

)1(

1)1( 1

1

22

Quadratic Loss

For ordinal data, the total accuracy of the rating

could be defined as the expected value of Lij.

ij

m

i

m

j

i PpijAccuracy 1 1

2)(

23

Quadratic Loss (Cont.)

If there is no preliminary information about pi:

If there is any preliminary information about pi:

6

1)(

1 2

1 1

22

mij

mEL

m

i

m

j

mds

6

)12)(1()()1()( 2

mmXEmXEELmds

24

Entropy loss

If level i is systematically related to level j

(Pij=1), there is no “entropy loss”.

The above loss function leads us directly to Eff = Theil’s uncertainty index.

ijij

m

i

m

j

i PPpEL 2

1 1

log

25

Linear Loss

Could be useful for bias evaluation:

)(1 1

ijPpBias ij

m

i

m

j

i

This measure characterizes the dominant tendency [over grading, if Bias > 0 or under grading, if Bias < 0 ]

26

Case Study (nectarines sorting)

Type 1- 0.860, Type 2 - 0.098 , Type 3 -0.042

Classification

Type1 Type2 Type3 Total

Actual

Type1 446 91 7 544

Type2 12 307 33 352

Type3 0 11 93 104

Total 458 409 133 1000

27

The Classification Matrix

894.0106.00

094.0872.0034.0

013.0167.0820.0

10493

10411

1040

35233

352307

35212

5447

54491

544446

ijP

28

The Loss matrix

000.742.12

78.0042.5

76.198.00

ijL

29

Effectiveness Evaluations

According to proposed approach: Eff = 80%Compare the above to the measures of effectivenessfor different loss functions :

Equal loss: 74%. Taguchi loss: 87%. Disorder entropy loss: 62% Linear penalty function:79% Traditional kappa measure : 79% .

It can be seen that the effectiveness estimatestrongly depends on the loss metric model.

30

REPEATED SORTING

n

n

xxxx

xx

n

...21

31

Independency vs. Correlation

22

2

tmeasuremenproduct

product

32

Case A: Two Independent Repeated Ratings Real improvement may be obtained if, in the

case of disagreement, the final decision is made in favor of the inferior sort (one rater can see a defect, which the other has not detected).

Usually, the loss that results from overrating is greater than the loss due to underrating, thus such a redistribution of probabilities seems to be legitimate. Nevertheless, to verify improvement in sorting effectiveness, we need a new expected loss calculation.

33

Case B: Three Repeated Ratings

We add a third rater only if the first two raters do not agree. For most industrial applications this means that a product is passed through a scrupulous laboratory inspection or, for the purpose of our analysis, through an MRB board. The decision could be considered as an etalon measurement.

The probability of correct decisions increases, and the probability of wrong decisions, decreases.

34

Conclusions

To decide whether a double or triple rating procedure is expedient the total expenditures

have to be compared

35

Case C: A Hierarchical Classification

System The classification procedure is built on more

than one level. To characterize such a hierarchical

classification system G can be utilized as a “pure” indicator of the classification system’s inexactness.

Usually, the cost of classification (COC) has an inverse relationship to the amount of G. In contrast to the COC, the expected loss usually decreases, as the exactness of the judgment improves.

36

Optimization of HCS

If one decides to pass the hierarchical classification subsystem from the lower level (1) up to the K level, the total expenditure can be optimized by looking for the best level minimizing it .

37

A CASE STUDY AND ILLUSTRATIVE EXAMPLE

The proportions of the sort types were: Type 1- 0.53, Type 2 - 0.27 and Type 3 - 0.20. The same loss matrix was considered. From an R&R study, executed in relation to two independent raters’ results, the joint probability matrices were estimated.

38

Summary table

One rater

Two independent

raters

( case A )

Three raters (case B)

Sorting matrix

0.89, 0.08, 0.03

0.07, 0.85, 0.08

0.06, 0.14, 0.80

0.84, 0.10, 0.06

0.04, 0.80, 0.16

0.01, 0.06, 0.93

0.96, 0.04, 0.00

0.04, 0.96, 0.00

0.01, 0.92, 0.07

Exp. loss EL = 0.534 EL' = 0.309 EL'' = 0.132

Inexact. Ind. G = 0.0194 G' = 0.0192 G'' = 0.0013

39

Summary

The proposed procedure for evaluation of product quality classifiers takes into account some a priori knowledge about the incoming product, errors of sorting and losses due to under/over graduation.

40

Summary (Cont.1)

The appropriate choice of the loss function (matrix) provides the opportunity to fit quality sorting process model to the real situation.

The effectiveness of quality classifying can be improved by different redundancy based methods. However, the advantages of redundancy based methods are not unequivocal, as is the case in the usual measurement processes, and corresponding calculations according to the technique being used are required.

41

Summary (Cont.2)

The conclusion concerning the selection of the preferred case depends on the losses due to misclassification, as well as on the incoming quality sort distribution.

Possible applications of the proposed approach are not limited only to quality sorting. The approach can be extended to other QA processes concerned with classification

42

Thank You

1 presentation outline introduction objective sorting probability and loss matrices the proposed...

Documents