towards performance evaluation of symbol recognition & spotting systems in a localization...

26
Towards Performance Evaluation of Symbol Recognition & Spotting Systems in a Localization Context Mathieu Delalandre CVC, Barcelona, Spain EuroMed Meeting LORIA, Nancy city, France Monday 18th of May 2009

Upload: vahe

Post on 16-Mar-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Towards Performance Evaluation of Symbol Recognition & Spotting Systems in a Localization Context. Mathieu Delalandre CVC, Barcelona, Spain EuroMed Meeting LORIA, Nancy city, France Monday 18th of May 2009. Introduction. tub. door. skin. door. sofa. r1 r2 r3. symbol. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Towards Performance Evaluation of Symbol Recognition & Spotting Systems

in a Localization Context

Mathieu DelalandreCVC, Barcelona, Spain

EuroMed MeetingLORIA, Nancy city, FranceMonday 18th of May 2009

Page 2: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Introduction

symbolbackgroundtext

Recognition

Spotting

r1 r2 r3

sofa

skin

tubdoordo

ordocument database

learning

database

Query By

Example

(QBE)

rank

labels

Symbol spotting: “a way to efficiently localize possible symbols and limit the computational complexity, without using full recognition methods” [Tombre2003] [Dosch2004] [Tabbone2004] [Zuwala2006] [Locteau2007] [Qureshi2007] [Rusinol2007]

Symbol recognition: ““a particular application of the general problem of pattern recognition, in which an unknown input pattern (i.e. input image) is classified as belonging to one of the relevant classes (i.e. predefined symbols) in the application domain” [Chhabra1998][Cordella1999] [Llados2002] [Tombre2005]

Electrical diagram

Mechanical drawing

Utility map

scanned

CAD file Web image

Page 3: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Introduction

Characterisation

GroundtruthGroundtruthGroundtruth

Groundtruthing

ResultsResultsResults

Performance evaluation

System

Performance evaluation: Information Retrieval [Salton1992], Computer Vision [Thacker2005], CBIR [Muller2001], DIA [Haralick2000]

Case of symbol recognition & spotting: [Ezra2008][Delalandre2008]

Training

data

dATADataData

Spotting/RecognitionSystem

Groundtruth Mapping

Region Of

Interest

Characterization

sofa

skin

tub

door

door

Labels

r1 r2 r3

RanksQBE

truth results

Learning

Performance evaluation

Page 4: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Plan1. Groundtruth and test documents2. Performance characterization3. Conclusions and perspectives

Page 5: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Groundtruth and test documents Overview of approaches

Real approachDocument

Document

Document

Groundtruth

Groundtruth

Groundtruth

Groundtruthing

- - weak ++ good real

approach

synthetic

approach

GTground-truthing

validation

groundtruth

drawings and alerts

groundtrutheddrawings

validation and

alertsevaluationtest images

recognition

results

Dosch and al 2006

10

43

2

5

4

01

5

32connectedparallel and

overlapped

Yan and al 2004

1. Overview of approaches2. Existing datasets

Rusinol and al 2009

Page 6: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Groundtruth and test documents Overview of approaches

Synthetic approach

Document

Document

Document

Groundtruth

Groundtruth

GroundtruthGroundtruthingSetting

- - weak ++ good

real approach

synthetic

approach

Aksoy 2000

binary noise

vectorial noise

Valveny and al 2007Zhai and al 2003

1. Overview of approaches2. Existing datasets

Page 7: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

symbolbackground

Graphical documents are composed of two layers

To use a same background layer with different symbol layers

Groundtruth and test documents Overview of approaches

- - weak ++ good

real approach

synthetic

approach

Delalandre2008

1. Overview of approaches2. Existing datasets

Page 8: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Delalandre2008

Groundtruth and test documents Overview of approaches

c2

c1

M1M2M3M4

C1C2

C3C4

L1

θ1

p1

L2θ2

p2

p

1,0L 2,0

L

bounding box and control point

alignment

symbol model

loaded symbol

1. Overview of approaches2. Existing datasets

- - weak ++ good

real approach

synthetic

approach

Page 9: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Delalandre2008

Groundtruth and test documents Overview of approaches

GT GTGT GT

PositioningConstraints

SymbolModels

Document Generation

SymbolPositioning

Symbol Models

BuildingEngine

(2) run

(3) displa

y

(1) edit

Background Image

1. Overview of approaches2. Existing datasets

- - weak ++ good

real approach

synthetic

approach

Page 10: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Groundtruth and test documents Existing datasets

datasets

images

symbols

degradations

models

GREC’03 #30 3000 3000 10 5-50GREC’05 #16 1000 1000 6 25-

150GREC’07 #6 2100 2100 6 50-

150

ICPR’00 #9 450 11250 9 25

bags #16 1600 15046 none 25-150

floorplans #10 1000 26830 none 16diagrams #10 1000 14100 none 21

queries #6 6000 6000 none 16-21

Rusinol’09 #1 42 344 none 38

GREC

1. Overview of approaches2. Existing datasets

ICPRSESYD

Others

Page 11: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Groundtruth and test documents Existing datasets

GREC

1. Overview of approaches2. Existing datasets

ICPRSESYD

datasets

images

symbols

degradations

models

GREC’03 #30 3000 3000 10 5-50GREC’05 #16 1000 1000 6 25-

150GREC’07 #6 2100 2100 6 50-

150

ICPR’00 #9 450 11250 9 25

bags #16 1600 15046 none 25-150

floorplans #10 1000 26830 none 16diagrams #10 1000 14100 none 21

queries #6 6000 6000 none 16-21

Rusinol’09 #1 42 344 none 38

Others

Page 12: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Groundtruth and test documents Existing datasets

GREC

1. Overview of approaches2. Existing datasets

ICPRSESYD

datasets

images

symbols

degradations

models

GREC’03 #30 3000 3000 10 5-50GREC’05 #16 1000 1000 6 25-

150GREC’07 #6 2100 2100 6 50-

150

ICPR’00 #9 450 11250 9 25

bags #16 1600 15046 none 25-150

floorplans #10 1000 26830 none 16diagrams #10 1000 14100 none 21

queries #6 6000 6000 none 16-21

Rusinol’09 #1 42 344 none 38

Others

Page 13: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Groundtruth and test documents Existing datasets

1. Overview of approaches2. Existing datasets

GRECICPR

SESYD

datasets

images

symbols

degradations

models

GREC’03 #30 3000 3000 10 5-50GREC’05 #16 1000 1000 6 25-

150GREC’07 #6 2100 2100 6 50-

150

ICPR’00 #9 450 11250 9 25

bags #16 1600 15046 none 25-150

floorplans #10 1000 26830 none 16diagrams #10 1000 14100 none 21

queries #6 6000 6000 none 16-21

Rusinol’09 #1 42 344 none 38

Others

Page 14: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Groundtruth and test documents Existing datasets

1. Overview of approaches2. Existing datasets

GRECICPR

SESYD

datasets

images

symbols

degradations

models

GREC’03 #30 3000 3000 10 5-50GREC’05 #16 1000 1000 6 25-

150GREC’07 #6 2100 2100 6 50-

150

ICPR’00 #9 450 11250 9 25

bags #16 1600 15046 none 25-150

floorplans #10 1000 26830 none 16diagrams #10 1000 14100 none 21

queries #6 6000 6000 none 16-21

Rusinol’09 #1 42 344 none 38

Others

Page 15: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Groundtruth and test documents Existing datasets

1. Overview of approaches2. Existing datasets

GRECICPR

SESYD

datasets

images

symbols

degradations

models

GREC’03 #30 3000 3000 10 5-50GREC’05 #16 1000 1000 6 25-

150GREC’07 #6 2100 2100 6 50-

150

ICPR’00 #9 450 11250 9 25

bags #16 1600 15046 none 25-150

floorplans #10 1000 26830 none 16diagrams #10 1000 14100 none 21

queries #6 6000 6000 none 16-21

Rusinol’09 #1 42 344 none 38

Others

Page 16: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Groundtruth and test documents Existing datasets

1. Overview of approaches2. Existing datasets

GRECICPR

SESYD

datasets

images

symbols

degradations

models

GREC’03 #30 3000 3000 10 5-50GREC’05 #16 1000 1000 6 25-

150GREC’07 #6 2100 2100 6 50-

150

ICPR’00 #9 450 11250 9 25

bags #16 1600 15046 none 25-150

floorplans #10 1000 26830 none 16diagrams #10 1000 14100 none 21

queries #6 6000 6000 none 16-21

Rusinol’09 #1 42 344 none 38

Others

Page 17: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Groundtruth and test documents Existing datasets

1. Overview of approaches2. Existing datasets

GRECICPR

SESYD

GroundtruthGenerato

r of queries

1. Random selection of a document2. Radom selection of a symbol

v0x

s [0,1]

y

vmax

v x

es0

21 2

21

2vzerfs

l

n

nn

nnz

0

12

)12(!)1(2

2.02

52

12

12

3. Random crop

datasets

images

symbols

degradations

models

GREC’03 #30 3000 3000 10 5-50GREC’05 #16 1000 1000 6 25-

150GREC’07 #6 2100 2100 6 50-

150

ICPR’00 #9 450 11250 9 25

bags #16 1600 15046 none 25-150

floorplans #10 1000 26830 none 16diagrams #10 1000 14100 none 21

queries #6 6000 6000 none 16-21

Rusinol’09 #1 42 344 none 38

Others

Page 18: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Groundtruth and test documents Existing datasets

1. Overview of approaches2. Existing datasets

GRECICPR

SESYD

datasets

images

symbols

degradations

models

GREC’03 #30 3000 3000 10 5-50GREC’05 #16 1000 1000 6 25-

150GREC’07 #6 2100 2100 6 50-

150

ICPR’00 #9 450 11250 9 25

bags #16 1600 15046 none 25-150

floorplans #10 1000 26830 none 16diagrams #10 1000 14100 none 21

queries #6 6000 6000 none 16-21

Rusinol’09 #1 42 344 none 38

Others

Page 19: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Plan1. Groundtruth and test documents2. Performance characterization3. Conclusions and perspectives

Page 20: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Performance characterization Introduction

Performance characterisation (segmented symbols) [Valveny2004] [Dosch2006] [Valveny2007,2008a,2008b]

Recognition ratePrecision/RecallHomogeneitySeparability

Performance characterisation (real context)

Spotting/RecognitionSystem

Groundtruth Mapping

Region Of

Interest

Characterization

sofa

skin

tubdoor

door

Labels

r1 r2 r3

RanksQBE

truth results

Learning

Performance evaluation

Page 21: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Performance characterization About mapping

groundtruth

segmentation

segmentation

Layout analysis [Antonacopoulos1999]

Text/graphics separation [Wenyin1997]groundtruth

segme

ntation

truth results

Single : a model line matches only with

one detected line.

Split : two model lines

match with one detected line.

Merge : a model line matches with two

detected lines.

False alarm : a detected line

doesn't match with any model lines.

Miss : a model line doesn't

match with any detected lines.

Mapping cases

Symbol spotting [Rusinol2009]

Groundtruth

Results

Mappingc1 c2

g1 g2

r

rccecision 21Pr

21

21Reggcccall

Page 22: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Performance characterization Mapping, application to symbol

wrapperbox,

ellipsis

convex polygon

the precision will depend of

the model

could be of weak

precision

Which representation ? How to define the regions ?

concave polygon

precise but comparison is time

consuming

the polarized pat of the capacitor belong

to the symbol ?

Same for the moving area of the door ?

Lot of systems use sliding windows to detect symbols providing only points [Adam2001] [Dosh2004] [Rusinol2007]

pointHow to define

local thresholds

Compatibility with recognition systems ?

groundtruth

segmentation

Lot of systems use sliding windows to detect symbols providing only points [Adam2001] [Dosh2004] [Rusinol2007]Systems providing region of interest can “tune” their results, how to limit the over segmentation cases ?

Page 23: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Performance characterization Work in progress

Comparison of some criteria System of [Qureshi’08] , 100 floorplans (2521 symbols)

Domain definition of

the ROI

Orientation sampling

[0-2π]

Reporting [0-2π]Rate

s %

Region size dx×dy

results ground

truth

Signature based characterization

Page 24: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Plan1. Groundtruth and test documents2. Performance characterization3. Conclusions and perspectives

Page 25: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

Conclusions and perspectives• Conclusions

– Large databases of segmented symbol images exist “GREC”– Synthetic databases in real context exist “SESYD”– True-life documents and groundtruth are at the corner “EPEIRES”– Characterization tools have been proposed “SymbolRec”

• Perspectives– Continue to produce other databases, using existing platforms– Mapping is the key problem today, to achieve a performance

evaluation in real context

Page 26: Towards Performance Evaluation of Symbol Recognition & Spotting Systems  in a Localization Context

ThanksAll the referenced papers can be found in

[1] M. Delalandre, E. Valveny and J. Lladós Performance Evaluation of Symbol Recognition and Spotting Systems: A Overview. Workshop on Document Analysis Systems (DAS), pp 497-505, 2008.