top-down and bottom-up cues for scene text...

Top-down and Bottom-up Cues for Scene Text Recognition Anand Mishra1, Karteek Alahari2 and C.V. Jawahar1

1CVIT, IIIT Hyderabad, India 2INRIA-WILLOW, ENS, Paris, France

Method SVT-Word ICDAR(50)

PICT 59 -

PLEX+ICDAR 56 72

ABBYY 9.0 35 56

Proposed (bi-gram) 70.03 76.96

Proposed (node specific) 73.26 81.78

Character detection

• Sliding window

• SVM classifier trained on

ICDAR’03

• HoG features

• Some windows are pruned

based on aspect ratio

• Set of labels 𝐿 = *0, 1, … , 9, 𝑎, 𝑏, . . 𝑧, 𝐴, 𝐵, … , 𝑍, 𝜖+ • Minimize an energy of following form:

𝐸 𝑋 = 𝐸𝑖(𝑥𝑖)𝑛𝑖=1 + 𝐸𝑖𝑗(𝑥𝑖 , 𝑥𝑗)ℇ

Sign Evaluation

[Weinman et al. PAMI’09]

ICDAR 2003

Street View Text

[Wang et al., ECCV’10]

Text Detection

• Adaboost based

[CVPR’04, ICDAR’11]

• SWT (CVPR’09)

• Multi-oriented text

detection [CVPR’12]

Word Recognition

• IP based [CVPR’11]

• Sparse BP [TPAMI’09]

• PLEX and PICT

[ECCV’10, ICCV’11]

Many Applications

• Multi-media indexing

• Mobile apps

• Auto navigation

• Help for visually impaired

Detection and Recognition

• Real time text localization

and recognition [CVPR’12]

• PLEX [ICCV’11]

Text is important

• Information rich

• Useful cues

• Viewers fixate on text

more [ICCV’09]

Unary cost: 𝐸𝑖(xi = cj) = 1 − P(cj|xi)

Unary cost of 𝜖 is computed from SVM score and aspect ratio prior.

Pairwise cost:

• Lexicon based: 𝐸𝑖𝑗(𝑥𝑖 = 𝑐𝑖 , 𝑥𝑗 = 𝑐𝑗) = 𝜆𝑙(1 − 𝑃(𝑐𝑖 , 𝑐𝑗 ))

The Goal

Datasets

Challenges

Prior Computation

Results

Failure cases

Center for Visual Information Technology

International Institute of Information Technology, Hyderabad, INDIA

http://cvit.iiit.ac.in/projects/SceneTextUnderstanding

PUFF CASBAH PIZZA

FUJI PARC WESTERN CAPOGIRO

Lexicon = {CVPR, ICPR}

Possible character pairs

= {CV, IC, VP, CP, PR,PR}

1/6 1/2 1/6 0 1/6 0

1/6 1/2 1/6 0

1/3 1 1/3 0

1/6 1/2 1/6 0

Toy example: Bi-gram prior v/s node specific prior

CV, IC VP, CP PR, PR

The CRF Energy

Graph construction

• Character windows = nodes

• Unary cost = 1 − 𝑓 𝑆𝑉𝑀 𝑆𝑐𝑜𝑟𝑒

• Pairwise cost =

Lexicon + overlap based door

c:0.1 d:0.5 b:0.1 c:0.1

o:0.4 x:0.2 o:0.3 r:0.4

• Descriptor: Dense HOG, cell size = 4 × 4, bins = 10 bins, after

resizing image to a 22× 20.

• Inference: Tree-reweighted message passing (TRW-S) [Kolmogorov,

TPAMI’06].

• The method is as it is applicable to near frontal text datasets like Sign

Evaluation data too.

E:0.1 3:0.2 1:0.2 L:0.3

E:0.1 K:0.1 S:0.2 X:0.1

Implementation Details

• Inter and intra character confusion

• Large number of classes

• Poor isolated character recognition

Need strong cues

• Top-Down: Prior computed

from lexicon

• Bottom-up: Sliding window

based character detections

• The CRF model infers the

true characters and the word

as a whole.

Lexicon

based Prior

Character

Detection

Top-down cue

Bottom-up cue

Top-down and Bottom-up cues

• Overlap based: 𝐸𝑖𝑗(𝑥𝑖 = 𝑐𝑖 , 𝑥𝑗 = 𝑐𝑗) = 𝜆𝑜𝑒𝑥𝑝(−(100 − 𝑜𝑣𝑒𝑟𝑙𝑎𝑝(𝑥𝑖 , 𝑥𝑗))

Recognize a cropped word

CAPOGIRO

Lexicons

• State-of-the-art commercial OCR : low accuracy

• Sign Evaluation (60.5%), ICDAR (56%), Street View Text (35%)

Weak detection

and low SVM

scores

Method Overview

Summary

• A general framework for scene text recognition

• Improves accuracies significantly on ICDAR and SVT

• Joint Probabilistic inference with lexicon priors unlike [ICCV’11]

• The method deals with poor character detections unlike [TPAMI’09]

SAIGON SUPER PUFF TRIPLE

Travel grant supported by Google Inc. India

top-down and bottom-up cues for scene text...

Documents

understanding text in scene...

2020 cues talent developement guide · cues webinar series...

tads!would!like!to!say!a!huge!thank!you!to!all!the ... ·...

combining bottom-up, top-down, and smoothness cues for...

exploring bottom-up and top-down cues with attentive...

resolving 3d human pose ambiguities with 3d scene...

bone tumors clues and cues bone tumors clues and cues

integrating depth and color cues for dense multi ... ·...

baby feeding cues (signs) · 2019-04-12 · baby feeding...

using cloud shadows to infer scene structure and...

depth cues

olfactory cues

model me conversation cues cues samples... · how to start...

bottom-up top-down cues for weakly-supervised semantic...

cues goodman

vzvz r t b image cross ratio measuring height b (bottom of...

2. use positive cues & 3. avoid negative cues

stereogram cues

bottom-up segmentation for top-down detection€¦ ·...

combining bottom-up, top-down, and smoothness cues for...