a discriminative semi-markov model for robust …weinman/files/nescai08poster.pdfa discriminative...

A Discriminative Semi-Markov Model for Robust Scene Text RecognitionJerod J. Weinman, Erik Learned-Miller, and Allen R. Hanson

Computer Vision Laboratory, Dept. of Computer Science, University of Massachusetts Amherst

Problem and MotivationOur goal: Read text from images of the everyday world.

Applications include:•Aid to visually impaired•Photo annotation•Portable translation

Many new challenges are atypical of OCR page readers:•Small sample (<50 chars) •Perspective distortion

•Uneven lighting •Unusual fonts

•Word segmentation •Character segmentation

Previous work has assumed word boundaries, charactersegmentations, and/or known lexicon words. Theconditions above make these assumptions problematic.We propose a model that integrates lexical decision andboth word and character segmentation with recognition.

Semi-Markov Model

!

p y x;r " ( ) =

1

Z x( )exp U y, x;

r " ( ){ }

We use a discriminatively trained semi-Markov model forrecognition. Like a CRF, it models dependencies betweenstates. It has the additional property of modeling theduration of a state, i.e., character.

Model parameterscan be learned fromlabeled training data.

!

r "

The exponent (U) is composed of functions that score asegmentation and labeling (y) of a text image (x) using:

•Appearance

•Bigrams

•Lexicon

( )1000

39English|TH =P ( )

1000

4.1English|QU =P ( )

1000

0001.English|QA =P

aAaberg

M

zymurgyZyuganov

M

!

L

Parse ScoringA text image is parsed, or divided intolabeled segments. Parses may differ in thenumber of segments, and the states yi maydiffer in width.y1 y2 y3 y4

y1 y2 y3

AppearanceEvery possible parse segment isscored by a discriminant UA forcharacter appearance and width.

!

U A yi , x( )rti

RkK

y1 y4

!

UA

y1, x( )

!

UA

y4, x( )

BigramsEach pair of neighboring segments get a bigram score UB.

!

U B yi , yi+1( )

LexiconA character invariant UL replaces UB in sequences forminglexicon words to promote known word recognition.

!

UL

Parse Gap/OverlapNeighboring states may have a gap or anoverlap, as in the case of ligatures. Theseare also scored using image features.

!

UP yi , yi+1, x( )

y1 y2 y3

y1 y2 y3

Interpretation GraphAll possible parses can be represented by a weightedgraph. Dynamic programming can be used to find theoptimal path, or the best interpretation of the image.

ExperimentsTraining•Appearance: Synthetic images generated from 934 fonts•Bigrams: 82 books from Project Gutenberg•Lexicon: 50th frequency percentile words from SCOWL

Evaluation85 sign images of 1,144 characters at ~12 pixel x-height.

Char. Error RateNo Lex.

Forced Lex.Mixed

Allowing words both in and out ofthe lexicon reduces error by 13%.

23.5%OmniPage

15.0%Semi-Markov

16.6%OmniPage +Binarized

CharErr.System

Above: Ourmodel reduceserror by 10%.

Left: Examples.

Low ResolutionIntegrated segmentation and recognition allow our modelto recognize images at lower resolutions than trained on.

End-to-End

InputImage

MultiscaleDetection

Text RegionCandidates

MultipleRecognitions

Top-ScoringOutput

Ï I tawAim 1717ïïïï Ñalti7Tir-TWITE.A

OmniPageDetection

OmniPageOutput

LLOYDS BANK

0

MILL ANTIQUM

MILL ANTIQUESJ L7:"? 4,

Fleet

43 010 Fleet ri

FIREE DELIVEp

Ma OS

With a textd e t e c t o r ,we have ac o m p l e t er e a d i n gsystem.

Sem

i-Mar

kov

Omni

Page

With support from:

uucL,ass

LIBRFIRY

A),1HERbTTAVF

Free ehe in

EMIJUONEYFLO.010NIK EY

COFFEEMaus

Ella

OriginalImage

BinaryImage

OmniPageOutput

Douglass

LIBRARY

AMHERSTTAVERN

Free checking

CHURCHMONEYFLOMONKEY

COFFEEHOUSE

First

Fleet

Semi-MarkovModel Output

Resumes

WM/ I WS

krauts11.-4her

ResumesResumesResumesResumesResumestttA wms

Resumes

Image Semi-Markov OmniPage

Je eryAmherstJefferyAmherstJONryAmhent

.WlervAmMnt

Jeffery AmherstJeffery AmherstJeffery AmherstLit wryA An memoAn wrrA Am wme

Image Semi-Markov OmniPage

GkilaryAmhent

a discriminative semi-markov model for robust …weinman/files/nescai08poster.pdfa discriminative...

Documents