a discriminative semi-markov model for robust …weinman/files/nescai08poster.pdfa discriminative...

1
A Discriminative Semi-Markov Model for Robust Scene Text Recognition Jerod J. Weinman, Erik Learned-Miller, and Allen R. Hanson Computer Vision Laboratory, Dept. of Computer Science, University of Massachusetts Amherst Problem and Motivation Our goal: Read text from images of the everyday world. Applications include: Aid to visually impaired Photo annotation Portable translation Many new challenges are atypical of OCR page readers: Small sample (<50 chars) Perspective distortion Uneven lighting Unusual fonts Word segmentation Character segmentation Previous work has assumed word boundaries, character segmentations, and/or known lexicon words. The conditions above make these assumptions problematic. We propose a model that integrates lexical decision and both word and character segmentation with recognition. Semi-Markov Model p y x ; r " ( ) = 1 Z x () exp U y , x ; r " ( ) { } We use a discriminatively trained semi-Markov model for recognition. Like a CRF, it models dependencies between states. It has the additional property of modeling the duration of a state, i.e., character. Model parameters can be learned from labeled training data. r " The exponent (U) is composed of functions that score a segmentation and labeling (y) of a text image (x) using: Appearance Bigrams Lexicon ( ) 1000 39 English | TH = P ( ) 1000 4 . 1 English | QU = P ( ) 1000 0001 . English | QA = P a Aaberg M zymurgy Zyuganov M L Parse Scoring A text image is parsed, or divided into labeled segments. Parses may differ in the number of segments, and the states y i may differ in width. y 1 y 2 y 3 y 4 y 1 y 2 y 3 Appearance Every possible parse segment is scored by a discriminant U A for character appearance and width. U A y i , x ( ) r t i R k K y 1 y 4 U A y 1 , x ( ) U A y 4 , x ( ) Bigrams Each pair of neighboring segments get a bigram score U B . U B y i , y i + 1 ( ) Lexicon A character invariant U L replaces U B in sequences forming lexicon words to promote known word recognition. U L Parse Gap/Overlap Neighboring states may have a gap or an overlap, as in the case of ligatures. These are also scored using image features. U P y i , y i + 1 , x ( ) y 1 y 2 y 3 y 1 y 2 y 3 Interpretation Graph All possible parses can be represented by a weighted graph. Dynamic programming can be used to find the optimal path, or the best interpretation of the image. Experiments Training Appearance: Synthetic images generated from 934 fonts Bigrams: 82 books from Project Gutenberg Lexicon: 50th frequency percentile words from SCOWL Evaluation 85 sign images of 1,144 characters at ~12 pixel x-height. Char. Error Rate No Lex. Forced Lex. Mixed Allowing words both in and out of the lexicon reduces error by 13%. 23.5% OmniPage 15.0% Semi- Markov 16.6% OmniPage + Binarized Char Err. System Above: Our model reduces error by 10%. Left: Examples. Low Resolution Integrated segmentation and recognition allow our model to recognize images at lower resolutions than trained on. End-to-End Input Image Multiscale Detection Text Region Candidates Multiple Recognitions Top-Scoring Output Ï I taw Aim 1717 ïïïï Ñal ti7Tir- TWITE.A OmniPage Detection OmniPage Output LLOYDS BANK 0 MILL ANTIQUM MILL ANTIQUES J L7:"? 4, Fleet 43 010 Fleet ri FIREE DELIVEp Ma OS With a text detector, we have a complete reading system. Semi-Markov OmniPage With support from: uucL,ass LIBRFIRY A),1HERbT TAVF Free ehe in EMI JUONEYFLO .010NIK EY COFFEE Maus Ella Original Image Binary Image OmniPage Output Douglass LIBRARY AMHERST TAVERN Free checking CHURCH MONEYFLO MONKEY COFFEE HOUSE First Fleet Semi-Markov Model Output Resumes WM/ I WS krauts 11.-4her Resumes Resumes Resumes Resumes Resumes tttA wms Resumes Image Semi-Markov OmniPage Je eryAmherst JefferyAmherst JONryAmhent .WlervAmMnt Jeffery Amherst Jeffery Amherst Jeffery Amherst Lit wryA An memo An wrrA Am wme Image Semi-Markov OmniPage GkilaryAmhent

Upload: others

Post on 25-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Discriminative Semi-Markov Model for Robust …weinman/files/nescai08poster.pdfA Discriminative Semi-Markov Model for Robust Scene Text Recognition Jerod J. Weinman, Erik Learned-Miller,

A Discriminative Semi-Markov Model for Robust Scene Text RecognitionJerod J. Weinman, Erik Learned-Miller, and Allen R. Hanson

Computer Vision Laboratory, Dept. of Computer Science, University of Massachusetts Amherst

Problem and MotivationOur goal: Read text from images of the everyday world.

Applications include:•Aid to visually impaired•Photo annotation•Portable translation

Many new challenges are atypical of OCR page readers:•Small sample (<50 chars) •Perspective distortion

•Uneven lighting •Unusual fonts

•Word segmentation •Character segmentation

Previous work has assumed word boundaries, charactersegmentations, and/or known lexicon words. Theconditions above make these assumptions problematic.We propose a model that integrates lexical decision andboth word and character segmentation with recognition.

Semi-Markov Model

!

p y x;r " ( ) =

1

Z x( )exp U y, x;

r " ( ){ }

We use a discriminatively trained semi-Markov model forrecognition. Like a CRF, it models dependencies betweenstates. It has the additional property of modeling theduration of a state, i.e., character.

Model parameterscan be learned fromlabeled training data.

!

r "

The exponent (U) is composed of functions that score asegmentation and labeling (y) of a text image (x) using:

•Appearance

•Bigrams

•Lexicon

( )1000

39English|TH =P ( )

1000

4.1English|QU =P ( )

1000

0001.English|QA =P

aAaberg

M

zymurgyZyuganov

M

!

L

Parse ScoringA text image is parsed, or divided intolabeled segments. Parses may differ in thenumber of segments, and the states yi maydiffer in width.y1 y2 y3 y4

y1 y2 y3

AppearanceEvery possible parse segment isscored by a discriminant UA forcharacter appearance and width.

!

U A yi , x( )rti

RkK

y1 y4

!

UA

y1, x( )

!

UA

y4, x( )

BigramsEach pair of neighboring segments get a bigram score UB.

!

U B yi , yi+1( )

LexiconA character invariant UL replaces UB in sequences forminglexicon words to promote known word recognition.

!

UL

Parse Gap/OverlapNeighboring states may have a gap or anoverlap, as in the case of ligatures. Theseare also scored using image features.

!

UP yi , yi+1, x( )

y1 y2 y3

y1 y2 y3

Interpretation GraphAll possible parses can be represented by a weightedgraph. Dynamic programming can be used to find theoptimal path, or the best interpretation of the image.

ExperimentsTraining•Appearance: Synthetic images generated from 934 fonts•Bigrams: 82 books from Project Gutenberg•Lexicon: 50th frequency percentile words from SCOWL

Evaluation85 sign images of 1,144 characters at ~12 pixel x-height.

Char. Error RateNo Lex.

Forced Lex.Mixed

Allowing words both in and out ofthe lexicon reduces error by 13%.

23.5%OmniPage

15.0%Semi-Markov

16.6%OmniPage +Binarized

CharErr.System

Above: Ourmodel reduceserror by 10%.

Left: Examples.

Low ResolutionIntegrated segmentation and recognition allow our modelto recognize images at lower resolutions than trained on.

End-to-End

InputImage

MultiscaleDetection

Text RegionCandidates

MultipleRecognitions

Top-ScoringOutput

Ï I tawAim 1717ïïïï Ñalti7Tir-TWITE.A

OmniPageDetection

OmniPageOutput

LLOYDS BANK

0

MILL ANTIQUM

MILL ANTIQUESJ L7:"? 4,

Fleet

43 010 Fleet ri

FIREE DELIVEp

Ma OS

With a textd e t e c t o r ,we have ac o m p l e t er e a d i n gsystem.

Sem

i-Mar

kov

Omni

Page

With support from:

uucL,ass

LIBRFIRY

A),1HERbTTAVF

Free ehe in

EMIJUONEYFLO.010NIK EY

COFFEEMaus

Ella

OriginalImage

BinaryImage

OmniPageOutput

Douglass

LIBRARY

AMHERSTTAVERN

Free checking

CHURCHMONEYFLOMONKEY

COFFEEHOUSE

First

Fleet

Semi-MarkovModel Output

Resumes

WM/ I WS

krauts11.-4her

ResumesResumesResumesResumesResumestttA wms

Resumes

Image Semi-Markov OmniPage

Je eryAmherstJefferyAmherstJONryAmhent

.WlervAmMnt

Jeffery AmherstJeffery AmherstJeffery AmherstLit wryA An memoAn wrrA Am wme

Image Semi-Markov OmniPage

GkilaryAmhent