from anthrax to zip codes- the handwriting is on the wall venu govindaraju dept. of computer science...

52
From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo [email protected]

Post on 24-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

From Anthrax to ZIP Codes-The Handwriting is on the Wall

Venu GovindarajuDept. of Computer Science &

EngineeringUniversity at Buffalo

[email protected]

Page 2: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Outline

Success in Postal Application Role of Handwriting

Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications

Page 3: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

USPS HWAI Background

Postal Sponsorship Started – 1984 370 Academic Articles Published Millions of Letters Examined Many Experimental Systems Built and

Tested Migrated from Hardware to Software

System Only Postal Research Continuously Funded

Page 4: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Items to be Recognized, Read, and Evaluated (Machine printed and Script)

Delivery address, sender´s address, endorsements Linear Codes, Mail Class Indicia (2D-Codes, Meter Marks)

Meter Mark

Sender’s Address

Delivery Address

Linear Code

Digital Post MarkEndorsem

entIn Case of Undeliverable as Addressed Return to Sender

Pattern Recognition Tasks

Page 5: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Deployed.. USA

250 P&DC sites 27 Remote Encoding Centers 25 Billion Images Processed Annually 89% Automated Bar-coding

UK 67 Processing Centers 27 Million Pieces Per Day, 9.7 Million Pieces Per Hour Peak

Australia

Page 6: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

RCR Overview

Bar Code Sorter

RemoteEncodin

g

Advanced Facer

CancelerMulti-Line

OCR

Image

RCR

Page 7: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

At the Right Price

Processing Type Cost/1000 Pieces

Manual $47.78

Mechanized $27.46

Automated $5.30

Page 8: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

80% encode rate and counting!

Handwriting Encode Rate

0%

10%20%

30%40%

50%

60%70%

80%

Date

En

co

de

Ra

te

Page 9: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Impact Applications of CEDAR research helping

to automate tasks at IRS and USPS 1st year that USPS used CEDAR-developed

software to read handwritten addresses on envelopes, saved $100 million

1997-1999 USPS deployment of CEDAR-developed RCRs, USPS saved 12 million work hours and over $340 million

500 scientific publications and 10 patents

Page 10: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Outline

Success in Postal Application Role of Handwriting

Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications

Page 11: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Role Handwriting Recognition in Address Interpretation

Page 12: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

• <ZIP Code, Primary Number>– Create street name lexicon

<06478, 110>• DPF yields 8 street names

• ZIP+4 yields 31 street names (on average about 5 times more)

HAWLEY RD 1034NEWGATE RD 1533BEE MOUNTAIN RD 1615DORMAN RD 1642BOWERS HILL RD 1757FREEMAN RD 1781PUNKUP RD 1784PARK RD 6124

Context Provided by Postal Directories

Page 13: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

One record per delivery point in USA Provided weekly by USPS, San Mateo Raw DPF

138 million records 15 GB (114 bytes per record); 41,889 ZIP Code files

Fields of interest to HWAI ZIP Code, street name, primary number,

secondary number, add-on

ContextCEDAR

Page 14: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

ZIP Code 30% of ZIP Codes contain a single street name 5% of ZIP Codes contain a single primary number 2% of ZIP Codes contain a single add-on

<ZIP Code, primary number> Maximum number of records returned is 3,071

<ZIP Code, add-on> Maximum number of records returned is 3,070

Power of Context

CEDAR

Page 15: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Outline

Success in Postal Application Role of Handwriting

Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications

Page 16: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Handwriting Recognition

Context Ranked Lexicon

Page 17: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Multiple Choice Question

ContextRanked Lexicon

Page 18: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Lexicon Driven Model

1 2 3 4 5 6 7 8 9

w[7.6]

w[7.2]r[3.8]

w[5.0]

w[8.6]

o[7.6]r[6.3]

d[4.9]

w[5.0]

o[6.6]

o[6.0]

o[7.2]o[10.6] d[6.5]

d[4.4]

r[7.5]r[6.4]

o[7.8]r[8.6]

o[8.7]r[7.4]

r[7.6]

o[8.3]

o[7.7]r[5.8]

1 2 3 4 5 6 7 8 9

o[6.1]

Find the best way of accounting for characters ‘w’, ‘o’, ‘r’, ‘d’ buy consuming all segments 1 to 8 in the process

Distance between lexicon entry ‘word’ first character ‘w’ and the image between:- segments 1 and 4 is 5.0- segments 1 and 3 is 7.2- segments 1 and 2 is 7.6

Page 19: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Lexicon Free Model

4

5

67 82 3

1

1 32 4 5 6 7 8i[.8], l[.8] u[.5], v[.2]

w[.6], m[.3]

w[.7]

i[.7]u[.3]

m[.2]m[.1]

r[.4]

d[.8]o[.5]

-Image from 1 to 3 is a in with 0.5 confidence-Image from segment 1 to 4 is a ‘w’ with 0.7 confidence-Image from segment 1 to 5 is a ‘w’ with 0.6 confidence and an ‘m’ with 0.3 confidence

Find the best path in graph from segment 1 to 8

w o r d

Page 20: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Holistic FeaturesSlant Norm

Turn Points

Position Grid and gaps

Ascender

Descender

Reference Lines

Page 21: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Lexicon Reduction and Verification

Page 22: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Outline

Success in Postal Application Role of Handwriting

Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications

Page 23: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Grapheme Models

Page 24: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Structural FeaturesBAG

JunctionLoops

LoopTurns

End

End

Page 25: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Feature Extraction and Ordering

Critical node: removal disconnects a connected component.

2-degree critical nodes keep feature ordering from left to right.

LeftComponent

RightComponent

Loop

EndTurns

Junction

LoopsEnd

Turns

Page 26: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Continuous Attributes

grapheme

pos orientation

angle

Down cusp

3.0 -90o

Up loop

Down arc

Page 27: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Stochastic Model

Page 28: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Observations

Page 29: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Results

Lex size

Top WMR %

SM CA%

10 1 96.86 96.56

2 98.80 98.77

100 1 91.36 89.12

2 95.30 94.06

1000 1 79.58 75.38

2 88.29 86.29

20000 1 62.43 58.14

2 71.07 66.49

Page 30: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Interactive Models[McClelland and Rumelhart, Psychological Review, 1981]

ABLE TRIPTRAP

A TN

Words

Letters

Features

Page 31: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Interactive Recognition

T-crossings, loops, ascenders, descenders, length

West Central StreetWest Main StreetSunset Avenue

West Central StreetEast Central StreetSunset Avenue

West Central StreetWest Central AvenueSunset Avenue

Lexicon 1 Lexicon 2 Lexicon 3

Interactive Model

features

image

Page 32: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Adaptive Character Recognition[Park and Govindaraju, IEEE CVPR 2000]

•Adaptive selection of features

•Adaptive number of features

•Adaptive resolutions

•Adaptive sequencing of features

•Adaptive termination conditions

Page 33: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Features

4 gradient features

5 moment features

Vector code book

Page 34: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Feature Space

|V| x |Nc| x |Ixy|

29 x 10 x 85 (quad tree, 4 levels)

Recognition rate and feature |V| GSC: |V| : 2512

Tradeoffs: space vs accuracy Hierarchical space with additional

resolution and features as needed

Page 35: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Active Recognition Using Quad Trees

Page 36: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Experimental Results

Page 37: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu
Page 38: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Results

Classifier Active Model Neural Net

KNN

Top 1% 95.7 % 96.4% 95.7%

Templates 612 976 3,777

Msec/char 1.45 11.5 384

Training hrs 1 24 1

25656 training and 12242 test (Postal +NIST)

Page 39: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Outline

Success in Postal Application Role of Handwriting

Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications

Page 40: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Fast Recognition

-Reuse matched characters

-Reuse matched sub-strings

-Parallel processing

Page 41: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Combination and Dynamic Selection[Govindaraju and Ianakiev, MCS 2000]

WR 1

WR 2

WR 3+Lexicon

1

Top 5

<55Top 50

image

•Optimization problem

•Combinatorial explosion in

•arrangement of recognizers

•lexicon reduction levels

Page 42: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Lexicon Density[Govindaraju, Slavik, and Xue, IEEE PAMI 2002]

Lexicon 1 Lexicon 2

Me MeHe MemoSo MemoryTo MemoirsIn Mellon

Page 43: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Classifier Performance Prediction[Xue and Govindaraju, IEEE PAMI 2002]

q: probability that recognizer make a unit distance errors

D: average distance between any two words in the lexicons

n: lexicon size; p: performance; a, k,: model parameters

ln (-ln p) = (ln q) D + a ln ln n + ln k

Page 44: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Outline

Success in Postal Application Role of Handwriting

Recognition Recognition Models Interactive Cognitive Models New Research Areas Other Applications

Page 45: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Bank Check Recognition

Page 46: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

PCR Trend Analysis

Page 47: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

NYS EMS PCR FormNYS PCR Example

Thousands are filed a day.Passed from EMS to Hospital.

PCR Purpose:– Medical care/diagnosis– Legal Documentation– Quality Assurance

EMS AbbreviationsCOPD Chronic Obstructive Pulmonary DiseaseCHF Congestive Heart FailureD/S Dextrose in SalinePID Pelvic Inflammatory DiseaseGSW Gunshot WoundNKA No known allergiesKVO Keep vein openNaCL Sodium Chloride

Page 48: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Medical Text Recognition and Data Mining

Page 49: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Reading Census Forms

Lexicon Anomalies

Space: “sales man” and “salesman”

Morphology: “acct manager” and “account management”

Abbreviation

Plural: “school” and “schools”

Typographical: “managar” and “manager”

Page 50: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Binarization

Page 51: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Historic Manuscripts

Page 52: From Anthrax to ZIP Codes- The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu@cedar.buffalo.edu

Summary Handwriting recognition technology Pattern recognition task Lexicon holds domain specific

knowledge Adaptive methods Classifier combination methods Many applications