image understanding & web security henry baird joint work with: richard fateman, allison coates,...

Image UnderstandingImage Understanding

& Web Security & Web Security

Henry Baird

Joint work with:

Richard Fateman, Allison Coates, Kris Popat,

Monica Chew, Tom Breuel, & Mark Luk

DIAR, Madison, WI – June 21, 2003 (HSB) 2

A fast-emerging research topicA fast-emerging research topic

Human Interactive Proofs (HIPs; definition later):– first instance in 1999– research took hold in CS security theory field first– intersects image understanding, cog sci, etc etc– fast attracting researchers, engineers, & users

This talk:– A brief history of HIPs – Existing systems -- w/ my critiques– Professional activities, so far -- incl. the 1st Int’l Workshop– In detail: PARC’s PessimalPrint & BaffleText

H. Baird & K. Popat, “Web Security & Document Image Analysis,” in J. Hu & A. Antonacopoulos (Eds.), Web Document Analysis, World Scientific, 2003 (in press).


Straws in the wind…Straws in the wind…

90’s: spammers trolling for email addresses

– in defense, people disguise them, e.g.

“baird AT parc DOT com”

1997: abuse of ‘Add-URL’ feature at AltaVista

– some write programs to add their URL many times

– skewed the search rankings

Andrei Broder et al (then at DEC SRC)

– a user action which is legitimate when performed once becomes abusive when repeated many times

– no effective legal recourse

– how to block or slow down these programs …


The first known instance…The first known instance…

Altavista’s AddURL filter Altavista’s AddURL filter

1999: “ransom note filter”– randomly pick letters, fonts, rotations – render as an image

– every user is required to read and type it in correctly

– reduced “spam add_URL” by “over 95%”

Weaknesses: isolated chars, filterable noise, affine deformations

M. D. Lillibridge, M. Abadi, K. Bharat, & A. Z. Broder, “Method for Selectively Restricting Access to Computer Systems,” U.S. Patent No. 6,195,698, Filed April 13, 1998, Issued February 27, 2001.

An image of text, not ASCII


Yahoo!’s “Chat Room Problem”Yahoo!’s “Chat Room Problem”

September 2000

Udi Manber asked Prof. Manuel Blum’s group at CMU:

– programs impersonate people in chat rooms, then hand out ads – ugh!

– how can all machines be denied access to a Web site

without inconveniencing any human users?

I.e., how to distinguish between machines and people on-line

… a kind of ‘Turing test’ !


Alan Turing (1912-1954)Alan Turing (1912-1954)

1936 a universal model of computation

1940s helped break Enigma (U-boat) cipher

1949 first serious uses of a working computer

including plans to read printed text

(he expected it would be easy)

1950 proposed a test for machine intelligence


Turing’s Test for AITuring’s Test for AI

How to judge that a machine can ‘think’:

– play an ‘imitation game’ conducted via teletypes

– a human judge & two invisible interlocutors: a human

a machine `pretending’ to be human

– after asking any questions (challenges) he/she

wishes, the judge decides which is human

– failure to decide correctly would be convincing

evidence of machine intelligence (Turing asserted)

Modern GUIs invite richer challenges than teletypes….

A. Turing, “Computing Machinery & Intelligence,” Mind, Vol. 59(236), 1950.


““CAPTCHAs”:CAPTCHAs”: Completely Automated Public Turing TestsCompletely Automated Public Turing Tests to Tell Computers & Humans Apart to Tell Computers & Humans Apart

challenges can be generated & graded automatically

(i.e. the judge is a machine) accepts virtually all humans, quickly & easily rejects virtually all machines resists automatic attack for many years

(even assuming that its algorithms are known?)

NOTE: the machine administers, but cannot pass the test!

(M. Blum, L. A. von Ahn, J. Langford, et al, CMU-SCS)

L. von Ahn, M. Blum, N.J. Hopper, J. Langford, “CAPTCHA: Using Hard AI Problems For Security,” Proc., EuroCrypt 2003, Warsaw, Poland, May 4-8, 2003 [to appear].


CMU’s ‘Gimpy’ CAPTCHACMU’s ‘Gimpy’ CAPTCHA

Randomly pick: English words, deformations, occlusions, backgrounds, etc

Challenge user to type in any three of the words Designed by CMU team: tried out by Yahoo! Problem: users hated it --- Yahoo! withdrew it

L. Von Ahn, M. Blum, N. J. Hopper, J. Langford, The CAPTCHA Web Page, http://www.captcha.net.


Yahoo!’s present CAPTCHA:Yahoo!’s present CAPTCHA: “EZ-Gimpy” “EZ-Gimpy”

Randomly pick: one English word, deformations, degradations, occlusions,

colored backgrounds, etc Better tolerated by users Now used on a large scale to protect various services Weaknesses: a single typeface, English lexicon


PayPal’s CAPTCHAPayPal’s CAPTCHA

Nothing published Seems to use a single typeface Picks, at random:

letters, overlain pattern Weaknesses: single typeface, simple grid, no image degradations, spaced apart


Cropping up everywhere… Cropping up everywhere…

In use today, to defend against:– skewing search-engine rankings (Altavista, 1999)– infesting chat rooms, etc (Yahoo!, 2000)– gaming financial accounts (PayPal, 2001)– robot spamming (MailBlocks, SpamArrest 2002)– In the last few months: Overture, Chinese website, HotMail, CD-rebate, TicketMaster, MailFrontier, Qurb, Madonnarama, …

…have you seen others?

On the horizon: – ballot stuffing, password guessing, denial-of-service attacks– `blunt force’ attacks (e.g. UT Austin break-in, Mar ’03)– …many others

Similar problems w/ scrapers; also, likely on Intranets.D. P. Baron, “eBay and Database Protection,” Case No. P-33, Case Writing Office, Stanford Graduate School of Business, Stanford Univ., 2001.


The Known Limits ofThe Known Limits ofImage Understanding TechnologyImage Understanding Technology

There remains a large gap in ability

between human and machine vision systems,

even when reading printed text

Performance of OCR machines has been systematically studied: 7 year olds can consistently do better!

This ability gap has been mapped quantitatively

S. Rice, G. Nagy, T. Nartker, OCR: An Illustrated Guide to the Frontier, Kluwer Academic Publishers: 1999.


Image Degradation ModelingImage Degradation Modeling

Effects of printing & imaging:

We can generate challenging

images pseudorandomly

H. Baird, “Document Image Defect Models,” in H. Baird, H. Bunke, & K. Yamamoto (Eds.), Structured Document Image Analysis, Springer-Verlag: New York, 1992.

blur

thrs

sen

s

thrs x blur


Machine Accuracy is a SmoothMachine Accuracy is a SmoothMonotonic Function of ParametersMonotonic Function of Parameters

T. K. Ho & H. S. Baird, “Large Scale Simulation Studies in Image Pattern Recognition,” IEEE Trans. on PAMI, Vol. 19, No. 10, p. 1067-1079, October 1997.


Can You ReadCan You Read These Degraded Images? These Degraded Images?

Of course you can …. but OCR machines cannot!


Experiments by PARC & UCB-CSExperiments by PARC & UCB-CS Pick words at random:

– 70 words commonly used on the Web

– w/out ascenders or descenders (cf. Spitz)

Vary physics-based image degradation parameters:

blur, threshold, x-scale -- within certain ranges

Pick fonts at random from a large set:

Times Roman (TR), Times Italic (TI), Palatino Roman (PR), Palatino Italic (PI), Courier Roman (CR), Courier Oblique (CO), etc

Test legibility on:– ten human volunteers (UC Berkeley CS Dept grad students)

– three OCR machines:

Expervision TR (E), ABBYY FineReader (A), IRIS Reader (I)


Results:Results: OCR Accuracy, by machine OCR Accuracy, by machine

00.10.20.30.40.50.60.70.80.9

1

fractionof wordscorrect

Expervis'n ABBYY IRIS

OCR machine

Times R

Times I

Courier O

Palatino R

Palatino I

total

Each machine has its peculiar blind spots


OCR Accuracy:OCR Accuracy: varying blur & threshold varying blur & threshold

0

0.2

0.4

0.6

0.8

1

fraction correct

0.02 0.04 0.06 0.08

threshhold

E blur=0.0E blur=0.4

E blur=0.8A blur=0.0A blur=0.4

A blur=0.8

The machines share some blind spots


PessimalPrint:PessimalPrint: exploiting image degradations exploiting image degradations

Three OCR machines fail when: OCR outputs

– blur = 0.0

& threshold 0.02 - 0.08

– threshold = 0.02

& any value of blur

~~~.I~~~

~~i1~~

N/A

N/A

N/A

~~I~~

A. Coates, H. Baird, R. Fateman, “Pessimal Print: A Reverse Turing Test,” Proc. 6th IAPR Int’l Conf. On Doc. Anal. & Recogn. (ICDAR’01), Seattle, WA, Sep 10-13, 2001.

… but people find all these easy to read


High Time for a Workshop!High Time for a Workshop!

Manuel Blum proposes it, rounds up some key speakers

Henry Baird offers PARC as venue; Kris Popat helps run it

Goals:Invite all known principals: theory, systems, engineers, users

Describe the state of the art

Plan next steps for the field

Organization:– ~30 attendees

– abstracts only, 1-5 pages, no refereeing, no archival publication

– 100% participation: everyone gives a (short) talk

– “mixing it up”: panel & working group discussions

– 2-1/2 days, lots of breaks for informal socializing

– plenary talk by John McCarthy ‘Father of AI’


1st NSF Int’l Workshop on1st NSF Int’l Workshop onHuman Interactive ProofsHuman Interactive Proofs PARC, Palo Alto, CA, January 9-11, 2002PARC, Palo Alto, CA, January 9-11, 2002


HIP’2002 ParticipantsHIP’2002 ParticipantsCMU - SCS, Aladdin Center

Manuel Blum, Lenore Blum, Luis von Ahn, John Langford, Guy Blelloch, Nick Hopper, Ke Yang, Brighten Godfrey, Bartosz Przydatek, Rachel Rue

PARC - SPIA/Security/TheoryHenry Baird, Kris Popat, Tom Breuel,

Prateek Sarkar, Tom Berson, Dirk Balfanz, David Goldberg

UCB - CS & SIMSRichard Fateman, Allison Coates,

Jitendra Malik, Doug Tygar, Alma Whitten, Rachna Dhamija, Monica Chew, Adrian Perrig, Dawn Song

RPIGeorge Nagy

StanfordJohn McCarthy

NSFRobert Sloan

AltavistaAndrei Broder

Yahoo!Udi Manber

Bell LabsDan Lopresti

IBM T.J. WatsonCharles Bennett

InterTrust Star LabsStuart Haber

City Univ. of Hong HongNancy Chan

Weizmann InstituteMoni Naor

RSA Security LaboratoriesAri Juels

Document Recognition Techs, IncLarry Spitz


Variations & GeneralizationsVariations & Generalizations CAPTCHA

Completely Automatic Public Turing test to tell Computers and Humans Apart

HUMANOIDText-based dialogue which an individual can use to

authenticate that he/she is himself/herself (‘naked in a glass bubble’)

PHONOIDIndividual authentication using spoken language

Human Interactive Proof (HIP)An automatically administered challenge/response protocolAn automatically administered challenge/response protocol

allowing a person to authenticate him/herself as belonging to allowing a person to authenticate him/herself as belonging to a certain group over a network without the burden of a certain group over a network without the burden of passwords,passwords,

biometrics, mechanical aids, or special training.biometrics, mechanical aids, or special training.


Highlights of HIP’2002Highlights of HIP’2002

Theory– some text-based CAPTCHAs are provably breakable

Ability Gaps– vision: gestalt, segmentation, noise immunity, style consistency

– speech: noise of many kinds, clutter (cocktail party effect)

– intelligence: puzzles, analogical reasoning, weak logic

– gestures, reflexes, common knowledge, …

Applications– subtle system-level vulnerabilties– aggressive arms race with shadowy enemies

http://www.parc.com/istl/groups/did/HIP2002


Funding & PartnershipsFunding & Partnerships

NSF– Robert Sloan, Dir, Theory of Computing Pgm– strongly supportive of this newborn field– encouraged grant proposals

Yahoo!– willing to run field trials– user acceptance laboratory– able to detect intrusion


DisciplinesDisciplines

Participating now: Cryptography Security Pattern Recognition Computer Vision Artificial Intelligence eCommerce

Needed: Cognitive Science Psychophysics (esp. of Reading) Biometrics Business, Law, … ….?


Weaknesses of ExistingWeaknesses of Existing Reading-Based CAPTCHAs Reading-Based CAPTCHAs

English lexicon is too predictable:– dictionaries are too small– only 1.2 bits of entropy per character (cf. Shannon)

Physics-based image degradations vulnerable to well-studied image restoration attacks, e.g.

Complex images irritate people

– even when they can read them– need user-tolerance experiments


Strengths ofStrengths of Human Reading Human Reading

Literature on the psychophysics of reading is relevant:

familiarity helps, e.g. English words optimal word-image size (subtended angle) is known (0.3-2 degrees) optimal contrast conditions known other factors measured for the best performance:

to achieve and sustain “critical reading speed”

BUT gives no answer to: where’s the optimal comfort zone?

G. E. Legge, D. G. Pelli, G. S. Rubin, & M. M. Schleske, “Psychophysics of Reading: I. normal vision,” Vision Research 25(2), 1985.

A. J. Grainger & J. Segui, “Neighborhood Frequency Effects in Visual Word Recognition,’ Perception & Psychophysics 47, 1990..


Designing a Stronger CAPTCHA:Designing a Stronger CAPTCHA: BaffleTextBaffleText principles principles

Nonsense words.– generate ‘pronounceable’ – not ‘spellable’ – words using a variable-length character n-gram Markov model– they look familiar, but aren’t in any lexicon, e.g.

ablithan wouquire quasis

Gestalt perception.– force inference of a whole word-image from fragmentary or occluded characters, e.g.

– using a single familiar typeface also helps

M. Chew & H. S. Baird, “BaffleText: A Human Interactive Proof,” Proc., SPIE/IS&T Conf. on Document Recognition & Retrieval X, Santa Clara, CA, January 23-24, 2003.


Mask DegradationsMask Degradations

Parameters of pseudorandom mask generator:– shape type: square, circle, ellipse, mixed– density: black-area / whole-area– range of radii of shapes


BaffleTextBaffleText Experiments at PARC Experiments at PARC

Goal: map the margins of accurate & comfortable

human reading on this family of images Metrics:

– objective difficulty: accuracy– subjective difficulty: rating– response time– exit survey: how tolerable overall

Participation:– 41 individual sessions– >1200 challenge/response trials– 18 exit surveys


BaffleTextBaffleText challenge webpage challenge webpage


BaffleTextBaffleText user ratings user ratings


User AcceptanceUser Acceptance

% Subjects willing to solve a BaffleText…

17% every time they send email

39% … if it cut spam by 10x

89% every time they register for an e-commerce site

94% … if it led to more trustworthy recommendations

100% every time they register for an email account

Out of 18 responses to the exit survey.


Subjective difficultySubjective difficulty tracks objective difficulty tracks objective difficulty


How to engineer How to engineer BaffleTextBaffleText

When we generate a challenge,– need to estimate its difficulty– throw away if too easy or too hard

Apply an idea from the psychophysics of reading:– image “complexity” metric: how hard to read

– simple to compute: perimeter** / black-area


Image complexityImage complexity predicts objective difficulty predicts objective difficulty


Image complexityImage complexity predicts subjective difficulty predicts subjective difficulty


Engineering guidelinesEngineering guidelines

For high performance, image complexity

should fall in the range 50-100; e.g.

Within this regime, BaffleText performs well:– 100% human subjects willing to try to read it– 89% accuracy by humans– 0% accuracy by commercial OCR– 3.3 difficulty rating, out of 10 (on average)– 8.7 seconds / trial on average

50 100


The latest seriousThe latest serious (known or published) attack… (known or published) attack…

Greg Mori & Jitendra Malik (UCB-CS)– Generalized Shape Context CV method– requires known lexicon – else, fails completely– expects known font (or fonts) – else, does worse

Results of Mori-Malik attacks (Dec 2002) given perfect foreknowledge of both lexicon and font:

CAPTCHA Attack success rate

EZ-GIMPY

Yahoo! + CMU83%

PessimalPrint

PARC + UCB40%

BaffleText PARC + UCB

25%

G. Mori & J. Malik, “Recognizing Objects in Adversarial Clutter,” submitted to CVPR’03, Madison, WI, June 16-22, 2003.


BaffleText:BaffleText: the strongest known CAPTCHA? the strongest known CAPTCHA?

Resists many known algorithmic attacks:– physics-based image restoration

– recognizing into a lexicon

– known-typeface targeting

– segmenting then recognizing

Exploits hard-to-automate human cognition powers:– Gestalt perception

– “semi-linguistic” familiarity

– within-typeface “style consistency”


Recent Microsoft CAPTCHARecent Microsoft CAPTCHA

• Random strings, local space-warping; plus meaningless curving strokes, both black (overlaid) and white (erasing)

• Fielded Dec 2002 on Passport (HotMail, etc)• Immediate reduction in new Hotmail accounts, with

virtually no user complaints

P. Y. Simard, R. Szeliski, J. Benaloh, J. Couvreur, I. Calinov, “Using Character Recognition and Segmentation to Tell Computer from Humans,” Proc., Int’l Conf. on Document Analysis & Recognition, Edinburgh, Scotland, August, 2003 [to appear].


PARC’s Leadership in R&D onPARC’s Leadership in R&D on Reading-based CAPTCHAs Reading-based CAPTCHAs

First refereed article on CAPTCHAs: A. L. Coates, H. S. Baird, R. Fateman, “Pessimal Print: a Reverse Turing Test,” Proc., 6th

IAPR Int’l Conf. On Document Analysis & Recognition, Seattle, WA, Sept. 10-13, 2001.

First professional HIP event, organized by PARC:

1st NSF Int’l Workshop on HIPs, Jan. 9-11, 2002, PARC, Palo Alto, CA.

First to ‘play both offense & defense’:– builds high-performance OCR systems; attacks CAPTCHAs– builds strong CAPTCHAs

First to validate using human-factors research:– human-subject trials measuring both accuracy & tolerance– PARC’s interdisciplinary tradition: social + computer sciences


The Arms Race The Arms Race

When will serious technical attacks be launched?

– ‘spam kings’ make $$ millions

– two spam-blocking e-commerce firms now use CAPTCHAs

How long can a CAPTCHA withstand attack?

– especially if its algorithms are published or guessed

Strategy: keep a pipeline of defenses in reserve:

– continuing partnership between R&D & users


Lots of Open Research QuestionsLots of Open Research Questions

What are the most intractable obstacles to machine vision?

segmentation, occlusion, degradations, …?

Under what conditions is human reading most robust?

linguistic & semantic context, Gestalt, style consistency…?

Where are ‘ability gaps’ located?

quantitatively, not just qualitatively

How to generate challenges strictly within ability gaps?

fully automatically

an indefinitely long sequence of distinct challenges


HIP Research CommunityHIP Research Community

PARC CAPTCHA website

www.parc.com/istl/projects/captcha

HIP’2002 Workshop

www.parc.com/istl/groups/did/HIP2002

HIP Website at Aladdin Center, CMU-SCS www.captcha.net

Volunteers for a PARC CAPTCHA usability test?

A 2nd HIP Workshop soon?


Alan Turing might haveAlan Turing might have enjoyed the irony … enjoyed the irony …

A technical problem – machine reading –

which he thought would be easy,

has resisted attack for 50 years, and

now allows the first widespread

practical use of variants of

his test for artificial intelligence.


ContactContact

Henry S. Baird

[email protected]

www.parc.com/baird

image understanding & web security henry baird joint work with: richard fateman, allison coates,...

Documents

machine intelligence

human judge

ascii slide

human users

image of text

human failure

mark luk slide

alan turing