writer identification through information retrieval

21
Writer identification through information retrieval Ralph Niels , Franc Grootjen & Louis Vuurpijl August 21st, 2008 ICFHR, Montreal

Upload: jeroen

Post on 22-Feb-2016

31 views

Category:

Documents


0 download

DESCRIPTION

August 21st, 2008 ICFHR, Montreal. Ralph Niels , Franc Grootjen & Louis Vuurpijl. Writer identification through information retrieval. Writer identification through information retrieval. Ralph Niels Franc Grootjen Louis Vuurpijl. A search engine for forensic experts. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Writer identification through information retrieval

Writer identification through information retrieval

Ralph Niels, Franc Grootjen & Louis Vuurpijl

August 21st, 2008ICFHR, Montreal

Page 2: Writer identification through information retrieval

A search engine for forensic experts

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

Page 3: Writer identification through information retrieval

Overview• Forensic writer identification• Prototypical shapes in handwriting• Information retrieval (IR)

• Traditional• Writer identification using

prototypes• Experiments

• Method• Results

• Conclusions & future work

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

Page 4: Writer identification through information retrieval

Forensic writer identification

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

Page 5: Writer identification through information retrieval

Forensic information retrieval• Web search: query of words to search in documents

containing words• Forensic search: query of characters to search in

documents containing characters

• Previous work*: sub-character level, binary features• Based on characters: improves justification possibilities

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

* A. Bensefia, T. Paquet, and L. Heutte. A writer identification and verification system. Pattern Recogn. Letters, 26(13):2080–2092, 2005.

Page 6: Writer identification through information retrieval

Forensic information retrieval• Dictionary of character shapes: prototypes

– Experts use prototypes– Describe query & documents by prototype usage

instances ofprototype

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

Prototypes

Page 7: Writer identification through information retrieval

Character to prototype matcher• Find most similar prototype for each character

W48 h16 a9 t1 y2 o1 u23 d16 i25 d12 i6 s12 (…)

a5

a9

a16

a52

(…)

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

Page 8: Writer identification through information retrieval

Prototypes• Averaged shapes of real handwritten characters• Dynamic Time Warping-distance to find most similar

prototype

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

R. Niels & L. Vuurpijl & L. Schomaker. Automatic allograph matching in forensic writer identification. International Journal of Pattern Recognition and Artificial Intelligence. Vol. 21, No. 1. Pages 61-81. February 2007.

Prototypes

Page 9: Writer identification through information retrieval

The IR model for writer identification

Character to prototype matcher

Indexing

Matching

Character to prototype matcher

Writer input

Query input

Prototype list

af(q)

af(w) aw(w)

Rankedlist

Justification

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

Page 10: Writer identification through information retrieval

Indexing: create weighted vectors• Vector of prototype usage for each writer: af(w)• Adjust weight of prototypes in that vector:

• Protos used by many writers: not distinctive -> lower weight• wf(p) = number of writers using proto p

• Weighted vector of prototype use for each writer

)log()( )(2

pwfnpiwf

)()()( piwfwafwaw p

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

Page 11: Writer identification through information retrieval

The IR model for writer identification

Character to prototype matcher

Indexing

Matching

Character to prototype matcher

Writer input

Query input

Prototype list

af(q)

af(w) aw(w)

Rankedlist

Justification

Prototype frequency in query

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

Page 12: Writer identification through information retrieval

The IR model for writer identification

Character to prototype matcher

Indexing

Matching

Character to prototype matcher

Writer input

Query input

Prototype list

af(q)

af(w) aw(w)

Rankedlist

Justification

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

Page 13: Writer identification through information retrieval

Matching• Input

• ‘Database writers’: Indexed writer vectors aw(w)• ‘Query writer’: Vector af(q)

• Match:• Calculate cosine of angle between af(q) and each aw(w)

• Output• Ranked list of writers (similarity to query)

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

Page 14: Writer identification through information retrieval

The IR model for writer identification

Character to prototype matcher

Indexing

Matching

Character to prototype matcher

Writer input

Query input

Prototype list

af(q)

af(w) aw(w)

Rankedlist

Justification

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

Page 15: Writer identification through information retrieval

Justification• Similarity value (cosine of angle)• Prototype contribution to retrieval result

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

Page 16: Writer identification through information retrieval

Justification• Forensic expert can further inspect justification

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

Page 17: Writer identification through information retrieval

Experiment• 43 writers from plucoll database

• Online data• Segmented into characters

• How well does our technique perform given a certain amount of data (characters)?• Amount of characters in database (d)• Amount of characters in query (q)

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

Page 18: Writer identification through information retrieval

Experiment• Pick d random letters from each database writer• Pick q random other letters from one writer,

and use those as query• Find most similar writer

• Prototypes• iwf(p), aw(w)• Matching

• Vary d and q

Repeat 10 times for each writer

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

Repeat10 times for each comb. ofd and q

Page 19: Writer identification through information retrieval

Results

100 300 500 1000

10 59 79 83 88

30 86 97 99 100

50 94 99 100 100

70 96 100 100 100

100 98 100 100 100

dq

d

q

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

Page 20: Writer identification through information retrieval

Conclusions & future work• Needed for 100%: 70 chars (q), 300 chars (d)

• Average English sentence: 75-100 characters• No black box: results are justified

• Online data: forensic practice?• Extract semi-automatically with help expert• Use offline matching technique

• Just 43 writers• Bigger (n writers & n techniques) experiments planned

• Promising results

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

Page 21: Writer identification through information retrieval

Writer identification throughinformation retrieval

Ralph NielsFranc GrootjenLouis Vuurpijl

A search engine for forensic experts