automatic indexing and retrieval of crime-scene photographs katerina pastra, horacio saggion, yorick...

20
Automatic indexing and retrieval of crime-scene photographs Katerina Pastra, Horacio Saggion, Yorick Wilks NLP group, University of Sheffield Scene of Crime Information System (SOCIS)

Upload: kerry-james

Post on 17-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Automatic indexing and retrieval of crime-scene photographs

Katerina Pastra, Horacio Saggion, Yorick Wilks

NLP group, University of Sheffield

Scene of Crime Information System (SOCIS)

Cambridge 2002

Outline

Application Scenario Project Overview SOCIS features Text-based approaches Using NLP: The Indexing mechanism The Retrieval mechanism Preliminary system evaluation Links

Cambridge 2002

Crime Scene Documentation:Current Practices

Scene of Crime Officers: attend crime scene photograph the scene collect evidence (package and label items) write reports and create indexed photo-album(s) case-files piled in storage rooms

Cambridge 2002

Examples

Ref: 6007898 Scenes of Crime Department Photographic Index Subject : SUD - Mary Smyth

Date : 24 - 1 - 00

Photographer : Jim Davis

1 Shows Ford Escort motor car A97BAK in woods off Winney Hill,

Harthill

2 - 3 Show close - up views of same vehicle

4 Shows the exhaust pipe

5 Shows the offside

6 Shows the front interior

7 Shows the front passenger seat (bag)

8 Shows a knife on the rear nearside seat

9 - 10 Show the position of the deceased rear offside

11 Shows the face of deceased at Rotherham District

Cambridge 2002

IT support for CSI

Crime Investigation requires: Fast and accurate retrieval of case-related info

(and therefore efficient classification of this info) Identification of “patterns” among cases

IT support for Crime Investigation: Governmental agencies’ Systems (HOLMES) Commercial Systems (LOCARD, SOCRATES)

(Crime Management and Administration Systems)

Needed: “Intelligent” support for Crime Investigation

Cambridge 2002

Project Overview

Domain: Scene of Crime Investigation (SOC) Scenario: Use of digital photography and speech to populate a central police database with case related information Objective: Creation of a prototype system that allows for intelligent indexing and retrieval of crime photographs

2000 - 2003

Cambridge 2002

SOCIS features

Access through the web (JSP application)

Storage of case documentation &

meta-information in central database

Automatic indexing of photographs

Automatic retrieval of photographs

Automatic population of official forms

Cambridge 2002

Cambridge 2002

Cambridge 2002

Cambridge 2002

“view of deceased with computer cable removed”

Cambridge 2002

Text-based image indexing & retrieval: approaches

• Manual assignment of keywords • Automatic extraction of keywords (statistics +/

semantic expansion) [Smeaton’96, Sable’99, Rose’00]• Extraction of logical form representations

(syntactic relations and concept classification) [Rowe’99]

Precision and recall increase as indexing termsgo beyond keywords capturing relational info

Cambridge 2002

Text-based image indexing & retrieval: problems

“view to the loft” vs. “view into loft” “position of baby with no bedding” “position of baby with bedding removed”

keyword barriersyntactic relations need to be

complemented with semantic information

Consider:

Cambridge 2002

Pipeline of processing resources:

tokeniser sentence splitter POS tagger lemmatizer NE recognizer parser discourse interpreter (+ triple extraction layer)

Indexing-Retrieval Mechanism

Free text queryOntoCrime

+ KB

Indexingterms

ARG1 REL ARG2

Query triples

ARG1 REL ARG2

matching

captions

Cambridge 2002

Corpus and Domain Model

1200 captions from 350 different crime cases dealt by South Yorkshire Police (text files) 65 captions (transcribed speech experiment)

Different lengths but same characteristics: Phrasal constructions, named entities, meta-info, what

and where references

Domain model = OntoCrime and knowledge base

Role = selection restrictions for triples’ arguments

and semantic expansion for retrieval

Cambridge 2002

Triple Extraction

17 Relations : AND, AROUND, MADE-OF,

OF, ON, WITHOUT etc.

Form of triples: ARG1 REL ARG2

Restrictions and filters for arguments

Rules for captions with multiple relations

Inferences restricted to certain cases

Cambridge 2002

Triple Extraction examples

“body on floor surrounded by blood”

“shot of footprint on top of bar”

“photograph from behind bar of body on floor”

“bottle, gun and ashtray on table”

“footprint with zigzag and target on chair”

blood AROUND floorblood AROUND body

Body ON floor

Cambridge 2002

Retrieval Mechanism

Allow for free text query Extract relational facts from the query Match the query triples with the indexing triples

of each captioned photograph Allow for exact match of arguments or class info ARG1, RELATION, ARG2Class: Class:

If no triples can be extracted, keyword matching

takes place with semantic expansion if needed

Cambridge 2002

Preliminary Evaluation

Indexing mechanism evaluation run on 600

captions indicated refinements on the rules

(80% accuracy in extracting and inferring triples)

Preliminary usability evaluation with real users:

Relational information considered to be an intuitive way for forming queries for image retrieval

Future work: overall evaluation of free text query for image retrieval

Cambridge 2002

Conclusions

Could the SOCIS approach be ported to other

domains ?Thorough testing and experimentation needed However, it is a corpus-driven approach:

Not just an alternative image indexing/retrieval

approach,but the one dictated by a real application

For more information on SOCIS:

http://www.dcs.shef.ac.uk/nlp/socis