evaluating xml retrieval: the inex initiative mounia lalmas queen mary university of london

Evaluating XML Evaluating XML retrieval: retrieval:

The INEX initiativeThe INEX initiativeMounia LalmasMounia Lalmas

Queen Mary University of LondonQueen Mary University of London

http://qmir.dcs.qmul.ac.ukhttp://qmir.dcs.qmul.ac.uk

OutlineOutline

Information retrievalInformation retrieval (Content-oriented) XML retrieval(Content-oriented) XML retrieval

Evaluating information retrievalEvaluating information retrieval Evaluating XML retrieval: INEXEvaluating XML retrieval: INEX

Information retrievalInformation retrieval

Example of a user information need:Example of a user information need:

““Find all documents about sailing charter Find all documents about sailing charter agencies that (1) offer sailing boats in the Greek agencies that (1) offer sailing boats in the Greek islands, and (2) are registered with the RYA. The islands, and (2) are registered with the RYA. The documents should contain boat specification, documents should contain boat specification, price per week, e-mail and other contact details.”price per week, e-mail and other contact details.”

A formal representation of an information need A formal representation of an information need constitutes a constitutes a queryquery

Information retrievalInformation retrieval

IR is concerned with the representation, IR is concerned with the representation, storage, organisation, and access to storage, organisation, and access to repositories of information, usually under repositories of information, usually under the form of the form of documentsdocuments. .

Primary goal of an IR systemPrimary goal of an IR system““Retrieve all the documents which are Retrieve all the documents which are relevantrelevant (useful) to a user query, while (useful) to a user query, while retrieving as few non-relevant documents as retrieving as few non-relevant documents as possible.”possible.”

DocumentsDocuments QueryQuery

Document representationDocument representation

Retrieval resultsRetrieval results

Query representationQuery representation

Indexing Formulation

Retrieval function

Relevancefeedback

Conceptual model for IRConceptual model for IR

Structured Document Structured Document RetrievalRetrieval

Traditional IR is about finding relevant documents to a Traditional IR is about finding relevant documents to a user’s information need, e.g. entire book.user’s information need, e.g. entire book.

SDR allows users to retrieve document components that SDR allows users to retrieve document components that are more focussed to their information needs, e.g a are more focussed to their information needs, e.g a chapter of a book instead of an entire book.chapter of a book instead of an entire book.

The structure of documents is exploited to identify which The structure of documents is exploited to identify which document components to retrieve.document components to retrieve.

Structured DocumentsStructured Documents

Linear order of words, sentences, paragraphs …

Hierarchy or logical structure of a book’s chapters, sections …

Links (hyperlink), cross-references, citations …

Temporal and spatial relationships in multimedia documents

Book

Chapters

Sections

Paragraphs

World Wide Web

This is only only another to look one le to show the need an la a out structure of and more a document and so ass to it doe not necessary text a structured document have retrieval on the web is an it important topic of today’s research it issues to make se last sentence..

Structured DocumentsStructured Documents

ExplicitExplicit structure structure formalised formalised through document representation through document representation standards (Mark-up Languages)standards (Mark-up Languages)

LayoutLayoutLaTeX (publishing), HTML (Web LaTeX (publishing), HTML (Web publishing)publishing)

StructureStructureSGML, SGML, XMLXML (Web publishing, (Web publishing, engineering), MPEG-7 (broadcasting)engineering), MPEG-7 (broadcasting)

Content/Content/SemanticSemanticRDF, DAML + OIL, OWL (semantic RDF, DAML + OIL, OWL (semantic web)web)

World Wide Web

This is only only another to look one le to show the need an la a out structure of and more a document and so ass to it doe not necessary text a structured document have retrieval on the web is an it important topic of today’s research it issues to make se last sentence..

<b><font size=+2>SDR</font></b><img src="qmir.jpg" border=0>

<section> <subsection> <paragraph>… </paragraph> <paragraph>… </paragraph> </subsection></section>

<Book rdf:about=“book”> <rdf:author=“..”/> <rdf:title=“…”/></Book>

XML: eXML: eXXtensible tensible Mark-upMark-up LLanguageanguage

Meta-language (user-defined tags) currently Meta-language (user-defined tags) currently being adopted as the document format being adopted as the document format language by W3Clanguage by W3C

Used to describe content and structure (and Used to describe content and structure (and not layout)not layout)

Grammar described in DTD (Grammar described in DTD ( used for used for validation)validation)<lecture> <title> Structured Document Retrieval </title> <author> <fnm> Smith </fnm> <snm> John </snm> </author> <chapter> <title> Introduction into XML retrieval </title> <paragraph> …. </paragraph> … </chapter> …</lecture>

<!ELEMENT lecture (title, author+,chapter+)><!ELEMENT author (fnm*,snm)><!ELEMENT fnm #PCDATA>…

XML: eXML: eXXtensible tensible Mark-upMark-up LLanguageanguage

Use of XPath notation to refer to the Use of XPath notation to refer to the XML structureXML structure

chapter/title: title is a direct sub-component of chapter//title: any titlechapter//title: title is a direct or indirect sub-component of chapterchapter/paragraph[2]: any direct second paragraph of any chapterchapter/*: all direct sub-components of a chapter

<lecture> <title> Structured Document Retrieval </title> <author> <fnm> Smith </fnm> <snm> John </snm> </author> <chapter> <title> Introduction into SDR </title> <paragraph> …. </paragraph> … </chapter> …</lecture>

Querying XML documentsQuerying XML documents

Content-only (CO) queriesContent-only (CO) queries

''open standards for digital video in distance learningopen standards for digital video in distance learning''

Content-and-structure (CAS) queriesContent-and-structure (CAS) queries

//article [about(., 'formal methods verify correctness aviation //article [about(., 'formal methods verify correctness aviation systems')]systems')]

/body//section/body//section [about(.,'case study application model checking [about(.,'case study application model checking

theorem proving')]theorem proving')]

Structure-only (SA) queriesStructure-only (SA) queries

/article//*section/paragraph[2]/article//*section/paragraph[2]

Conceptual model for XML Conceptual model for XML retrievalretrieval

Structured documents Content + structure

Inverted file + structure index

tf, idf, acc

Matching content + structure

Presentation of related components

DocumentsDocuments QueryQuery

Document representationDocument representation

Retrieval resultsRetrieval results

Query representationQuery representation

IndexingIndexing FormulationFormulation

Retrieval functionRetrieval function

Relevancefeedback

Relevancefeedback

Content-oriented XML Content-oriented XML retrievalretrieval

Return document components of Return document components of varying granularityvarying granularity (e.g. a book, (e.g. a book,

a chapter, a section, a paragraph, a a chapter, a section, a paragraph, a table, a figure, etc), relevant to the table, a figure, etc), relevant to the user’s information need both with user’s information need both with

regards to regards to contentcontent and and structurestructure..

Content-oriented XML Content-oriented XML retrievalretrieval

Retrieve theRetrieve the bestbest components components according to content and structure according to content and structure criteria:criteria:

INEX:INEX: most specific component that satisfies the query, most specific component that satisfies the query, while being exhaustive to the querywhile being exhaustive to the query

Shakespeare study:Shakespeare study: best entry points, which are best entry points, which are components from which many relevant components can components from which many relevant components can be reached through browsingbe reached through browsing

??????

ArticleArticle ?XML,??XML,?retrievalretrieval

??authoringauthoring

0.9 XML 0.5 XML 0.2 XML0.9 XML 0.5 XML 0.2 XML

0.4 retrieval 0.7 0.4 retrieval 0.7 authoringauthoring

ChallengesChallenges

Title Section 1 Section 2

No fixed retrieval unit + nested document components +different types of document components

how to obtain document and collection statistics? which component is a good retrieval unit? which components contribute best to content of Article? how to estimate? how to aggregate?

0.40.5

0.2

0.6 0.40.4

0.2

Approaches …Approaches …

vector space model

probabilistic model

bayesian network

language model

extending DB model

boolean model

natural language processing

cognitive model

ontology

parameter estimation

tuning

smoothing

fusion

phrase

term statistics

collection statistics

component statistics

proximity search

logistic regression

belief modelrelevance feedback

EvaluationEvaluation

The goal of an IR systemThe goal of an IR systemretrieve as many relevant documents as possible and as retrieve as many relevant documents as possible and as few non-relevant documents as possiblefew non-relevant documents as possible

Comparative evaluation of technical performance of Comparative evaluation of technical performance of IR systems = effectivenessIR systems = effectiveness

ability of the IR system to retrieve relevant documents and ability of the IR system to retrieve relevant documents and suppress non-relevant documentssuppress non-relevant documents

EffectivenessEffectivenesscombination of combination of recallrecall and and precisionprecision

RelevanceRelevance

A document is relevant if it “has significant A document is relevant if it “has significant and demonstrable bearing on the matter at and demonstrable bearing on the matter at hand”.hand”.

Common assumptions:Common assumptions: ObjectivityObjectivity TopicalityTopicality Binary natureBinary nature IndependenceIndependence

Recall / PrecisionRecall / Precision

Document collection

Retrieved RelevantRetrieved and relevant

documentsrelevant ofnumber

retrieved documentsrelevant ofnumber recall

retrieved documents ofnumber

retrieved documentsrelevant ofnumber precision

=

=

Recall / PrecisionRecall / Precisionrelevant documents for a given queryrelevant documents for a given query

{d3, d5, d9, d25, d39, d44, d56, d71, d89, d123}{d3, d5, d9, d25, d39, d44, d56, d71, d89, d123}

rankrank docdoc precisionprecision recallrecall rankrank docdoc precisionprecision recallrecall

11

22

33

44

55

66

77

d123d123

d84d84

d56d56

D6D6

d8d8

d9d9

d511d511

1/11/1

2/32/3

3/63/6

1/101/10

2/102/10

3/103/10

88

99

1010

1111

1212

1313

1414

d129d129

d187d187

d25d25

d48d48

d250d250

d113d113

d3d3

4/104/10

5/145/14

4/104/10

5/105/10

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100recall

precision

s 1s 2

Test collectionTest collection Document collection = document themselvesDocument collection = document themselves

depend on the task, e.g. evaluating web retrieval depend on the task, e.g. evaluating web retrieval requires a collection of HTML documents.requires a collection of HTML documents.

Queries / requestsQueries / requestssimulate real user information needs.simulate real user information needs.

Relevance judgementsRelevance judgementsstating for a query the relevant documents.stating for a query the relevant documents.

See TREC, CLEF, etcSee TREC, CLEF, etc

Evaluation of XML retrieval: Evaluation of XML retrieval: INEXINEX

Evaluating the effectiveness of content-oriented XML Evaluating the effectiveness of content-oriented XML retrieval approachesretrieval approaches

Collaborative effort Collaborative effort participants contribute to the participants contribute to the development of the collectiondevelopment of the collection

queriesqueriesrelevance assessmentsrelevance assessments

Similar methodology as for TREC, but adapted to XML Similar methodology as for TREC, but adapted to XML retrievalretrieval

40+ participants worldwide40+ participants worldwide

Workshop in Schloss Dagstuhl in December (20+ Workshop in Schloss Dagstuhl in December (20+ institutions)institutions)

INEX Test CollectionINEX Test Collection Documents (~500MB), which consist of 12,107 articles Documents (~500MB), which consist of 12,107 articles

in XML format from the IEEE Computer Society; 8 in XML format from the IEEE Computer Society; 8 millions elementsmillions elements

INEX 2002INEX 200230 CO and 30 CAS queries30 CO and 30 CAS queries

inex_eval metricinex_eval metric

INEX 2003INEX 200336 CO and 30 CAS queries36 CO and 30 CAS queries

CAS queries are defined according to enhanced subset of CAS queries are defined according to enhanced subset of XPathXPath

inex_eval and inex_eval_ng metricsinex_eval and inex_eval_ng metrics

INEX 2004 is just startingINEX 2004 is just starting

Relevance in XMLRelevance in XML

A element is relevant if it “has significant A element is relevant if it “has significant and demonstrable bearing on the matter at and demonstrable bearing on the matter at hand”hand”

Common assumptions in IRCommon assumptions in IR ObjectivityObjectivity TopicalityTopicality Binary natureBinary nature IndependenceIndependence

section

paragraph

article

1 2

1 2 3

Relevance in INEXRelevance in INEX

ExhaustivityExhaustivityhow exhaustively a document component discusses the how exhaustively a document component discusses the query: 0, 1, 2, 3query: 0, 1, 2, 3

SpecificitySpecificityhow focused the component is on the query: 0, 1, 2, 3how focused the component is on the query: 0, 1, 2, 3

RelevanceRelevance (3,3), (2,3), (1,1), (0,0), …(3,3), (2,3), (1,1), (0,0), …

section

article all sections relevant article very relevantall sections relevant article better than sectionsone section relevant article less relevantone section relevant section better than article…

Relevance assessment Relevance assessment tasktask

CompletenessCompleteness Element Element parent element, children element parent element, children element

ConsistencyConsistency Parent of a relevant element must also be relevant, although to a Parent of a relevant element must also be relevant, although to a

different extentdifferent extent Exhaustivity increase going Exhaustivity increase going Specificity decrease going Specificity decrease going

Use of an online interfaceUse of an online interface Assessing a query takes a week!Assessing a query takes a week! Average 2 topics per participantsAverage 2 topics per participants

Only participants that complete the assessment task have access to the Only participants that complete the assessment task have access to the collectioncollection

section

paragraph

article

1 2

1 2 3

MetricsMetrics

Recall / precision - basedRecall / precision - based

quantisation functions to obtain one relevance quantisation functions to obtain one relevance valuevalue

expected search lengthexpected search length

penalise overlappenalise overlap consider sizeconsider size

OthersOthersexpected ratio of relevantexpected ratio of relevantcumulated gain-based metricscumulated gain-based metricstolerance to irrelevancetolerance to irrelevance

section

article

Lessons learntLessons learnt

Good definition of relevanceGood definition of relevance

Expressing CAS queries was not easyExpressing CAS queries was not easy

Relevance assessment process must be Relevance assessment process must be “improved”“improved”

Further development on metrics neededFurther development on metrics needed

User studies requiredUser studies required

ConclusionConclusion XML retrieval is not just about the effective XML retrieval is not just about the effective

retrieval of XML documents, but also about how retrieval of XML documents, but also about how to evaluate effectivenessto evaluate effectiveness

INEX 2004 tracksINEX 2004 tracks Relevance feedbackRelevance feedback InteractiveInteractive Heterogeneous collectionHeterogeneous collection Natural language queryNatural language query

http://inex.is.informatik.uni-duisburg.de:2004/

Evaluating XML Evaluating XML retrieval: retrieval:

The INEX initiativeThe INEX initiativeMounia LalmasMounia Lalmas

Queen Mary University of LondonQueen Mary University of London

http://qmir.dcs.qmul.ac.ukhttp://qmir.dcs.qmul.ac.uk

evaluating xml retrieval: the inex initiative mounia lalmas queen mary university of london

Documents

structure of documents

information retrieval

information retrieval

xml structure chaptertitle

logical structure

querying xml documents

structure cas queries

ir slide