xml keyword search refinement

Click here to load reader

Post on 31-Dec-2015




0 download

Embed Size (px)


XML Keyword Search Refinement. 郭青松. Outline. Introduction Query Refinement in Traditional IR XML Keyword Query Refinement My work. Why we need query refinement?. User express their query intention by keywords, but their don’t know how to formulate good query Lack of experience - PowerPoint PPT Presentation


XML Keyword Search Refinement

XML Keyword Search RefinementLOGOOutline Introduction

Query Refinement in Traditional IR

XML Keyword Query Refinement

My workWhy we need query refinement?User express their query intention by keywords, but their dont know how to formulate good queryLack of experience Too many expression formsUnfamiliar with the systemHave no idea about the data

Query RefinementRefine the query and get good resultsWhat is Query Refinement?Query expansion(query reformulation)

Given an ill-formed query from the user, we refine the query and help the user to better retrieve documents.

The goal is to improve precision and/or recall.

Example: cars car, automobile, autoXML SearchTag + Keyword searchbook: xmlPath Expression + Keyword search (CAS Queries) /book[./title about xml db]Structure queryXPath, XQueryKeyword search (CO Queries) xmlXML Keywords Search VS IRIRFlat HTML pagesWhole page returned

XMLModel(treegraph)Structural(semi-structural)Semantic-based query(LCA, SLCA)Information fragment returned

Need of XML Keyword Query RefinementHard to know the XML contentEspecially big xml documentInformation fragments(LCA\SLCA)Easily affect the results(Precision )Huge difference of query resultsIR style refinement methods is not suitable for xml Only content be consideredNeed structure information to form a good queryOutline Introduction

Query Refinement in Traditional IR

XML Keyword Query Refinement

My workTasksSpelling CorrectionWord Splitting/Word MergingPhrase SegmentationWord StemmingAcronym ExpansionAdd/Delete Terms SubstitutionClasses of Query Refinement Relevance feedbackUsers mark documents(relevant, nonrelevant)Reweight the terms in the query

Automatic query RefinementSystem analysis the relevance of documents and query, give refined query automaticallyGlobal analysisLocal analysisRelevance FeedbackBegan in the 1960s

Improvement in recall and precision

Basic process as followsThe user issues their initial query The system returns an initial result set.The user then marks some returned documents as relevant or nonrelevant.The system then re-weights the terms and refine the query resultsRelevance Feedback ModelsBoolean.Terms appear in document: relevanceVector Space.q=(t1, t2,, tn) d=(w1, w2,, wn)

Probabilistic.Relevance of a query and documents evaluate as probabilityProbabilistic ranking principle

Rocchio algorithm for vector-space modelqm :refined query vectorq0: the original query vector Dr : relevant documents , Dnr: nonrelevant documents , , : weights attached to each term

Average relevant- document vector Average non-relevant document vector 13Global analysis(1)Using all documents to compute the similarity of query q and terms in the documents

Similarity Thesaurus based

Global analysis(2)Select r terms with highest sim value and adding into initial query , reformulate the new query

Similarity of terms

Query vector

Similarity of query and terms Local analysisLocal analysis: Using initial query results(especially documents front ,local documents) to refine the query

Local clusteringClustering the term of local documentsQuery refined with the relevant clusterSimilarity of terms in query and terms in documents

Local context analysis(LCA)Get the most similar term in local documents with the query q to expanseSimilarity of q and terms in documentsCompany namewww.themegallery.comOutline Introduction

Query Refinement in Traditional IR

XML Keyword Query Refinement

My workXML Refinement Manner(1)Query refined formKeywords query New Keywords QueryTreat as traditional IR problemIR with XML Keyword search SemanticsKeywords Structural QueryUser participantManually(User Interactive )Structural FeedbackAutomaticCompany namewww.themegallery.comXML Refinement Manner (2)Manually Refined to new Keywords QueryIR(consider the structure of xml)Manually Transform to Structural QueryRelevance FeedbackAutomatic Refined to new Keywords Query Lu jiaheng:Automatic Transform to Structural QueryNLP

Automatic Refined to new Keywords Query(1) Query Refined QueryRule basedOperation Term merging: Term splitting:Term substitution:Term deletion

Original queryRefined queryIR,2003,MikeInformation Retrieval,2003,MikeMike, publicationMike, publicationsDatabase, paperDatabase, in-proceedingsXML, John,2003XML, Johnmachin, learnmachine, learningHobby, news, paperHobby, newspaperOn, line, data, baseOnline, databaseAutomatic Refined to new Keywords Query(2) Ranking Refined query candidates set S(RQ)Refinement costCost: the step of op from Q to RQDynamic programmingEfficient Refinement AlgorithmsAvoid the multiple scan invert liststack-based ,stack-based, short-list-eager approachRQ candidates have the same renement costQ={XML, Jim, 2001}{XML, 2001}, {Jim, 2001} or {XML, Jim}NLPX Natural Language Query (NLQ) NEXINEXI(Narrowed Extended XPath I)//A[about(//B,C)]A: path expression, B :relative path expression to AC is the content requirement. about clause represents an individual information request. NLPXLexical and Semantic Taggingstructural words: content requirements

boundary words: Path expression

instruction wordsR :return request , S :support request.Find sections about compression in articles about information retrievalTagged: Find/XIN sections/XST about/XBD compression/NN in/IN articles/XST about/XBD information/NN retrieval/NNNLPXTemplate Matchingmost queries correspond to a small set of patternsformulate grammar templates with patterns

Query: Request+ Request : CO_Request | CAS_Request CO_Request: NounPhrase+ CAS_Request: SupportRequest | ReturnRequest SupportRequest: Structure [Bound] NounPhrase+ ReturnRequest: Instruction Structure [Bound] NounPhrase+ Grammar TemplatesRequest 1 Request 2 Structural: /article/sec /articlec Content: compression information retrieval Instruction: R SInformation RequestsNLPXNEXI Query Productionmerge the information request into NEXI query. A[about(.,C)] A :the request structural attribute and C : the request content attribute.

//article[about(.,information retrieval)]//sec[about (.,compression)]Query generation processCreate target component Break up the query into unitsGenerate initial target combinations of input target componentsGenerate queries modifying a target component combing two componentsInitializationBreaks up the input query into termsStructure( XML tags or attributes)Content term(refer to text)Create componentStructure term unbound targetContent term binding to a bound targetProbability enumeration

Target component and target sets{//author[~jennifer widom]} 0.6842{//editor[~jennifer widom]} 0.3150 {//title[~jennifer widom]} 0.0004{//article} 0.5000 {//inproceedings} 0.5000Jennifer widompapers{//article} {//author[jennifer widom]} 0.3421{//inproceedings} {//author[jennifer widom]} 0.3421{//inproceedings} {//editor[jennifer widom]} 0.1577{//article} {//editor[jennifer widom]} 0.1577{//inproceedings} {//title[jennifer widom]} 0.0002{//article} {//title[jennifer widom]} 0.0002Query: Papers by jennifer widomTransformation Operators(1)Aggregation: merge targets with same tag{//a}, {//a[~x]} {//a[~x]}{//a[~x]} , {//a[~y]} {//a[~x y]} Prefix expansion: add an ancestor condition{//b} {//a//b}{//b[~x]} {//a//b[~x]} Ordering: combine targets{//a}, {//b} {//a//b} or {//a[//b]}{//a}, {//b[~x]} {//a//b[~x]} or {//a[//b[~x]]} Conclusion Two stronger assumptionKeyword query non-ambiguityAvailability of XML thesaurusAccuracy: terms classification didnt consider specific XML contextTime costly:Term classificationTargets create scan the XML documentsOutline Introduction

Query Refinement in Traditional IR

XML Keyword Query Refinement

My workwww.themegallery.comThank You !LOGO

View more