data frame augmentation of free form queries for constraint based document filtering andrew...

21
Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Post on 22-Dec-2015

227 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Andrew Zitzelberger

Page 2: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Problem

Page 3: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Constraint Based Queries

Page 4: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Queries

Test Queries    1) Find me a Wii game.    2) Find me a Honda for under 15 thousand dollars.    3) Roller Coaster more than 150 feet high    4) mountains at least 15K feet    5) games under $25    6) mountains less than 4 km    7) ps games < $40    8) coasters longer than 1000 feet    9) car for under 5 grand newer than 1990 with less than 115K miles   10) more than 15K miles under 5 grand newer than 2004

Page 5: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Keywords + Semantics

• Semantic queries are computationally expensive

• Keyword queries are fast and simpleo People are used to keyword queries

• Synergistic solution:o extract numerical constraints from the queryo use keywords to quickly narrow the search spaceo use constraints as a filter

Page 6: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Data Frames

Price    internal representation: Double    external representation: \$[1-9]\d{0,2}(,\d{3})*|...    ...    right units: (K)?\s*(cents|dollars|[Gg]rand|...)    canonicalization method: toUSDollars    comparison methods:        LessThan(p1: Price, p2: Price) returns (Boolean)        external representation: (less than|<|under|...)\s*{p2}|...        ...    end

Page 7: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Data Frame Library

Page 8: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Free Form Query

• Car under 6 grand newer than 1990 with less than 115K miles

Page 9: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Step 1: Condition Extraction

• Car under 6 grand newer than 1990 with less than 115K miles

• Extracted Conditionso (Price < 6000)o (Year > 1990)o (Distance  < 115000)

Page 10: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Step 2: Remove Condition Values

• Car under newer than with less than

Page 11: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Step 3: Remove Stopwords

• Car

Page 12: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Step 4: Perform Keyword Search

Page 13: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Step 5: Filter Document on Constraints

• Keep page if every constraint is satisfied by at least one extracted value

Page 14: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Experimental Setup

• 300 web documentso 100 car+trucks pages from http://provo.craigslist.orgo 100 video gaming pages from http://provo.craigslist.orgo 50 mountain pages from http://en.wikipedia.orgo 50 roller coaster pages from http://en.wikipedia.org

• 10 querieso 8 with usable conditions

• 2 data setso test-developmento blind test

Page 15: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Results Summary

• Precision increase for 56% of queries o 75% for test-dev, 50% for blind-test

• Precision never worse than keyword query• Most effective for short, focused documents

Precision@3/Query Type Keyword Queries Reduced Queries Data Frame Augmented Queries

Dev-Test Queries 33% 40% 60%

Blind-Test Queries 50% 46% 63%

Overall 42% 43% 62%

Page 16: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Discussion

• Issues:1.inadequate narrowing or ranking of search space2.noise caused by other numbers

Distance < 115000

Page 17: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Future Work

• Scalabilityo Indexing data frame extracted terms

• Precision vs Recall trade-offs

• Pay-as-you-go search construction

Page 18: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Related Work

• Question-Answering Systems

• Keyword search over databases and semantic stores

Page 19: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Questions?

Page 20: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Results (Test-Dev Set)

Query Keyword Condition Removed Keyword

Data Frame Augmentation

Find me a Wii game. 0.33 0.33 0.33

Find me a Honda for under 15 thousand dollars. 0.67 1.00 1.00

roller coaster more than 150ft high 0.33 0.33 0.67

mountains at least 15K ft 1.00 0.67 1.00

games under $25 0.00 0.33 0.67

mountains less than 4 km 0.00 0.00 0.33

ps games < 40 bucks 0.33 0.00 0.33

coasters longer than 1000 feet 0.33 1.00 1.00

car for under 6 grand newer than 1990 with less than 115K miles

0.33 0.33 0.67

more than 15K miles under 10 grand newer than 2000

0.00 0.00 0.00

Page 21: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Results (Blind Test Set)

Query Keyword Condition Removed Keyword

Data Frame Augmentation

Find me a Wii game. 0.67 0.67 0.67

Find me a Honda for under 15 thousand dollars. 0.67 1.00 1.00

roller coaster more than 150ft high 0.67 0.67 0.67

mountains at least 5K ft 0.33 0.33 0.67

games under $25 0.67 0.67 1.00

mountains less than 4 km 0.00 0.00 0.00

ps games < 40 bucks 0.33 0.33 0.33

coasters longer than 1000 feet 0.67 0.67 0.67

car for under 6 grand newer than 1990 with less than 115K miles

0.67 0.00 1.00

more than 15K miles under 10 grand newer than 2000

0.33 0.33 0.33