data frame augmentation of free form queries for constraint based document filtering

Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Upload: faraji

Post on 20-Mar-2016




1 download


Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering. Andrew Zitzelberger. Problem. Constraint Based Queries. Queries. Test Queries     1) Find me a Wii game.     2) Find me a Honda for under 15 thousand dollars.     3) Roller Coaster more than 150 feet high - PowerPoint PPT Presentation


Page 1: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Andrew Zitzelberger

Page 2: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering


Page 3: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Constraint Based Queries

Page 4: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering


Test Queries    1) Find me a Wii game.    2) Find me a Honda for under 15 thousand dollars.    3) Roller Coaster more than 150 feet high    4) mountains at least 15K feet    5) games under $25    6) mountains less than 4 km    7) ps games < $40    8) coasters longer than 1000 feet    9) car for under 5 grand newer than 1990 with less than 115K miles   10) more than 15K miles under 5 grand newer than 2004

Page 5: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Keywords + Semantics

• Semantic queries are computationally expensive

• Keyword queries are fast and simpleo People are used to keyword queries

• Synergistic solution:o extract numerical constraints from the queryo use keywords to quickly narrow the search spaceo use constraints as a filter

Page 6: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Data Frames

Price    internal representation: Double    external representation: \$[1-9]\d{0,2}(,\d{3})*|...    ...    right units: (K)?\s*(cents|dollars|[Gg]rand|...)    canonicalization method: toUSDollars    comparison methods:        LessThan(p1: Price, p2: Price) returns (Boolean)        external representation: (less than|<|under|...)\s*{p2}|...        ...    end

Page 7: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Data Frame Library

Page 8: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Free Form Query

• Car under 6 grand newer than 1990 with less than 115K miles

Page 9: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Step 1: Condition Extraction

• Car under 6 grand newer than 1990 with less than 115K miles

• Extracted Conditionso (Price < 6000)o (Year > 1990)o (Distance  < 115000)

Page 10: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Step 2: Remove Condition Values

• Car under newer than with less than

Page 11: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Step 3: Remove Stopwords

• Car

Page 12: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Step 4: Perform Keyword Search

Page 13: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Step 5: Filter Document on Constraints

• Keep page if every constraint is satisfied by at least one extracted value

Page 14: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Experimental Setup

• 300 web documentso 100 car+trucks pages from http://provo.craigslist.orgo 100 video gaming pages from http://provo.craigslist.orgo 50 mountain pages from http://en.wikipedia.orgo 50 roller coaster pages from

• 10 querieso 8 with usable conditions

• 2 data setso test-developmento blind test

Page 15: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Results Summary

• Precision increase for 56% of queries o 75% for test-dev, 50% for blind-test

• Precision never worse than keyword query• Most effective for short, focused documents

Precision@3/Query Type Keyword Queries Reduced Queries Data Frame Augmented Queries

Dev-Test Queries 33% 40% 60%

Blind-Test Queries 50% 46% 63%

Overall 42% 43% 62%

Page 16: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering


• Issues:1.inadequate narrowing or ranking of search space2.noise caused by other numbers

Distance < 115000

Page 17: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Future Work

• Scalabilityo Indexing data frame extracted terms

• Precision vs Recall trade-offs

• Pay-as-you-go search construction

Page 18: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Related Work

• Question-Answering Systems

• Keyword search over databases and semantic stores

Page 19: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering


Page 20: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Results (Test-Dev Set)

Query Keyword Condition Removed Keyword

Data Frame Augmentation

Find me a Wii game. 0.33 0.33 0.33

Find me a Honda for under 15 thousand dollars. 0.67 1.00 1.00

roller coaster more than 150ft high 0.33 0.33 0.67

mountains at least 15K ft 1.00 0.67 1.00

games under $25 0.00 0.33 0.67

mountains less than 4 km 0.00 0.00 0.33

ps games < 40 bucks 0.33 0.00 0.33

coasters longer than 1000 feet 0.33 1.00 1.00

car for under 6 grand newer than 1990 with less than 115K miles

0.33 0.33 0.67

more than 15K miles under 10 grand newer than 2000

0.00 0.00 0.00

Page 21: Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Results (Blind Test Set)

Query Keyword Condition Removed Keyword

Data Frame Augmentation

Find me a Wii game. 0.67 0.67 0.67

Find me a Honda for under 15 thousand dollars. 0.67 1.00 1.00

roller coaster more than 150ft high 0.67 0.67 0.67

mountains at least 5K ft 0.33 0.33 0.67

games under $25 0.67 0.67 1.00

mountains less than 4 km 0.00 0.00 0.00

ps games < 40 bucks 0.33 0.33 0.33

coasters longer than 1000 feet 0.67 0.67 0.67

car for under 6 grand newer than 1990 with less than 115K miles

0.67 0.00 1.00

more than 15K miles under 10 grand newer than 2000

0.33 0.33 0.33