stop stupid fuzzy searches

26
Stop stupid fuzzy searches ! STOP STUPID FUZZY SEARCHES !

Upload: aurelien-saint-requier

Post on 13-Apr-2017

146 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Stop stupid fuzzy searches

Stop stupid fuzzy searches !

STOP STUPID FUZZY SEARCHES !

Page 2: Stop stupid fuzzy searches

/01

/02

/03

/04

Fuzzy search

Google Did you mean

Did you mean by Deezer

Conclusion

Table of contents

STOP STUPID FUZZY SEARCHES !

Page 3: Stop stupid fuzzy searches

Fuzzy search

/01

STOP STUPID FUZZY SEARCHES !

Page 4: Stop stupid fuzzy searches

Fuzzy searchHere’s how it works.

STOP STUPID FUZZY SEARCHES ! 4

Page 5: Stop stupid fuzzy searches

Fuzzy searchHere’s how it works.

STOP STUPID FUZZY SEARCHES ! 5

Page 6: Stop stupid fuzzy searches

Fuzzy searchHere’s how it works.

STOP STUPID FUZZY SEARCHES ! 6

Page 7: Stop stupid fuzzy searches

Fuzzy searchHere’s how it works.

STOP STUPID FUZZY SEARCHES ! 7

Page 8: Stop stupid fuzzy searches

Fuzzy searchHere’s how it going fast or not...

STOP STUPID FUZZY SEARCHES ! 8

fuzziness=2 is devastatingly slowhttps://www.elastic.co/blog/elasticsearch-queries-or-term-queries-are-really-fast

Page 9: Stop stupid fuzzy searches

Pros

● Search for all possible mistakes according to the fuzziness parameters

● Natively implemented in Elasticsearch and Lucene

Cons

STOP STUPID FUZZY SEARCHES !

● Increase CPU usage and query response time

● Not always relevant results

● System does not try to guess the user need

9

Fuzzy searchObservations

Page 10: Stop stupid fuzzy searches

Google Did You Mean

/02

STOP STUPID FUZZY SEARCHES !

Page 11: Stop stupid fuzzy searches

Did You Mean ? By Google

STOP STUPID FUZZY SEARCHES ! 11

Page 12: Stop stupid fuzzy searches

Did You Mean ? By Google

STOP STUPID FUZZY SEARCHES !

Based on the Query revision using known highly-ranked queries patent

http://www.google.com/patents/US20060224554

12

Page 13: Stop stupid fuzzy searches

Did You Mean ? Here’s how it works.

STOP STUPID FUZZY SEARCHES !

1. Assign a rank to all search queries.2. Identify the highest ranked queries as

known highly-ranked queries (KHRQ).3. Identify queries with strong probability of

being revised to KHRQ as NQ. KHRQ and NQ are indexed.

4. Determine a revision probability for a given query with the respect to indexed query.

5. Calculate a revision score (RS) using revision probability and query rank for the indexed query

6. Retrieve indexed queries with the highest RS as alternative queries

7. Provide alternative queries that are KHRQs or corresponding KHRQ for alternative queries that are NQs.

13

Page 14: Stop stupid fuzzy searches

Did You Mean ? Here’s how it works.

STOP STUPID FUZZY SEARCHES ! 14

Page 15: Stop stupid fuzzy searches

Did You Mean by Deezer

/03

STOP STUPID FUZZY SEARCHES !

Page 16: Stop stupid fuzzy searches

Did You Mean at Deezer Assign a rank to Deezer search queries

STOP STUPID FUZZY SEARCHES !

Exploit user search action :

Compute top ranked queries :

16

Page 17: Stop stupid fuzzy searches

Did You Mean at Deezer Identify nearby queries

STOP STUPID FUZZY SEARCHES !

Use a behavioral similarity based on the works of Elisa Gilles which allow us to :

● group user search queries for same needs (temporal analysis and levenshtein distance between tokens of two queries)

● flag reformulated queries (operations like insertion on the middle, substitution or deletion)

● foreach NQ, keep the most frequent NQ-KHRQ pair

17

Page 18: Stop stupid fuzzy searches

Did You Mean at Deezer Query revision system

STOP STUPID FUZZY SEARCHES !

● Use Elasticsearch to store pairs of NQS and KHRQs ○ id : md5 of the nq○ nq : the nearby query○ khrq : the known high ranked query corresponding to the nearby

query○ rank : rank of the KHRQ○ frequency : frequency of the NQ – KHRQ couple

● Near 1 millions of NQs coupled to 50 000 KHRQs are stored

● Learn Query revision model on the last month of user web search clicks and the last 3 months of user search queries

18

Page 19: Stop stupid fuzzy searches

Did You Mean at Deezer Query revision system

STOP STUPID FUZZY SEARCHES ! 19

Page 20: Stop stupid fuzzy searches

Did You Mean at Deezer Query revision system

STOP STUPID FUZZY SEARCHES !20

Page 21: Stop stupid fuzzy searches

Did You Mean at Deezer First results

STOP STUPID FUZZY SEARCHES ! 21

Page 22: Stop stupid fuzzy searches

Did You Mean at Deezer First Results

STOP STUPID FUZZY SEARCHES ! 22

Page 23: Stop stupid fuzzy searches

Conclusion

/04

STOP STUPID FUZZY SEARCHES !

Page 24: Stop stupid fuzzy searches

Pros

● For users :

○ get more relevant results

○ show what the system is really searching for

● For servers :

○ save CPU usage

○ improve query response time

○ save memcache space

Cons

STOP STUPID FUZZY SEARCHES !

● Learn from user search actions : potentially subject to “bombing”

Conclusion

24

Page 25: Stop stupid fuzzy searches

● How many KHRQs does the system need ?

● When we automatically replace user query by a revisioned query or we leave the choice to the user ?

● How many mistakes could we allow to the user ?

● Could we combine other types of similarities to pair NQs with KHRQs :

○ semantic similarity ?

○ syntactic similarity ?

● Is Elasticsearch is the best tools to be a query revision server ?

Cons

STOP STUPID FUZZY SEARCHES !

Opened questions

25

Page 26: Stop stupid fuzzy searches