efficient interactive fuzzy keyword search

54
Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Jianhua Feng , Chen Li University of California, Irvine WWW 2009 1 Dec 2011 Presentation @ IDB Lab. Seminar Presented by Jee-bum Park

Upload: nhu

Post on 22-Feb-2016

77 views

Category:

Documents


0 download

DESCRIPTION

Efficient Interactive Fuzzy Keyword Search. Shengyue Ji , Guoliang Li, Jianhua Feng , Chen Li University of California, Irvine WWW 2009 1 Dec 2011 Presentation @ IDB Lab. Seminar Presented by Jee -bum Park. Outline . Introduction Indexing Methods Single Keyword Multiple Keywords - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Efficient Interactive Fuzzy Keyword Search

Efficient Interactive Fuzzy Keyword SearchShengyue Ji, Guoliang Li, Jianhua Feng , Chen LiUniversity of California, IrvineWWW 2009

1 Dec 2011Presentation @ IDB Lab. Seminar

Presented by Jee-bum Park

Page 2: Efficient Interactive Fuzzy Keyword Search

2

Outline Introduction Indexing Methods Single Keyword Multiple Keywords Experiments Conclusions

Page 3: Efficient Interactive Fuzzy Keyword Search

3

Introduction http://searchenginewatch.com/article/2128218/Google-Searchers-Use-Autocomplete-Most-Ignore-Google-Instant-Eye-

Tracking-Study

Page 4: Efficient Interactive Fuzzy Keyword Search

4

Introduction http://searchenginewatch.com/article/2128218/Google-Searchers-Use-Autocomplete-Most-Ignore-Google-Instant-Eye-

Tracking-Study

Page 5: Efficient Interactive Fuzzy Keyword Search

5

Introduction http://searchenginewatch.com/article/2128218/Google-Searchers-Use-Autocomplete-Most-Ignore-Google-Instant-Eye-

Tracking-Study

Page 6: Efficient Interactive Fuzzy Keyword Search

6

Introduction http://searchenginewatch.com/article/2128218/Google-Searchers-Use-Autocomplete-Most-Ignore-Google-Instant-Eye-

Tracking-Study

Page 7: Efficient Interactive Fuzzy Keyword Search

7

Introduction http://searchenginewatch.com/article/2128218/Google-Searchers-Use-Autocomplete-Most-Ignore-Google-Instant-Eye-

Tracking-Study

Page 8: Efficient Interactive Fuzzy Keyword Search

8

Introduction http://searchenginewatch.com/article/2128218/Google-Searchers-Use-Autocomplete-Most-Ignore-Google-Instant-Eye-

Tracking-Study

Page 9: Efficient Interactive Fuzzy Keyword Search

9

Introduction http://searchenginewatch.com/article/2128218/Google-Searchers-Use-Autocomplete-Most-Ignore-Google-Instant-Eye-

Tracking-Study

Page 10: Efficient Interactive Fuzzy Keyword Search

10

Introduction http://searchenginewatch.com/article/2128218/Google-Searchers-Use-Autocomplete-Most-Ignore-Google-Instant-Eye-

Tracking-Study

Page 11: Efficient Interactive Fuzzy Keyword Search

11

Introduction http://searchenginewatch.com/article/2128218/Google-Searchers-Use-Autocomplete-Most-Ignore-Google-Instant-Eye-

Tracking-Study

Page 12: Efficient Interactive Fuzzy Keyword Search

12

Introduction A typical directory-search form

Page 13: Efficient Interactive Fuzzy Keyword Search

13

Introduction Interactive fuzzy search

Page 14: Efficient Interactive Fuzzy Keyword Search

14

Introduction “interactive, fuzzy search”

– Interactive The system searches for the best answers on the fly as the

user types in a keyword query– Fuzzy

The system tries to find relevant records that include words sim-ilar to the keywords in the query, even if they do not match exactly

Page 15: Efficient Interactive Fuzzy Keyword Search

15

Outline Introduction Indexing Methods Single Keyword Multiple Keywords Experiments Conclusions

Page 16: Efficient Interactive Fuzzy Keyword Search

16

Indexing Methods List

Prefix query Inverted index

li 1

lin 3, 4

liu 5

lu 4

luis 7

Page 17: Efficient Interactive Fuzzy Keyword Search

17

Indexing Methods List

– Typed “li”Prefix query Inverted index

li 1

lin 3, 4

liu 5

lu 4

luis 7

Page 18: Efficient Interactive Fuzzy Keyword Search

18

Indexing Methods List

– Typed “lu”Prefix query Inverted index

li 1

lin 3, 4

liu 5

lu 4

luis 7

Page 19: Efficient Interactive Fuzzy Keyword Search

19

Indexing Methods Trie

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

Page 20: Efficient Interactive Fuzzy Keyword Search

20

Indexing Methods Trie

– Typed “li”

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

Page 21: Efficient Interactive Fuzzy Keyword Search

21

Indexing Methods Trie

– Typed “li”

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

Page 22: Efficient Interactive Fuzzy Keyword Search

22

Indexing Methods Trie

– Typed “li”

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

Page 23: Efficient Interactive Fuzzy Keyword Search

23

Indexing Methods Trie

– Typed “li”

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

Page 24: Efficient Interactive Fuzzy Keyword Search

24

Indexing Methods Trie

– Typed “li”

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

Page 25: Efficient Interactive Fuzzy Keyword Search

25

Indexing Methods Trie

– Typed “li”

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

Page 26: Efficient Interactive Fuzzy Keyword Search

26

Indexing Methods Trie

– Typed “li”

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

Page 27: Efficient Interactive Fuzzy Keyword Search

27

Outline Introduction Indexing Methods Single Keyword Multiple Keywords Experiments Conclusions

Page 28: Efficient Interactive Fuzzy Keyword Search

28

Single Keyword

Page 29: Efficient Interactive Fuzzy Keyword Search

29

Single Keyword Example

– Query = “nlis”, edit distance threshold = 2

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

0 1 2

Edit dis-tance

Page 30: Efficient Interactive Fuzzy Keyword Search

30

Single Keyword Initial state: “”

– Query = “nlis”, edit distance threshold = 2

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

0 1 2

Edit dis-tance

Φε Delete Substitute Match Insert

<0,0>

<10,1>

<11,2>

<14,2>

Page 31: Efficient Interactive Fuzzy Keyword Search

31

Single Keyword Typed: “n”

– Query = “nlis”, edit distance threshold = 2

0 1 2

Edit dis-tance

Φε Delete Substitute Match Insert

<0,0> <0,1> <10,1>

<10,1> <10,2> <11,2><14,2>

<11,2> <12,2>

<14,2>

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

Page 32: Efficient Interactive Fuzzy Keyword Search

32

Single Keyword Typed: “n”

– Query = “nlis”, edit distance threshold = 2

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

0 1 2

Edit dis-tance

Φε Delete Substitute Match Insert

<0,0> <0,1> <10,1>

<10,1> <10,2> <11,2><14,2>

<11,2> <12,2>

<14,2>

Φn

<0,1>, <10,1>, <11,2>, <12,2>, <14,2>

Page 33: Efficient Interactive Fuzzy Keyword Search

33

Single Keyword Typed: “n”

– Query = “nlis”, edit distance threshold = 2

0 1 2

Edit dis-tance

Φn Delete Substitute Match Insert

<0,1>

<10,1>

<11,2>

<12,2>

<14,2>

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

Page 34: Efficient Interactive Fuzzy Keyword Search

34

Single Keyword Typed: “nl”

– Query = “nlis”, edit distance threshold = 2

0 1 2

Edit dis-tance

Φn Delete Substitute Match Insert

<0,1> <0,2> <10,1> <11,2><14,2>

<10,1> <10,2> <11,2><14,2>

<11,2>

<12,2>

<14,2>

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

Page 35: Efficient Interactive Fuzzy Keyword Search

35

Single Keyword Typed: “nl”

– Query = “nlis”, edit distance threshold = 2

0 1 2

Edit dis-tance

Φn Delete Substitute Match Insert

<0,1> <0,2> <10,1> <11,2><14,2>

<10,1> <10,2> <11,2><14,2>

<11,2>

<12,2>

<14,2>

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

Φnl

<10,1>, <0,2>, <11,2>, <14,2>

Page 36: Efficient Interactive Fuzzy Keyword Search

36

Single Keyword Typed: “nl”

– Query = “nlis”, edit distance threshold = 2

0 1 2

Edit dis-tance

Φnl Delete Substitute Match Insert

<10,1>

<0,2>

<11,2>

<14,2>

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

Page 37: Efficient Interactive Fuzzy Keyword Search

37

Single Keyword Typed: “nli”

– Query = “nlis”, edit distance threshold = 2

0 1 2

Edit dis-tance

Φnl Delete Substitute Match Insert

<10,1> <10,2> <14,2> <11,1> <12,2><13,2>

<0,2>

<11,2>

<14,2> <15,2>

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

Page 38: Efficient Interactive Fuzzy Keyword Search

38

Single Keyword Typed: “nli”

– Query = “nlis”, edit distance threshold = 2

0 1 2

Edit dis-tance

Φnl Delete Substitute Match Insert

<10,1> <10,2> <14,2> <11,1> <12,2><13,2>

<0,2>

<11,2>

<14,2> <15,2>

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

Φnli

<11,1>, <10,2>, <12,2>, <13,2>, <14,2>, <15,2>

Page 39: Efficient Interactive Fuzzy Keyword Search

39

Single Keyword Typed: “nli”

– Query = “nlis”, edit distance threshold = 2

0 1 2

Edit dis-tance

Φnli Delete Substitute Match Insert

<11,1>

<10,2>

<12,2>

<13,2>

<14,2>

<15,2>

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

Page 40: Efficient Interactive Fuzzy Keyword Search

40

Single Keyword Typed: “nlis”

– Query = “nlis”, edit distance threshold = 2

0 1 2

Edit dis-tance

Φnli Delete Substitute Match Insert

<11,1> <11,2> <12,2><13,2>

<10,2>

<12,2>

<13,2>

<14,2>

<15,2> <16,2>

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

Page 41: Efficient Interactive Fuzzy Keyword Search

41

Single Keyword Typed: “nlis”

– Query = “nlis”, edit distance threshold = 2

0 1 2

Edit dis-tance

Φnli Delete Substitute Match Insert

<11,1> <11,2> <12,2><13,2>

<10,2>

<12,2>

<13,2>

<14,2>

<15,2> <16,2>

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

Φnlis

<11,2>, <12,2>, <13,2>, <16,2>

Page 42: Efficient Interactive Fuzzy Keyword Search

42

Single Keyword Typed: “nlis”

– Query = “nlis”, edit distance threshold = 2

0 1 2

Edit dis-tance

Φnli Delete Substitute Match Insert

<11,1> <11,2> <12,2><13,2>

<10,2>

<12,2>

<13,2>

<14,2>

<15,2> <16,2>

10: l

0: \0

14: u

15: i

16: s

11: i

12: n

13: u

3, 4 5 7

41

Φnlis

<11,2>, <12,2>, <13,2>, <16,2>

Page 43: Efficient Interactive Fuzzy Keyword Search

43

Outline Introduction Indexing Methods Single Keyword Multiple Keywords Experiments Conclusions

Page 44: Efficient Interactive Fuzzy Keyword Search

44

Multiple Keywords Challenges in multiple keywords

– Intersection of multiple lists of keywords Each prefix query keyword has

– Multiple predicted complete keywords– The union of the lists of predicted keywords includes potential an-

swers The union lists of multiple query keywords need to be inter-

sected in order to compute the answers to the query– Cache-based incremental intersection

Page 45: Efficient Interactive Fuzzy Keyword Search

45

Multiple Keywords HYB (H. Bast, I. Weber. Type Less, Find More: Fast Autocompletion Search with a Succinct Index. In SI-

GIR 2006)

The intersections can be computed in

The union can be computed in

Total time complexity

D.id

D.content

21 apple iphone33 php programming64 apple juice91 iphone programming172

iphone galaxy tab

308

application iphone

759

difference ipv4 ipv6

W New Data Structure (HYB)ipho 950(ipho)

900(iph), 1000, ...64, 128, 256, 900(juice), 950(juice), ...

iphjuice

iphone 1, 5, 21, 91, 172, 300, 308, 3000, 3001, ...759(ipv4), 760, ...400, 759(ipv6), 800(ipv6), ...5(ipv), 6, 1100, 1200, ...5(tab), 172, 272, 800(tab), ...

ipv4ipv6ipvtab

iphon NULL5, 3000, 5123, ...ip

W’ = { iphone, ipv4, ipv6 }D ∩ Dw = D’ = { 21, 172, 308, 759 }

Page 46: Efficient Interactive Fuzzy Keyword Search

46

Multiple Keywords Forward lists

Page 47: Efficient Interactive Fuzzy Keyword Search

47

Outline Introduction Indexing Methods Single Keyword Multiple Keywords Experiments Conclusions

Page 48: Efficient Interactive Fuzzy Keyword Search

48

Experiments DBLP

– It included about one million computer science publication records

Authors, title, conference or journal name, year, page numbers, URL

MEDLINE– It had about 4 million latest publication records related to life

sciences and biomedical information Authors, their affiliations, article title, journal name, journal issue

Page 49: Efficient Interactive Fuzzy Keyword Search

49

Experiments Computing prefixes similar to a keyword

Page 50: Efficient Interactive Fuzzy Keyword Search

50

Experiments List intersection of multiple keywords

Page 51: Efficient Interactive Fuzzy Keyword Search

51

Experiments Scalability (MEDLINE)

Page 52: Efficient Interactive Fuzzy Keyword Search

52

Outline Introduction Indexing Methods Single Keyword Multiple Keywords Experiments Conclusions

Page 53: Efficient Interactive Fuzzy Keyword Search

53

Conclusions They proposed an efficient incremental algorithm to

answer single-keyword fuzzy queries

They studied various algorithms for computing the answers to a query with multiple keywords that are treated as fuzzy, prefix conditions

Page 54: Efficient Interactive Fuzzy Keyword Search

Thank You!Any Questions or Comments?