indexed search tree (trie)

21
Indexed Search Tree (Trie) Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park

Upload: aitana

Post on 11-Feb-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Indexed Search Tree (Trie). Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park. Indexed Search Tree ( Trie). Special case of tree Applicable when Key C can be decomposed into a sequence of subkeys C 1 , C 2 , … C n - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Indexed Search Tree (Trie)

Indexed Search Tree (Trie)

Fawzi EmadChau-Wen Tseng

Department of Computer ScienceUniversity of Maryland, College Park

Page 2: Indexed Search Tree (Trie)

Indexed Search Tree (Trie)

Special case of treeApplicable when

Key C can be decomposed into a sequence of subkeys C1, C2, … Cn

Redundancy exists between subkeys

ApproachStore subkey at each nodePath through trie yields full key

ExampleHuffman tree

C3

C1

C2

C4C3

Page 3: Indexed Search Tree (Trie)

Tries

Useful for searching strings String decomposes into sequence of lettersExample

“ART” “A” “R” “T”

Can be very fastLess overhead than hashing

May reduce memoryExploiting redundancy

May require more memoryExplicitly storing substrings

S

A

R

TE

“ART”

Page 4: Indexed Search Tree (Trie)

Types of Tries

StandardSingle character per node

CompressedEliminating chains of nodes

CompactStores indices into original string(s)

SuffixStores all suffixes of string

Page 5: Indexed Search Tree (Trie)

Standard Tries

ApproachEach node (except root) is labeled with a characterChildren of node are ordered (alphabetically)Paths from root to leaves yield all input strings

Trie for Morse Code

Page 6: Indexed Search Tree (Trie)

Standard Trie Example

For strings{ a, an, and, any, at }

Page 7: Indexed Search Tree (Trie)

Standard Trie Example

For strings{ bear, bell, bid, bull, buy, sell, stock, stop }

a

e

b

r

l

l

s

u

l

l

y

e t

l

l

o

c

k

p

i

d

Page 8: Indexed Search Tree (Trie)

Standard Tries

Node structureValue between 1…mReference to m children

Array or linked list

ExampleClass Node {

Letter value; // Letter V = { V1, V2, … Vm }Node child[ m ];

}

Page 9: Indexed Search Tree (Trie)

Standard Tries

EfficiencyUses O(n) space Supports search / insert / delete in O(dm) timeFor

n total size of strings indexed by tried length of the parameter stringm size of the alphabet

Page 10: Indexed Search Tree (Trie)

Word Matching Trie

Insert words into trieEach leaf stores occurrences of word in the text

s e e b e a r ? s e l l s t o c k !

s e e b u l l ? b u y s t o c k !

b i d s t o c k !

a

a

h e t h e b e l l ? s t o p !

b i d s t o c k !

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68

69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86a r

87 88

a

e

b

l

s

u

l

e t

e

0, 24

o

c

i

l

r

6

l

78

d

47, 58l

30

y

36l

12 k

17, 40,51, 62

p

84

h

e

r

69

a

Page 11: Indexed Search Tree (Trie)

Compressed Trie

ObservationInternal node v of T is redundant if v has one child and is not the root

ApproachA chain of redundant nodes can be compressed

Replace chain with single node Include concatenation of labels from chain

ResultInternal nodes have at least 2 childrenSome nodes have multiple characters

Page 12: Indexed Search Tree (Trie)

Compressed Trie

e

b

ar ll

s

u

ll y

ell to

ck p

id

a

e

b

r

l

l

s

u

l

l

y

e t

l

l

o

c

k

p

i

d

Example

Page 13: Indexed Search Tree (Trie)

Compact Tries

Compact representation of a compressed trieApproach

For an array of strings S = S[0], … S[s-1]Store ranges of indices at each node

Instead of substringRepresent as a triplet of integers (i, j, k)

Such that X = s[i][j..k]Example: S[0] = “abcd”, (0,1,2) = “bc”

PropertiesUses O(s) space, where s = # of strings in the arrayServes as an auxiliary index structure

Page 14: Indexed Search Tree (Trie)

Compact Representation

Example

s e eb e a rs e l ls t o c k

b u l lb u yb i d

h eb e l ls t o p

0 1 2 3 4a rS[0] =

S[1] =

S[2] =

S[3] =

S[4] =

S[5] =

S[6] =

S[7] =

S[8] =

S[9] =

0 1 2 3 0 1 2 3

1, 1, 1

1, 0, 0 0, 0, 0

4, 1, 1

0, 2, 2

3, 1, 2

1, 2, 3 8, 2, 3

6, 1, 2

4, 2, 3 5, 2, 2 2, 2, 3 3, 3, 4 9, 3, 3

7, 0, 3

0, 1, 1

Page 15: Indexed Search Tree (Trie)

Suffix Trie

Compressed trie of all suffixes of textExample: “IPDPS”

SuffixesIPDPSPDPSDPSPSS

Useful for finding pattern in any part of textOccurrence prefix of some suffixExample: find PDP in IPDPS

DP

S

P

IP

D

S

P

S

DP

SS

Page 16: Indexed Search Tree (Trie)

Suffix Trie

PropertiesFor

String X with length nAlphabet of size mPattern P with length d

Uses O(n) spaceCan be constructed in O(n) timeFind pattern P in X in O(dm) time

Proportional to length of pattern, not text

Page 17: Indexed Search Tree (Trie)

Suffix Trie Example

e nimize

nimize ze

zei mi

mize nimize ze

m i n i z em i0 1 2 3 4 5 6 7

7, 7 2, 7

2, 7 6, 7

6, 7

4, 7 2, 7 6, 7

1, 1 0, 1

Page 18: Indexed Search Tree (Trie)

Tries and Web Search Engines

Search engine indexCollection of all searchable wordsStored in compressed trie

Each leaf of trie Associated with a word List of pages (URLs) containing that word

Called occurrence list

Trie is kept in memory (fast)Occurrence lists kept in external memory

Ranked by relevance

Page 19: Indexed Search Tree (Trie)

Computational Biology

DNASequence of 4 different nucleotides (ATCG)Portions of DNA sequence produce proteins (genes)

GenomeMaster DNA sequence for organismFor Human

46 chromosomes3 billion nucleotides

Page 20: Indexed Search Tree (Trie)
Page 21: Indexed Search Tree (Trie)

Tries and Computational Biology

ESTsFragments of expressed DNA Indicator for genes (& location)5.5 million sequences at NIH

ESTmapperBuild suffix trie of genome

8 hours, 60 GbytesSearch for ESTs in suffix trie

11 hours w/ 8 processor Sun

Search genome w/ BLAST 5+ years (predicted)

Genome

ESTsSuffix tree

Mapping

Gene