260 wheeler graph2 pub - department of computer science · wheeler graphs a graph is path coherent...

26
Wheeler graphs, part 2 Ben Langmead Department of Computer Science Please sign guestbook (www.langmead-lab.org/teaching-materials) to tell me briey how you are using the slides. For original Keynote les, email me ([email protected]).

Upload: others

Post on 09-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs, part 2Ben Langmead

Department of Computer Science

Please sign guestbook (www.langmead-lab.org/teaching-materials) to tell me briefly how you are using the slides. For original Keynote files, email me ([email protected]).

Page 2: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

g a t ta c

a t

a g

Wheeler graphs

01 23

4

56 789

Can we show Wheeler Graphs are “special,” giving us analogs to FM index queries?

We start with “path coherence,” the graph version of consecutivity

Page 3: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

A graph is path coherent if there is a total order of the nodes such that:

For any consecutive range of nodes and string , the nodes reached by following edges matching also form a consecutive range

[i, j]α

α

g a t ta c

a t

a g

$01 23

46 7

589 {1, 2, 3}aga

g a t ta c

a t

a g

$01 23

46 7

589 {5, 6}aga

Page 4: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

Consider a single step where our initial set of nodes are in consecutive range and, after advancing on a single character , is the smallest range containing our next set of nodes

[i, j]c ∈ Σ [i′ , j′ ]

i

j

i′

j′

c

……

We want to show that the nodes in consist only of the

-successors of nodes [i′ , j′ ]

c [i, j]

……

Page 5: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

i

j

i′

j′

……

……

As defined, is reachable via an edge labeled

from a node in

i′

c[i, j]

Same for j′

Consider node , where with incoming edge labeled . Suppose .

x i′ < x < j′

c′ c′ ≠ c

c

cx …c′ ≠ c

Since , we have x ≮ i′ c′ ⊀ ca ≺ a′ ⟹ v < v′

Since , we have j′ ≮ x c ⊀ c′

We have , , and , giving a contradictionc′ ≽ c c ≽ c′ c′ ≠ c

Recall:

Page 6: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

i

j

i′

j′

……

……

c

c

(a = a′ ) ∧ (u < u′ ) ⟹ v ≤ v′

cCould node be a

predecessor of a node ?

x ∉ [i, j]c−y, i′ < y < j′

x

y

No. Proof is by contradiction, similar to previous argument and using:

Page 7: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

Proof takes previous arguments and extends them to string inductivelyα

For any consecutive range of nodes and string , the nodes reached by following edges matching also form a consecutive range

[i, j]α

α

Page 8: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

g a t ta c

a t

a g01 23

4

56 789

How would we represent a Wheeler graph with bitvectors?

Need to represent structure as well as node and edge labels

Page 9: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

g a t ta c

a t

a g01 23

4

56 789

Idea 1: Encode in- and outdegree of each node in unary

Here, O = 010100101010110101010 1 2 3 4 56 7 8 9

Idea 2: Concatenate in order by node

Page 10: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

g a t ta c

a t

a g01 23

4

56 789

I = 10010101010101010101 0 1 2 3 4 5 6 7 8 9

Page 11: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

g a t ta c

a t

a g01 23

4

56 789

Idea 3: Encode edge labels corresponding to 0s in O

O =L =

01010010101011010101t t cg g a a a t a

Page 12: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

g a t ta c

a t

a g01 23

4

56 789

O =L =

01010010101011010101ttcggaaata

I = 10010101010101010101

How long is ?I (# edges) + (# nodes) bits

How long is L? (# edges) charsHow long is ?O (# edges) + (# nodes) bits

Page 13: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

g a t ta c

a t

a g01 23

4

56 789

O =L =

01010010101011010101ttcggaaata

I = 10010101010101010101

How to find in- or outdegree of a node?

Page 14: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

g a t ta c

a t

a g01 23

4

56 789

I = 10010101010101010101

G . indegree(i) = I . select1(i) − I . select1(i − 1) − 1

G . indegree(1) = I . select1(1) − I . select1(0) − 1= 3 − 0 − 1 = 2

(let )I . select(-1) = 0

Page 15: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

g a t ta c

a t

a g01 23

4

56 789

G . outdegree(i) = O . select1(i) − O . select1(i − 1) − 1

G . outdegree(2) = O . select1(2) − O . select1(1) − 1= 6 − 3 − 1 = 2

(let )O . select(-1) = 0

O = 01010010101011010101

Page 16: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

g a t ta c

a t

a g01 23

4

56 789

O =L =

01010010101011010101ttcggaaata

I = 10010101010101010101

How to access the labels of the edges outgoing from a node?

Page 17: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

g a t ta c

a t

a g01 23

4

56 789

O =L =

01010010101011010101ttcggaaata

off =

for in {0, 1, …, num-1}:

num = G . outdegree(i)O . rank0(O . select1(i − 1))

jL . access(off + j)

Page 18: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

g a t ta c

a t

a g01 23

4

56 789

O =L =

01010010101011010101ttcggaaata

off =

for in {0, 1, …, num-1}:

num = G . outdegree(i)O . rank0(O . select1(i − 1))

jL . access(off + j)

Page 19: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

g a t ta c

a t

a g01 23

4

56 789

O =L =

01010010101011010101ttcggaaata

= 2

off =

for in {0, 1, …, num-1}:

num = G . outdegree(2)O . rank0(O . select1(2 − 1))

jL . access(off + j)

Page 20: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

g a t ta c

a t

a g01 23

4

56 789

O =L =

01010010101011010101ttcggaaata

off =

for in {0, 1, …, num-1}:

num = 2O . rank0(O . select1(2 − 1) = 3)

jL . access(off + j)

Page 21: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

g a t ta c

a t

a g01 23

4

56 789

O =L =

01010010101011010101ttcggaaata

off =

for in {0, 1, …, num-1}:

num = 2O . rank0(3) = 2

jL . access(off + j)

Page 22: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

g a t ta c

a t

a g01 23

4

56 789

O =L =

01010010101011010101ttcggaaata

off = for in {0, 1}:

num = 22

jL . access(off + j)

Page 23: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

g a t ta c

a t

a g01 23

4

56 789

O =L =

01010010101011010101ttcggaaata

off = for in {0, 1}:

num = 22

jL . access(2 + 0) = 𝚌

Page 24: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

g a t ta c

a t

a g01 23

4

56 789

O =L =

01010010101011010101ttcggaaata

off = for in {0, 1}:

num = 22

jL . access(2 + 1) = 𝚐

Page 25: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

. . . + C[𝚌])L . rank𝚌( . . . )

C[𝚌], C[𝚌 + 1]

$ a b a a b a0

a0 $ a b a a b0

a1 a b a $ a b1

a2 b a $ a b a1

a3 b a a b a $b0 a $ a b a a2

b1 a a b a $ a3

F LFM Index match query loop:

Page 26: 260 wheeler graph2 pub - Department of Computer Science · Wheeler graphs A graph is path coherent if there is a total order of the nodes such that: For any consecutive range of nodes

Wheeler graphs

O . rank0(O . select1( . . . )) I . rank1(I . select0( . . . + C[𝚌]))

L . rank𝚌( . . . )

I :O :L :

01010010101011010101ttcggaaata

10010101010101010101

g a t ta c

a t

a g

01 23

4

5

6 789

C[𝚌], C[𝚌 + 1]

Wheeler graph match query: