scalable visual comparison of biological trees and sequences

73
1 Scalable Visual Comparison of Biological Trees and Sequences Tamara Munzner University of British Columbia Department of Computer Science Imager

Upload: pandora-trevino

Post on 30-Dec-2015

29 views

Category:

Documents


0 download

DESCRIPTION

Scalable Visual Comparison of Biological Trees and Sequences. Tamara Munzner University of British Columbia Department of Computer Science. Imager. Outline. Accordion Drawing information visualization technique TreeJuxtaposer tree comparison SequenceJuxtaposer sequence comparison - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Scalable Visual Comparison of Biological Trees and Sequences

1

Scalable Visual Comparison of Biological Trees and Sequences

Tamara Munzner

University of British Columbia

Department of Computer Science

Imager

Page 2: Scalable Visual Comparison of Biological Trees and Sequences

2

Outline

• Accordion Drawing– information visualization technique

• TreeJuxtaposer– tree comparison

• SequenceJuxtaposer– sequence comparison

• PRISAD– generic accordion drawing framework

Page 3: Scalable Visual Comparison of Biological Trees and Sequences

3

Accordion Drawing• rubber-sheet navigation

– stretch out part of surface, the rest squishes

– borders nailed down– Focus+Context technique

• integrated overview, details

– old idea• [Sarkar et al 93],

[Robertson et al 91]

• guaranteed visibility– marks always visible– important for scalability – new idea

• [Munzner et al 03]

Page 4: Scalable Visual Comparison of Biological Trees and Sequences

44

Guaranteed Visibility

• marks are always visible• easy with small datasets

Page 5: Scalable Visual Comparison of Biological Trees and Sequences

5

Guaranteed Visibility Challenges

• hard with larger datasets

• reasons a mark could be invisible

Page 6: Scalable Visual Comparison of Biological Trees and Sequences

6

Guaranteed Visibility Challenges

• hard with larger datasets

• reasons a mark could be invisible– outside the window

• AD solution: constrained navigation

Page 7: Scalable Visual Comparison of Biological Trees and Sequences

7

Guaranteed Visibility Challenges

• hard with larger datasets

• reasons a mark could be invisible– outside the window

• AD solution: constrained navigation

– underneath other marks• AD solution: avoid 3D

Page 8: Scalable Visual Comparison of Biological Trees and Sequences

8

Guaranteed Visibility Challenges

• hard with larger datasets

• reasons a mark could be invisible– outside the window

• AD solution: constrained navigation

– underneath other marks• AD solution: avoid 3D

– smaller than a pixel• AD solution: smart culling

Page 9: Scalable Visual Comparison of Biological Trees and Sequences

9

Guaranteed Visibility: Small Items

• Naïve culling may not draw all marked items

GV no GV

Guaranteed visibilityof marks

No guaranteed visibility

Page 10: Scalable Visual Comparison of Biological Trees and Sequences

10

Guaranteed Visibility: Small Items

• Naïve culling may not draw all marked items

GV no GV

Guaranteed visibilityof marks

No guaranteed visibility

Page 11: Scalable Visual Comparison of Biological Trees and Sequences

11

Outline

• Accordion Drawing– information visualization technique

• TreeJuxtaposer

– tree comparison

• SequenceJuxtaposer– sequence comparison

• PRISAD– generic accordion drawing framework

Page 12: Scalable Visual Comparison of Biological Trees and Sequences

12

Phylogenetic/Evolutionary Tree

M Meegaskumbura et al., Science 298:379 (2002)

Page 13: Scalable Visual Comparison of Biological Trees and Sequences

13

Common Dataset Size Today

M Meegaskumbura et al., Science 298:379 (2002)

Page 14: Scalable Visual Comparison of Biological Trees and Sequences

14

Future Goal: 10M node Tree of Life

David Hillis, Science 300:1687 (2003)

Plants

Protists

Fungi

Animals

You arehere

Page 15: Scalable Visual Comparison of Biological Trees and Sequences

15

Paper Comparison: Multiple Trees

focus

context

Page 16: Scalable Visual Comparison of Biological Trees and Sequences

16

TreeJuxtaposer• side by side comparison of evolutionary trees • [video]

– video/software downloadable from http://olduvai.sf.net/tj

Page 17: Scalable Visual Comparison of Biological Trees and Sequences

17

TJ Contributions• first interactive tree comparison system

– automatic structural difference computation– guaranteed visibility of marked areas

• scalable to large datasets– 250,000 to 500,000 total nodes– all preprocessing subquadratic– all realtime rendering sublinear

• scalable to large displays (4000 x 2000)

• introduced – guaranteed visibility, accordion drawing

Page 18: Scalable Visual Comparison of Biological Trees and Sequences

18

Structural Comparison

rayfinned fish

lungfish

salamander

frog

mammal

turtle

bird

crocodile

lizard

snake

rayfinned fish

bird

lungfish

salamander

frog

mammal

turtle

snake

lizard

crocodile

Page 19: Scalable Visual Comparison of Biological Trees and Sequences

19

Matching Leaf Nodes

rayfinned fish

lungfish

salamander

frog

mammal

turtle

bird

crocodile

lizard

snake

rayfinned fish

bird

lungfish

salamander

frog

mammal

turtle

snake

lizard

crocodile

Page 20: Scalable Visual Comparison of Biological Trees and Sequences

20

Matching Leaf Nodes

rayfinned fish

lungfish

salamander

frog

mammal

turtle

bird

crocodile

lizard

snake

rayfinned fish

bird

lungfish

salamander

frog

mammal

turtle

snake

lizard

crocodile

Page 21: Scalable Visual Comparison of Biological Trees and Sequences

21

Matching Leaf Nodes

rayfinned fish

lungfish

salamander

frog

mammal

turtle

bird

crocodile

lizard

snake

rayfinned fish

bird

lungfish

salamander

frog

mammal

turtle

snake

lizard

crocodile

Page 22: Scalable Visual Comparison of Biological Trees and Sequences

22

Matching Interior Nodes

rayfinned fish

lungfish

salamander

frog

mammal

turtle

bird

crocodile

lizard

snake

rayfinned fish

bird

lungfish

salamander

frog

mammal

turtle

snake

lizard

crocodile

Page 23: Scalable Visual Comparison of Biological Trees and Sequences

23

Matching Interior Nodes

rayfinned fish

lungfish

salamander

frog

mammal

turtle

bird

crocodile

lizard

snake

rayfinned fish

bird

lungfish

salamander

frog

mammal

turtle

snake

lizard

crocodile

Page 24: Scalable Visual Comparison of Biological Trees and Sequences

24

Matching Interior Nodes

rayfinned fish

lungfish

salamander

frog

mammal

turtle

bird

crocodile

lizard

snake

rayfinned fish

mammal

lungfish

salamander

frog

bird

turtle

snake

lizard

crocodile

Page 25: Scalable Visual Comparison of Biological Trees and Sequences

25

Matching Interior Nodes

rayfinned fish

lungfish

salamander

frog

mammal

turtle

bird

crocodile

lizard

snake

rayfinned fish

bird

lungfish

salamander

frog

mammal

turtle

snake

lizard

crocodile

?

Page 26: Scalable Visual Comparison of Biological Trees and Sequences

26

Previous Work

• tree comparison– RF distance [Robinson and Foulds 81]– perfect node matching [Day 85]– creation/deletion [Chi and Card 99]– leaves only [Graham and Kennedy 01]

Page 27: Scalable Visual Comparison of Biological Trees and Sequences

27

Similarity Score: S(m,n)

{ }{ } 3

2)()(

)()(),( ==

∪∩

=FE,D,

FE,

nm

nmnm

LL

LLS

{ }FE,D,n =)(L{ }FE,m =)(L

T1 T2A

B

C

D

E

F

A

C

B

D

F

Em n

Page 28: Scalable Visual Comparison of Biological Trees and Sequences

28

Best Corresponding Node

• – computable in O(n log2 n)– linked highlighting

T1 T2A

B

C

D

E

F

A

C

B

D

F

Em BCN(m) = n

1/32/3

2/6

00

0

0

0

0

1/2

1/2

)),(( vSv margmax)mBCN(2T∈=

Page 29: Scalable Visual Comparison of Biological Trees and Sequences

29

• – Matches intuition

1≠))BCN(,( whichfor Nodes vvS

Marking Structural DifferencesT1 T2A

B

C

D

E

F

A

C

B

D

F

Em n

Page 30: Scalable Visual Comparison of Biological Trees and Sequences

30

Outline

• Accordion Drawing– information visualization technique

• TreeJuxtaposer– tree comparison

• SequenceJuxtaposer– sequence comparison

• PRISAD– generic accordion drawing framework

Page 31: Scalable Visual Comparison of Biological Trees and Sequences

31

Genomic Sequences• multiple aligned sequences of DNA• now commonly browsed with web apps

– zoom and pan with abrupt jumps– previous work

• Ensembl [Hubbard 02], UCSC Genome Browser [Kent 02], NCBI [Wheeler 02]

• investigate benefits of accordion drawing– showing focus areas in context– smooth transitions between states– guaranteed visibility for globally visible

landmarks

Page 32: Scalable Visual Comparison of Biological Trees and Sequences

32

SequenceJuxtaposer

• comparing multiple aligned gene sequences• provides searching, difference calculation• [video]

– video/software downloadable from http://olduvai.sf.net/tj

Page 33: Scalable Visual Comparison of Biological Trees and Sequences

33

Searching

• search for motifs– protein/codon search– regular expressions supported

• results marked with guaranteed visibility

Page 34: Scalable Visual Comparison of Biological Trees and Sequences

34

Differences

• explore differences between aligned pairs– slider controls difference threshold in realtime

• results marked with guaranteed visibility

Page 35: Scalable Visual Comparison of Biological Trees and Sequences

35

SJ Contributions• fluid tree comparison system

– showing multiple focus areas in context– guaranteed visibility of marked areas

• thresholded differences, search results

• scalable to large datasets– 2M nucleotides– all realtime rendering sublinear

Page 36: Scalable Visual Comparison of Biological Trees and Sequences

36

Outline

• Accordion Drawing– information visualization technique

• TreeJuxtaposer– tree comparison

• SequenceJuxtaposer– sequence comparison

• PRISAD– generic accordion drawing framework

Page 37: Scalable Visual Comparison of Biological Trees and Sequences

37

Goals of PRISAD

• generic AD infrastructure– tree and sequence applications

• PRITree is TreeJuxtaposer using PRISAD• PRISeq is SequenceJuxtaposer using PRISAD

• efficiency– faster rendering: minimize overdrawing– smaller memory footprint

• correctness– rendering with no gaps: eliminate overculling

Page 38: Scalable Visual Comparison of Biological Trees and Sequences

38

PRISAD Navigation

• generic navigation infrastructure– application independent– uses deformable grid– split lines

• Grid lines define object boundaries

– horizontal and vertical separate• Independently movable

Page 39: Scalable Visual Comparison of Biological Trees and Sequences

39

Split line hierarchy

• data structure supports navigation, picking, drawing• two interpretations

– linear ordering

– hierarchical subdivision

A B C D E F

Page 40: Scalable Visual Comparison of Biological Trees and Sequences

40

PRISAD Architecture

world-space discretization• preprocessing

• initializing data structures• placing geometry

screen-space rendering• frame updating

• analyzing navigation state• drawing geometry

Page 41: Scalable Visual Comparison of Biological Trees and Sequences

41

World-space Discretization

interplay between infrastructure and application

Page 42: Scalable Visual Comparison of Biological Trees and Sequences

42

• application-specific layout of dataset– non-overlapping objects

• initialize PRISAD split line hierarchies– objects aligned by split lines

Laying Out & Initializing

A A C C

A T T T68

12

34 5

97

Page 43: Scalable Visual Comparison of Biological Trees and Sequences

43

4

2

• application-specific layout of dataset– non-overlapping objects

• initialize PRISAD split line hierarchies– objects aligned by split lines

Laying Out & Initializing

A A C C

A T T T68

12

34 5

97

4

5

Page 44: Scalable Visual Comparison of Biological Trees and Sequences

44

Gridding

• each geometric object assigned its four encompassing split line boundaries

A A C C

A T T T

Page 45: Scalable Visual Comparison of Biological Trees and Sequences

45

Mapping

• PRITree mapping initializes leaf references– bidirectional O(1) reference between leaves and

split lines

13

468

25

973

4

12

5

84

95

63

52

21 Map

68

25

9

Split line Leaf index

Page 46: Scalable Visual Comparison of Biological Trees and Sequences

46

Screen-space Rendering

control flow to draw each frame

Page 47: Scalable Visual Comparison of Biological Trees and Sequences

47

Partitioning

• partition object set into bite-sized ranges– using current split line screen-space positions

• required for every frame

– subdivision stops if region smaller than 1 pixel• or if range contains only 1 object

1234

5

[1,2]

[3,4]

[5]

{ [1,2], [3,4], [5] }

Queue of ranges

Page 48: Scalable Visual Comparison of Biological Trees and Sequences

48

Seeding

• reordering range queue result from partition– marked regions get priority in queue

• drawn first to provide landmarks

1234

5

[1,2]

[3,4]

[5]

{ [1,2], [3,4], [5] }

{ [3,4], [5], [1,2] }

ordered queue

Page 49: Scalable Visual Comparison of Biological Trees and Sequences

49

Drawing Single Range

• each enqueued object range drawn according to application geometry– selection for trees– aggregation for sequences

Page 50: Scalable Visual Comparison of Biological Trees and Sequences

50

PRITree Range Drawing

• select suitable leaf in each range

• draw path from leaf to the root– ascent-based tree drawing

– efficiency: minimize overdrawing • only draw one path per range

1234

5

[3,4]{ [3,4], [5], [1,2] }

Page 51: Scalable Visual Comparison of Biological Trees and Sequences

51

Rendering Dense Regions

– correctness: eliminate overculling• bad leaf choices would result in misleading gaps

– efficiency: maximize partition size to reduce rendering• too much reduction would result in gaps

Intended rendering Partition size too big

Page 52: Scalable Visual Comparison of Biological Trees and Sequences

52

Rendering Dense Regions

– correctness: eliminate overculling• bad leaf choices would result in misleading gaps

– efficiency: maximize partition size to reduce rendering• too much reduction would result in gaps

Intended rendering Partition size too big

Page 53: Scalable Visual Comparison of Biological Trees and Sequences

53

PRITree Skeleton

• guaranteed visibility of marked subtrees during progressive rendering

first frame: one path per marked group

full scene: entire marked subtrees

Page 54: Scalable Visual Comparison of Biological Trees and Sequences

54

PRISeq Range Drawing: Aggregation

• aggregate range to select box color for each sequence– random select to break ties

A A C C

A T T T

[1,4]

A

T

[1,4]

T T T C T

Page 55: Scalable Visual Comparison of Biological Trees and Sequences

55

PRISeq Range Drawing

• collect identical nucleotides in column– form single box to represent identical objects

• attach to split line hierarchy cache• lazy evaluation

• draw vertical column

A

T

T

T

A

{ A:[1,1], T:[2,3] }

1

2

3

Page 56: Scalable Visual Comparison of Biological Trees and Sequences

56

PRISAD Performance

• PRITree vs. TreeJuxtaposer (TJ)

• synthetic and real datasets

– complete binary trees• lowest branching factor• regular structure

– star trees• highest possible branching factor

Page 57: Scalable Visual Comparison of Biological Trees and Sequences

57

InfoVis Contest Benchmarks• two 190K node trees• directly compare TJ and PT

Page 58: Scalable Visual Comparison of Biological Trees and Sequences

58

OpenDirectory benchmarks• two 480K node trees• too large for TJ, PT results only

Page 59: Scalable Visual Comparison of Biological Trees and Sequences

59

PRITree Rendering Time Performance

TreeJuxtaposer renders all nodes for star trees• branching factor k leads to O(k) performance

Page 60: Scalable Visual Comparison of Biological Trees and Sequences

60

PRITree Rendering Time Performance

TreeJuxtaposer renders all nodes for star trees• branching factor k leads to O(k) performance

Page 61: Scalable Visual Comparison of Biological Trees and Sequences

61

PRITree Rendering Time Performance

InfoVis 2003 Contest dataset• 5x rendering speedup

Page 62: Scalable Visual Comparison of Biological Trees and Sequences

62

PRITree Rendering Time Performance

a closer look at the fastest rendering times

Page 63: Scalable Visual Comparison of Biological Trees and Sequences

63

PRITree Rendering Time Performance

Page 64: Scalable Visual Comparison of Biological Trees and Sequences

64

PRITree handles 4 million nodes in under 0.4 seconds• TreeJuxtaposer takes twice as long to render 1 million nodes

Detailed Rendering Time Performance

Page 65: Scalable Visual Comparison of Biological Trees and Sequences

65

Detailed Rendering Time PerformanceTreeJuxtaposer valley from overculling

Page 66: Scalable Visual Comparison of Biological Trees and Sequences

66

Memory Performancelinear memory usage for both applications

• 4-5x more efficient for synthetic datasets

Page 67: Scalable Visual Comparison of Biological Trees and Sequences

67

Memory Performance1GB difference for InfoVis contest comparison

• marked range storage changes improve scalability

Page 68: Scalable Visual Comparison of Biological Trees and Sequences

68

Performance Comparison

• PRITree vs. TreeJuxtaposer– detailed benchmarks against identical TJ

functionality• 5x faster, 8x smaller footprint• handles over 4M node trees

• PRISeq vs. SequenceJuxtaposer– 15x faster rendering, 20x smaller memory size– 44 species * 17K nucleotides = 770K items– 6400 species * 6400 nucleotides = 40M items

Page 69: Scalable Visual Comparison of Biological Trees and Sequences

69

Future Work

• future work– editing and annotating datasets– PRISAD support for application specific actions

• logging, replay, undo, other user actions

– develop process or template for building applications

Page 70: Scalable Visual Comparison of Biological Trees and Sequences

70

PRISAD Contributions

• infrastructure for efficient, correct, and generic accordion drawing

• efficient and correct rendering – screen-space partitioning tightly bounds overdrawing and

eliminates overculling

• first generic AD infrastructure– PRITree renders 5x faster than TJ– PRISeq renders 20x larger datasets than SJ

Page 71: Scalable Visual Comparison of Biological Trees and Sequences

71

Joint Work• TreeJuxtaposer

– François Guimbretière, Serdar Taşiran, Li Zhang, Yunhong Zhou• SIGGRAPH 2003

• SequenceJuxtaposer– James Slack, Kristian Hildebrand, Katherine St.John

• German Conference on Bioinformatics 2004

• TJC/TJC-Q– Dale Beermann, Greg Humphreys

• EuroVis 2005

• PRISAD– James Slack, Kristian Hildebrand

• IEEE InfoVis Symposium 2005• Information Visualization journal, to appear

Page 72: Scalable Visual Comparison of Biological Trees and Sequences

72

Open Source

• software freely available from http://olduvai.sourceforge.net– SequenceJuxtaposer

olduvai.sf.net/sj– TreeJuxtaposer

olduvai.sf.net/tj– requires Java and OpenGL

• JOGL bindings for TJ, GL4Java for SJ (JOGL coming soon)

• papers, talks, videos also from http://www.cs.ubc.ca/~tmm

Page 73: Scalable Visual Comparison of Biological Trees and Sequences

73

Other Projects

• Focus+Context evaluation– high-level user studies of systems– low-level visual search and memory

• graph drawing

• dimensionality reduction