visual compression of workflow visualizations with automated detection of macro motifs

Post on 14-Dec-2014

419 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

VIS 2013 Presentation Paper is available here: http://www.oerc.ox.ac.uk/personal-pages/emaguire/AutoMacron.pdf Code is available here: http://github.com/isa-tools/automacron

TRANSCRIPT

Visual Compression of Workflow Visualizations with Automated Detection of Macro Motifs

Eamonn Maguire, Philippe Rocca-Serra, Susanna-Assunta Sansone, Jim Davies and Min Chen

University of Oxford e-Research CentreUniversity of Oxford Department of Computer Science

VIS 2013, 13th-18th October 2013

Some terminology

Motif

Macro

Commonly observed subgraphs

A single instruction that expands automatically in to a more complex set of instructions.

Workflow Literally a flow of work showing the processes enacted from start to finish in say business processes, software execution, analysis procedures, or in our case, biological experiments.

They are used to enable reproducibility.

e.g VisTrails in our VIS community - 40,000 downloads

Q

Q

D

E

Q

QE

D

VIS 2013, 13th-18th October 2013

Very commonly seen used in: biology - protein-protein interaction, transcription/regulation networks; chemistry; and even visualization (e.g. VisComplete)

Roadmap

VIS 2013, 13th-18th October 2013

Roadmap

WorkflowSubstitute motifs with

‘macros’

VIS 2013, 13th-18th October 2013

AutomaticallyDetect Motifs

VIS 2013, 13th-18th October 2013

Blockades

VIS 2013, 13th-18th October 2013

Blockades

Current Motif Detection Algorithm Limitations

No semantics

Limited motif sizes (Max 10)

VIS 2013, 13th-18th October 2013

Blockades

Current Motif Detection Algorithm Limitations

No semantics

Limited motif sizes (Max 10) Deciding what should

be a Macro

Macros in electronic circuit diagrams are the product of years of refinement.

Macros in biological workflows for instance is new...how do we determine what should be a macro?

Example case

Biology

VIS 2013, 13th-18th October 2013

Taxonomy-based Glyph Design

Maguire et al, 2012IEEE TVCG

Visualizing (ISA based) workflows of biological experiments

Extension on Previous Work

VIS 2013, 13th-18th October 2013

A Typical Biological Experiment

Hypothesis Experiment Results

&

Paper

Analysis

VIS 2013, 13th-18th October 2013

material protocol chemical dataKEY

Source name

Sampling Protocol

Sample name

Chemical Label

Labeling Protocol

Labeled Extract

Hybridisation Protocol

Assay Name

Scanning Protocol

Raw Data File

Feature Extraction Protocol

Processed Data File

Describe the flow of work from a

biological sample to the data file.

Workflow varies between technologies,

but there is a large commonality in steps.

For example, the labeling step is very

common in DNA microarray experiments.

Representing an Experiment - Workflows!

Reproducibility!

VIS 2013, 13th-18th October 2013

MOTIF EXTRACTION ALGORITHM

RANKING ALGORITHM

MACRO SELECTION VIA UI

BIOLOGICAL WORKFLOW REPOSITORY

MACROSELECTION

GLYPH DESIGN MACRO ANNOTATION

2.87

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

2.4

OCCURRENCE

600

WORKFLOWS

240

COMPRESSION

2400

SELECTED MACROS

DOMAIN EXPERT

Branch & Merge

Branch & Merge

F

A

s1

s0

A

B DB

EE

G

H

s2

s3C

C

s4E

H

C

E

Branch & Merge

Branch & Merge

MACRO INSERTION IN GRAPH

MOTIFS

...

2.871

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

-2.43n

OCCURRENCE

20

WORKFLOWS

10

COMPRESSION

200

...

DOMAIN EXPERT

Our Process

VIS 2013, 13th-18th October 2013

MOTIF EXTRACTION ALGORITHM

RANKING ALGORITHM

MACRO SELECTION VIA UI

BIOLOGICAL WORKFLOW REPOSITORY

MACROSELECTION

GLYPH DESIGN MACRO ANNOTATION

2.87

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

2.4

OCCURRENCE

600

WORKFLOWS

240

COMPRESSION

2400

SELECTED MACROS

DOMAIN EXPERT

Branch & Merge

Branch & Merge

F

A

s1

s0

A

B DB

EE

G

H

s2

s3C

C

s4E

H

C

E

Branch & Merge

Branch & Merge

MACRO INSERTION IN GRAPH

MOTIFS

...

2.871

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

-2.43n

OCCURRENCE

20

WORKFLOWS

10

COMPRESSION

200

...

DOMAIN EXPERT

Our Process

VIS 2013, 13th-18th October 2013

Workflow Repository

9,670 Biological Experiment Workflows

Why such a large number?We can statistically make suggestions to users about what motifs can be macros based on a number of metrics (detailed later)

+ we can robustly test our algorithm performance across a huge cross section of experiments...

VIS 2013, 13th-18th October 2013

MOTIF EXTRACTION ALGORITHM

RANKING ALGORITHM

MACRO SELECTION VIA UI

BIOLOGICAL WORKFLOW REPOSITORY

MACROSELECTION

GLYPH DESIGN MACRO ANNOTATION

2.87

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

2.4

OCCURRENCE

600

WORKFLOWS

240

COMPRESSION

2400

SELECTED MACROS

DOMAIN EXPERT

Branch & Merge

Branch & Merge

F

A

s1

s0

A

B DB

EE

G

H

s2

s3C

C

s4E

H

C

E

Branch & Merge

Branch & Merge

MACRO INSERTION IN GRAPH

MOTIFS

...

2.871

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

-2.43n

OCCURRENCE

20

WORKFLOWS

10

COMPRESSION

200

...

DOMAIN EXPERT

VIS 2013, 13th-18th October 2013

MOTIF EXTRACTION ALGORITHM

RANKING ALGORITHM

MACRO SELECTION VIA UI

BIOLOGICAL WORKFLOW REPOSITORY

MACROSELECTION

GLYPH DESIGN MACRO ANNOTATION

2.87

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

2.4

OCCURRENCE

600

WORKFLOWS

240

COMPRESSION

2400

SELECTED MACROS

DOMAIN EXPERT

Branch & Merge

Branch & Merge

F

A

s1

s0

A

B DB

EE

G

H

s2

s3C

C

s4E

H

C

E

Branch & Merge

Branch & Merge

MACRO INSERTION IN GRAPH

MOTIFS

...

2.871

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

-2.43n

OCCURRENCE

20

WORKFLOWS

10

COMPRESSION

200

...

DOMAIN EXPERT

F

A

s1

s0

A

B DB

EE

G

H

s2

s3C

C

s4E

H

C

E

Motif Extraction Algorithm

VIS 2013, 13th-18th October 2013

The Current Weaknesses

No semantics (edge or node)Small node limit normally <10

VIS 2013, 13th-18th October 2013

Imagine n-grams with no information other than topology

e.g. bi-grams of DNA ‘motifs’ where instead of A-T, T-C, T-G > x-x, x-x, x-x

FANMOD, mFinder etc.

The Problem...Current Motif Extraction Algorithms

Ah, and you can’t have macros without function...

Exactly!

We can’t infer function from these results

What’s up?

Unable to infer function Unable to produce a macro

VIS 2013, 13th-18th October 2013

SolutionA a normal state, with

s1

s0

A

B DB

EE

G

H

a holding state, with

a pseudo-ŵŽƟĨ

a ‘’legal’’ ŵŽƟĨ

s2

s3C

C

s4

F

E

H

a starƟŶŐ state

a trĂŶƐŝƟŽŶ that

generates a ŵŽƟĨC

E

generates a ŵŽƟĨ

a trĂŶƐŝƟŽŶ does not

generate a ŵŽƟĨ

A a normal state, with

s1

s0

A

B DB

EE

G

H

a holding state, with

a pseudo-ŵŽƟĨ

a ‘’legal’’ ŵŽƟĨ

s2

s3C

C

s4

F

E

H

a starƟŶŐ state

a trĂŶƐŝƟŽŶ that

generates a ŵŽƟĨC

E

generates a ŵŽƟĨ

a trĂŶƐŝƟŽŶ does not

generate a ŵŽƟĨ

VIS 2013, 13th-18th October 2013

More detail about each individual case, A-H available in paper.

SolutionA a normal state, with

s1

s0

A

B DB

EE

G

H

a holding state, with

a pseudo-ŵŽƟĨ

a ‘’legal’’ ŵŽƟĨ

s2

s3C

C

s4

F

E

H

a starƟŶŐ state

a trĂŶƐŝƟŽŶ that

generates a ŵŽƟĨC

E

generates a ŵŽƟĨ

a trĂŶƐŝƟŽŶ does not

generate a ŵŽƟĨ

A a normal state, with

s1

s0

A

B DB

EE

G

H

a holding state, with

a pseudo-ŵŽƟĨ

a ‘’legal’’ ŵŽƟĨ

s2

s3C

C

s4

F

E

H

a starƟŶŐ state

a trĂŶƐŝƟŽŶ that

generates a ŵŽƟĨC

E

generates a ŵŽƟĨ

a trĂŶƐŝƟŽŶ does not

generate a ŵŽƟĨ

3

VIS 2013, 13th-18th October 2013

More detail about each individual case, A-H available in paper.

Resulting In... From our algorithm, running over 9,670 workflows, we retrieved ~12,000 motifs up to depth 12

VIS 2013, 13th-18th October 2013

Resulting In...

Semantically awareLimited by depth, not node count - we have motifs with > 80 nodes

From our algorithm, running over 9,670 workflows, we retrieved ~12,000 motifs up to depth 12

VIS 2013, 13th-18th October 2013

Essentially, more complicated topologically sensitive n-grams

MOTIF EXTRACTION ALGORITHM

RANKING ALGORITHM

MACRO SELECTION VIA UI

BIOLOGICAL WORKFLOW REPOSITORY

MACROSELECTION

GLYPH DESIGN MACRO ANNOTATION

2.87

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

2.4

OCCURRENCE

600

WORKFLOWS

240

COMPRESSION

2400

SELECTED MACROS

DOMAIN EXPERT

Branch & Merge

Branch & Merge

F

A

s1

s0

A

B DB

EE

G

H

s2

s3C

C

s4E

H

C

E

Branch & Merge

Branch & Merge

MACRO INSERTION IN GRAPH

MOTIFS

...

2.871

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

-2.43n

OCCURRENCE

20

WORKFLOWS

10

COMPRESSION

200

...

DOMAIN EXPERT

VIS 2013, 13th-18th October 2013

MOTIF EXTRACTION ALGORITHM

RANKING ALGORITHM

MACRO SELECTION VIA UI

BIOLOGICAL WORKFLOW REPOSITORY

MACROSELECTION

GLYPH DESIGN MACRO ANNOTATION

2.87

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

2.4

OCCURRENCE

600

WORKFLOWS

240

COMPRESSION

2400

SELECTED MACROS

DOMAIN EXPERT

Branch & Merge

Branch & Merge

F

A

s1

s0

A

B DB

EE

G

H

s2

s3C

C

s4E

H

C

E

Branch & Merge

Branch & Merge

MACRO INSERTION IN GRAPH

MOTIFS

...

2.871

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

-2.43n

OCCURRENCE

20

WORKFLOWS

10

COMPRESSION

200

...

DOMAIN EXPERT

2.871

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

-2.43n

OCCURRENCE

20

WORKFLOWS

10

COMPRESSION

200

...

Ranking Algorithm...because 12,000 is just too much.

VIS 2013, 13th-18th October 2013

Ranking Algorithm

M1 - Occurrences in data

repository

1,043

M2 -Workflow Presence

640...

M3 -Compression Potention

VIS 2013, 13th-18th October 2013

Ranking Algorithm

M1 - Occurrences in data

repository

1,043

M2 -Workflow Presence

640...

M3 -Compression Potention

VIS 2013, 13th-18th October 2013

Ranking Algorithm

M1 - Occurrences in data

repository

1,043

M2 -Workflow Presence

640...

M3 -Compression Potention

VIS 2013, 13th-18th October 2013

Ranking Algorithm

M1 - Occurrences in data

repository

1,043

M2 -Workflow Presence

640...

M3 -Compression Potention

For At, Aw and Ac, we map it to a fixed range [−1, 1] using a linear mapping based on the min-max range of each indicator, yielding three normalized metrics M1 , M2 and M3

VIS 2013, 13th-18th October 2013

No algorithm would be complete without a weighting element. So each metric can be weighted. We use a default weight of 1.

Ranking Algorithm

3 Normalized metrics Motif subgraph 3 Glyph representations

Filter by pattern presenceLinear, branching and merging

Filter by min/max depth

Motifs arranged by depth

Depth 6 motifs with magnified view in B and detailed popup of selected motif in D

VIS 2013, 13th-18th October 2013

Ranking Algorithm

3 Normalized metrics Motif subgraph 3 Glyph representations

Filter by pattern presenceLinear, branching and merging

Filter by min/max depth

Motifs arranged by depth

Depth 6 motifs with magnified view in B and detailed popup of selected motif in D

Occurrences Workflow

presence

Score Compression

Potential

VIS 2013, 13th-18th October 2013

Ranking Algorithm

3 Normalized metrics Motif subgraph 3 Glyph representations

Filter by pattern presenceLinear, branching and merging

Filter by min/max depth

Motifs arranged by depth

Depth 6 motifs with magnified view in B and detailed popup of selected motif in D

Occurrences Workflow

presence

Score Compression

Potential

Downgrade Icon

Adjusted Score

VIS 2013, 13th-18th October 2013

Ranking Algorithm

3 Normalized metrics Motif subgraph 3 Glyph representations

Filter by pattern presenceLinear, branching and merging

Filter by min/max depth

Motifs arranged by depth

Depth 6 motifs with magnified view in B and detailed popup of selected motif in D

Occurrences Workflow

presence

Score Compression

Potential

Downgrade Icon

Adjusted Score

VIS 2013, 13th-18th October 2013

1000

Ranking Algorithm

3 Normalized metrics Motif subgraph 3 Glyph representations

Filter by pattern presenceLinear, branching and merging

Filter by min/max depth

Motifs arranged by depth

Depth 6 motifs with magnified view in B and detailed popup of selected motif in D

Occurrences Workflow

presence

Score Compression

Potential

Downgrade Icon

Adjusted Score

VIS 2013, 13th-18th October 2013

Subset of

1000

1200

Ranking Algorithm

3 Normalized metrics Motif subgraph 3 Glyph representations

Filter by pattern presenceLinear, branching and merging

Filter by min/max depth

Motifs arranged by depth

Depth 6 motifs with magnified view in B and detailed popup of selected motif in D

Occurrences Workflow

presence

Score Compression

Potential

Downgrade Icon

Adjusted Score

VIS 2013, 13th-18th October 2013

Subset of

1000

1200 200

MOTIF EXTRACTION ALGORITHM

RANKING ALGORITHM

MACRO SELECTION VIA UI

BIOLOGICAL WORKFLOW REPOSITORY

MACROSELECTION

GLYPH DESIGN MACRO ANNOTATION

2.87

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

2.4

OCCURRENCE

600

WORKFLOWS

240

COMPRESSION

2400

SELECTED MACROS

DOMAIN EXPERT

Branch & Merge

Branch & Merge

F

A

s1

s0

A

B DB

EE

G

H

s2

s3C

C

s4E

H

C

E

Branch & Merge

Branch & Merge

MACRO INSERTION IN GRAPH

MOTIFS

...

2.871

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

-2.43n

OCCURRENCE

20

WORKFLOWS

10

COMPRESSION

200

...

DOMAIN EXPERT

VIS 2013, 13th-18th October 2013

MOTIF EXTRACTION ALGORITHM

RANKING ALGORITHM

MACRO SELECTION VIA UI

BIOLOGICAL WORKFLOW REPOSITORY

MACROSELECTION

GLYPH DESIGN MACRO ANNOTATION

2.87

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

2.4

OCCURRENCE

600

WORKFLOWS

240

COMPRESSION

2400

SELECTED MACROS

DOMAIN EXPERT

Branch & Merge

Branch & Merge

F

A

s1

s0

A

B DB

EE

G

H

s2

s3C

C

s4E

H

C

E

Branch & Merge

Branch & Merge

MACRO INSERTION IN GRAPH

MOTIFS

...

2.871

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

-2.43n

OCCURRENCE

20

WORKFLOWS

10

COMPRESSION

200

...

DOMAIN EXPERT

Glyph Design

VIS 2013, 13th-18th October 2013

Glyph Design

Topology/structure within a macro

Node type

Density

Annotation

Things we’d like to see...

VIS 2013, 13th-18th October 2013

Glyph Design

annotation

annotation

Node typecolour/shape

Node typecolour/shape

Length

Topologyarrangement

Breadth

Topologyarrangement

Breadth

Topologyoverall

Node typecolour

Length

Breadth

Length

annotation

VIS 2013, 13th-18th October 2013

annotation

annotation

STATE-TRANSITION MODEL EXAMPLES

Node typecolour/shape

Node typecolour/shape

Length

Topologyarrangement

Breadth

Topologyarrangement

Breadth

Topologyoverall

Node typecolour

Length

Breadth

Length

A

s1

s0

A

B DB

EE

G

H

s2

s3C

C

s4

F

E

H

C

E

s1

s0

AA s

1ss0 s

3

Bs1

s3C A s

1

Es4

s1

s4

F s1

s4

G

annotation

annotation

annotation

STATE-TRANSITION MODEL EXAMPLES

Node typecolour/shape

Node typecolour/shape

Length

Topologyarrangement

Breadth

Topologyarrangement

Breadth

Topologyoverall

Node typecolour

Length

Breadth

Length

A

s1

s0

A

B DB

EE

G

H

s2

s3C

C

s4

F

E

H

C

E

s1

s0

AA s

1ss0 s

3

Bs1

s3C A s

1

Es4

s1

s4

F s1

s4

G

annotation

MOTIF EXTRACTION ALGORITHM

RANKING ALGORITHM

MACRO SELECTION VIA UI

BIOLOGICAL WORKFLOW REPOSITORY

MACROSELECTION

GLYPH DESIGN MACRO ANNOTATION

2.87

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

2.4

OCCURRENCE

600

WORKFLOWS

240

COMPRESSION

2400

SELECTED MACROS

DOMAIN EXPERT

Branch & Merge

Branch & Merge

F

A

s1

s0

A

B DB

EE

G

H

s2

s3C

C

s4E

H

C

E

Branch & Merge

Branch & Merge

MACRO INSERTION IN GRAPH

MOTIFS

...

2.871

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

-2.43n

OCCURRENCE

20

WORKFLOWS

10

COMPRESSION

200

...

DOMAIN EXPERT

VIS 2013, 13th-18th October 2013

MOTIF EXTRACTION ALGORITHM

RANKING ALGORITHM

MACRO SELECTION VIA UI

BIOLOGICAL WORKFLOW REPOSITORY

MACROSELECTION

GLYPH DESIGN MACRO ANNOTATION

2.87

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

2.4

OCCURRENCE

600

WORKFLOWS

240

COMPRESSION

2400

SELECTED MACROS

DOMAIN EXPERT

Branch & Merge

Branch & Merge

F

A

s1

s0

A

B DB

EE

G

H

s2

s3C

C

s4E

H

C

E

Branch & Merge

Branch & Merge

MACRO INSERTION IN GRAPH

MOTIFS

...

2.871

OCCURRENCE

1092

WORKFLOWS

476

COMPRESSION

3276

-2.43n

OCCURRENCE

20

WORKFLOWS

10

COMPRESSION

200

...

DOMAIN EXPERT

Branch & Merge

Branch & Merge

Macro Insertion for Workflow Compression

VIS 2013, 13th-18th October 2013

Macro Insertion for Workflow Compression

VIS 2013, 13th-18th October 2013

Macro Insertion for Workflow Compression

A

VIS 2013, 13th-18th October 2013

Macro Insertion for Workflow Compression

A

B

VIS 2013, 13th-18th October 2013

Macro Insertion for Workflow Compression

A

B

C

VIS 2013, 13th-18th October 2013

Macro Insertion for Workflow Compression

A

B

C D

VIS 2013, 13th-18th October 2013

Evaluation

VIS 2013, 13th-18th October 2013

User Testing Performance

Evaluation

VIS 2013, 13th-18th October 2013

Evaluation

VIS 2013, 13th-18th October 2013

Evaluation

VIS 2013, 13th-18th October 2013

Evaluation

VIS 2013, 13th-18th October 2013

Community Dissemination

VIS 2013, 13th-18th October 2013

A B

Dissemination of macros to community

Automacron API available as an OSGi plugin for ISAcreator

VIS 2013, 13th-18th October 2013

Roadmap

WorkflowSubstitute motifs with

‘macros’AutomaticallyDetect Motifs

VIS 2013, 13th-18th October 2013

VIS 2013, 13th-18th October 2013

Overcoming the blockades

Current Motif Detection Algorithm Limitations

No semantics

Limited motif sizes (Max 10) Deciding what should

be a Macro

Macros in electronic circuit diagrams are the product of years of refinement.

Macros in biological workflows for instance is new...how do we determine what should be a macro?

VIS 2013, 13th-18th October 2013

Overcoming the blockades

Current Motif Detection Algorithm Limitations

No semantics

Limited motif sizes (Max 10) Deciding what should

be a Macro

Macros in electronic circuit diagrams are the product of years of refinement.

Macros in biological workflows for instance is new...how do we determine what should be a macro?

New semantically enabled algorithm

VIS 2013, 13th-18th October 2013

Overcoming the blockades

Current Motif Detection Algorithm Limitations

No semantics

Limited motif sizes (Max 10) Deciding what should

be a Macro

Macros in electronic circuit diagrams are the product of years of refinement.

Macros in biological workflows for instance is new...how do we determine what should be a macro?

New semantically enabled algorithm

Statistically informed selection fro

m

analysis of a large corpus of w

orkflows

VIS 2013, 13th-18th October 2013

Summary

New semantically enabled motif discovery algorithm

Statistically informed selection of macro candidates for use in biological workflow visualizations

Automated macro image generation from inferred from algorithm states

Integration of final selections and utility to compress in ISAcreator tool for curators and biologists alike

Open source - we want you to extend!

F

A

s1

s0

A

B DB

EE

G

H

s2

s3C

C

s4E

H

C

E

github.com/isa-tools/automacron

Philippe Rocca-SerraSusanna-Assunta SansoneJim DaviesMin Chen

Co-authors

AlsoAlejandra Gonzalez Beltran for many useful discussions

Bye.

You can download this software now!

And yes.

It is open source!

VIS 2013, 13th-18th October 2013

top related