speaker: chuang chieh lin advisor: professor r. c. t. lee national chi-nan university

63
CSIE in National Chi-Nan Univ CSIE in National Chi-Nan Univ ersity ersity 1 How to Reconstruct a Large How to Reconstruct a Large Genetic Network from Genetic Network from n n Gene Gene Perturbations in fewer than Perturbations in fewer than n n 2 2 Easy Steps Easy Steps Speaker: Chuang Chieh Lin Speaker: Chuang Chieh Lin Advisor: Professor R. C. Advisor: Professor R. C. T. Lee T. Lee National Chi-Nan National Chi-Nan University University Andreas Wagner, Bioinformatics, vol. 17, No. 12, 2001, pp. 1183- 1187.

Upload: benard

Post on 05-Jan-2016

30 views

Category:

Documents


2 download

DESCRIPTION

How to Reconstruct a Large Genetic Network from n Gene Perturbations in fewer than n 2 Easy Steps. Andreas Wagner, Bioinformatics, vol. 17, No. 12, 2001, pp. 1183-1187. Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

CSIE in National Chi-Nan University CSIE in National Chi-Nan University 11

How to Reconstruct a Large Genetic How to Reconstruct a Large Genetic Network from Network from nn Gene Perturbations in Gene Perturbations in

fewer than fewer than nn22 Easy Steps Easy Steps

Speaker: Chuang Chieh LinSpeaker: Chuang Chieh Lin

Advisor: Professor R. C. T. LeeAdvisor: Professor R. C. T. Lee

National Chi-Nan UniversityNational Chi-Nan University

Andreas Wagner, Bioinformatics, vol. 17, No. 12, 2001, pp. 1183-1187.

Page 2: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

22CSIE in National Chi-Nan University CSIE in National Chi-Nan University

OutlineOutline

Introduction and basic definitionsIntroduction and basic definitions

Graph theoretical frameworkGraph theoretical framework

Parsimonious networkParsimonious network

Algorithm and complexityAlgorithm and complexity

Cycles in genetic networksCycles in genetic networks

ConclusionsConclusions

ReferencesReferences

Page 3: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

33CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Introduction and basic definitionsIntroduction and basic definitions

Graph theoretical frameworkGraph theoretical framework

Parsimonious networkParsimonious network

Algorithm and complexityAlgorithm and complexity

Cycles in genetic networksCycles in genetic networks

ConclusionsConclusions

ReferencesReferences

OutlineOutline

Page 4: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

44CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Introduction and basic definitionsIntroduction and basic definitions

Gene activityGene activity includes whether a gene is expressed or includes whether a gene is expressed or not, as mRNA, as protein etc..not, as mRNA, as protein etc..

Gene networkGene network: In this paper, we define a genetic : In this paper, we define a genetic network as a group of genes in which individual gene network as a group of genes in which individual gene can influence the activity of other genes.can influence the activity of other genes.

The core task of reconstructing genetic networks is to The core task of reconstructing genetic networks is to identify the causal structure of a gene network.identify the causal structure of a gene network.

Page 5: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

55CSIE in National Chi-Nan University CSIE in National Chi-Nan University

To reconstruct a genetic networkTo reconstruct a genetic network is to identify, for is to identify, for each network gene, which other genes and their each network gene, which other genes and their activity the gene influences directly.activity the gene influences directly.

Now, let’s see an illustration of genetic network.Now, let’s see an illustration of genetic network.

Page 6: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

66CSIE in National Chi-Nan University CSIE in National Chi-Nan University

P

P

DNA Gene 1 Gene 2 Gene 3 Gene 4 Gene 5

This is a hypothetical biochemical pathway involving two transcription factors, a protein kinase and a protein phosphatase, as well as the genes encoding them.

transcription factor

protein kinase

protein phosphatase

transcription factor

protein

active

inactive inactive

active

Page 7: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

77CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Genetic perturbationGenetic perturbation: an experimental manipulation : an experimental manipulation of gene activity by manipulating either a gene itself of gene activity by manipulating either a gene itself or its product. It includes point mutations, gene or its product. It includes point mutations, gene deletions, or other interference with the activity of the deletions, or other interference with the activity of the product.product.

Page 8: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

88CSIE in National Chi-Nan University CSIE in National Chi-Nan University

P

P

DNA Gene 1 Gene 2 Gene 3 Gene 4 Gene 5

transcription factor

protein kinase

protein phosphatase

transcription factor

protein

active

inactive inactive

active

Genetic perturbation: gene deletion Genetic perturbation: gene deletionAspect of gene activity: mRNA expression Aspect of gene activity: phosphorlation state

G1: G2, G5 G1: G3, G4G2: G5 G2: G3, G4G3: G5 G3: G4G4: G5 G4:G5: G5:

Page 9: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

99CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Introduction and basic definitionsIntroduction and basic definitions

Graph theoretical frameworkGraph theoretical framework

Parsimonious networkParsimonious network

Algorithm and complexityAlgorithm and complexity

Cycles in genetic networksCycles in genetic networks

ConclusionsConclusions

ReferencesReferences

OutlineOutline

Page 10: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

1010CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Graph theoretical frameworkGraph theoretical framework

As the previous instance indicated, we are As the previous instance indicated, we are concerned with qualitative information on gene concerned with qualitative information on gene interaction.interaction.

We consider a “We consider a “digraphdigraph”, a graph representation of ”, a graph representation of genetic networks, to this qualitative information.genetic networks, to this qualitative information.

A digraph is a directed graph consisting of nodes A digraph is a directed graph consisting of nodes and directed edges.and directed edges.

Let’s see an example.Let’s see an example.

Page 11: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

1111CSIE in National Chi-Nan University CSIE in National Chi-Nan University

We use a → b to mean that gene a influence the activity of gene b directly. For brevity, genes will be labeled by numbers from now on.

1

2 3

4

5

6

7

810

9

11

12

13

14

15

16

18

19

20

17

0

Page 12: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

1212CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Adjacency listAdjacency list: for each gene : for each gene ii, it simply shows which , it simply shows which genes’ activity state the gene genes’ activity state the gene ii influences directly. influences directly.

We denote We denote AdjAdj ((GG) to be the adjacency list of graph ) to be the adjacency list of graph GG and and AdjAdj ((ii) to be the set of nodes (genes) adjacent to ) to be the set of nodes (genes) adjacent to (directly influenced by) node (directly influenced by) node ii..

Page 13: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

1313CSIE in National Chi-Nan University CSIE in National Chi-Nan University

1

2 3

4

5

6

7

810

9

11

12

13

14

15

16

18

19

20

17

0

0: 161:2:3: 2 5 84:5: 126: 5 127: 2 178:9: 10 1510: 1 2011: 2012: 1413: 8 1714: 015: 016: 217: 818:19: 820: 6 18

Adjacency list of G:

G

Page 14: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

1414CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Accessibility listAccessibility list: the list of perturbation effects or the : the list of perturbation effects or the list of regulatory effects. It shows all nodes (genes) list of regulatory effects. It shows all nodes (genes) that can be accessed (influenced in their activity state) that can be accessed (influenced in their activity state) from a given gene by paths of direct interactions.from a given gene by paths of direct interactions.

We denote We denote AccAcc ((GG) to be the accessibility list of the ) to be the accessibility list of the graph graph GG and and AccAcc ((ii) to be the set of nodes that can be ) to be the set of nodes that can be reached (influenced) from node (gene) reached (influenced) from node (gene) ii..

Page 15: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

1515CSIE in National Chi-Nan University CSIE in National Chi-Nan University

1

2 3

4

5

6

7

810

9

11

12

13

14

15

16

18

19

20

17

0

0: 2 161:2:3: 0 2 5 8 12 14 164:5: 0 2 12 14 166: 0 2 5 12 14 167: 2 8 178:9: 0 1 2 5 6 10 12 14 15 16 18 2010: 0 1 2 5 6 12 14 16 18 2011: 0 2 5 6 12 14 16 18 2012: 0 2 14 1613: 8 1714: 0 2 1615: 0 2 1616: 217: 818:19: 820: 0 2 5 6 12 14 16 18

Accessibility list of G:

G

Page 16: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

1616CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Introduction and basic definitionsIntroduction and basic definitions

Graph theoretical frameworkGraph theoretical framework

Parsimonious networkParsimonious network

Algorithm and complexityAlgorithm and complexity

Cycles in genetic networksCycles in genetic networks

ConclusionsConclusions

ReferencesReferences

OutlineOutline

Page 17: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

1717CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Before proceeding with the algorithm, we have to Before proceeding with the algorithm, we have to give some concepts and theorems first.give some concepts and theorems first.

Page 18: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

1818CSIE in National Chi-Nan University CSIE in National Chi-Nan University

The most parsimonious networkThe most parsimonious network

An acyclic digraph defines its accessibility list, but an An acyclic digraph defines its accessibility list, but an accessibility list may have more than one accessibility list may have more than one corresponding acyclic digraph.corresponding acyclic digraph.

Let’s see an example first.Let’s see an example first.

Page 19: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

1919CSIE in National Chi-Nan University CSIE in National Chi-Nan University

0: 1 2 3 4 51: 2 3 4 52: 3 4 53:4: 55:

0

1

234

5

0

1

234

5

0

1

234

5

(a) (b)

(c) (d)

(d) is the most parsimonious network of Acc, i.e., (a).

Page 20: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

2020CSIE in National Chi-Nan University CSIE in National Chi-Nan University

An accessibility list An accessibility list AccAcc and a digraph and a digraph GG are are compatiblecompatible if if GG has has AccAcc as its accessibility list. as its accessibility list. AccAcc is the accessibility list is the accessibility list inducedinduced by by GG..

GGparspars is called is called the most parsimonious networkthe most parsimonious network compatible with compatible with AccAcc..

Page 21: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

2121CSIE in National Chi-Nan University CSIE in National Chi-Nan University

We prefer simplest or most parsimonious one of gene We prefer simplest or most parsimonious one of gene network.network.

For any accessibility list For any accessibility list AccAcc of a digraph of a digraph GG, there , there exists a most parsimonious network exists a most parsimonious network GGparspars. (From a . (From a result of a theorem.) Therefore result of a theorem.) Therefore GGparspars is the core of all is the core of all the corresponding digraphs.the corresponding digraphs.

More complicated digraphs make people confused.More complicated digraphs make people confused.

Why we prefer the most parsimonious Why we prefer the most parsimonious network?network?

Page 22: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

2222CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Theorem 1Theorem 1

Let Let AccAcc be the accessibility list of an acyclic digraph. be the accessibility list of an acyclic digraph. Then there exists exactly one graph Then there exists exactly one graph GGparspars that has that has AccAcc as its accessibility list and that has fewer edges than as its accessibility list and that has fewer edges than any other graph any other graph GG with Acc as its accessibility list. with Acc as its accessibility list.

Before starting the proof, we need to introduce some Before starting the proof, we need to introduce some terminology.terminology.

Page 23: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

2323CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Range and shortcutRange and shortcut

Consider two nodes Consider two nodes ii and and jj of a digraph that are of a digraph that are connected by an edge connected by an edge ee. The . The rangerange rr of the edge of the edge ee is is the length of the shortest path between the length of the shortest path between ii and and jj in the in the absence of absence of ee. If there is no other path connecting . If there is no other path connecting ii and and jj, then , then rr : = . : = .

An edge An edge ee with range with range rr ≥ 2 but is called a ≥ 2 but is called a shortcutshortcut..

Let’s see an example.Let’s see an example.

Page 24: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

2424CSIE in National Chi-Nan University CSIE in National Chi-Nan University

ij

zk

zk-1

zk-2

z2

z1

e

r (e) = k + 1 e is a shortcut. When eliminating e, i and j are still connected by a path of length k + 1, so r (e) = k + 1.

Page 25: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

2525CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Lemma 1Lemma 1

For any accessibility list For any accessibility list AccAcc of a digraph, there exists of a digraph, there exists a compatible graph a compatible graph GGparspars that is free of shortcuts. that is free of shortcuts.

Page 26: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

2626CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Proof of Lemma 1Proof of Lemma 1

Assume that there is no such graph Assume that there is no such graph GGparspars..

xi

yi

ei

Pi

Length of Pi is greater than 1.

xi

yi

Pi

deleting ei

If there exists a shortcut ei between xi and yi , delete ei . Then by the definition of shortcut, we’ll derive that xi and yi are still connected via Pi , whose length is greater than 1.

Page 27: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

2727CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Suppose that we have Suppose that we have nn possible ( possible (xxi i , , yyii), i.e., (), i.e., (xx11 , , yy11), ),

…, (…, (xx11, , xxnn). After repeating all possible (). After repeating all possible (xxi i , , yyii), ), ii = 1, = 1,

…, …, nn, we’ll derive a shortcut-free graph compatible , we’ll derive a shortcut-free graph compatible with the accessibility list. This is a contradiction to with the accessibility list. This is a contradiction to the assumption made in the beginning of this proof.the assumption made in the beginning of this proof.

Page 28: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

2828CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Lemma 2Lemma 2

Assume that Assume that AccAcc is the accessibility list of a digraph is the accessibility list of a digraph GG. For each node . For each node xx, the adjacency list , the adjacency list AdjAdj ((xx) of a ) of a shortcut-free graph shortcut-free graph GGparpar compatible with compatible with AccAcc is a is a subset of the adjacency list subset of the adjacency list AdjAdj ((xx) of any graph ) of any graph compatible with compatible with AccAcc..

Page 29: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

2929CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Assume that Lemma 2 is false.Assume that Lemma 2 is false.

W. L. O. G., suppose that a shortcut-free graph W. L. O. G., suppose that a shortcut-free graph GGparspars and some other graph and some other graph GG induce induce AccAcc..

By assumption, By assumption, GGparspars contains at least one node contains at least one node xx so so that that AdjAdj((xx) of ) of GGpars pars contains at least one node contains at least one node yy that that isn’t in isn’t in AdjAdj((xx) of ) of GG..

Proof of Lemma 2Proof of Lemma 2

Page 30: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

3030CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Because Because GG and and GGparspars have the same accessibility list have the same accessibility list AccAcc, there must exist some path , there must exist some path x x → → zz11 → → zz22 → … → → … → zzkk → → y y from from xx to to yy in in GG. For the same reason, . For the same reason, zz11 is is accessible from accessible from xx in in GGparspars, , zz22 from from zz11 in in GGparspars, … and , … and zzkk from from zzkk-1-1 in in GGparspars..

Therefore we can find two paths (Therefore we can find two paths (xx →…→ →…→yy) in ) in GGparspars::(1) the edge (1) the edge ee between between xx and and yy

(2) the path (2) the path xx → → zz1 1 →→zz2 2 →… →→… →zzkk → →yy

This is in contradiction to the assumption that This is in contradiction to the assumption that GGparspars is is shortcut-free because shortcut-free because ee is a shortcut. is a shortcut.

Let’s see an example!

Page 31: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

3131CSIE in National Chi-Nan University CSIE in National Chi-Nan University

x: z1 z2 yz1: z2 yz2: y

Acc: Adj(Gpars)

:

x: z1 y z1: z2

z2: yAdj(G):

x: z1 z2

z1: z2

z2: y

x

y

z2

z1

G

x

y

z2

z1

Gpars

A shortcut!

Page 32: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

3232CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Corollary 1Corollary 1

The shortcut-free graph The shortcut-free graph GGparspars compatible with compatible with AccAcc is a is a unique graph with the fewest edges among all graphs unique graph with the fewest edges among all graphs GG compatible with compatible with AccAcc..

This corollary follows immediately from Lemma 2.This corollary follows immediately from Lemma 2.

Page 33: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

3333CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Now, we can proceed to the algorithm.Now, we can proceed to the algorithm.

Page 34: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

3434CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Introduction and basic definitionsIntroduction and basic definitions

Graph theoretical frameworkGraph theoretical framework

Parsimonious networkParsimonious network

Algorithm and complexityAlgorithm and complexity

Cycles in genetic networksCycles in genetic networks

ConclusionsConclusions

ReferencesReferences

OutlineOutline

Page 35: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

3535CSIE in National Chi-Nan University CSIE in National Chi-Nan University

1:1: for all nodes for all nodes ii of of GG2:2: AdjAdj((ii) = ) = AccAcc((ii))

3:3: for all nodes for all nodes ii of of GG4:4: if node if node ii hasn’t been visited hasn’t been visited5:5: call PRUNE_ACC(call PRUNE_ACC(ii))6:6: end ifend if

7:7: PRUNE_ACC(PRUNE_ACC(ii))8:8: for all nodes for all nodes j Accj Acc((ii))9:9: if if AccAcc((jj) =) =10:10: declare declare jj as visited. as visited.11:11: elseelse12:12: call PRUNE_ACC(call PRUNE_ACC(jj))13:13: end ifend if

14:14: for all nodes for all nodes jj AccAcc((ii))15:15: for all nodes for all nodes k k AdjAdj((jj))16:16: if if k k AccAcc((ii))17:17: delete delete kk from from AdjAdj((ii))18:18: end ifend if19:19: declare node declare node ii as visited as visited20:20: end PRUNE_ACC(end PRUNE_ACC(ii))

A recursive pruning algorithm to reconstruct the most parsimonious graph from an accessibility list.

Page 36: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

3636CSIE in National Chi-Nan University CSIE in National Chi-Nan University

This algorithm is based on the following theorem, so This algorithm is based on the following theorem, so we have to get something from the theorem.we have to get something from the theorem.

Page 37: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

3737CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Theorem 2Theorem 2

Let Let AccAcc ((GG) be the accessibility list of an acyclic ) be the accessibility list of an acyclic digraph, digraph, GGparspars its most parsimonious graph, and its most parsimonious graph, and VV ((GGparspars) ) the set of all nodes of the set of all nodes of GGparspars. Then the following identity . Then the following identity holds:holds:

In stead of proving the theorem, we give an example In stead of proving the theorem, we give an example later.later.

Page 38: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

3838CSIE in National Chi-Nan University CSIE in National Chi-Nan University

0: 1 2 3 4 51: 2 3 4 52: 3 4 53:4: 55:

Original Acc(G)

0: 1 1: 2 3 4 52: 3 4 53:4: 55:

A possible corresponding G

0

1

234

5

0: 1 1: 2 2: 3 4 53:4: 55:

0

1

234

5

0 via 1, 2, 3, 4, 5 1 via 2, 3, 4, 50

1

234

5

Page 39: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

3939CSIE in National Chi-Nan University CSIE in National Chi-Nan University

0: 1 1: 2 2: 3 4 53:4: 55:

0

1

234

5

2 via 3, 4, 5

0: 1 1: 2 2: 3 4 3:4: 55:

0

1

234

5

4 via 5

0: 1 1: 2 2: 3 4 3:4: 55:

0

1

234

5

The most parsimonious network

Page 40: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

4040CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Actually, the aforementioned example is an Actually, the aforementioned example is an illustration of our algorithm.illustration of our algorithm.

From this theorem, we can derive Corollary 2.From this theorem, we can derive Corollary 2.

Page 41: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

4141CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Corollary 2Corollary 2

Let Let ii, , jj and and kk be any three pairwise different nodes of be any three pairwise different nodes of an acyclic directed shortcut-free graph an acyclic directed shortcut-free graph GG. If . If jj is is accessible from accessible from ii, then no node , then no node kk accessible from accessible from j j is is adjacent to adjacent to ii..

i

j

k

A shortcut !!

Page 42: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

4242CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Computational complexityComputational complexity

Let Let kk < < nn − − 1 be the average number of entries in a 1 be the average number of entries in a node’s accessibility list.node’s accessibility list.

Assume that there are Assume that there are nn genes, that is, genes, that is, nn entries. entries.

Page 43: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

4343CSIE in National Chi-Nan University CSIE in National Chi-Nan University

During execution, each node accessible from a node During execution, each node accessible from a node jj induces one recursive call of PRUNE_ACC, after which induces one recursive call of PRUNE_ACC, after which the node accessed from the node accessed from jj is declared as visited. Thus is declared as visited. Thus each entry of the accessibility list of a node is explored each entry of the accessibility list of a node is explored no more than once.no more than once.

Line 15 Line 15 of the algorithm loops over all nodes adjacent to of the algorithm loops over all nodes adjacent to a node a node jj. Let . Let aa denotes the average number of entries in denotes the average number of entries in AdjAdj ((jj).).

The overall computational complexity would be The overall computational complexity would be OO ((nkanka).).

Page 44: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

4444CSIE in National Chi-Nan University CSIE in National Chi-Nan University

For practical matters, large scale experimental gene For practical matters, large scale experimental gene perturbations in the yeast perturbations in the yeast Saccharomyces cerevisiaeSaccharomyces cerevisiae ((nn ≈ ≈ 6300) suggest that 6300) suggest that kk < 50 ([HMJRS2000]), < 50 ([HMJRS2000]), aa ≤ 1 ≤ 1 ([W2001a]) and thus ([W2001a]) and thus nkanka << << nn22..

Page 45: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

4545CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Storage complexityStorage complexity

The algorithm stores two copies of the accessibility The algorithm stores two copies of the accessibility list, as well as a list of the nodes that has been visited.list, as well as a list of the nodes that has been visited.

Because the graph is acyclic, the recursion depth can Because the graph is acyclic, the recursion depth can be no greater than be no greater than n n − − 1.1.

Note that Note that kk < < nn − − 1 is the average number of entries in 1 is the average number of entries in a node’s accessibility list.a node’s accessibility list.

The overall storage requirements are The overall storage requirements are OO ((nknk).).

Page 46: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

4646CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Introduction and basic definitionsIntroduction and basic definitions

Graph theoretical frameworkGraph theoretical framework

Parsimonious networkParsimonious network

Algorithm and complexityAlgorithm and complexity

Cycles in genetic networksCycles in genetic networks

ConclusionsConclusions

OutlineOutline

Page 47: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

4747CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Dealing with cyclesDealing with cycles

All we have mentioned are restricted on acyclic graphs.All we have mentioned are restricted on acyclic graphs.

Now let us go to see the problems brought by cyclic Now let us go to see the problems brought by cyclic graphs.graphs.

Page 48: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

4848CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Problems that single gene perturbation Problems that single gene perturbation can’t solvecan’t solve

1

4

2

30

2

3

1

04

0: 1 2 3 41: 0 2 3 42: 0 1 3 43: 0 1 2 44: 0 1 2 3

They have the same accessibility list. Therefore, we can not reconstruct the gene network uniquely.

Page 49: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

4949CSIE in National Chi-Nan University CSIE in National Chi-Nan University

1

4

2

30

2

3

1

04

Note that the order of direct regulatory interactions in these two networks is different, as reflected in the adjacency lists.

0: 31: 42: 13: 24: 0

0: 11: 22: 33: 44: 0

Page 50: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

5050CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Instead of solving this problem, we collapse the nodes Instead of solving this problem, we collapse the nodes which form a cycle into which form a cycle into a single groupa single group of nodes with of nodes with indistinguishable order of regulatory interactions.indistinguishable order of regulatory interactions.

Such a single group can be also called a Such a single group can be also called a strongly strongly connected componentconnected component or or strong component strong component of a of a directed graph directed graph GG. Every two nodes in a strong . Every two nodes in a strong component are mutually accessible.component are mutually accessible.

Let us see an example.Let us see an example.

Page 51: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

5151CSIE in National Chi-Nan University CSIE in National Chi-Nan University

1

53

10

2

7

11

154

812

14

136

9

0

10

2

7

118

14

130

1, 3, 4, 5, 15

6 , 9, 12

A single group

A single group

This graph is called a condensation of G.

Page 52: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

5252CSIE in National Chi-Nan University CSIE in National Chi-Nan University

How do we construct a condensation of a gene How do we construct a condensation of a gene network?network?

There are a theorem and a corollary before our There are a theorem and a corollary before our presenting the algorithm constructing a condensation presenting the algorithm constructing a condensation of a gene network.of a gene network.

Page 53: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

5353CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Theorem 3Theorem 3

Let Let PP be the accessibility matrix of a digraph be the accessibility matrix of a digraph GG with n with n nodes, nodes, xx11, …, , …, xxnn. The strong component containing . The strong component containing xxii is is

determined by the unit entries of determined by the unit entries of iith row in the matrix th row in the matrix . .

xi

Page 54: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

5454CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Corollary 3Corollary 3

Let Let ii and and jj ( (ii ≠≠ jj) be two nodes of a digraph ) be two nodes of a digraph GG. . ii and and jj are in the same component iff are in the same component iff and and

We use corollary 3 because we will work with accessibility lists, not matrices.

Now we are going to present the algorithm.

Page 55: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

5555CSIE in National Chi-Nan University CSIE in National Chi-Nan University

1:1: for all nodes for all nodes ii of of GG2:2: if if componentcomponent [[ii] has not been defined] has not been defined3:3: create new node create new node xx of of GG**

4:4: componentcomponent [[ii] = ] = xx5:5: for all nodes for all nodes jj AccAcc ((ii))6:6: if if i i AccAcc ((jj))7:7: componentcomponent [[jj] = ] = xx8:8: end ifend if9:9: end ifend if

10:10: for all nodes for all nodes ii of of GG**

11:11:12:12: for all nodes for all nodes ii of of GG13:13: for all nodes for all nodes jj AccAcc ((ii))14:14: if if componentcomponent [ [ii] ≠ ] ≠ componentcomponent [[jj]]15:15: if if componentcomponent [[jj] ] 16:16: add add componentcomponent [[jj] to ] to 17:17: end ifend if18:18: end ifend if

Page 56: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

5656CSIE in National Chi-Nan University CSIE in National Chi-Nan University

1

2

3 4

5

6

7

1

2

3 4

5

6

7

x1 x3

x2

1: 2 3 4 5 6 72: 1 3 4 5 6 73: 1 2 4 5 6 74: 5 6 75: 6 76: 5 77: 5 6

Page 57: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

5757CSIE in National Chi-Nan University CSIE in National Chi-Nan University

1

2

3 4

5

6

7

x1 x3

x2

1: 2 3 4 5 6 72: 1 3 4 5 6 73: 1 2 4 5 6 74: 5 6 75: 6 76: 5 77: 5 6

Page 58: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

5858CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Storage and time complexityStorage and time complexity

The graph The graph GG** has at most the same number of nodes has at most the same number of nodes and accessibility list.and accessibility list.

The algorithm generates only one copy of The algorithm generates only one copy of GG** and its and its accessibility list.accessibility list.

Therefore both time and storage complexity are Therefore both time and storage complexity are OO ((kk), ), where where kk is the average number of entries of the is the average number of entries of the accessibility list. (accessibility list. (kk < < nn22))

Page 59: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

5959CSIE in National Chi-Nan University CSIE in National Chi-Nan University

Introduction and basic definitionsIntroduction and basic definitions

Graph theoretical frameworkGraph theoretical framework

Parsimonious networkParsimonious network

Algorithm and complexityAlgorithm and complexity

Cycles in genetic networksCycles in genetic networks

ConclusionsConclusions

ReferencesReferences

OutlineOutline

Page 60: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

6060CSIE in National Chi-Nan University CSIE in National Chi-Nan University

ConclusionsConclusions

Genetics is concerned with identifying the gene Genetics is concerned with identifying the gene interactions and their biological significance.interactions and their biological significance.

Function genomics takes this concern to the next Function genomics takes this concern to the next level, that is, identifying gene interactions among level, that is, identifying gene interactions among thousands of genes in a genome.thousands of genes in a genome.

There are other ways to simplify gene networks, such There are other ways to simplify gene networks, such as Boolean logic design, reduction in symbolic logic, as Boolean logic design, reduction in symbolic logic, graph theory, and etc..graph theory, and etc..

Page 61: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

6161CSIE in National Chi-Nan University CSIE in National Chi-Nan University

ReferencesReferences

[BB2001][BB2001] Arabidopsis Gene Knockout: Phenotypes Wanted, Bouche, N. Arabidopsis Gene Knockout: Phenotypes Wanted, Bouche, N. and Bouchez, D., Curr. Opin. Plant Biol., vol. 4, pp. 111-117.and Bouchez, D., Curr. Opin. Plant Biol., vol. 4, pp. 111-117.

[DIB97][DIB97] Exploring the Metabolic and Genetic Control of Gene Expression Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale, DeRisi, J. L., Iyer, V. R., Brown, P. O., Science, Vol. on a Genomic Scale, DeRisi, J. L., Iyer, V. R., Brown, P. O., Science, Vol. 278, pp. 680-686.278, pp. 680-686.

[ESBB98][ESBB98] Cluster Analysis and Display of Genome-Wide Expression Cluster Analysis and Display of Genome-Wide Expression Patterns, Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D., Patterns, Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D., Proc. Natl Acad. Sci. USA, vol. 95, pp. 14863-14868.Proc. Natl Acad. Sci. USA, vol. 95, pp. 14863-14868.

[FW2000][FW2000] The Small World of Metabolism, Fell, D. and Wagner, A., The Small World of Metabolism, Fell, D. and Wagner, A., Nature Biotechnology, Vol. 18, pp. 1121-1122.Nature Biotechnology, Vol. 18, pp. 1121-1122.

[FKZMS2000][FKZMS2000] Functional Genomic Analysis of C. elegans Chromosome Functional Genomic Analysis of C. elegans Chromosome I by Systematic RNA Interference, Fraser, A. G., Kamath, R. S., Zipperlen, I by Systematic RNA Interference, Fraser, A. G., Kamath, R. S., Zipperlen, P., MartinezCampos, M. and Sohrmann, M., Nature, Vol. 408, pp. 325-P., MartinezCampos, M. and Sohrmann, M., Nature, Vol. 408, pp. 325-330.330.

Page 62: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

6262CSIE in National Chi-Nan University CSIE in National Chi-Nan University

[GEOCJ2000][GEOCJ2000] Functional Genomic Analysis of Cell Division in Functional Genomic Analysis of Cell Division in CC. . eleganselegans Using RNAi of Genes on Chromosome III, Gonczy, P., Echeverri, Using RNAi of Genes on Chromosome III, Gonczy, P., Echeverri, C., Oegema, K., Coulson, A. and Jones, S. J. M. C., Oegema, K., Coulson, A. and Jones, S. J. M. et alet al., Nature, Vol. 408, ., Nature, Vol. 408, pp. 331-336.pp. 331-336.[H69][H69] Graph Theory, Harary, F., Addison-Wesley, Reading, MA., 1969.Graph Theory, Harary, F., Addison-Wesley, Reading, MA., 1969.[HMJRS2000][HMJRS2000] Functional Discovery via a Compendium of Expression Functional Discovery via a Compendium of Expression Profiles, Hughes, T. R., Marton, M. J., Jones, A. R., Roberts, C. J. and Profiles, Hughes, T. R., Marton, M. J., Jones, A. R., Roberts, C. J. and Stoughton, R. et al., Cell, Vol. 102, 2000, pp. 109-126.Stoughton, R. et al., Cell, Vol. 102, 2000, pp. 109-126.[JTAOB2000][JTAOB2000] The Large-Scale Organization of Metabolic Networks, The Large-Scale Organization of Metabolic Networks, Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N. and Barebasi, A. L., Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N. and Barebasi, A. L., Nature, Vol. 407, pp. 651-654.Nature, Vol. 407, pp. 651-654.[MN99][MN99] LEDA: a Platform for Combinatorial and Geometric Computing, LEDA: a Platform for Combinatorial and Geometric Computing, Mehlhorn, K. and Naher, S., Cambrige Unversity Press, Cambrige, 1999.Mehlhorn, K. and Naher, S., Cambrige Unversity Press, Cambrige, 1999.[SSBRL99][SSBRL99] The Berkeley Drosophila Genome Project Gene Disruption The Berkeley Drosophila Genome Project Gene Disruption Project: Single P-element Insertions Mutating 25% of Vital Drosophila Project: Single P-element Insertions Mutating 25% of Vital Drosophila Genes, Spradling, A. C., Stern, D., Beaton, A., Rhem, E. J. and Laverty, T. Genes, Spradling, A. C., Stern, D., Beaton, A., Rhem, E. J. and Laverty, T. et al., Genetics, Vol. 153, 1999, pp. 135-177.et al., Genetics, Vol. 153, 1999, pp. 135-177.

Page 63: Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

6363CSIE in National Chi-Nan University CSIE in National Chi-Nan University

[THCCC99][THCCC99] Systematic Determination of Genetic Network Architecture, Systematic Determination of Genetic Network Architecture, Tavazoie, S., Hughes, J. D., Campbell, M. J., Cho, R. J. and Church, G. Tavazoie, S., Hughes, J. D., Campbell, M. J., Cho, R. J. and Church, G. M., Nature Genet., Vol. 22, 1999, pp. 281-285.M., Nature Genet., Vol. 22, 1999, pp. 281-285.[W2000][W2000] Mutational Robustness in Genetic Networks of Yeast, Wagner, Mutational Robustness in Genetic Networks of Yeast, Wagner, A., Nature Genet., Vol. 24, 2000, pp. 355-361.A., Nature Genet., Vol. 24, 2000, pp. 355-361.[W2001a][W2001a] Genetic Networks Are Sparse: Estimates Based on a Large- Genetic Networks Are Sparse: Estimates Based on a Large-Scale Genetic Perturbation Experiment, submitted, Wagner, A., 2001.Scale Genetic Perturbation Experiment, submitted, Wagner, A., 2001.[W2001b][W2001b] The Yeast Protein Interaction Network Evolves Rapidly and The Yeast Protein Interaction Network Evolves Rapidly and Contains Few Redundant Duplicate Genes, Wagner, A., Mol. Bio. Evol., Contains Few Redundant Duplicate Genes, Wagner, A., Mol. Bio. Evol., Vol. 18, 2001, pp. 1283-1292.Vol. 18, 2001, pp. 1283-1292.[WF2001][WF2001] The Small World Inside Large Metabolic Networks, Wagner, The Small World Inside Large Metabolic Networks, Wagner, A. and Fell, D., Proceedings of the Royal Society of London, Series B, A. and Fell, D., Proceedings of the Royal Society of London, Series B, Vol. 268, pp. 1803-1810.Vol. 268, pp. 1803-1810.[W97][W97] The Structure and Dynamics of Small World Networks, Watts, D. The Structure and Dynamics of Small World Networks, Watts, D. J., PhD Dissertation, Cornell University, 1999.J., PhD Dissertation, Cornell University, 1999.[WSALA99][WSALA99] Functional Characterization of the Functional Characterization of the SS. . cerevisiaecerevisiae Geneome by Geneome by Gene Deletion and Parallel Analysis, Winzeler, E. A., Shoemaker, D. D., Gene Deletion and Parallel Analysis, Winzeler, E. A., Shoemaker, D. D., Astromoffm A., Liang, H. and Anderson, K. et al., Science, Vol. 285, pp. Astromoffm A., Liang, H. and Anderson, K. et al., Science, Vol. 285, pp. 901-906.901-906.