topic outline

30
Network Biology Data Biological, Conceptual and Computational Issues around Network, System, and Pathway Data The Abstract and The Concrete

Upload: lazar

Post on 26-Jan-2016

41 views

Category:

Documents


3 download

DESCRIPTION

Network Biology Data Biological, Conceptual and Computational Issues around Network, System, and Pathway Data The Abstract and The Concrete. Topic Outline. Lessons from Genome Program and Abstract Ideas to transform data to information when looking at systems data. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Topic Outline

Network Biology Data

Biological, Conceptual and Computational Issues around Network, System, and Pathway Data

The Abstract andThe Concrete

Page 2: Topic Outline

Topic Outline Lessons from Genome Program and Abstract Ideas to transform data to

information when looking at systems data.

Two examples of Concrete Tools (ready for use) WebGestalt (for large sets of genes) Ingenuity (for networks)

A Concrete Thing: Bioinformatics Resource Center

(under development) Other tools under development

Page 3: Topic Outline

Human Genome Project (HGP): Past Lessons and Future Directions in Data…

Phenotype and System Data

Individualized Genotype data within populations

Genome Data

Genome-encoded “parts list” as data

integrator. -Common Data Elements of gene and gene Products of transcripts and

proteins. Enabling Integration and Comparison of data in NEW ways…

GeneKeyDB and related work as an integrative

foundation that can help merge with other data.

Page 4: Topic Outline

Genome Data`

HGP Highlighted some ways to succeed or fail with large data sets. ? Lessons Learned applicable for systems bio of expression, proteomics, genetic data sets? Yes.

?But, are some new approaches needed to understand SYSTEM data? Yes.

Page 5: Topic Outline

Biggest Lesson: A Biodata item has 2 questions attached to it…Mayr…HGP showed importance of the why questions in thinking about and organizing data.

Other genotype, phenotype, system data

Genome Data

How? Why?

A datum…

Page 6: Topic Outline

Genotype + Environment + DEVELOPMENT ==> Phenotype

1) Astounding Results Importance of Network thinking in development and physiology for data to

explain phenotype (e.g. PAX6)

2) Some relevance from HGP data approaches, but…Need new bioinformatics tools for network data and

thinking…

HGP results and Future Issues for new data….

Page 7: Topic Outline

Δ data in Regulatory networks

Δ data in Cellular signaling networks

Δ data in protein coding

Page 8: Topic Outline

Δ data in Regulatory networks

Δ data in Cellular signaling networks

Δ data in protein coding

A way of thinking about data…

Bioinformatics: Finding the (genotypic, environmental data) difference that makes

the (phenotypic data) difference.

(Many differences that make an interesting difference, NOT at protein coding, but at complex networks)

Page 9: Topic Outline

e.g. Alon U. 2003. Science 301: 1866; Barabasi Linked. 2003. Plume Books. Barabasi AL, Oltvai ZN. 2004. Nat. Rev. Genetics 5: 101

What is a “Network” way of viewing data…

A Biological network can be expressed and manipulated in terms of “graph theory.”

Combinatorial algorithms are needed to analyze graphs.

Nodes or Vertices May be • Genes• Gene products• Hormones, signals• Metabolites• Publications• Functional Sequence

Elements

Nodes or Vertices May be • Genes• Gene products• Hormones, signals• Metabolites• Publications• Functional Sequence

Elements

Edges or LinesEdges or Linesmay be may be • Undirected vs. directedUndirected vs. directed• Weighted vs. unweighted.Weighted vs. unweighted.

Edges or LinesEdges or Linesmay be may be • Undirected vs. directedUndirected vs. directed• Weighted vs. unweighted.Weighted vs. unweighted.

1.21.70.9

++++++

Could be…Could be…• Co-expression Networks• Gene Regulatory networks • Cell-Cell communication and signal

transduction networks.• Phylogenetic relationships among

genes, species, networks: orthology, paralogy, etc. (trees, clades, etc.)

• Gene Ontology or other Directed Acyclic Graphs.

Could be…Could be…• Co-expression Networks• Gene Regulatory networks • Cell-Cell communication and signal

transduction networks.• Phylogenetic relationships among

genes, species, networks: orthology, paralogy, etc. (trees, clades, etc.)

• Gene Ontology or other Directed Acyclic Graphs.

Page 10: Topic Outline

e.g. Alon U. 2003. Science 301: 1866; Barabasi Linked. 2003. Plume Books. Barabasi AL, Oltvai ZN. 2004. Nat. Rev. Genetics 5: 101

What is a “Network” way of viewing data…

A Biological network can be expressed and manipulated in terms of “graph theory.”

Combinatorial algorithms are needed to analyze graphs.

Nodes or Vertices May be • Genes• Gene products• Hormones, signals• Metabolites• Publications• Functional Sequence

Elements

Nodes or Vertices May be • Genes• Gene products• Hormones, signals• Metabolites• Publications• Functional Sequence

Elements

Edges or LinesEdges or Linesmay be may be • Undirected vs. directedUndirected vs. directed• Weighted vs. unweighted.Weighted vs. unweighted.• Experimental correlation (can Experimental correlation (can

be undirected) vs. be undirected) vs. mechanistic & directedmechanistic & directed

Edges or LinesEdges or Linesmay be may be • Undirected vs. directedUndirected vs. directed• Weighted vs. unweighted.Weighted vs. unweighted.• Experimental correlation (can Experimental correlation (can

be undirected) vs. be undirected) vs. mechanistic & directedmechanistic & directed

1.21.70.9

++++++

Tightly connected modules might be

found…Might be loosely analogous to

a protein sequence module that is conserved, duplicated,

and diverged. Might see similarity across different

tissue, species, etc.

Page 11: Topic Outline

Data Storage & Collaborative

Bioinformatics

Integrative Bioinformatics

Genotype & Phenotype Data Sets

Comparative Bioinformatics & Data Mining

Data Visualization

& Stats

GeneKeyDB

Large Molecular data sets

Genetic

Data

Existing Knowledge

Phenotype Data

Microarray data, proteome, etc.

MuTrack WebQTL Williams et al UTHSC

Gene-centered data integration (via GeneKEyDB, BioFoundation)Comparative, Boolean, other operations on Gene Sets & Networks WebGestalt and Ingenuity are two examples

Comparative Cladistic

Phylogenetic Analysis

Network Analysis

CS, Stats, Bio

Graph Algorithms

Sequence and

Network Modularity

Network modules:

DuplicatedDiverged

Converged

Need to collaborate, integrate, and COMPARE to find differences in

biological NETWORKS. Collaborative, Integrative, and Comparative

Bioinformatics

Page 12: Topic Outline

WebGestalt Web-based Gene Set Analysis Toolkit http://bioinfo.vanderbilt.edu/webgestalt

BingZhang

Page 13: Topic Outline

Can upload gene sets based on

1)IDs (e.g. affy, locus link, protein IDs from chip, proteome, etc.)

2) Genome LocationOr…3) Gene Ontology(common biological process,

molecular function, cellular location)

Page 14: Topic Outline

Manipulate data, as set of genes or gene productsRNA expression, proteome, genomics, statistical genetics, etc. all produce list of genes that may function in a network.

Page 15: Topic Outline

1 of 3 things to doBoolean operations on multiple sets or retrieving orthologs.

Page 16: Topic Outline

2 of 3 things to doRetrieve Data and other IDs

1 of 3 things to do

Page 17: Topic Outline

3rd thing to do “Unusual” Properties across set

Page 18: Topic Outline

e.g. What GO (biological processes, molecular functions,

and cellular locations) are in the set? Are they any that seem to occur

more than than expected…

Page 19: Topic Outline
Page 20: Topic Outline
Page 21: Topic Outline

Co-occurrence of genes and publications (GRIF)

Page 22: Topic Outline

Protein Domains in set

Page 23: Topic Outline

Chromosome locations in set…

Page 24: Topic Outline

Pathways in set (1)

Page 25: Topic Outline

Pathways in set (2)

Page 26: Topic Outline

Ingenuity

A commercial tool for manipulating graphs (networks).

VU Licensehttp://bioinfo.vanderbilt.edu/wiki/Ingenuity

(Also some open source tools, cytoscape, GeNetViz, etc. )

Page 27: Topic Outline

Use of Commercial

tool, Ingenuity by Dr N. Deanne

and Dr. Beauchamp

Pathways (3)

Page 28: Topic Outline

Bioinformatics Resource Center Developing a Bioinformatics Resource Center (BRC) that will

consist

Training infrastructure and applied workshops Support faculty using existing tools and databases (CaBIG, custom

statistical packages, NCBI genomics, imaging,molecular structure resources).

Collaborative IT Establish accessible databases in shared cores and support faculty

using these resources. … Integrative IT

Web sites that integrate information from disparate data sets: Comparative IT

Systems biology: comparing data across multiple platforms to identify new patterns—tissues and cells, molecular pathways, model organisms, toxins, etc

(taken from VUMC Strategic Plan).

Page 29: Topic Outline

Other systems…

Construction projects that can be further formed by your needs… CollabCore and Lab Blogs Genepedia, GeneKeyDB, BioFoundation Extensions to Webgestalt TFCAT, GeneCAT, CladeCAT, Pazar

Page 30: Topic Outline

AcknowledgmentsBing Zhang Stefan Kirov

Leslie GallowayBarbara JacksonBetty Lou AlspaughOakley CrawfordSuzanne Baktash Xinxia Peng Harold Shanafield Sam WangAdam TebbeShawn Ericson

Jeff Horner

A few collaborators…Bonnie LaFleur Shawn Levy

Phil Dexheimer

Michael LangstonCS collaborator

Wyeth WassermanDan Goldowitz and the TMGCRob Williams et al

WebQtl, etc.Erich BakerDan Beauchamp

Natasha Deanne Chad Johnson