identifying functional subnetworks in large-scale datasets

49
Identifying functional subnetworks in large-scale datasets Benno Schwikowski Institut Pasteur – Systems Biology Group http://systemsbiology.fr

Upload: zena

Post on 25-Feb-2016

53 views

Category:

Documents


6 download

DESCRIPTION

Identifying functional subnetworks in large-scale datasets. Benno Schwikowski Institut Pasteur – Systems Biology Group http://systemsbiology.fr. The three levels of this talk. Discovery of pathways active in HepC infection Cytoscape plug-ins Cytoscape platform. Hepatitis C infection. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Identifying functional subnetworks in large-scale datasets

Identifying functional subnetworksin large-scale datasets

Benno SchwikowskiInstitut Pasteur – Systems Biology Group

http://systemsbiology.fr

Page 2: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

The three levels of this talk

1. Discovery of pathways active in HepC infection

2. Cytoscape plug-ins3. Cytoscape platform

Page 3: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Hepatitis C infection

• One person out of 30 is infected• No vaccine exists• In 20% of chronic infections, liver

fibrosis and cirrhosis• Frequently requires liver

transplants

Page 4: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Studying HepC infection mRNA changes

• 50% of transplant livers become re-infected with Hepatitis C

• Study expression of 7000 genes in re-infected livers after transplantation– 1-24 month post-transplant– Samples in 3-6 month intervals

• 28 biopsies from 11 patients– Mixture of hepatocytes, hepatic stellate cell,

Kupffer cells, various types of blood cells• Compare against pre-transplant reference

pool

Page 5: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Result of mRNA expression analysis

• Most genes (5968 of 7000)were significantly under- or overexpressed in one or more experiments

• High patient-to-patient variation

Page 6: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Our approach

1. Construct seed networkamong known molecular players

2. Expand seed networkto include differentially expressed genes

3. Identify putative pathwaysby the Active Modules approach

Page 7: Identifying functional subnetworks in large-scale datasets

Seed network

Protein-proteinProtein-DNAPhosphorylationActivationRepressionCovalent bondMethylation

Types of interactions

Page 8: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

InteractionFetcher plug-inPurpose• Dynamically retrieves remote information for selected nodes

– From SQL database– Requests data via XML-RPC protocol

Currently implemented types• Protein/gene synonyms• Orthologs• Sequences (DNA, protein, DNA upstream)

– Gene, protein, • Interactions/associationsOptions• Cross-species queries• Ortholog information from Homologene• Inferred interactions (interologs)• Interactive links to Source Web pages100% open-source (client and server)

Page 9: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

2. Expand seed network

Purpose• Bring significantly up-/downregulated

genes “into the picture”Approach• Add interactions with differentially

expressed genes (“in silico pull-down”)– Use BIND, HPRD databases– Only human-curated interactions

Page 10: Identifying functional subnetworks in large-scale datasets

Network after InteractionFetcher expansion

Page 11: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Identifying putative pathwaysWhy clustering can be problematic

• Many clustering methods are not model-based significance of clusters is unclear

• Any given cluster may not be supported by all experiments – noise problem

• Clusters tend to contain unrelated genes with vaguely similar profiles

Page 12: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

The three levels of this talk

1. Discovery of pathways active in HepC infection

2. Cytoscape plug-ins3. Cytoscape platform

Page 13: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

How can the clustering issuesbe addressed? The ActiveModules

Plug-in• Define “up-/downregulated” on the

basis of a well-defined statistical model

• Also derive clusters from some of the input experiments

• Use additional evidence to focus on “plausible” clusters protein interactions

Page 14: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Interaction networks

Schwikowski, Uetz, FieldsNature Biotechnology (2000)

Page 15: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Modular organization of interaction networks

Page 16: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

A lot of interaction data is becoming available

Databases on...• Protein-protein interactions• Protein-DNA interactions• Genetic interactions• Metabolic pathways• Cell signaling pathways, similarity

relationships, literature-based relationships

Page 17: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Multi-criteria detection of modules

Experiments

Gene

s

2. Differential Gene/Protein

Abundances/Activities

1. Interaction networkbetween

genes/proteins

Page 18: Identifying functional subnetworks in large-scale datasets

Pertu

rbat

ions

/c

ondi

tions

Rank adjustment: Binomial summation

Pz = 1-(zA(j))

m

jh

hmz

hz

mhjA PPp 1 rA(j)=-1(1-

pA(j)) m = total number of conditionsj = size of subset of conditions

FinalScore

Ideker, Ozier, Schwikowski, Siegel(2002): Bioinformatics 18. S233-240

Scoring a module candidate

Page 19: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Pathways in Rosetta’s compendium

(300 conditions)

Page 20: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

The three levels of this talk

1. Discovery of pathways active in HepC infection

2. Cytoscape plug-ins3. Cytoscape platform

Page 21: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Active Modules plug-in appliedto HCV re-infection data

• Iterative application results in four significant highly overlapping subnetworks

• Repeat analysis only retaining “late-active” re-infection experiments– Eliminates pathways activated by

transplant operation – Cutoff: 8 months

Page 22: Identifying functional subnetworks in large-scale datasets

Which observations can we make locally?

Network after InteractionFetcher expansion

Bold: Differentially regulated subnetworkRed/Green: Late-active subnetwork

Page 23: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Cytotalk plug-in

• Overrepresentation analysis using Cytotalk plug-in, R, of overrepresentation of genes in Gene Ontology classes

• Cytotalk enables interactive communication with– C/C++ programs– Java processes– Python– UNIX shell scripts– R, R scripts

• Can be run on same machine or any other Internet-connected machine

• Can function as Cytoscape plug-in• 100% open-source

Page 24: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

The three levels of this talk

1. Discovery of pathways active in HepC infection

2. Cytoscape plug-ins3. Cytoscape platform

Page 25: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Some Network Visualization Tools

• Pajek - Slovenia• Osprey - SLRI, Toronto• VisANT - BU• Biolayout - EBI• GraphViz• PowerPoint• Others• Cytoscape (only open-source biology)

Page 26: Identifying functional subnetworks in large-scale datasets

Cytoscape

Page 27: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Cytoscape Basic Concepts

• Objectsvisualized as nodes

• Relationshipsvisualized as edges

• Attributes (name, sequence, source,...)

• Mappingattributes drawing customizable throughvisual mapper

Page 28: Identifying functional subnetworks in large-scale datasets

Cytoscape file formats

YDR216W pd YIL056WYDR216W pd YKR042WYDR216W pd YGL096WYDR216W pd YDR077W

[...]

GENE DESC exp0.sig exp1.sig exp0.sig exp1.sigGENE0 G0 0.0 0.0 23.2 11.5GENE1 G1 0.0 0.0 34.6 5.2GENE2 G2 0.0 0.0 10.0 28.0GENE3 G3 0.0 0.0 1.64 4.77[...]

Sample interaction file

Sample interaction file

Page 29: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Display

• gene & protein expression

• protein interactions (physical andnon-physical)

• protein classifications

Analysis plug-in modules

http://www.cytoscape.org/

Java: platform independent + web-start

• 100% open-source

Cytoscape

Page 30: Identifying functional subnetworks in large-scale datasets

Visual Styles

Display gene expressionas clear text

Page 31: Identifying functional subnetworks in large-scale datasets

Visual Styles

Map expression valuesto node colors using acontinuous mapper

Page 32: Identifying functional subnetworks in large-scale datasets

Visual Styles

Expression data mappedto node colors

Page 33: Identifying functional subnetworks in large-scale datasets

Multidimensional attributes

Cytoscape, pre-release plug-inData from Ideker et al., Science (2001)

Page 34: Identifying functional subnetworks in large-scale datasets

Layout

• 16 algorithms available through plug-ins

• Zooming, hide/show, alignment

Page 35: Identifying functional subnetworks in large-scale datasets

yFiles Circular

Page 36: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Page 37: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Cytoscape Core – Differences to most other

approaches• Emphasis on data analysis &

integration• No built-in semantics

(added by plug-ins)• Very simple concepts• Human-readable input formats• Extensibility

Page 38: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Cytoscape extensibility• Core: 100% open source Java

– Plug-in API– Plug-ins are independently licensed

• “Just need to do the biology”• Template code samples

Plug-in

Page 39: Identifying functional subnetworks in large-scale datasets

Biomodules plug-in

Prinz S, Avila-Campillo I, Aldridge C, Srinivasan A,Dimitrov K, Siegel AF, and Galitski TGenome Res. 2004 14: 380-390

Page 40: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Cytoscape PluginsModules in Complex

NetworksIliana Avila-Campillo,

Tim GalitskiDiscovering Regulatory and Signaling Circuits in Molecular Interaction NetworksTrey Ideker, Owen Ozier, Benno Schwikowski, Andrew Siegel

Data Integration in Juvenile Diabetes Research

Marta Janer, Paul Shannon

A network motif samplerDavid Reiss, Benno

Schwikowski

Page 41: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Cytoscape Core Features

• Visualize and lay out networks• Display network data using visual styles• Easily organize multiple networks• Bird’s eye view navigation of large networks• Supports SIF and GML, molecular profiling

formats, node/edge attributes• Functional annotation from GO + KEGG• Metanode support (hierarchical groupings)• Extensible through plugins (20 developed)

Page 42: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Baliga et al.Genome ResearchJune 2004

Page 43: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Collaborators: HCV

Institute for Systems Biology, Seattle, WA

• David Reiss• Iliana Avila-Campillo• Vesteinn Thorsson• Tim Galitski

Page 44: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Page 45: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Collaborators: Cytoscape• ISB

Leroy HoodRowan Christmas

• Agilent Technologies• Unilever PLC

• Long-term funding from NIH and participating institutions

• UCSDTrey IdekerChris Workman

• Memorial-Sloan KetteringCancer CenterChris SanderGary BaderEthan Cerami

• Pasteur Melissa ClineAndrea SplendianiTero Aittokallio

Page 46: Identifying functional subnetworks in large-scale datasets

Shannon, P., et al. (2003). Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res 13, 2498-504.

Page 47: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Collaborators: Active Networks

• Trey Ideker• Owen Ozier• Andrew Siegel

• Richard Karp

Page 48: Identifying functional subnetworks in large-scale datasets
Page 49: Identifying functional subnetworks in large-scale datasets

Benno Schwikowski

Levels of Biological InformationDNA

mRNAProtein

PathwaysNetworks

CellsTissuesOrgans

IndividualsPopulations

Ecologies