the string database what it does and how it interfaces to other resources the string database what...

29
The STRING Database What it does and how it interfaces to other resources Christian von Mering, University of Zurich & SIB bigDATA Workshop

Post on 22-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

The STRING Database

What it does and how it interfaces to other resources

The STRING Database

What it does and how it interfaces to other resources

Christian von Mering, University of Zurich & SIBbigDATA Workshop

Page 2: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

- viewers for all types of evidence

- focus on useability and speed

- integrated scoring scheme

- information transfer between species

Genomic Neighborhood

Genes/Species Co-occurence

Gene Fusions

Database Imports

Exp. Interaction Data

Co-expression

Literature co-occurence

STRING http://string-db.org/

Page 3: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

http://string-db.org

• 630 organisms

• 2.6 Mio proteins

• 88 Mio interactions

• server-footprint: 320 Gb

Numbers:

Page 4: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

networks

Phylogenetic Profiles

Conserved Neighborhood

Gene-Fusions

quantify …

integrate …

Interaction prediction from genome information

“genomic context”

Page 5: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

Other Interaction Sources

Interaction Databases Pathway Databases

Reactome

Automated Textmining Interolog Transfer

Page 6: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

final interaction score: protein A – protein B 0.856between 0 and 1, pseudoprobability, “likelihood of functional association”

1 – (1 – nscore) * (1 – fscore) * (1 – pscore) * (1 – cscore) * (1 – escore) * (1 – tscore)neighborhood fusion cooccurence coexpression experimental textmining

nscore = 1 – (1 – nscorequery species) * (1 – nscoretransf.)

evidence transfer between speciesinformation transfer betweenspecies either via orthologs(COG database) or via homology

analog for cscore, escore, tscore,...

benchmarking

raw score

KE

GG

per

form

ance

(fra

ctio

n on

sa

me

map

) raw score Example - Neighborhood raw score:

each predictor has its own raw-score regime

gene A gene B

100 bp 6 bp 20 bp

raw score: sum of intergenic distances

The scoring system

Page 7: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

The raw score regimes

gene A gene B

100 bp 6 bp 20 bp

raw score: sum of intergenic distances

Neighborhood Phylogenetic profiles

• “similarity profiles”• singular value decomposition

raw score: euklidian distance

filter: downweigh scores for homologous pairs

raw score: constant (0.99)

Fusion experimental interactions• two-hydrid, TAP, annotated complexes, …• topology-based analysis: who with whom, how many other partners?

raw score: various (usually ‘uniqueness’ of interaction).

Co-expression

• download all microarray datasets for a given species• data normalization (spatial correction)

raw score: pairwise pearson-correlation coefficient

Textmining

• download all PubMed abstracts• identify proteins in the abstracts• search for co-mentioned pairs

raw score: log-odds score

Page 8: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

User-Experience: Aiming to be Visual and Intuitive

Page 9: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

1’000 visits / day800 users / day9’000 pageviews / day> 10’000 DB-queries / day

Page 10: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

Citations

2000 NAR Snel et al.

2003 NAR von Mering et al.

2005 NAR von Mering et al.

2007 NAR von Mering et al.

2009 NAR Jensen et al.

80 citations

215 citations

183 citations

189 citations

47 citations

total: 714 citations

Page 11: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

Cross-links

SMART: protein domain information

GENECARDS: info and products on human genes

SWISS-MODEL-REPOSITORY: homology models

CYTOSCAPE: access via plug-in architecture

SWISSPROT / UNIPROT: expert protein annotation

Page 12: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

Cross-link example

launchSwissModel

Page 13: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

Reciprocal View

popup: launchSTRING

Page 14: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

Example #1

A missing chaperone for Cytochrome C oxidase

Question: who inserts the Copper-atom into CcO ?

Page 15: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

Initial observation:

Example #1

The missing chaperone for Cytochrome C oxidase

Page 16: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

Example #1

The missing chaperone for Cytochrome C oxidase

• gene expressed• structure solved• it binds copper !• likely function - copper delivery

Page 17: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

Example #2

Simplify discovery in genome-wide association screens ?

Christian von Mering – UZH MolBio – SIB

Page 18: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

a) download data in relational database scheme

d) cross-link to server(version controlled, to network, protein, link, ...)

In-House Use of STRING

b) download data ascompact flat-files

e) PSI-MI export

f) [ SOAP / webservices ]

c) in-house installationof webserver

Page 19: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

Core organisms:

• include all model organisms (annotated knowledge)

• non-redundant, each genus is covered

• include organisms with functional genomics data

Irrelevant Organisms

[future category]

Version 9.0 – exceeding 1000 genomes

Page 20: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

More details & new features

Page 21: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

“Payload Display” - Your Own STRING Server

=> “branding” STRING via remote-control: a call-back API

=> “branding” STRING via remote-control: a call-back API

Page 22: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

Acknowledgements

The STRING team:

Samuel ChaffronManuel WeissMichael KuhnLars Juhl Jensen

Sean HooperBerend SnelMartijn HuynenPeer Bork

The STRING institutions:

SIB – Swiss Instituteof Bioinformatics

University ofZurich

TU-Dresden,University of Copenhagen

European MolecularBiology Laboratory

Page 23: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian
Page 24: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

“MySTRING”

users can register / login

using OpenID or similar for authentication

persistency of search results (“history”)

store lists / items of interest (“bag of genes”)

users can customize the interface

generate revenue (?)

Page 25: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

Feature #2 (Finding Relevant Texts)

Page 26: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

Example #2

The missing enzymes for uric acid degradation

Question: why can’t humans degrade uric acid ?

Page 27: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

Example #2

The missing enzymes for uric acid degradation

?

?

Page 28: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

Example #2

The missing enzymes for uric acid degradation

initial observation:

Page 29: The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian

Example #2

The missing enzymes for uric acid degradation

• genes cloned, expressed• enzymatic activity demonstrated• candidate short-term therapeutics !