nikolaj blom center for biological sequence analysis biocentrum-dtu

54
Center for Biologisk Sekvensanalyse Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark [email protected] ”Resources of Biomolecular Data: Sequences, Structures and Functionality” PhD course #27803

Upload: gwidon

Post on 11-Jan-2016

18 views

Category:

Documents


1 download

DESCRIPTION

”Resources of Biomolecular Data: Sequences, Structures and Functionality” PhD course #27803. Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark nikob @cbs.dtu.dk. Outline. Magnitudes and Scales Resources: Data Sources & Tools - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

se

Nikolaj BlomCenter for Biological Sequence Analysis

BioCentrum-DTUTechnical University of Denmark

[email protected]

”Resources of Biomolecular

Data: Sequences, Structures and Functionality”

PhD course #27803

Page 2: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seOutline

Magnitudes and ScalesResources: Data Sources & Tools• Primary DNA sources• Sequence Repositories• Structure Repositories• Functional Categorization• Integration of Databases• The Human Genome

• Genome Browsers• Prediction Tools

• Evaluation of Prediction Servers

Starting points• Link collections

Page 3: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seResources: Sources & Tools

There is A LOT OF biomolecular databases/sourcesA LOT OF overlap of information/redundancyA LOT OF TOOLSPersonal picks/preferences• User-friendliness• Update intervals• Curation efforts / error

correction• Linkage to other DBs

Page 4: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seFaster than Moore’s Faster than Moore’s law...law...

Page 5: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

se

Human Genome

Published

HUGO: Nature, 15.feb.2001

Celera: Science,

16.feb.2001

Page 6: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

se

Magnitudes and Scales

Human genome 3,200,000,000 bp • Single basepair full

genome is 9 orders of magnitude

Genome = Football field: ~3 billion leaves of grassSingle base A T G C (or SNP) = 1 leaf of grass Genome browsing• Zooming from whole

stadium to single leaf

Page 7: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seHow we got the sequence

Sanger chain termination method

Page 8: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

sePrimary DNA sourcesTrace files repositoriesSingle read: 500-1000 bp (~golf ball size / jig saw puzzle)Variable quality• WashU-Merck Human EST Project / Trace files• ”Base-calling” non-trivial

Page 9: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seAssembly is Non-trivial!Assembly is Non-trivial!

Page 10: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seSequence repositories - GenBank et al.

GenBank / EMBL / DDBJ • Highly redundant (many versions of same

gene)• Cross-updated daily• Version history is recorded

• Previous sequence records can be retrieved

• Contigs/HTGS (100-200 kb) finishing at different stages

• Draft Finished

• Includes genomic DNA, cDNA, ESTs, translated peptides

Page 11: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seNon-redundant and Curated databases

Non-redundant• Manual or automatic curation• DNA

• RefSeq (NCBI; semi-automated)• Ensembl gene index (automated)

• Protein• RefSeq (NCBI; semi-automated)• TrEMBL (EMBL; automated)

Page 12: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seCurated database: UniProt/SwissProt

SIB - Swiss Institute of Bioinformatics Protein Knowledgebase / Sequence Database• Highly curated• Experimental evidence

evaluated (e.g. modifications)

• All 80,000 entries checked by Amos Bairoch himself ;-)

ExPASy - Expert Protein Analysis System• Proteomics tools: links +

local servers

Page 13: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seStructure databases / Protein Data Bank (PDB)

X-ray , NMR biomolecular structuresProtein Data Bank (PDB)>22,000 structures (April 2003)http://www.rcsb.org/pdb/

Page 14: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seFunctional Categorization

Gene Ontology (GO) • Hierarchical• Controlled

vocabulary

Page 15: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seFunctional Categorization

Gene Ontology (GO) http://www.geneontology.org/

• Molecular Function - the tasks performed by individual gene products; examples are transcription factor and DNA helicase

• Biological Process - broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions

• Cellular Component - subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and origin recognition complex

Page 16: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seIntegration of databases - Webs of web-sites

Links, links, links...SRS = Sequence Retrieval System• Powerful,

complex query language

BioDAS – Distributed Annotation System

http://srs.ebi.ac.uk/

Page 17: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seFor ’my gene’, how do I:

Get an overview of the sequence information known? (GeneCards)Examine the ’Genome Neighbourhood’? (Genome Browsers)Predict protein post-translational modifications (PTMs)? (Prediction servers)• (Evaluate the value of predicted features)

Page 18: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seGeneCards http://nciarray.nci.nih.gov/cards/

Page 19: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seGeneCards-II

Page 20: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seGeneCards-III

Page 21: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seGeneCards-IV

Page 22: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seGeneCards-V

Page 23: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seGenetic/Medical Information

OMIM, Online Mendelian Inheritance in Man (NCBI)• The OMIM database is a catalog of human

genes and genetic disorders• >13,000 entries (April, 2002)• Examples: cystic fibrosis, prions, amyloid

precursor protein• Condensed, highly curated descriptions of

genetics/disease/animal models/references

Page 24: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seOMIM-I (http://www3.ncbi.nlm.nih.gov/Omim/)

Page 25: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seOMIM-II

Page 26: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seOMIM-III

Page 27: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seFor ’my gene’, how do I:

Get an overview of the sequence information known? (GeneCards)Examine the ’Genome Neighbourhood’? (Genome Browsers)Predict protein post-translational modifications (PTMs)? (Prediction servers)• (Evaluate the value of predicted features)

Page 28: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seGenome Browsing

Three public• Open access• Use same genome build/assembly

• NCBI (U.S.)• UCSC (Santa Cruz, U.S.)• EnsEmbl (EBI, EU)

One private• Restricted, commercial• Academic, free usage: 1 Mbase/week• Proprietary assembly

• Celera Genomics (U.S.)

Page 29: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seCelera Human/Mouse Genomes

Page 30: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seGenome Browsers - Portals to the Genomic World

NCBI – National Center for Biotechnology Information (U.S.) • http://www.ncbi.nlm.nih.gov/Genomes/index.html

UCSC – Univ. California – Santa Cruz (U.S.)• http://genome.ucsc.edu/

EnsEmbl – European Molecular Biology Laboratory (E.U.)• http://www.ensembl.org/

Page 31: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seNCBI

Page 32: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seNCBI

Page 33: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seUCSC – Genome Browser

Page 34: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seUCSC – Genome Browser II

Page 35: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

se

Page 36: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seEnsEmbl – Genome Browser

Page 37: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seEnsEmbl – Genome Browser

Page 38: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seEnsEmbl – Genome Browser

Page 39: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seEnsEmbl – Genome Browser

Page 40: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seEnsEmbl – Genome Browser

Page 41: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seEnsEmbl – Genome Browser

Page 42: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seFor ’my gene’, how do I:

Get an overview of the sequence information known? (GeneCards)Examine the ’Genome Neighbourhood’? (Genome Browsers)Predict protein post-translational modifications (PTMs) or Gene Structure? (Prediction servers)• ...and evaluate the reliability of prediction

methods

Page 43: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seCBS Services/Toolbox http://www.cbs.dtu.dk/services/

Page 44: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

se

Page 45: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seNetPhos – a prediction server

http://www.cbs.dtu.dk/services/NetPhos/

Page 46: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seNetPhos – a prediction server

Page 47: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seEvaluating Prediction Servers

Performance on independent/cross-validated data presented?Published in peer-reviewed journal?Cited by others? • Science Citation Index

Linked to from credible web sites? • Google Page-rank• ”link:URL” search

Page 48: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seEvaluating Prediction Servers

Page 49: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

se2can Bioinformatics Education

At EBI – European Bioinformatics Institutehttp://www.ebi.ac.uk/2can/index.htmlTutorials, resource links, etc.

Page 50: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seStarting Points

General Bioinformatics• NCBI, National Center for Biotechnology

Information, U.S.• EBI, European Bioinformatics Institute

Prediction Tools• CBS, DK• Expasy (Protein analysis), Switzerland

Page 51: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seDynamic Resources

Pros• Includes most recent developments• Updated regularly• User interface improves(usually)

Cons• Difficult to keep pace• Tutorials and lectures hard to recycle ;-(• Difficult to use at irregular intervals

Page 52: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

seGenome Browsers - Portals to the Genomic World

Three main entry points:• NCBI, UCSC, EnsEmbl

• Essentially contain same information• High degree of linking to secondary databases• Advisable to become familiar with only one

genome browser• Learn to navigate and make queries

GeneCards and OMIM• well suited for getting a quick overview of a

gene of interest

Page 53: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

sePrediction Servers

Evaluate scientific ’soundness’• Look for indications of quality (citations,

etc.)

Remember that prediction servers provide...well, predictions!

Page 54: Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU

Cente

r fo

r B

iolo

gis

k Sekv

ensa

naly

se

The End