annotation. traditional genome annotation blast similarities

47
Annotation

Upload: emerald-carmella-harris

Post on 18-Jan-2016

273 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Annotation. Traditional genome annotation BLAST Similarities

Annotation

Page 2: Annotation. Traditional genome annotation BLAST Similarities

Traditional genome annotation

Page 3: Annotation. Traditional genome annotation BLAST Similarities

Traditional genome annotation

BLAST Similarities

Page 4: Annotation. Traditional genome annotation BLAST Similarities

Traditional genome annotation

BLAST Similarities

Page 5: Annotation. Traditional genome annotation BLAST Similarities

Traditional genome annotation

BLAST Similarities

Page 6: Annotation. Traditional genome annotation BLAST Similarities

Traditional genome annotation

BLAST Similarities

Page 7: Annotation. Traditional genome annotation BLAST Similarities

Traditional genome annotation

BLAST Similarities

Page 8: Annotation. Traditional genome annotation BLAST Similarities

Traditional genome annotation

BLAST Similarities

Page 9: Annotation. Traditional genome annotation BLAST Similarities

Traditional genome annotation

BLAST Similarities

Page 10: Annotation. Traditional genome annotation BLAST Similarities

Traditional genome annotation

BLAST Similarities

Page 11: Annotation. Traditional genome annotation BLAST Similarities

Traditional genome annotation

BLAST Similarities

Page 12: Annotation. Traditional genome annotation BLAST Similarities

Traditional genome annotation

BLAST Similarities

Page 13: Annotation. Traditional genome annotation BLAST Similarities

Traditional genome annotation

BLAST Similarities

Page 14: Annotation. Traditional genome annotation BLAST Similarities

Traditional genome annotation

BLAST Similarities

Page 15: Annotation. Traditional genome annotation BLAST Similarities

Traditional genome annotation

BLAST Similarities

Page 16: Annotation. Traditional genome annotation BLAST Similarities

Protein Families

Page 17: Annotation. Traditional genome annotation BLAST Similarities

Protein Families

Page 18: Annotation. Traditional genome annotation BLAST Similarities

Protein Families

Page 19: Annotation. Traditional genome annotation BLAST Similarities

Protein Families

Page 20: Annotation. Traditional genome annotation BLAST Similarities

Gene Ontology

Ontology A “hierarchy” of functions Does not need to be linear

Directed Acyclic Graph

Controlled Vocabulary Decides which words or phrases to use

Page 21: Annotation. Traditional genome annotation BLAST Similarities

GO

Gene ontology A eukaryotic focus

Drosophila Mus Saccharomyces Homo

Page 22: Annotation. Traditional genome annotation BLAST Similarities

GO

Cellular component The parts of a cell

Molecular function e.g. ligand binding

Biological processes What things do

Page 23: Annotation. Traditional genome annotation BLAST Similarities

GO Terms

[GO ID, function] e.g:

GO:0004743 Ontology: molecular function Name: pyruvate kinase activity

Page 24: Annotation. Traditional genome annotation BLAST Similarities

GO Terms

[GO ID, function] e.g:

GO:0004743 Ontology: molecular function Name: pyruvate kinase activity

Mainly assigned by BLAST/HMMER/... etc

Page 25: Annotation. Traditional genome annotation BLAST Similarities

Directed Acyclic GraphMolecular function

Catalytic activity

Transferase activity

Transferase activity, transferring phosphorous

Kinase activity phosphotransferase activity,alcohol group as acceptor

Pyruvate kinase activity

Page 26: Annotation. Traditional genome annotation BLAST Similarities

Problems

Annotation by committee Eukaryotic focus

Some efforts to counter that Owen White Arriane Toussaint

Not very deep Strict controlled vocabulary

Page 27: Annotation. Traditional genome annotation BLAST Similarities

Alternatives

Page 28: Annotation. Traditional genome annotation BLAST Similarities

lacZlacI lacY lacA

Jacob & Monod, 1961

Basic biology

Page 29: Annotation. Traditional genome annotation BLAST Similarities

lacZlacI lacY lacA

Basic biology

Page 30: Annotation. Traditional genome annotation BLAST Similarities

< 80 % < 80 % < 80%

Different types of clustering

Page 31: Annotation. Traditional genome annotation BLAST Similarities

< 80 % < 80 % < 80%

Different types of clustering

Page 32: Annotation. Traditional genome annotation BLAST Similarities

Purine metabolism

Page 33: Annotation. Traditional genome annotation BLAST Similarities

< 80 % < 80 % < 80%

Different types of clustering

Page 34: Annotation. Traditional genome annotation BLAST Similarities

Heme / chlorophyll metabolis

m is conserved

They are both porphyrins

Page 35: Annotation. Traditional genome annotation BLAST Similarities

Actin

obac

teria

Aquifi

cae

Bacte

roid

etes

Chlam

ydia

e

Chlor

oflex

i

Cyano

bact

eria

Deino

cocc

us-

Ther

mus Fi

rmicut

es

Spiro

chae

tes

Ther

mot

ogae

Prot

eoba

cter

ia

1

0.8

0.6

0.4

0.2

0

Clusters of genes w/ maximum 80% identity

Genes in subsystems in clustersTotal number of genomes in group

Fra

ctio

n o

f genes

in c

lust

ers

Num

ber o

f genom

es

0

40

80

120

Avera

ge

Occurrence of clustering in different genomes

Page 36: Annotation. Traditional genome annotation BLAST Similarities

Subsystem is a generalization of “pathway” collection of functional roles jointly involved in a

biological process or complex

Functional Role is the abstract biological function of a gene product atomic, or user-defined, examples:

6-phosphofructokinase (EC 2.7.1.11) LSU ribosomal protein L31p Streptococcal virulence factors Should not contain “putative”, “thermostable”, etc

Populated subsystem is complete spreadsheet of functions and roles

The Subsystems Approach to Annotation

Page 37: Annotation. Traditional genome annotation BLAST Similarities

1 HutH Histidine ammonia-lyase (EC 4.3.1.3)

2 HutU Urocanate hydratase (EC 4.2.1.49)

3 HutI Imidazolonepropionase (EC 3.5.2.7)4 GluF Glutamate formiminotransferase (EC 2.1.2.5)

5 HutG Formiminoglutamase (EC 3.5.3.8)

6 NfoD N-formylglutamate deformylase (EC 3.5.1.68)

7 ForI Formiminoglutamic iminohydrolase (EC 3.5.3.13)

Subsystem: Histidine Degradation

Conversion of histidine to glutamate Functional roles defined in table Inclusion in subsystem is only by functional role Controlled vocabulary …

Histidine Degradation

Page 38: Annotation. Traditional genome annotation BLAST Similarities

Column headers taken from table of functional roles Rows are selected genomes or organisms Cells are populated with specific, annotated genes Functional variants defined by the annotated roles Variant code -1 indicates subsystem is not functional Clustering shown by color

Organism Variant HutH HutU HutI GluF HutG NfoD ForI

Bacteroides thetaiotaomicron 1 Q8A4B3 Q8A4A9 Q8A4B1 Q8A4B0

Desulfotela psychrophila 1 gi51246205 gi51246204 gi51246203 gi51246202

Halobacterium sp. 2 Q9HQD5 Q9HQD8 Q9HQD6 Q9HQD7

Deinococcus radiodurans 2 Q9RZ06 Q9RZ02 Q9RZ05 Q9RZ04

Bacillus subtilis 2 P10944 P25503 P42084 P42068

Caulobacter crescentus 3 P58082 Q9A9MI P58079 Q9A9M0 Q9A9L9

Pseudomonas putida 3 Q88CZ7 Q88CZ6 Q88CZ9 Q88D00 Q88CZ3

Xanthomonas campestris 3 Q8PAA7 P58988 Q8PAA6 Q8PAA8 Q8PAA5

Listeria monocytogenes -1

Subsystem Spreadsheet

Subsystem Spreadsheet

Page 39: Annotation. Traditional genome annotation BLAST Similarities

1 HutH Histidine ammonia-lyase (EC 4.3.1.3)

2 HutU Urocanate hydratase (EC 4.2.1.49)

3 HutI Imidazolonepropionase (EC 3.5.2.7)4 GluF Glutamate formiminotransferase (EC 2.1.2.5)

5 HutG Formiminoglutamase (EC 3.5.3.8)

6 NfoD N-formylglutamate deformylase (EC 3.5.1.68)

7 ForI Formiminoglutamic iminohydrolase (EC 3.5.3.13)

Subsystem: Histidine Degradation

Organism Variant HutH HutU HutI GluF HutG NfoD ForI

Bacteroides thetaiotaomicron 1 Q8A4B3 Q8A4A9 Q8A4B1 Q8A4B0

Desulfotela psychrophila 1 gi51246205 gi51246204 gi51246203 gi51246202

Halobacterium sp. 2 Q9HQD5 Q9HQD8 Q9HQD6 Q9HQD7

Deinococcus radiodurans 2 Q9RZ06 Q9RZ02 Q9RZ05 Q9RZ04

Bacillus subtilis 2 P10944 P25503 P42084 P42068

Caulobacter crescentus 3 P58082 Q9A9MI P58079 Q9A9M0 Q9A9L9

Pseudomonas putida 3 Q88CZ7 Q88CZ6 Q88CZ9 Q88D00 Q88CZ3

Xanthomonas campestris 3 Q8PAA7 P58988 Q8PAA6 Q8PAA8 Q8PAA5

Listeria monocytogenes -1

Subsystem Spreadsheet

“The Populated Subsystem”

Page 40: Annotation. Traditional genome annotation BLAST Similarities

Wet lab Chromosomal context Metabolic context Phylogenetic context Microarray data Proteomics data

Subsystems developed based on

Page 41: Annotation. Traditional genome annotation BLAST Similarities

Three level “hierarchy”

• Amino Acids and Derivatives– Alanine, serine, and glycine

• Serine Biosynthesis

• Amino Acids and Derivatives– Lysine, threonine, methionine, and

cysteine• Methionine Biosynthesis

Make your own subsystems!

About 2,500 Subsystems

Page 42: Annotation. Traditional genome annotation BLAST Similarities

Growth in Subsystems Over Time

Page 43: Annotation. Traditional genome annotation BLAST Similarities

Classification#

SSClassification

# SS

Classification # SS

Experimental Subsystems

498 Regulation and Cell signaling

51 Motility and Chemotaxis

11

Clustering-based subsystems

352 Virulence 49 Plant cell walls and outer surfaces

10

Carbohydrates 160 Stress Response 43 Phages 10

Cofactors, Vitamins, Prosthetic Groups, Pigments

123 DNA Metabolism 41 Cell Division and Cell Cycle

10

Amino Acids and Derivatives

96 Aromatic Compounds 38 Photosynthesis 9

Protein Metabolism 95 Phages 36 Metabolite damage 8

Virulence, Disease, Defense

70 Secondary Metabolism 34 Phosphorus Metabolism

7

Miscellaneous 70 Iron acquisition and metabolism

31 Potassium metabolism 4

RNA Metabolism 65 Nucleosides and Nucleotides

24 Transcriptional regulation

2

Membrane Transport 65 Sulfur Metabolism 20 Plasmids 2

Respiration 62 Dormancy and Sporulation

17 Central metabolism 2

Cell Wall and Capsule 62 Plant-prokaryote 12 Autotrophy 2

Fatty Acids, Lipids, and Isoprenoids

60 Nitrogen Metabolism 12 Arabinose Transport 1

Page 44: Annotation. Traditional genome annotation BLAST Similarities

RAST usage grows...

Page 45: Annotation. Traditional genome annotation BLAST Similarities

RAST coverage....

Page 46: Annotation. Traditional genome annotation BLAST Similarities

RASTtk

RAST2.0 Customizable choice of pipelines to run Same behind the scenes infrastructure

Page 47: Annotation. Traditional genome annotation BLAST Similarities

RASTtk