automated annotation of microbial genomes, opportunities and pitfalls

22
Automated Annotation of Microbial Genomes, Opportunit ies and Pitfalls Margie Romine Pacific Northwest National Laboratory Richland, Washington

Upload: blake-house

Post on 31-Dec-2015

35 views

Category:

Documents


5 download

DESCRIPTION

Automated Annotation of Microbial Genomes, Opportunities and Pitfalls. Margie Romine Pacific Northwest National Laboratory Richland, Washington. Shewanella oneidensis MR-1. Breathes Mn & Fe and other metals thereby changing their solubility - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Automated Annotation of Microbial Genomes,

Opportunities and Pitfalls

Margie RominePacific Northwest

National LaboratoryRichland, Washington

Page 2: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Shewanella oneidensis MR-1

• Breathes Mn & Fe and other metals thereby changing their solubility

• Also reduces radionuclides and hence impacts their mobility at contaminated sites

• Genome sequenced by the Institute for Genome Research in 2002 (funded by DOE-OBER)

• Can we now better determine how this organism interacts with metals and radionuclides?

Page 3: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Shewanella spp. Inhabit Many Niches

•Energy rich - fermentation is occurring and energy is continuously being deposited via sedimentation

•Rapidly changing redox conditions/dominant electron acceptors•Microbial partners are present to remove the acetate

produced via anaerobic respiration.

2 more were sequenced by DOE’s

Joint Genome Institute and 14 more are under way!

Page 4: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Bacterial Genome Sequencing Explodes

• 341 completed genomes, 976 ongoing• Partial genome sequences released in

just days now by JGI!• How do we use sequence information

to understand how all these organisms function in the environment?

• Annotation is the key, but is now largely automated and hence of lower quality

Page 5: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Locate genes

Assign putative functions

What is Annotation?

AGCTTAACTGGGATACGACGACCAGTAGACAGGTRTACGATGAGATATATAT

Translate to proteins

Gather Evidence of function

MASDLKKIYTRPRPDSAWQECVAALFDGHSKDKLACNDDL

Page 6: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Annotation Drives Post-genomic Research

DNA microarrays

Proteomics

Gene prediction

s

Protein prediction

s

Targeted gene knock-outs

ChiP-Chip

Function predictions

mRNA expression

DNA binding sites

Protein expression

Methodologies Data Interpretation

Metabolic modeling

Hypothesis

Page 7: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Annotation with Gnare/Puma2

• Developed at Argonne National Laboratory by Natalia Maltsev, Mark D’Souza, Elizabeth Glass, Dina Sulakhe, Mustafa Syed, Pavan Anumula

• http://compbio.mcs.anl.gov/puma2/cgi-bin/index.cgi

• Gnare – Private genome sequences• Puma2 – Public genome sequences

Page 8: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Types of Functional Descriptors

• Hypothetical protein• Conserved hypothetical protein• Conserved domain protein• Function associated protein• Class specific enzyme• Specific function predicted• Function validated

Page 9: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Go to Puma page for homolog

Checking Functions Where No Domain Hit Occurs

type IV secretion outer membrane protein, PilW?

Page 10: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Shewanella oneidensis MR-1

MKNCQKG

Domain identified

Align proteins

Page 11: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

This is a family of hypothetical proteins. A number of the sequence records state they are transmembrane proteins or putative permeases. It is not clear what source suggested that these proteins might be permeases and this information should be treated with caution.

2.A.86 The Autoinducer-2 Exporter (AI-2E) FamilyThe AI-2E family (UPF0118) is a large family of prokaryotic proteins derived from a variety of bacteria and archaea. Those examined are about 350 residues in length, and the couple that have been examined exhibit 7 putative transmembrane α-helical spanners (TMSs). E. coli, B. subtilis and several other prokaryotes have multiple paralogues encoded within their genomes. Herzberg et al. (2006) have presented strong evidence for a role of a AI-2E family homologue, YdgG (renamed TqsA), as an exporter of the E. coli autoinducer-2 (AI-2) (Camilli and Bassler, 2006; Chen et al., 2002). AI-2 is a proposed signalling molecule for interspecies communication in bacteria. It is a furanosyl borate diester (Chen et al., 2002).

autoinducer-2 transport protein, TqsA

Clues in Interpro Domain Descriptor

Page 12: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

No functional clues

Using Genome Context to Predict Function

Page 13: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Clusters with N-acetyl glucosame catabolic enzymes

Missing enzyme

Hypothesis experimentally validated!

Page 14: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Precomputed text mining

General enzyme function

Page 15: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

sulfite dehydrogenase catalytic molybdopterin subunit, SorA

Relevant abstracts mentioning your query species (Shewanella oneidensis)

Page 16: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Domain hit does not match current annotation

propogated in automated

annotations!!!

Mistake in Interpro Database found!

Page 17: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

More Automation in Evidence Collecting Needed

Page 18: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Protein Location Linked to Function

cytoplasm

periplasm

outer membrane

inner membrane

extracellular

peptidoglycan

cytoplasm

Page 19: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Multiple Routes of Secretion

++ LXGC

+++ G P AXA

X

++ K/RRXFXK AXA X

F E G

LepB

LepB

LspA

PilD

GG C39

Page 20: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Bioinformatics Tools for Localization Prediction

LipoPLipoPsort

SosuiTmHMMPhobius

PsortHMMTOP

SubLocCelloPsort

Secretome

ProfTMBBompBBTM

barrelLspA

IM TMPsortLipoPPredsi

PhobiusSignalP

TatP

LepB

• Incorrect start sites have strong impact on predictions!

• Different tools have unique specialties

• No one tool provides good predictions for all proteins

Page 21: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Example: c type cytochromes

• Contain CXXCH motif for binding heme…so do some other proteins that

are not c type cytochromes • All are secreted across the inner

membrane and then assembled• 60 proteins in MR-1 have CXXCH• Only 43 have a leader peptide and

are predicted to be c type cytochromes

Page 22: Automated Annotation of Microbial Genomes, Opportunities and Pitfalls

Future Needs in Annotation Automation

• Current methods of automated annotation will lead to propagation of annotation errors and burying of useful evidence

• But manual annotation cannot keep up with rate at which sequences are produced

• Additional automations are needed!– Protein localization– Specialty database mining (TCDB, merops, etc)– Experimental data mining – appropriate

databases don’t exist