improving pan-genome annotation using whole genome multiple alignment

15
Raunak Shrestha 27 th October 2011 Source: Angiuoli SV, Hotopp JC, Salzberg SL, Tettelin H. Improving pan-genome annotation using whole genome multiple alignment. BMC Bioinformatics. 2011 Jun 30;12:272.

Upload: raunak-shrestha

Post on 26-May-2015

185 views

Category:

Health & Medicine


2 download

DESCRIPTION

Improving pan-genome annotation using whole genome multiple alignment

TRANSCRIPT

Page 1: Improving pan-genome annotation using whole genome multiple alignment

Raunak Shrestha

27th October 2011

Source: Angiuoli SV, Hotopp JC, Salzberg SL, Tettelin H. Improving pan-genome annotation using whole genome multiple alignment. BMC Bioinformatics. 2011 Jun 30;12:272.

Page 2: Improving pan-genome annotation using whole genome multiple alignment

Background

• Describing genetic diversity of some organism is difficult on the basis of a single reference genome

• Pan-genomes • greater intra-specific

genetic variation even in closely related strains

• To aid gene-prediction & annotation genome sequence of the some closely related strains are required

2http://en.wikipedia.org/wiki/File:Pan-genome-graphics.png

Page 3: Improving pan-genome annotation using whole genome multiple alignment

Background

3

Schnoes et. al., 2009

The change in misannotation over time in the NR database for the 37 families investigated.

Page 4: Improving pan-genome annotation using whole genome multiple alignment

Mugsy-Annotator (http://mugsy.sf.net)• Steps:

1. Aligning multiple whole genomes, 2. mapping orthologs among the genomes, 3. identifying annotation anomalies

4

• Objectives :1) identifying orthologs and 2) Evaluating the quality of

annotated gene structures in prokaryotic genomes.

Page 5: Improving pan-genome annotation using whole genome multiple alignment

Determining Orthologs

• Identifies orthologs on the basis of Whole Genome Alignment (WGA), sequence position and length of sequence.

• expects one segment per organism in the whole genome alignment.

• For segmental duplications: • It will report separate ortholog groups for each copy only if whole

genome alignment identifies orthologous copies in other genomes

• If not, it will not recognize the duplication and group under a single ortholog

5

Page 6: Improving pan-genome annotation using whole genome multiple alignment

Identification of annotation inconsistencies• Evaluate Start codon, Stop codon and Translation Initiation Sites

(TIS),

6

Page 7: Improving pan-genome annotation using whole genome multiple alignment

Data set• Neisseria meningitidis (Nmen) dataset of 20 genomes• Nmen verA contained 13 genomes • Nmen verB contained 7 genomes• Annotation pipeline differs between Nmen verA and Nmen verB

• A genome dataset of other 9 bacterial species from Refseq database.

7

Page 8: Improving pan-genome annotation using whole genome multiple alignment

Comparison of the groups oforthologs for 20 Nmen genomes

• Within the genes reported exclusively by any one method• intra-genome BLASTP matches predicts most of the genes to be

paralogs (40 % for Mugsy-Annotator & 60% for OrthoMCL)• Some have functional names that indicate transposases• Some are hypothetical proteins

• Paper claims that OrthoMCL clusters paralogs and orthologs in a single group

8

Page 9: Improving pan-genome annotation using whole genome multiple alignment

Run Time Performance

• Nmen dataset of 20 genomes

• single CPU in ~4 h • ~2 h for WGA with Mugsy and • ~2 h for comparing annotations with Mugsy-Annotator

• OrthoMCL consumed ~32 CPU hours

• WGA method is computationally efficient and has a significant runtime performance advantage over BLAST based OrthoMCL

9

Page 10: Improving pan-genome annotation using whole genome multiple alignment

10

Page 11: Improving pan-genome annotation using whole genome multiple alignment

Consistency of annotated gene structures in several species pan-genomes as reported by Mugsy-Annotator

11

Page 12: Improving pan-genome annotation using whole genome multiple alignment

improve annotation consistency

• In case of inconsistency in TIS, Mugsy-Annotator suggests alternative gene structures that improve annotation consistency

• Strategy -> to look for the conserved TIS in the close proximity to the previously annotated TIS

12

Page 13: Improving pan-genome annotation using whole genome multiple alignment

Conclusion• aids in identifying and comparing gene content across a pan-

genome

• Aids annotation and re-annotation of genes within a pan-genome rather than in a single genome

• Study demonstrates significant variation in annotation primarily due to different bioinformatics approaches available rather than the true biological variation

• Mugsy-Annotator : efficient, accurate method for finding orthologs within a pan-genome

• Mugsy (WGA approach) is computationally efficient compared to BLAST-based approaches for finding orthologs

13

Page 14: Improving pan-genome annotation using whole genome multiple alignment

Critique• Musgy-Annotator requires pre-predicted annotation

information and is therefore not an independent annotation tool

• Musgy-Annotator still finds difficult to determine the segmental duplications and paralogs

• It would have been even better, if the author had measured the performance of Musgy-Annotator for pan-genomes dataset with larger evolutionary distance.

14

Page 15: Improving pan-genome annotation using whole genome multiple alignment

15

QUESTIONS?