computational prediction and characterization of genomic islands: insights into bacterial...

53
Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity Morgan G.I. Langille Department of Molecular Biology & Biochemistry Simon Fraser University http://tinyurl.com/genomic-island

Upload: morgan-langille

Post on 10-May-2015

2.508 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Morgan G.I. Langille

Department of Molecular Biology & BiochemistrySimon Fraser University

http://tinyurl.com/genomic-islands

Page 2: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

2

Genomic Island History

Early 1990’s clusters of virulence genes were found in E. coli (Hacker, et al.,1990)

Pathogenicity Islands (PAIs) Clusters of genes that are associated with bacterial

virulence

Genomic Islands (GIs) (Hacker, et al., 2000)

Segments of a genome that are thought to have originated from a horizontal transfer event

Page 3: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

3

Genomic Island Interest

Pathogenicity Islands Adhesins

Fimbriae, intimin, etc. Secretion Systems

Type III and Type IV Toxins

Hemolysins, Pertussis toxin Invasins, Modulins, and Effectors

Antibiotic Resistance Islands Metabolic Islands

Page 4: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

4

Page 5: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Genomic Island Interest

5

Page 6: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

6

Methods for Predicting GIs

1. Sequence based Abnormal sequence composition

GC% bias, dinucleotide bias, codon bias, etc

Genomic features associated with mobile genetic elements Direct repeats, IS elements, presence of tRNA and

mobility genes (Integrases, transposases, etc.)

Page 7: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Methods of Predicting GIs

2. Comparative genomics based Identify genomic regions with anomalous

phylogenetic patterns Requires multiple genomes

Page 8: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

8

Page 9: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Previous state of GI identification

1. Sequence based methods Numerous methods and constant improving of

algorithm design Not very user friendly and accuracy of various

methods not well described

2. Comparative based methods Used by many researchers, but with no

established method (only in-house scripts) Limited access to user friendly tools for this type

of analysis

9

Page 10: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Outline

IslandPick: A comparative genomics approach for genomic island identification

Evaluating sequence composition based genomic island prediction methods

IslandViewer: An integrated interface for computational identification and visualization of genomic islands

The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain

CRISPRs and their association with genomic islands

10

Page 11: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Outline

IslandPick: A comparative genomics approach for genomic island identification

Evaluating sequence composition based genomic island prediction methods

IslandViewer: An integrated interface for computational identification and visualization of genomic islands

The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain

CRISPRs and their association with genomic islands

11

Page 12: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

12

Mauve-whole genome aligner

Allows genome arrangements and inversions Fast – Aligns two genomes < 15 minutes Command line accessible http://gel.ahabs.wisc.edu/mauve/

(Darling, et al., 2004)

Page 13: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

13

IslandPick: Outline

Run Mauve

Mauve (A & B)

Extract unique regions

Mauve (A & C) Mauve (A & D)

Genome D

Putative Genomic IslandsBLAST

Identify overlapping unique regions

Query Genome AGenome B Genome C

Genome D

Page 14: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

14

Selecting Comparative Genomes

14

Run Mauve

Mauve (A & B)

Extract unique regions

Mauve (A & C) Mauve (A & D)

Genome D

Putative Genomic IslandsBLAST

Identify overlapping unique regions

Genome B Genome CGenome D

Comparative Genome Selection (using CVTree distances)

Query Genome A

Page 15: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

15

What genomes to use?

We want to compare the query genome to other comparative genomes within certain evolutionary distances

Need a phylogenetic tree or a distance matrix for all sequenced bacteria species

Page 16: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

16

CVTree

Uses matching K-strings between the proteomes of two organisms

Constructs phylogenetic trees without alignment

Avoids choosing genes for phylogenetic reconstruction

Web Server http://cvtree.cbi.pku.edu.cn

Downloadable command line executable

(Qi, et al., 2004)

Page 17: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Example: Pseudomonas Tree

17

0.227

0.256

0.397

0.393

0.411

0.428

0.430

0

0.481

P. fluorescens Pf-5

P. putida KT2440

P. fluorescens PfO-1

P. syringae tomato DC3000

P. syringae phaseolicola 1448A

P. syringae syringae B728a

P. aeruginosa PAO1

P. aeruginosa PA14

Acinetobacter ADP1

Tree built using conserved genes, Omp85 & CarB, and maximum parsimony

CVTree distances from P.syringae B728a are shown

Page 18: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

18

Determining Distance Cutoffs

Given the distances between any two species, how do we choose comparison genomes?

Maximum Distance Cutoff Eliminates the use of genomes that have diverged too

much (noise)

Minimum Distance Cutoff Eliminates the use of genomes that have not diverged

enough (very closely related strains)

Minimum Number of Genomes Eliminates the use of too few comparative genomes

Page 19: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

0.227

0.256

0.397

0.393

0.411

0.428

0.430

0

0.481

P. fluorescens Pf-5

P. putida KT2440

P. fluorescens PfO-1

P. syringae tomato DC3000

P. syringae phaseolicola 1448A

P. syringae syringae B728a

P. aeruginosa PAO1

P. aeruginosa PA14

Acinetobacter ADP1

19

Example: Pseudomonas Tree

Minimum Distance Cutoff = 0.10

Maximum Distance Cutoff = 0.42

Minimum Number of Genomes = 3

Page 20: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

20

Predicting Similar Aged GIs

GI I

nser

tion

Query Genome

1 genome < distance X

Query Genome

GI I

nser

tion

Page 21: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Outline

IslandPick: A comparative genomics approach for genomic island identification

Evaluating sequence composition based genomic island prediction methods

IslandViewer: An integrated interface for computational identification and visualization of genomic islands

The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain

CRISPRs and their association with genomic islands

21

Page 22: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Accuracy of GI methods Sequence based GI prediction methods

Only require a single genome Can easily make false predictions

Highly expressed genes May miss predictions

Amelioration of DNA to host genome Source genome has same composition as host genome

Usually evaluate accuracy using simulated horizontal gene transfer events or small datasets of verified GIs

IslandPick is independent of sequence composition methods generated a “positive” dataset of islands

22

Page 23: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Developing a Negative Dataset

To identify false positives we need a “negative” dataset that does not contain GIs

Identify regions that are conserved across several genomes using Mauve whole genome alignment

Use the same genomes as selected by IslandPick with one additional cutoff

23

Page 24: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

24

Negative Dataset

Query Genome

1 genome > distance X

GI I

nser

tion

Query Genome

GI I

nser

tion

Page 25: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

IslandPick Cutoffs

25

Page 26: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

26

•118 chromosomes •771 GIs• ~100 genes/strain

173 chromosomes

736 chromosomes

(Langille, et al., 2008)

Page 27: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

GI Prediction Accuracy

27

PositiveDataset

NegativeDataset

PredictedDataset

TP FP

FN

Precision = TP / (TP + FP)Recall = TP / (TP + FN)

TN

Page 28: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

28

GI Prediction Accuracy

Tool

Average number of nucleotides in GIs per genome

(kb)

Precision RecallOverall

Accuracy

SIGI-HMM 233 92 33.0 86

IslandPath/Dimob

171 86 36 86

PAI IDA 163 68 32 84

Centroid 171 61 28 82

IslandPath/Dinuc

444 55 53 82

Alien Hunter 1265 38 77 71

Literature* 639 100 87 96

(Langille, et al.,2008)

Page 29: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Outline

IslandPick: A comparative genomics approach for genomic island identification

Evaluating sequence composition based genomic island prediction methods

IslandViewer: An integrated interface for computational identification and visualization of genomic islands

The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain

CRISPRs and their association with genomic islands

29

Page 30: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

IslandViewer (Langille, et al., 2009)

Website that integrates the most accurate GI prediction programs SIGI-HMM, IslandPath-DIMOB, and IslandPick

Genomic island prediction pre-calculated for all genomes Automatically updated monthly

User genome submission available

IslandPick can be run using manually selected comparison genomes

Download data for a genomic island, a chromosome, or entire dataset

http://www.pathogenomics.sfu.ca/islandviewer/

30

Page 31: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

31

Page 32: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

32

Page 33: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

33

Page 34: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

34

Page 35: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

IslandPick – Manual genome selection

35

Page 36: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

User Genome Submission

36

Page 37: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Outline

IslandPick: A comparative genomics approach for genomic island identification

Evaluating sequence composition based genomic island prediction methods

IslandViewer: An integrated interface for computational identification and visualization of genomic islands

The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain

CRISPRs and their association with genomic islands

37

Page 38: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Pseudomonas aeruginosaLiverpool Epidemic Strain (LES)

Highly successful at colonizing cystic fibrosis (CF) patients

Has replaced previously established strains

Caused infections of non-CF patients

Can cause greater morbidity in CF than other strains of P. aeruginosa

(Salunkhe, et al., 2005)38

Page 39: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

LES Analysis

39

Genome sequenced by Sanger Centre

I led annotation of the genome and analysis of GIs

6 Prophages

5 Genomic Islands

(Winstanley, Langille, et al., 2008)

Page 40: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Signature-tagged mutagenesis (STM) STM is a

method to identify genes associated with pathogenesis

LES used in a chronic rat lung infection model

47 genes identified by STM

5 of these genes are within GIs and prophage regions

http://www.traill.uiuc.edu/uploads/porknet/papers/LitchtensteigerPaper.pdf

Page 41: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

LES Prophage

41

PLES 15491 PLES 15961

4

PLES 25021 PLES 25661

5

Duplication 2

Duplication 1PLES 13201 PLES 13711

3

Duplication 2

2PLES 8321PLES 7891

Duplication 1

PLES 6091 PLES 6271

1

PLES 41181 PLES 41281

6

Pseudomonas Phage F10

Pseudomonas Phage D3112

Pyocin R2 Pseudomonas Phage D3

STM Mutations

Pseudomonas Phage Pf1

5 kb

PLES 15491 PLES 15961

4

PLES 25021 PLES 25661

5

Duplication 2

Duplication 1PLES 13201 PLES 13711

3

Duplication 2

2PLES 8321PLES 7891

Duplication 1

PLES 6091 PLES 6271

1

PLES 41181 PLES 41281

6

Pseudomonas Phage F10

Pseudomonas Phage D3112

Pyocin R2 Pseudomonas Phage D3

STM Mutations

Pseudomonas Phage Pf1Pseudomonas Phage F10Pseudomonas Phage F10

Pseudomonas Phage D3112Pseudomonas Phage D3112

Pyocin R2Pyocin R2 Pseudomonas Phage D3Pseudomonas Phage D3

STM Mutations

Pseudomonas Phage Pf1Pseudomonas Phage Pf1

5 kb5 kb

(Winstanley, Langille, et al., 2008)

Page 42: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

LES Genomic Islands

42

(Winstanley, Langille, et al., 2008)

Page 43: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

LES in-vivo competitive index

Mutants grown for 7 days in rat lung with the wild type LES

A CI of less than 1 indicates attenuation of virulence

4 genes within prophage and GIs had strong impact on competitiveness

43

(Winstanley, Langille, 2008)

Page 44: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Outline

IslandPick: A comparative genomics approach for genomic island identification

Evaluating sequence composition based genomic island prediction methods

IslandViewer: An integrated interface for computational identification and visualization of genomic islands

The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain

CRISPRs and their association with genomic islands

44

Page 45: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Overview of CRISPRs

45

CRISPRs: Clustered regularly interspaced short palindromic repeats

Able to provide phage resistance and block conjugation

Thought to be similar to RNAi, except DNA (instead of RNA) is thought to be the target

Page 46: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

CRISPRs and HGT

Previous studies have shown some evidence of HGT of CRISPRs Phylogenetic profiles of CAS genes

(Haft, et al., 2005) CRISPRs within 10 megaplasmids

(Godde, et al., 2006) CRISPRs within two prophage in Clostridium

difficile (Sebaihia, et al., 2006)

Analysis of CRISPRs and GIs had not been conducted previously

46

Page 47: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

CRISPRs within GIs

Domain of Life

Number of Genomes

Number of GIs

Proportion of Genome in GIs

Total Number of CRISPRs

Expected CRISPRs in GIs

Observed CRISPRs in GIs

Significance (Chi-square Test)*

Archaea 49 298 3.7% 206 7.7 14 0.020

Bacteria 306 4874 6.4% 837 53.3 114 8.1x 10-18

Archaea & Bacteria

355 5172 6.1% 1043 64.0 128 1.6x 10-16

47

CRISPRs predictions were obtained from CRISPRdb, http://crispr.u-psud.fr/crispr/CRISPRHomePage.php

GI predictions were taken from the union of IslandPick, IslandPath-DIMOB, and SIGI-HMM

Number of CRISPRs inside and outside GIs were compared

CRISPRs are over-represented in GIs

Page 48: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Phage genes within GIs

Many GIs are known to contain phage genes What proportion of GI genes have links to phage? Identified genes with “phage” in their annotation within GIs

48

Genomic Regions

Number of ‘phage genes’Total number of genes in

region

Chi- Square

TestObserved Expected3

Inside GIs1 6990 1264.22 165784~0

Outside GIs1 12868 18593.78 2438303

35% of all ‘phage genes’ are within GIs (6% expected)

Phage genes are over-represented in GIs

Page 49: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Archaea and CRISPRs

Archaea Bacteria

Genomes containing a CRISPR 90% 40%

Proportion of phage genes 0.10% 0.79%

Proportion of GIs with a phage gene 5.1% 17.6%

49

Prevalence of CRISPRs in Archaea genomes could result in reduced

phage genes

Page 50: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

GIs with CRISPRs and phage genes

Is there evidence supporting that some CRISPRs are being transferred by phage?

50

Genomic Regions

Number of ‘phage genes’Total number of genes in

region

Chi- Square

TestObserved Expected3

GIs containing CRISPR(s)2 13 4.5 1500

5.7 x 10-5

Outside GIs2 812 820.5 274073

GIs containing CRISPR(s) also contain an over-representation of phage genes -> suggesting that some CRISPRs are transferred by phage

Page 51: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

CRISPR conclusions

CRISPR over-representation in GIs suggest that they are being horizontally transferred

Some GIs that contain CRISPRs may have phage origins

CRISPRs in Archaea could be limiting HGT by increasing resistance to phage

51

Page 52: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

Conclusions

Several advances in GI computational prediction IslandPick, a novel automated comparative genomics

based GI prediction program Analysis of the accuracy of several sequenced based GI

prediction methods IslandViewer: An integrated interface for computational

identification and visualization of genomic islands

Insights into GI evolution and their pathogenicity P. aeruginosa LES – evidence that genomic islands and

prophage regions contain genes that provide a competitive advantage for infection in a chronic rat infection model.

CRISPRs and their association with genomic islands

52

Page 53: Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

53

Acknowledgements

SupervisorDr. Fiona Brinkman

Supervisor CommitteeDr. BaillieDr. Pio

P. aeruginosa LESCraig WinstanleyRoger LevesqueBob HancockNick Thomson