william hsiao brinkman laboratory simon fraser university burnaby, bc, canada

19
Genomic island analysis: Improved web-based software and insights into an apparent gene pool associated with genomic islands William Hsiao William Hsiao Brinkman Laboratory Brinkman Laboratory Simon Fraser University Simon Fraser University Burnaby, BC, Canada Burnaby, BC, Canada

Upload: vernon

Post on 15-Jan-2016

51 views

Category:

Documents


0 download

DESCRIPTION

Genomic island analysis: Improved web-based software and insights into an apparent gene pool associated with genomic islands. William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada. Prokaryotic Genomic Islands (GIs). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada

Genomic island analysis: Improved web-based software and insights into an apparent

gene pool associated with genomic islands

Genomic island analysis: Improved web-based software and insights into an apparent

gene pool associated with genomic islands

William HsiaoWilliam HsiaoBrinkman LaboratoryBrinkman Laboratory

Simon Fraser UniversitySimon Fraser UniversityBurnaby, BC, CanadaBurnaby, BC, Canada

William HsiaoWilliam HsiaoBrinkman LaboratoryBrinkman Laboratory

Simon Fraser UniversitySimon Fraser UniversityBurnaby, BC, CanadaBurnaby, BC, Canada

Page 2: William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada

Prokaryotic Genomic Islands (GIs)Prokaryotic Genomic Islands (GIs)Definition: Genomic DNA segments with Definition: Genomic DNA segments with

particular characteristics that indicate particular characteristics that indicate horizontal originshorizontal origins

Definition: Genomic DNA segments with Definition: Genomic DNA segments with particular characteristics that indicate particular characteristics that indicate horizontal originshorizontal origins

GI

A bacterium

Page 3: William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada

Genomic Island CharacteristicsGenomic Island Characteristics

Often contain genes encoding adaptive functions of Often contain genes encoding adaptive functions of medical and environmental importancemedical and environmental importance

Pathogenicity Islands: virulence factors (genes contribute to diseases)Pathogenicity Islands: virulence factors (genes contribute to diseases) Resistance Islands: antibiotic resistanceResistance Islands: antibiotic resistance Metabolic Islands: secondary metabolism (e.g. sucrose)Metabolic Islands: secondary metabolism (e.g. sucrose)

Often contain genes encoding adaptive functions of Often contain genes encoding adaptive functions of medical and environmental importancemedical and environmental importance

Pathogenicity Islands: virulence factors (genes contribute to diseases)Pathogenicity Islands: virulence factors (genes contribute to diseases) Resistance Islands: antibiotic resistanceResistance Islands: antibiotic resistance Metabolic Islands: secondary metabolism (e.g. sucrose)Metabolic Islands: secondary metabolism (e.g. sucrose)

tRNA gene

mob

Direct Repeats Direct RepeatsGenomic Island (e.g. PAI)(%G+C, sequence composition bias)

mob: mobility geneschromosome

Exhibit sequence and annotation featuresExhibit sequence and annotation features

VF VF VF

Page 4: William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada

A yellow circle: %G+C above high cutoffA green circle: % G+C between cutoffs A pink circle: %G+C below low cutoff A black bar: transfer RNA A purple bar: ribosomal RNAA deep blue bar: both tRNA and rRNAA black square: transposase A black triangle: integrase

A strike-line: regions with dinucleotide bias (Hsiao et al 2003Bioinformatics p418-20)

IslandPath: Aiding identification of GIsIslandPath: Aiding identification of GIsIslandPath: Aiding identification of GIsIslandPath: Aiding identification of GIs

Vibrio cholerae N16961 Chr1

TCP island

TCP = toxin co-regulated pili

Page 5: William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada

IslandPath V.2IslandPath V.2IslandPath V.2IslandPath V.2

Page 6: William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada

Which Features Best Identify GIsWhich Features Best Identify GIsWhich Features Best Identify GIsWhich Features Best Identify GIs

Examined prevalence of features in 95 published islandsExamined prevalence of features in 95 published islands

85%85% of islands with >25% of islands with >25% dinucleotide biasdinucleotide bias coverage coverage (62% have > 50% dinucleotide bias coverage)(62% have > 50% dinucleotide bias coverage)

Mobility genesMobility genes identified in identified in >75%>75% of the islands of the islands

tRNA genestRNA genes observed in observed in <50%<50% of known islands of known islands

Only Only 20%20% of the islands show of the islands show atypical %G+Catypical %G+C

Page 7: William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada

Properties of genes in GIs?Properties of genes in GIs?Properties of genes in GIs?Properties of genes in GIs?Defined a “putative island” as Defined a “putative island” as

8 or more genes in a row with dinucleotide 8 or more genes in a row with dinucleotide biasbias

8 or more genes in a row with dinucleotide 8 or more genes in a row with dinucleotide bias + an associated mobility genebias + an associated mobility gene

Any difference for genes in islands versus outside Any difference for genes in islands versus outside of islands in terms of their protein Functional of islands in terms of their protein Functional categories?categories?

63 genomes (67 chromosomes) analyzed63 genomes (67 chromosomes) analyzed COG: cluster of orthologous groups of proteinsCOG: cluster of orthologous groups of proteins

Defined a “putative island” as Defined a “putative island” as 8 or more genes in a row with dinucleotide 8 or more genes in a row with dinucleotide

biasbias8 or more genes in a row with dinucleotide 8 or more genes in a row with dinucleotide

bias + an associated mobility genebias + an associated mobility gene

Any difference for genes in islands versus outside Any difference for genes in islands versus outside of islands in terms of their protein Functional of islands in terms of their protein Functional categories?categories?

63 genomes (67 chromosomes) analyzed63 genomes (67 chromosomes) analyzed COG: cluster of orthologous groups of proteinsCOG: cluster of orthologous groups of proteins

Page 8: William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada

Bacillus subtilis 168

Borrelia burgdorferi B31

Buchnera sp. APS

Chlamydia trachomatis DClostridium acetobutylicum

ATCC824Escherichia coli K12

Escherichia coli O157

Haemophilus influenzae Rd-KW20

Helicobacter pylori 26695Listeria innocua Clip11262

Mycobacterium leprae

Mycobacterium tuberculosis

CDC1551

Mycoplasma pneumoniae M129Neisseria meningitidis MC58

Pseudomonas aeruginosa PAO1

Salmonella typhimurium LT2Staphylococcus aureus N315

Streptococcus pneumoniae TIGR4

Sulfolobus solfataricus

Vibrio cholerae chromosome IVibrio cholerae chromosome II

Yersinia pestis CO92

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

Proportions of Genes with no COG Assignment in Islands vs. Outside

OUTSIDE

ISLAND

Paired-t-test P value:1.27E-18

More novel genes inside of islands

Hsiao Hsiao et alet al. . PLOS PLOS GeneticsGenetics e62, Nov. e62, Nov. 20052005

Page 9: William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada

Control for Analysis BiasesControl for Analysis Biases

Control for mis-prediction of genes in sequence Control for mis-prediction of genes in sequence composition biased regionscomposition biased regions Excluded genes < 300bpsExcluded genes < 300bps

Control for bias of COG Protein ClassificationControl for bias of COG Protein Classification Used SUPERFAMILY classification which is better Used SUPERFAMILY classification which is better

at detecting distant homologsat detecting distant homologs

Control for compositional bias due to other Control for compositional bias due to other factorsfactors Used the dinucleotide bias plus mobility gene Used the dinucleotide bias plus mobility gene

datasetdataset

Control for mis-prediction of genes in sequence Control for mis-prediction of genes in sequence composition biased regionscomposition biased regions Excluded genes < 300bpsExcluded genes < 300bps

Control for bias of COG Protein ClassificationControl for bias of COG Protein Classification Used SUPERFAMILY classification which is better Used SUPERFAMILY classification which is better

at detecting distant homologsat detecting distant homologs

Control for compositional bias due to other Control for compositional bias due to other factorsfactors Used the dinucleotide bias plus mobility gene Used the dinucleotide bias plus mobility gene

datasetdataset

Page 10: William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada

Island Dataset Classification Method

Paired t-test p-value

DINUC (all genes) COG 1.27E-18

DINUC+MOB (all Genes) COG 1.20E-18

DINUC (all genes) SUPERFAMILY 1.13E-18

DINUC+Mob (all genes) SUPERFAMILY 4.43E-14

DINUC (>300bps) COG 1.05E-17

DINUC+MOB (>300bps) COG 7.65E-16

DINUC (>300bps) SUPERFAMILY 3.01E-16

DINUC+MOB (>300bps) SUPERFAMILY 2.04E-10

Hsiao Hsiao et alet al. . PLOS GeneticsPLOS Genetics e62, Nov. 2005 e62, Nov. 2005

More novel genes in islands in all More novel genes in islands in all experimentsexperiments

Page 11: William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada

Phage may be the predominant donors of GIsPhage may be the predominant donors of GIs

Some GIs are clearly of bacteriophage origin, but Some GIs are clearly of bacteriophage origin, but more may be from phage as wellmore may be from phage as well

Predicted subcellular localizations of proteins Predicted subcellular localizations of proteins encoded in our GIs similar to phage genomes (lower encoded in our GIs similar to phage genomes (lower proportion of cytoplasmic membrane proteins)proportion of cytoplasmic membrane proteins)

Hsiao Hsiao et alet al. PLOS Genetics e62, Nov. 2005. PLOS Genetics e62, Nov. 2005

Many GI encoded genes have sequence Many GI encoded genes have sequence characteristics similar to phage genes (A+T rich and characteristics similar to phage genes (A+T rich and short)short)

Daubin Daubin et al.et al. Genome Biol. 4(9): R57 Genome Biol. 4(9): R57

Some GIs are clearly of bacteriophage origin, but Some GIs are clearly of bacteriophage origin, but more may be from phage as wellmore may be from phage as well

Predicted subcellular localizations of proteins Predicted subcellular localizations of proteins encoded in our GIs similar to phage genomes (lower encoded in our GIs similar to phage genomes (lower proportion of cytoplasmic membrane proteins)proportion of cytoplasmic membrane proteins)

Hsiao Hsiao et alet al. PLOS Genetics e62, Nov. 2005. PLOS Genetics e62, Nov. 2005

Many GI encoded genes have sequence Many GI encoded genes have sequence characteristics similar to phage genes (A+T rich and characteristics similar to phage genes (A+T rich and short)short)

Daubin Daubin et al.et al. Genome Biol. 4(9): R57 Genome Biol. 4(9): R57

Page 12: William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada

Proportions of virulence factors in Islands vs. Outside of Islands in 26 pathogens

0

1

2

3

4

5

6

7

DINUC DINUC + Mob Gene

Island Types

% of VFs

Outside Island

Higher proportions of genes in Islands are VFs

P value: < 2.2E-16

http://zdsys.chgb.org.cn/VFs/Fedynak, Hsiao, and Brinkman (unpublished)

Page 13: William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada

Certain classes of VFs over-represented in GIsCertain classes of VFs over-represented in GIs

Virulence Factor Database (VFDB) classification of VFs in GIs and non-GIs

GIs non-GIs VFDB Classification

VFs (#) Proportion of genes (%)

VFs (#) Proportion of genes

(%)

p-value

Unclassified 185 1.89 158 0.23 < 2.20E-16

Secretion system 95 0.97 138 0.20 < 2.20E-16 Adherence 59 0.60 138 0.20 5.69E-13

Iron uptake 33 0.34 59 0.09 5.83E-11 Type III translocated protein 6 0.06 1 0.00 1.54E-07

Antiphagocytosis 23 0.23 66 0.10 3.34E-04 Protease 5 0.05 5 0.01 2.08E-03

Toxin 18 0.18 53 0.08 2.34E-03

Most of these are “offensive” virulence factors

Fedynak, Hsiao, and Brinkman (unpublished)

Page 14: William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada

ConclusionsConclusions Genomic islands contain disproportionately Genomic islands contain disproportionately

higher number of novel genes, suggesting a large higher number of novel genes, suggesting a large and understudied gene pool contributing to and understudied gene pool contributing to horizontal gene transfer horizontal gene transfer

These novel genes appear to be drawn from a These novel genes appear to be drawn from a large pool of phage - metagenomics studies large pool of phage - metagenomics studies usefuluseful

These novel genes may contribute to microbial These novel genes may contribute to microbial adaptation and may play a role in pathogenesis adaptation and may play a role in pathogenesis and in antibiotic resistanceand in antibiotic resistance

Genomic islands contain disproportionately Genomic islands contain disproportionately higher number of novel genes, suggesting a large higher number of novel genes, suggesting a large and understudied gene pool contributing to and understudied gene pool contributing to horizontal gene transfer horizontal gene transfer

These novel genes appear to be drawn from a These novel genes appear to be drawn from a large pool of phage - metagenomics studies large pool of phage - metagenomics studies usefuluseful

These novel genes may contribute to microbial These novel genes may contribute to microbial adaptation and may play a role in pathogenesis adaptation and may play a role in pathogenesis and in antibiotic resistanceand in antibiotic resistance

Page 15: William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada

AcknowledgementsAcknowledgementsFiona BrinkmanFiona BrinkmanAmber FedynakAmber Fedynak -VF studies -VF studiesBrian Coombes, Michael Lowden, and Brett Brian Coombes, Michael Lowden, and Brett

FinlayFinlay (UBC) - Microarray data (UBC) - Microarray dataJenny BryanJenny Bryan (UBC) -Stats analysis (UBC) -Stats analysisBrinkman LaboratoryBrinkman Laboratory

http://www.pathogenomics.sfu.ca/islandpath

Fiona BrinkmanFiona BrinkmanAmber FedynakAmber Fedynak -VF studies -VF studiesBrian Coombes, Michael Lowden, and Brett Brian Coombes, Michael Lowden, and Brett

FinlayFinlay (UBC) - Microarray data (UBC) - Microarray dataJenny BryanJenny Bryan (UBC) -Stats analysis (UBC) -Stats analysisBrinkman LaboratoryBrinkman Laboratory

http://www.pathogenomics.sfu.ca/islandpath

Page 16: William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada

Other categories more common in islandsOther categories more common in islands

CategoryCategory In putative islands:In putative islands:Paired t-test Paired t-test p-valuep-value

In putative islands + In putative islands + mobility genes:mobility genes:Paired t-testPaired t-testp-valuep-value

Cell motilityCell motility 7.73E-57.73E-5 0.002087 0.002087 (may be a (may be a sampling size issue)sampling size issue)

Intracellular trafficking, Intracellular trafficking, secretion, and vesicular secretion, and vesicular transporttransport

8.124E-38.124E-3 0.406955 0.406955 (may be a (may be a sampling size issue)sampling size issue)

* Novel genes not included in analysis due to potential skew of other category results

Several metabolism-associated categories are under-represented in islands

Page 17: William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada

Bacillus subtilis 168

Borrelia burgdorferi B31

Buchnera sp. APS

Chlamydia trachomatis DClostridium acetobutylicum

ATCC824Escherichia coli K12

Escherichia coli O157

Haemophilus influenzae Rd-

KW20

Helicobacter pylori 26695Listeria innocua Clip11262

Mycobacterium leprae

Mycobacterium tuberculosis

CDC1551

Mycoplasma pneumoniae M129Neisseria meningitidis MC58

Pseudomonas aeruginosa PAO1

Salmonella typhimurium LT2Staphylococcus aureus N315

Streptococcus pneumoniae

TIGR4

Sulfolobus solfataricus

Vibrio cholerae chromosome IVibrio cholerae chromosome II

Yersinia pestis CO92

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

Proportions of Genes with no SUPERFAMILY Assignment in Islands vs. Outside

OUTSIDE

ISLAND P value3.0E-16

Page 18: William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada

IslandPath V.2IslandPath V.2IslandPath V.2IslandPath V.2

Page 19: William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada

Experiment: S. typhimurium LT2 ssrB gene KOTrack 1: IslandPathTrack 2: Microarray expression (overexp & underexp )