william hsiao brinkman laboratory simon fraser university burnaby, bc, canada
DESCRIPTION
Genomic island analysis: Improved web-based software and insights into an apparent gene pool associated with genomic islands. William Hsiao Brinkman Laboratory Simon Fraser University Burnaby, BC, Canada. Prokaryotic Genomic Islands (GIs). - PowerPoint PPT PresentationTRANSCRIPT
Genomic island analysis: Improved web-based software and insights into an apparent
gene pool associated with genomic islands
Genomic island analysis: Improved web-based software and insights into an apparent
gene pool associated with genomic islands
William HsiaoWilliam HsiaoBrinkman LaboratoryBrinkman Laboratory
Simon Fraser UniversitySimon Fraser UniversityBurnaby, BC, CanadaBurnaby, BC, Canada
William HsiaoWilliam HsiaoBrinkman LaboratoryBrinkman Laboratory
Simon Fraser UniversitySimon Fraser UniversityBurnaby, BC, CanadaBurnaby, BC, Canada
Prokaryotic Genomic Islands (GIs)Prokaryotic Genomic Islands (GIs)Definition: Genomic DNA segments with Definition: Genomic DNA segments with
particular characteristics that indicate particular characteristics that indicate horizontal originshorizontal origins
Definition: Genomic DNA segments with Definition: Genomic DNA segments with particular characteristics that indicate particular characteristics that indicate horizontal originshorizontal origins
GI
A bacterium
Genomic Island CharacteristicsGenomic Island Characteristics
Often contain genes encoding adaptive functions of Often contain genes encoding adaptive functions of medical and environmental importancemedical and environmental importance
Pathogenicity Islands: virulence factors (genes contribute to diseases)Pathogenicity Islands: virulence factors (genes contribute to diseases) Resistance Islands: antibiotic resistanceResistance Islands: antibiotic resistance Metabolic Islands: secondary metabolism (e.g. sucrose)Metabolic Islands: secondary metabolism (e.g. sucrose)
Often contain genes encoding adaptive functions of Often contain genes encoding adaptive functions of medical and environmental importancemedical and environmental importance
Pathogenicity Islands: virulence factors (genes contribute to diseases)Pathogenicity Islands: virulence factors (genes contribute to diseases) Resistance Islands: antibiotic resistanceResistance Islands: antibiotic resistance Metabolic Islands: secondary metabolism (e.g. sucrose)Metabolic Islands: secondary metabolism (e.g. sucrose)
tRNA gene
mob
Direct Repeats Direct RepeatsGenomic Island (e.g. PAI)(%G+C, sequence composition bias)
mob: mobility geneschromosome
Exhibit sequence and annotation featuresExhibit sequence and annotation features
VF VF VF
A yellow circle: %G+C above high cutoffA green circle: % G+C between cutoffs A pink circle: %G+C below low cutoff A black bar: transfer RNA A purple bar: ribosomal RNAA deep blue bar: both tRNA and rRNAA black square: transposase A black triangle: integrase
A strike-line: regions with dinucleotide bias (Hsiao et al 2003Bioinformatics p418-20)
IslandPath: Aiding identification of GIsIslandPath: Aiding identification of GIsIslandPath: Aiding identification of GIsIslandPath: Aiding identification of GIs
Vibrio cholerae N16961 Chr1
TCP island
TCP = toxin co-regulated pili
IslandPath V.2IslandPath V.2IslandPath V.2IslandPath V.2
Which Features Best Identify GIsWhich Features Best Identify GIsWhich Features Best Identify GIsWhich Features Best Identify GIs
Examined prevalence of features in 95 published islandsExamined prevalence of features in 95 published islands
85%85% of islands with >25% of islands with >25% dinucleotide biasdinucleotide bias coverage coverage (62% have > 50% dinucleotide bias coverage)(62% have > 50% dinucleotide bias coverage)
Mobility genesMobility genes identified in identified in >75%>75% of the islands of the islands
tRNA genestRNA genes observed in observed in <50%<50% of known islands of known islands
Only Only 20%20% of the islands show of the islands show atypical %G+Catypical %G+C
Properties of genes in GIs?Properties of genes in GIs?Properties of genes in GIs?Properties of genes in GIs?Defined a “putative island” as Defined a “putative island” as
8 or more genes in a row with dinucleotide 8 or more genes in a row with dinucleotide biasbias
8 or more genes in a row with dinucleotide 8 or more genes in a row with dinucleotide bias + an associated mobility genebias + an associated mobility gene
Any difference for genes in islands versus outside Any difference for genes in islands versus outside of islands in terms of their protein Functional of islands in terms of their protein Functional categories?categories?
63 genomes (67 chromosomes) analyzed63 genomes (67 chromosomes) analyzed COG: cluster of orthologous groups of proteinsCOG: cluster of orthologous groups of proteins
Defined a “putative island” as Defined a “putative island” as 8 or more genes in a row with dinucleotide 8 or more genes in a row with dinucleotide
biasbias8 or more genes in a row with dinucleotide 8 or more genes in a row with dinucleotide
bias + an associated mobility genebias + an associated mobility gene
Any difference for genes in islands versus outside Any difference for genes in islands versus outside of islands in terms of their protein Functional of islands in terms of their protein Functional categories?categories?
63 genomes (67 chromosomes) analyzed63 genomes (67 chromosomes) analyzed COG: cluster of orthologous groups of proteinsCOG: cluster of orthologous groups of proteins
Bacillus subtilis 168
Borrelia burgdorferi B31
Buchnera sp. APS
Chlamydia trachomatis DClostridium acetobutylicum
ATCC824Escherichia coli K12
Escherichia coli O157
Haemophilus influenzae Rd-KW20
Helicobacter pylori 26695Listeria innocua Clip11262
Mycobacterium leprae
Mycobacterium tuberculosis
CDC1551
Mycoplasma pneumoniae M129Neisseria meningitidis MC58
Pseudomonas aeruginosa PAO1
Salmonella typhimurium LT2Staphylococcus aureus N315
Streptococcus pneumoniae TIGR4
Sulfolobus solfataricus
Vibrio cholerae chromosome IVibrio cholerae chromosome II
Yersinia pestis CO92
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
Proportions of Genes with no COG Assignment in Islands vs. Outside
OUTSIDE
ISLAND
Paired-t-test P value:1.27E-18
More novel genes inside of islands
Hsiao Hsiao et alet al. . PLOS PLOS GeneticsGenetics e62, Nov. e62, Nov. 20052005
Control for Analysis BiasesControl for Analysis Biases
Control for mis-prediction of genes in sequence Control for mis-prediction of genes in sequence composition biased regionscomposition biased regions Excluded genes < 300bpsExcluded genes < 300bps
Control for bias of COG Protein ClassificationControl for bias of COG Protein Classification Used SUPERFAMILY classification which is better Used SUPERFAMILY classification which is better
at detecting distant homologsat detecting distant homologs
Control for compositional bias due to other Control for compositional bias due to other factorsfactors Used the dinucleotide bias plus mobility gene Used the dinucleotide bias plus mobility gene
datasetdataset
Control for mis-prediction of genes in sequence Control for mis-prediction of genes in sequence composition biased regionscomposition biased regions Excluded genes < 300bpsExcluded genes < 300bps
Control for bias of COG Protein ClassificationControl for bias of COG Protein Classification Used SUPERFAMILY classification which is better Used SUPERFAMILY classification which is better
at detecting distant homologsat detecting distant homologs
Control for compositional bias due to other Control for compositional bias due to other factorsfactors Used the dinucleotide bias plus mobility gene Used the dinucleotide bias plus mobility gene
datasetdataset
Island Dataset Classification Method
Paired t-test p-value
DINUC (all genes) COG 1.27E-18
DINUC+MOB (all Genes) COG 1.20E-18
DINUC (all genes) SUPERFAMILY 1.13E-18
DINUC+Mob (all genes) SUPERFAMILY 4.43E-14
DINUC (>300bps) COG 1.05E-17
DINUC+MOB (>300bps) COG 7.65E-16
DINUC (>300bps) SUPERFAMILY 3.01E-16
DINUC+MOB (>300bps) SUPERFAMILY 2.04E-10
Hsiao Hsiao et alet al. . PLOS GeneticsPLOS Genetics e62, Nov. 2005 e62, Nov. 2005
More novel genes in islands in all More novel genes in islands in all experimentsexperiments
Phage may be the predominant donors of GIsPhage may be the predominant donors of GIs
Some GIs are clearly of bacteriophage origin, but Some GIs are clearly of bacteriophage origin, but more may be from phage as wellmore may be from phage as well
Predicted subcellular localizations of proteins Predicted subcellular localizations of proteins encoded in our GIs similar to phage genomes (lower encoded in our GIs similar to phage genomes (lower proportion of cytoplasmic membrane proteins)proportion of cytoplasmic membrane proteins)
Hsiao Hsiao et alet al. PLOS Genetics e62, Nov. 2005. PLOS Genetics e62, Nov. 2005
Many GI encoded genes have sequence Many GI encoded genes have sequence characteristics similar to phage genes (A+T rich and characteristics similar to phage genes (A+T rich and short)short)
Daubin Daubin et al.et al. Genome Biol. 4(9): R57 Genome Biol. 4(9): R57
Some GIs are clearly of bacteriophage origin, but Some GIs are clearly of bacteriophage origin, but more may be from phage as wellmore may be from phage as well
Predicted subcellular localizations of proteins Predicted subcellular localizations of proteins encoded in our GIs similar to phage genomes (lower encoded in our GIs similar to phage genomes (lower proportion of cytoplasmic membrane proteins)proportion of cytoplasmic membrane proteins)
Hsiao Hsiao et alet al. PLOS Genetics e62, Nov. 2005. PLOS Genetics e62, Nov. 2005
Many GI encoded genes have sequence Many GI encoded genes have sequence characteristics similar to phage genes (A+T rich and characteristics similar to phage genes (A+T rich and short)short)
Daubin Daubin et al.et al. Genome Biol. 4(9): R57 Genome Biol. 4(9): R57
Proportions of virulence factors in Islands vs. Outside of Islands in 26 pathogens
0
1
2
3
4
5
6
7
DINUC DINUC + Mob Gene
Island Types
% of VFs
Outside Island
Higher proportions of genes in Islands are VFs
P value: < 2.2E-16
http://zdsys.chgb.org.cn/VFs/Fedynak, Hsiao, and Brinkman (unpublished)
Certain classes of VFs over-represented in GIsCertain classes of VFs over-represented in GIs
Virulence Factor Database (VFDB) classification of VFs in GIs and non-GIs
GIs non-GIs VFDB Classification
VFs (#) Proportion of genes (%)
VFs (#) Proportion of genes
(%)
p-value
Unclassified 185 1.89 158 0.23 < 2.20E-16
Secretion system 95 0.97 138 0.20 < 2.20E-16 Adherence 59 0.60 138 0.20 5.69E-13
Iron uptake 33 0.34 59 0.09 5.83E-11 Type III translocated protein 6 0.06 1 0.00 1.54E-07
Antiphagocytosis 23 0.23 66 0.10 3.34E-04 Protease 5 0.05 5 0.01 2.08E-03
Toxin 18 0.18 53 0.08 2.34E-03
Most of these are “offensive” virulence factors
Fedynak, Hsiao, and Brinkman (unpublished)
ConclusionsConclusions Genomic islands contain disproportionately Genomic islands contain disproportionately
higher number of novel genes, suggesting a large higher number of novel genes, suggesting a large and understudied gene pool contributing to and understudied gene pool contributing to horizontal gene transfer horizontal gene transfer
These novel genes appear to be drawn from a These novel genes appear to be drawn from a large pool of phage - metagenomics studies large pool of phage - metagenomics studies usefuluseful
These novel genes may contribute to microbial These novel genes may contribute to microbial adaptation and may play a role in pathogenesis adaptation and may play a role in pathogenesis and in antibiotic resistanceand in antibiotic resistance
Genomic islands contain disproportionately Genomic islands contain disproportionately higher number of novel genes, suggesting a large higher number of novel genes, suggesting a large and understudied gene pool contributing to and understudied gene pool contributing to horizontal gene transfer horizontal gene transfer
These novel genes appear to be drawn from a These novel genes appear to be drawn from a large pool of phage - metagenomics studies large pool of phage - metagenomics studies usefuluseful
These novel genes may contribute to microbial These novel genes may contribute to microbial adaptation and may play a role in pathogenesis adaptation and may play a role in pathogenesis and in antibiotic resistanceand in antibiotic resistance
AcknowledgementsAcknowledgementsFiona BrinkmanFiona BrinkmanAmber FedynakAmber Fedynak -VF studies -VF studiesBrian Coombes, Michael Lowden, and Brett Brian Coombes, Michael Lowden, and Brett
FinlayFinlay (UBC) - Microarray data (UBC) - Microarray dataJenny BryanJenny Bryan (UBC) -Stats analysis (UBC) -Stats analysisBrinkman LaboratoryBrinkman Laboratory
http://www.pathogenomics.sfu.ca/islandpath
Fiona BrinkmanFiona BrinkmanAmber FedynakAmber Fedynak -VF studies -VF studiesBrian Coombes, Michael Lowden, and Brett Brian Coombes, Michael Lowden, and Brett
FinlayFinlay (UBC) - Microarray data (UBC) - Microarray dataJenny BryanJenny Bryan (UBC) -Stats analysis (UBC) -Stats analysisBrinkman LaboratoryBrinkman Laboratory
http://www.pathogenomics.sfu.ca/islandpath
Other categories more common in islandsOther categories more common in islands
CategoryCategory In putative islands:In putative islands:Paired t-test Paired t-test p-valuep-value
In putative islands + In putative islands + mobility genes:mobility genes:Paired t-testPaired t-testp-valuep-value
Cell motilityCell motility 7.73E-57.73E-5 0.002087 0.002087 (may be a (may be a sampling size issue)sampling size issue)
Intracellular trafficking, Intracellular trafficking, secretion, and vesicular secretion, and vesicular transporttransport
8.124E-38.124E-3 0.406955 0.406955 (may be a (may be a sampling size issue)sampling size issue)
* Novel genes not included in analysis due to potential skew of other category results
Several metabolism-associated categories are under-represented in islands
Bacillus subtilis 168
Borrelia burgdorferi B31
Buchnera sp. APS
Chlamydia trachomatis DClostridium acetobutylicum
ATCC824Escherichia coli K12
Escherichia coli O157
Haemophilus influenzae Rd-
KW20
Helicobacter pylori 26695Listeria innocua Clip11262
Mycobacterium leprae
Mycobacterium tuberculosis
CDC1551
Mycoplasma pneumoniae M129Neisseria meningitidis MC58
Pseudomonas aeruginosa PAO1
Salmonella typhimurium LT2Staphylococcus aureus N315
Streptococcus pneumoniae
TIGR4
Sulfolobus solfataricus
Vibrio cholerae chromosome IVibrio cholerae chromosome II
Yersinia pestis CO92
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
Proportions of Genes with no SUPERFAMILY Assignment in Islands vs. Outside
OUTSIDE
ISLAND P value3.0E-16
IslandPath V.2IslandPath V.2IslandPath V.2IslandPath V.2
Experiment: S. typhimurium LT2 ssrB gene KOTrack 1: IslandPathTrack 2: Microarray expression (overexp & underexp )