genome-scale identification of legionella pneumophila ... · a large number of highly pathogenic...
TRANSCRIPT
A large number of highly pathogenic bacteria
utilize secretion systems
to translocate
effector proteins
into host cells.
Using these
effectors, the
bacteria subvert
various host cell
processes during infection.
Biological assayBiological assay
The Bordetella pertussis
CyaA toxin increases its
adenylate cyclase activity
inside eukaryotic cells by
a factor of 1,000.
Genome-Scale Identification of Legionella pneumophilaEffectors using a Machine Learning Approach
David Burstein, Tal Zusman, Elena Degtyar, Ram Viner, Gil Segal and, Tal Pupko
George S. Wise Faculty of Life Sciences, Tel-Aviv University, Israel
Bacterial
cell
Eukaryotic
cell
cyaA-effector fussion
ATP
cAMP
Calmodulin
Active complex
Type IV secretion
systemTranslocation assay of a gene
suspected of being an effector:
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Ceg10 Ceg23 Ceg29 vector
cAMP
(fmol/well)
αCyaA
Legionella pneumophila,
the causative agent of
the Legionnaires'
disease, a severe
pneumonia-like
disease, translocate
effectors via the Icm/Dot
secretion system.
Our goalOur goal is to identify novel effectors
in L. pneumophila on a
genomic scale. Copyright © 2005 Nature Publishing Group. Created by Arkitek from Nature Reviews Microbiology
BackgroundBackground
ExperimentalExperimental
resultsresults45 predictions tested
42 successfully expressed
40 validates as effectors
achieving correct
prediction rate > 90%
for tested genes (!)
The novel effectors
were termed lem for
Legionella
effector identified by
machine learning.
|
The machine learning approachThe machine learning approachWe formulate the task of effectors identification as a classification problem:
A set of features distinguishing effectors from non-effectors was established.
These features were then used as input to a variety of machine learning
algorithms, which classified each L. pneumophilla genes as either a
putative effector or not. High-ranking predictions were then
experimentally validated.
Prior knowledge
Newly validatedeffectors
Non-effectors
Validatedeffectors
Experimentalvalidation
SVMNaïve Bayes
NNBayesian Net
Voting
Classification
Features
Feature selection
Predictedeffectors
Predictednon-effectors
Trainedmodel
Unclassifiedgenes
lem13
lem14
lem15
lem16
lem17
lem18
lem19
lem20
lem21
lem22
lem23
lem24
lem25
ceg30
ceg32
lem26
lem27
lem28
lem29
ceg34
0
10
00
20
00
30
00
40
00
50
00
60
00
70
00
80
00
90
00
10
00
0
11
00
0
cAMP (pmol/well)
legA10
ceg3
lem1
ceg4
ceg5
ceg8
lem2
ceg14
ceg17
lem3
lem4
lem5
lem6
ceg19
lem7
lem8
vpdC
lem9
lem10
lem11
lem12
0
10
00
20
00
30
00
40
00
50
00
60
00
70
00
80
00
90
00
10
00
0
11
00
0
cAMP (pmol/well)
Support: David Burstein is a fellow of the Edmond J. Safra bioinformatics program at Tel-Aviv university
FeaturesFeaturesA wide range of
features were
measured for every
L. pneumophila
gene:Similarity toeffectors
Similarity to host proteins
G-C content
Abundance in Metazoa/Bacteria
Secretory signals
Genome arrangement
Regulatory elements