genome-scale identification of legionella pneumophila ... · a large number of highly pathogenic...

1
A large number of highly pathogenic bacteria utilize secretion systems to translocate effector proteins into host cells. Using these effectors, the bacteria subvert various host cell processes during infection. Biological assay Biological assay The Bordetella pertussis CyaA toxin increases its adenylate cyclase activity inside eukaryotic cells by a factor of 1,000. Genome-Scale Identification of Legionella pneumophila Effectors using a Machine Learning Approach David Burstein, Tal Zusman, Elena Degtyar, Ram Viner, Gil Segal and, Tal Pupko George S. Wise Faculty of Life Sciences, Tel-Aviv University, Israel Bacterial cell Eukaryotic cell cyaA-effector fussion ATP cAMP Calmodulin Active complex Type IV secretion system Translocation assay of a gene suspected of being an effector: 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Ceg10 Ceg23 Ceg29 vector cAMP (fmol/well) αCyaA Legionella pneumophila, the causative agent of the Legionnaires' disease, a severe pneumonia-like disease, translocate effectors via the Icm/Dot secretion system. Our goal Our goal is to identify novel effectors in L. pneumophila on a genomic scale. Copyright © 2005 Nature Publishing Group. Created by Arkitek from Nature Reviews Microbiology Background Background Experimental Experimental results results 45 predictions tested 42 successfully expressed 40 validates as effectors achieving correct prediction rate > 90% for tested genes (!) The novel effectors were termed lem for L egionella e ffector identified by m achine learning. | The machine learning approach The machine learning approach We formulate the task of effectors identification as a classification problem: A set of features distinguishing effectors from non-effectors was established. These features were then used as input to a variety of machine learning algorithms, which classified each L. pneumophilla genes as either a putative effector or not. High-ranking predictions were then experimentally validated. Prior knowledge Newly validated effectors Non- effectors Validated effectors Experimental validation SVM Naïve Bayes NN Bayesian Net Voting Classification Features Feature selection Predicted effectors Predicted non-effectors Trained model Unclassified genes lem13 lem14 lem15 lem16 lem17 lem18 lem19 lem20 lem21 lem22 lem23 lem24 lem25 ceg30 ceg32 lem26 lem27 lem28 lem29 ceg34 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 cAMP (pmol/well) legA10 ceg3 lem1 ceg4 ceg5 ceg8 lem2 ceg14 ceg17 lem3 lem4 lem5 lem6 ceg19 lem7 lem8 vpdC lem9 lem10 lem11 lem12 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 cAMP (pmol/well) Support: David Burstein is a fellow of the Edmond J. Safra bioinformatics program at Tel-Aviv university Features Features A wide range of features were measured for every L. pneumophila gene: Similarity to effectors Similarity to host proteins G-C content Abundance in Metazoa/Bacteria Secretory signals Genome arrangement Regulatory elements

Upload: others

Post on 20-Sep-2019

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genome-Scale Identification of Legionella pneumophila ... · A large number of highly pathogenic bacteria utilize secretion systems to translocate effector proteins into host cells

A large number of highly pathogenic bacteria

utilize secretion systems

to translocate

effector proteins

into host cells.

Using these

effectors, the

bacteria subvert

various host cell

processes during infection.

Biological assayBiological assay

The Bordetella pertussis

CyaA toxin increases its

adenylate cyclase activity

inside eukaryotic cells by

a factor of 1,000.

Genome-Scale Identification of Legionella pneumophilaEffectors using a Machine Learning Approach

David Burstein, Tal Zusman, Elena Degtyar, Ram Viner, Gil Segal and, Tal Pupko

George S. Wise Faculty of Life Sciences, Tel-Aviv University, Israel

Bacterial

cell

Eukaryotic

cell

cyaA-effector fussion

ATP

cAMP

Calmodulin

Active complex

Type IV secretion

systemTranslocation assay of a gene

suspected of being an effector:

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

Ceg10 Ceg23 Ceg29 vector

cAMP

(fmol/well)

αCyaA

Legionella pneumophila,

the causative agent of

the Legionnaires'

disease, a severe

pneumonia-like

disease, translocate

effectors via the Icm/Dot

secretion system.

Our goalOur goal is to identify novel effectors

in L. pneumophila on a

genomic scale. Copyright © 2005 Nature Publishing Group. Created by Arkitek from Nature Reviews Microbiology

BackgroundBackground

ExperimentalExperimental

resultsresults45 predictions tested

42 successfully expressed

40 validates as effectors

achieving correct

prediction rate > 90%

for tested genes (!)

The novel effectors

were termed lem for

Legionella

effector identified by

machine learning.

|

The machine learning approachThe machine learning approachWe formulate the task of effectors identification as a classification problem:

A set of features distinguishing effectors from non-effectors was established.

These features were then used as input to a variety of machine learning

algorithms, which classified each L. pneumophilla genes as either a

putative effector or not. High-ranking predictions were then

experimentally validated.

Prior knowledge

Newly validatedeffectors

Non-effectors

Validatedeffectors

Experimentalvalidation

SVMNaïve Bayes

NNBayesian Net

Voting

Classification

Features

Feature selection

Predictedeffectors

Predictednon-effectors

Trainedmodel

Unclassifiedgenes

lem13

lem14

lem15

lem16

lem17

lem18

lem19

lem20

lem21

lem22

lem23

lem24

lem25

ceg30

ceg32

lem26

lem27

lem28

lem29

ceg34

0

10

00

20

00

30

00

40

00

50

00

60

00

70

00

80

00

90

00

10

00

0

11

00

0

cAMP (pmol/well)

legA10

ceg3

lem1

ceg4

ceg5

ceg8

lem2

ceg14

ceg17

lem3

lem4

lem5

lem6

ceg19

lem7

lem8

vpdC

lem9

lem10

lem11

lem12

0

10

00

20

00

30

00

40

00

50

00

60

00

70

00

80

00

90

00

10

00

0

11

00

0

cAMP (pmol/well)

Support: David Burstein is a fellow of the Edmond J. Safra bioinformatics program at Tel-Aviv university

FeaturesFeaturesA wide range of

features were

measured for every

L. pneumophila

gene:Similarity toeffectors

Similarity to host proteins

G-C content

Abundance in Metazoa/Bacteria

Secretory signals

Genome arrangement

Regulatory elements