top-down characterization of proteins in bacteria with unsequenced genomes

29
Top-down characterization of proteins in bacteria with unsequenced genomes Nathan Edwards Georgetown University Medical Center

Upload: zola

Post on 20-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Top-down characterization of proteins in bacteria with unsequenced genomes. Nathan Edwards Georgetown University Medical Center. Microorganism Identification. Homeland-security/defense applications Long history of fingerprinting approaches Clinical applications in strain identification: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Top-down characterization of proteins in bacteria with unsequenced genomes

Top-down characterization of proteins in bacteria with

unsequenced genomes

Nathan EdwardsGeorgetown University Medical Center

Page 2: Top-down characterization of proteins in bacteria with unsequenced genomes

2

Microorganism Identification

Homeland-security/defense applications Long history of fingerprinting approaches

Clinical applications in strain identification: Selection of treatment and/or antibiotics

New applications in microbiome analysis: Bacterial colonies in gut, .... Chronic wound infections

Compete with genomic approaches? PCR, Next-gen sequencing Primary sales-pitch is speed.

Page 3: Top-down characterization of proteins in bacteria with unsequenced genomes

Microorganism Identifications

Match spectra with proteome (or genome) sequence for (species) identity Provides robust match with respect to

instrumentation and sample prep

Many bacteria will never be sequenced or "finished"... Pathogen simulants, for example

...but many have – about 2500 to date.

3

Page 4: Top-down characterization of proteins in bacteria with unsequenced genomes

Microorganism Identifications

Match spectra with proteome (or genome) sequence for (species) identity Provides robust match with respect to

instrumentation and sample prep

Many bacteria will never be sequenced or "finished"... Pathogen simulants, for example

...but many have – about 2500 to date.

Can we use the available sequence to identify proteins from unknown, unsequenced bacteria? Yes, for some proteins in some organisms!

4

Page 5: Top-down characterization of proteins in bacteria with unsequenced genomes

5

Intact protein LC-MS/MS

Crude cell lysate

Capilary HPLC C8 column

LTQ-Orbitrap XL

Precursor scan: 30,000 @ 400 m/z

Data-dependent precursor selection: 5 most abundant ions 10 second dynamic

exclusion Charge-state +3 or

greater

CAD product ion scan 15,000 @ 400 m/z

Page 6: Top-down characterization of proteins in bacteria with unsequenced genomes

6

E:\Yersinia Work\yr_inclusion 3/11/2009 3:43:13 PM yrohdei

RT: 19.04 - 25.39

19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0

Time (min)

0

20

40

60

80

100

0

20

40

60

80

100

Re

lative

Ab

un

da

nce

25.3619.9919.93

25.2720.04 25.2319.89 23.0322.97 23.08

20.1019.83 23.64 25.1923.7022.88 24.6324.5720.1422.82

20.2019.7822.7220.2519.48

22.5220.41 22.0821.8420.60 21.04

20.00

21.03 21.46

NL: 1.66E8

TIC MS yr_inclusion

NL: 1.01E7

TIC F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00] MS yr_inclusion

yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00]

200 400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

10

20

30

40

50

60

70

80

90

100

Re

lative

Ab

un

da

nce

576.83z=2

840.16z=7

720.39z=2 903.81

z=3785.41

z=4694.62

z=4

584.57z=4

928.49z=4559.55

z=41804.48

z=?992.53

z=3200.78z=?

329.71z=?

1253.14z=?

555.29z=4

1610.27z=?

1883.75z=?

1491.23z=?

1118.93z=?

1666.89z=?

1345.30z=?

461.16z=?

756.70 +8 MW 6044.11

CID Protein Fragmentation Spectrum from Y. rohdei

Page 7: Top-down characterization of proteins in bacteria with unsequenced genomes

7

Enterobacteriaceae Protein Sequences

Exhaustive set of all Enterobacteriaceae family protein sequences from Swiss-Prot, TrEMBL, RefSeq, Genbank, and [CMR]

...plus Glimmer3 predictions on RefSeq Enterobacteriaceae genomes Primary and alternative translation start-sites

Filter for intact mass in range 1 kDa – 20 kDa 253,626 distinct protein sequences, 256 species

Derived from "Rapid Microorganism Identification Database" (RMIDb.org) infrastructure.

Page 8: Top-down characterization of proteins in bacteria with unsequenced genomes

8

ProSightPC 2.0

Product ion scan decharging Enabled by high-resolution fragment ion

measurements THRASH algorithm implementation

Absolute mass search mode 15 ppm fragment ion match tolerance 250 Da precursor ion match tolerance

"Single-click" analysis of entire LC-MS/MS datafile.

Page 9: Top-down characterization of proteins in bacteria with unsequenced genomes

Other tools

Explored using standard search engines: Decharge and format as charge +1 spectrum X!Tandem scoring plugin (ProSight, delta M) OMSSA, Mascot, etc…

MS-Tools: MS-Deconv, MS-TopDown, MS-Align, MS-Align+, MS-Align-E!

9

Page 10: Top-down characterization of proteins in bacteria with unsequenced genomes

10

E:\Yersinia Work\yr_inclusion 3/11/2009 3:43:13 PM yrohdei

RT: 19.04 - 25.39

19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0

Time (min)

0

20

40

60

80

100

0

20

40

60

80

100

Re

lative

Ab

un

da

nce

25.3619.9919.93

25.2720.04 25.2319.89 23.0322.97 23.08

20.1019.83 23.64 25.1923.7022.88 24.6324.5720.1422.82

20.2019.7822.7220.2519.48

22.5220.41 22.0821.8420.60 21.04

20.00

21.03 21.46

NL: 1.66E8

TIC MS yr_inclusion

NL: 1.01E7

TIC F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00] MS yr_inclusion

yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00]

200 400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

10

20

30

40

50

60

70

80

90

100

Re

lative

Ab

un

da

nce

576.83z=2

840.16z=7

720.39z=2 903.81

z=3785.41

z=4694.62

z=4

584.57z=4

928.49z=4559.55

z=41804.48

z=?992.53

z=3200.78z=?

329.71z=?

1253.14z=?

555.29z=4

1610.27z=?

1883.75z=?

1491.23z=?

1118.93z=?

1666.89z=?

1345.30z=?

461.16z=?

756.70 +8 MW 6044.11

CID Protein Fragmentation Spectrum from Y. rohdei

Match to Y. pestis 50SRibosomal Protein L32

Page 11: Top-down characterization of proteins in bacteria with unsequenced genomes

Exact match sequence…

11

Page 12: Top-down characterization of proteins in bacteria with unsequenced genomes

Phylogeny: Protein vs DNA

12

Protein Sequence 16S-rRNA Sequence

Page 13: Top-down characterization of proteins in bacteria with unsequenced genomes

What about mixtures?

13

Page 14: Top-down characterization of proteins in bacteria with unsequenced genomes

14

Shared Small Ribosomal Proteins

Page 15: Top-down characterization of proteins in bacteria with unsequenced genomes

15

Shared Small Ribosomal Proteins

Page 16: Top-down characterization of proteins in bacteria with unsequenced genomes

16

Identified E. herbicola proteins

30S Ribosomal Protein S19 m/z 686.39, z 15+, E-value 1.96e-16, Δ 0.007

Six proteins identified with |Δ| < 0.02

Page 17: Top-down characterization of proteins in bacteria with unsequenced genomes

17

DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128

Eight proteins identified with "large" |Δ|

Identified E. herbicola proteins

Page 18: Top-down characterization of proteins in bacteria with unsequenced genomes

18

DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 1.91e-58

Use "Sequence Gazer" to find mass shift ΔM mode can "tolerate" one shift for free!

Identified E. herbicola proteins

Page 19: Top-down characterization of proteins in bacteria with unsequenced genomes

ProSightPC: ΔM mode

19

Protein Sequence

ExperimentalPrecursor

ΔM

b- and y-ions

Also: PIITA - Tsai et al. 2009

Page 20: Top-down characterization of proteins in bacteria with unsequenced genomes

ProSightPC: ΔM mode

20

Protein Sequence

ExperimentalPrecursor

ΔM

b- and y-ions

ΔM b'- and y'-ions

Also: PIITA - Tsai et al. 2009

Match a single "blind" mass-shift for free!

Page 21: Top-down characterization of proteins in bacteria with unsequenced genomes

ProSightPC: ΔM mode

21

Protein Sequence

ExperimentalPrecursor

ΔM

b-, b'-, y- and y'-ions

ΔM

Also: PIITA - Tsai et al. 2009

Match a single "blind" mass-shift for free!

Page 22: Top-down characterization of proteins in bacteria with unsequenced genomes

22

DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128

Extract N- and C-terminus sequence supported by at least 3 b- or y-ions

Identified E. herbicola proteins

Page 23: Top-down characterization of proteins in bacteria with unsequenced genomes

23

E. herbicola protein sequences

Page 24: Top-down characterization of proteins in bacteria with unsequenced genomes

24

E. herbicola sequences found in other species

Page 25: Top-down characterization of proteins in bacteria with unsequenced genomes

25

Phylogenetic placement of E. herbicola

Phylogram Cladogramphylogeny.fr – "One-Click"

Page 26: Top-down characterization of proteins in bacteria with unsequenced genomes

Genome annotation errors

UniProt: E. coli Cell division protein ZapB

22 (371) E. coli strains

26

MQFRRGMTMSLEVFEKLEAKVQQAIDTITL…

3 (204)17 (166)

0 (2)

Page 27: Top-down characterization of proteins in bacteria with unsequenced genomes

Genome annotation errors

UniProt: E. coli Cell division protein ZapB

22 (371) E. coli strains Need ±1500 Da precursor tolerance…

27

MQFRRGMTMSLEVFEKLEAKVQQAIDTITL…

3 (204)17 (166)

0 (2)

Page 28: Top-down characterization of proteins in bacteria with unsequenced genomes

28

Conclusions Protein identification for unsequenced organisms.

Identification and localization for sequence mutations and post-translational modifications.

Extraction of confidently established sequence suitable for phylogenetic analysis.

Genome annotation correction.

New paradigm for phylogenetic analysis?

Page 29: Top-down characterization of proteins in bacteria with unsequenced genomes

29

Acknowledgements

Dr. Catherine Fenselau Avantika Dhabaria, Joe Cannon*, Colin Wynne* University of Maryland Biochemistry

Dr. Yan Wang University of Maryland Proteomics Core

Dr. Art Delcher University of Maryland CBCB

Funding: NIH/NCI