seqan and openms integration workshop · 2017. 5. 23. · temesgen dadi, julianus pfeuffer,...

Post on 02-Oct-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn

The Center for Integrative

Bioinformatics (CIBI)

SeqAn and OpenMSIntegration Workshop

Julianus Pfeuffer, Alexander Fillbrunn

Mass-spectrometry data analysis in KNIME

OpenMS• OpenMS – an open-source C++ framework for computational mass

spectrometry

• Jointly developed at ETH Zürich, FU Berlin, University of Tübingen

• Open source: BSD 3-clause license

• Portable: available on Windows, OSX, Linux

• Vendor-independent: supports all standard formats and vendor-formats through proteowizard

• OpenMS TOPP tools – The OpenMS Proteomics Pipeline tools

– Building blocks: One application for each analysis step

– All applications share identical user interfaces

– Uses PSI standard formats

• Can be integrated in various workflow systems

– Galaxy

– WS-PGRADE/gUSE

– KNIME

Kohlbacher et al., Bioinformatics (2007), 23:e191

OpenMS Tools in KNIME

• Wrapping of OpenMS tools in KNIME via GenericKNIMENodes(GKN)

• Every tool writes its CommonToolDescription (CTD) via its command line parser

• GKN generates Java source code for nodes to show up in KNIME

• Wraps C++ executables and provides file handling nodes

Installation of the OpenMS plugin

• Community-contributions update site (stable & trunk)– Bioinformatics & NGS

• provides > 180 OpenMS TOPP tools as Community nodes – SILAC, iTRAQ, TMT, label-free, SWATH, SIP, …

– Search engines: OMSSA, MASCOT, X!TANDEM, MSGFplus, …

– Protein inference: FIDO

Peak

DataMaps

Annotated

Maps

Data Flow in Shotgun Proteomics

HPLC/MSSample

Sig.

Proc.

Data Reduction

Diff.

Quant.

Identification

Differentially

Expressed

Proteins

100 GB

1 GB50 MB

50 MB 50 kB

Raw

Data

Quantification StrategiesQuantitative Proteomics

Relative Quantification

Labeled

In vivo

14N/15N SILAC

In vitro

iTRAQ TMT 16O/18O

Label-Free

SpectralCounting MRM Feature-Based

Absolute Quantification

AQUA SISCAPA

After: Lau et al., Proteomics, 2007, 7, 2787

Quantitative Data – LC-MS Maps

• Spectra are acquired with rates up to dozens per second

• Stacking the spectra yields maps

• Resolution:

– Up to millions of points per spectrum

– Tens of thousands of spectra per LC run

• Huge 2D datasets of up to hundreds of GB per sample

• MS intensity follows the chromatographic concentration

LC-MS Data (Map)

10

Quantification(15 nmol/µl, 3x over-expressed, …)

Label-Free Quantification (LFQ)

• Label-free quantification is probably the most natural way of quantifying – No labeling required, removing further sources of

error, no restriction on sample generation, cheap

– Data on different samples acquired in different measurements – higher reproducibility needed

– Manual analysis difficult

– Scales very well with the number of samples, basically no limit, no difference in the analysis between 2 or 100 samples

LFQ – Analysis Strategy

1. Find features in all maps

1. Find features in all maps

2. Align maps

LFQ – Analysis Strategy

1. Find features in all maps

2. Align maps

3. Link corresponding features

LFQ – Analysis Strategy

1. Find features in all maps

2. Align maps

3. Link corresponding features

4. Identify features

GDAFFGMSCK

LFQ – Analysis Strategy

1. Find features in all maps

2. Align maps

3. Link corresponding features

4. Identify features

5. Quantify

GDAFFGMSCK

1.0 : 1.2 : 0.5

LFQ – Analysis Strategy

Feature-Based Alignment

• LC-MS maps can contain millions of peaks

• Retention time of peptides and metabolites can shift between

experiments

• In label-free quantification, maps thus need to be aligned in

order to identify corresponding features

• Alignment can be done on the raw maps (where it is usually

called ‘dewarping’) or on already identified features

• The latter is simpler, as it does not require the alignment of

millions of peaks, but just of tens of thousands of features

• Disadvantage: it replies on an accurate feature finding

Feature-Based Alignment

~350,000 peaks

~ 700 features

Feature Finding

• Identify all peaks belonging to one peptide

• Key idea:

– Identify suspicious regions (e.g. highest peaks)

– Fit a model to that region and identify peaks explained by it

Feature Finding

• Extension: collect all data points close to the seed

• Refinement: remove peaks that are not consistent with the model

• Fit an optimal model for the reduced set of peaks

• Iterate this until no further improvement can be achieved

Map 1

Map 2

Map k

rt

m/z

T1

T2

Tk

Consensus map

• Dewarp k maps onto a comparable coordinate system

• Choose one map (usually the one with the largest number of features) as reference map (here: map 2 -> T2 = 1)

Multiple Alignment

rt

LFQ with OpenMS in KNIME

• Identification• Feature finding and mapping• Map alignment• Feature linking• Statistical analysis with R Snippets• Visualization with KNIME plotting nodes

Preprocessing of single maps

Combining information of maps

Statistical post-processing and visualization

top related