model management tools for improved reproducibility in systems biology
TRANSCRIPT
Model management tools for improved reproducibility in systems biology
Dagmar Waltemath, on behalf of the SEMS team
University of Rostock, Germany
10th International CellML Workshop Auckland, June 2016
2
On models and simulations
Model Simulation
Figs: BioModels (top) and DOI: 10.1073/pnas.88.16.7328 (bottom)
3
Most scientific discoveries rely on previous findings.
Model
Fig.: Tyson 2001 (BIOM195)
Fig.: Tyson 1991 (BIOM005)
Successor
Fig.: History of Cell Cycle models in BioModels
4
Can we rely on findings that we ourselves cannot evaluate? (Probably not!)
“only in ~20–25% of the projects were the relevant published data completely in line with our in-house findings (Fig. 1c). In almost two-thirds of the projects, there were inconsistencies [..] that either considerably prolonged the duration of the target validation process or, in most cases, resulted in termination of the projects because the evidence [..] was insufficient to justify further investments into these projects.” Prinz et al (2011)
5
We identified key challenges of reproducibility insystems biology and systems medicine.
Lack of data standards – Lack of data quality and quantity – Lack of data availability – Lack of transparency
6
A lack of data availability makes it impossible for researchers to reproduce results.
● Model code in BioModels, including supplemental with a how-to reproduce the figures given in the original paper
● Online tool makes data available and browseable
TriplexRNA
Recon 2Recon 2
● Publication backed up with a website containing the supplemental material
● Model code in (non-curated) BioModels● Visualisation of the model can easily
be explored● References to original works
How can we support scientistswho wish to share model-based results?
Issues– Simulation studies comprise
of several files
– Data is heterogeneous, distributed, complex
– Data changes over time
– Documentation of the how the study was performed often missing
7
A lack of data availability makes it impossible for researchers to reproduce results.
How can we support scientistswho wish to share model-based results?
Issues– Simulation studies comprise
of several files
– Data is heterogeneous, distributed, complex
– Data changes over time
– Documentation of the how the study was performed often missing
Our solutions– Tool support for the
COMBINE Archive – lowering the effort to share reproducible models
– Graph-based storage of model-related files – integrated & searchable virtual experiments
– Model version control –towards a provenance of models
8
The COMBINE archive bundles all files necessary to reproduce a simulation study.
COMBINE archive toolkit
● manage COMBINE archives
– Explore
– Edit
– Share
– Publish● Used in: PMR 2, JWS Online,
SED-ML Web Tools, OpenCor …
WebCAT, Scharm et al 2014
9
STON, SED-ML DB & MASYMOS
Integrated storage & retrieval system (MASYMOS)
doi: 10.1093/database/bau130
doi: 10.1186/s13326-015-0014-4
Search across heterogeneous data, ontologies, and structures→poster
Tailor-made storage systems (STON, SED-ML DB)
Using graph databases to integrate standardised model-based data
https://dx.doi.org/10.6084/m9.figshare.3382993.v1
SED-ML DB in JWS Online
BioModelsPhysiome Model repository
10
BiVeS & COMODI
Model version control (BiVeS, COMODI) Provenance-to-be (COMODI)
Tracking the evolution of a CellML/SBML model over time
doi: 10.1093/bioinformatics/btv484
Tracking the evolution of simulation studies and biological systems.
https://dx.doi.org/10.6084/m9.figshare.2543059.v5
Physiome Model repository
doi: 10.1093/bioinformatics/btv484
11
What's next? Models for the clinic, or: Bridging the gap between standards for systems biology & systems medicine
Fig. courtesy Atalag et al (2015) http://hdl.handle.net/2292/27911
Thank you for your attention.
m n @SemsProject
Martin ScharmBiVeS, COMODI, COMBINE Archive Video master
Fabienne LambuschPattern & structure search in SBML models
Mariam NassarRank aggregation
Tom GebhardtSBGN-compliant diffs
Martin PetersM2CAT, COMBINE Archive, SED-ML database
Vasundra ToureSTON, SBGN-ED, SBGN symbol of the month
Ron HenkelMASYMOS, MORRE
www.sems.uni-rostock.de
References
Atalag et al (2015) http://hdl.handle.net/2292/27911
Bergmann et al. (2014) F.T. Bergmann, R. Adams, S. Moodie, J. Cooper, M. Glont et al.: COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project. BMC Bioinformatics (2014)
Prinz et al. (2011) Prinz, Florian, Thomas Schlange, and Khusru Asadullah. "Believe it or not: how much can we rely on published data on potential drug targets?." Nature reviews Drug discovery 10.9 (2011): 712-712.
Schmitz et al. (2014) Schmitz, Ulf, et al. "Cooperative gene regulation by microRNA pairs and their identification using a computational workflow." Nucleic acids research (2014): gku465.
Thiele et al. (2013) Thiele, Ines, et al. "A community-driven global reconstruction of human metabolism." Nature biotechnology 31.5 (2013): 419-425.
Waltemath & Scharm (2014) D. Waltemath and M. Scharm: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit. Workshop on Data Management for the Life Sciences (2014), Hamburg, BTW 2014.
Waltemath & Wolkenhauer (2016) D. Waltemath and O. Wolkenhauer: How modeling standards, software, and initiatives support reproducibility in systems biology and systems medicine. IEEE Transactions on Biomedical Engineering (2016) in the press.