open innovation in the agri-food sector mark …...open innovation in the agri-food sector mark...
TRANSCRIPT
Open innovation in the agri-food sector Mark Forster Syngenta R&D Chemical Research March 2015
Classification: Public
2
Outline of presentation.
Software
Free / Libre Open Source Software (FLOSS)
Scientific open source
Open source in Syngenta
Open access data
Chemical data resources
Data searching and comparison
Projects
OpenPHACTS
Elixir
Chemical screening
Plant science data sharing
3
Software.
Free / Libre Open Source Software (FLOSS)
Open source, code freely available, available freely
Free - No (license) cost, gratis, free as in beer.
Libre - Free to modify, distribute, free as in speech.
Copyright - Protect rights of copyright holder, re: distribution
Copyleft - Protect author rights for sharing, derivative works
Licenses
Permissive - e.g. BSD, Derivative code can be closed,
sold, not shared.
Restrictive – e.g. GPL, Derivatives open, community orientation,
binary distribution optional.
4
Examples of open source.
From the smallest device to the largest supercomputers
From the smallest to the largest companies
5
Scientific open source.
Long history – QCPE (1962)
Sourceforge, Google Code (2012)
Essential for reproducible science
Benefit of easier large scale deployment (cloud)
6
Open source in Syngenta.
Codes for docking or MD can be cumbersome
- Text file input / output, editing required !
Solution was to ‘automate’ editing using PERL
Can the automation script be released ?
Issues: Warranty , Liability, Support, Host, IP ours/3rd party
Outcome: Code released 2007
First ‘official’ open source release by Syngenta
7
Open source in Syngenta.
Contribute to existing projects
Mzmine – GCMS metabolomics
Create new projects
LICSS - Chemical features in Excel
PDBclimb - 3D command line molecule builder
2011 - Simplified process
Open source checklist
8
Open source in Syngenta.
Sponsor / support industry and academic open source dialogue
Wellcome Trust, EBI Industry programme funded ‘retreat’ meetings:
2011 – Molecular Informatics Open Source Software (MIOSS)
2012 – Systems, pathways, interactions, networks (SPIN-OSS)
9
Open access data in Syngenta.
We use public life science data of many relevant types
Genome Transcriptome DNA/RNA sequence
Protein sequence Protein structure Chemical bioactivity
10
Elixir – European Life Science Infrastructure.
The pan European sustainable life science data infrastructure.
Supporting life science research, applications and industry.
11
Open access chemical structure / property data
Discuss three significant open access chemistry resources:
Chemspider (RSC) , Pubchem (NIH) , ChEMBL, ChEBI (EBI).
12
Chemical data resources
Chemical data resources are 10Ms of structures
Manage ‘large’ chemical structure data sets
eMolecules February 2015 catalogue - 7M
Syngenta corporate – several million.
Syngenta vendor cumulative – 25M+
ChEMBL (public bioactivity data) – 1.4M
SureCHEMBL (patent data) – 15M
Zinc ‘All Purchasing’ – 19M
Zinc ‘Boutique’ – 12M
Enamine virtual catalogue - 26M
13
Open access chemical structure / property data
ChEMBL target based search.
14
Open access chemical structure / property data
ChEMBL – filter for most active compounds against given target
Individual compound record
15
ChEMBL – Pesticide data integration.
ChEMBL public bioactivity data source.
Curated data from medicinal chemistry literature
Syngenta - ChEMBL collaboration:
Pesticide focused literature curation January 2013-June 2013
Data quality control and review – July 2013-June 2014.
28k compounds
not in ChEMBL
Data public
availability in
August 2014
16
Chemical data set comparisons.
International Chemical Identifier (InChI)
Developed by IUPAC / NIST
Unique textual identifier for chemical structures
Generated by an algorithm not a committee or authority.
Ethanol CH3CH2OH
SMILEs CCO
InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3 (standard InChI)
InChI Key LFQSCWFLJHTTHZ-UHFFFAOYSA-N
Utility: Uniquely valuable for chemical identity checking, and cross
linking of same structure in different data sets.
Sophisticated treatment of stereoisomerism, salts, tautomers,
isotopes etc. Now used in Syngenta registration, much faster than
at pre-checking identity than pure commercial solution.
17
Chemical data set comparisons.
Fast similarity search: Openbabel search index (fs file) creation
Perl scripted search of data in SMILEs format
18
Open innovation in plant sciences
Donate data to open access public resources
Rice genome sequence (Goff et al. 2002)
Tomato sequencing (Sato et al. 2012)
19
Open innovation in plant sciences
1999 – A strain of wheat stem rust identified in Uganda (Ug99).
Identify disease resistance markers and related genes.
Germplasm freely available to wheat breeding organizations.
CIMMYT, Syngenta, Cornell University, Gates Foundation.
‘A breakthrough in breeding technology for wheat rust.’
20
Open innovation – chemical screening.
IVCC – Innovative vector control consortium.
Screening of Syngenta compounds for anti-mosquito activity.
Public-private partnership funded by Gates foundation.
21
Open innovation – chemical screening.
Liverpool school of tropical medicine , neglected tropical diseases (NTD)
Screening of Syngenta compounds for Plasmodium / TB activity.
Agreed project (early stage).
Chemistry donated in non competing field.
22
Open innovation – chemical data integration.
• Open pharmacalogical concept triple store – http://www.openphacts.org
• EU (IMI) and pharma company funded (16M USD, 3.5 years)
• Use ‘semantic web’ methods to integrate data across R&D process.
• Commercially hosted data store (public data), agreed ontologies.
• Exemplar applications (commercial and/or open source)
• Now OpenPHACTS foundation.
23
Acknowledgements.
All colleagues engaged in the work presented here:
M.Robinson (Syngenta foundation)
P.Wege (IVCC, LSTM, ChEMBL)
M.Earll , C.Pudney, M.Seymour (Mzmine)
K.Lawson (LICSS Speadsheet)
S.Albinson (Open source checklist)
B.Dietrich, R.Cade (Tomato)
S.Goff & team (Rice)
OpenPHACTS team
J.Overington and staff of EBI / Elixir
And any others I have missed.
Questions?