open source drug discovery (osdd) connecting minds &...
TRANSCRIPT
Anshu Bhardwaj
Scientist & Community Builder
OSDD, CSIR
India
Open Source Drug Discovery (OSDD) Connecting Minds & Machines
A CSIR led team India consortium with global partnership for
affordable healthcare for all
National Knowledge Network “First Annual Workshop” “The e-Infrastructure of India” 31st Oct – 1st Nov 2012
First Disease Target : Tuberculosis; Now extended to Malaria Tuberculosis (TB) is one of leading causes of fatality, ranking second only to HIV as the killer infectious disease of adults worldwide.
Source:
http://www.globalhealthfacts.org/data/topic/map.aspx?ind=12
OSDD Focus : Tropical Neglected Diseases
At least one person in
the world is newly
infected with TB bacilli
every second
Over 1000 deaths a day or
3 deaths every 2 mins
New TB cases 2010
No New TB Drugs past 50 years
Research Spending Per New Drug
Company Number of drugs
approved
R&D Spending Per
Drug ($Mil)
Total R&D Spending
1997-2011 ($Mil)
AstraZeneca 5 11,790.93 58,955
GlaxoSmithKline 10 8,170.81 81,708
Sanofi 8 7,909.26 63,274
Roche Holding AG 11 7,803.77 85,841
Pfizer Inc. 14 7,727.03 108,178
Johnson & Johnson 15 5,885.65 88,285
Eli Lilly & Co. 11 4,577.04 50,347
Abbott Laboratories 8 4,496.21 35,970
Merck & Co Inc 16 4,209.99 67,360
Bristol-Myers Squibb Co. 11 4,152.26 45,675
Novartis AG 21 3,983.13 83,646
Amgen Inc. 9 3,692.14 33,229
Slate’s Bad Math : $55 million on each new drugs
Source: http://www.forbes.com/sites/matthewherper/2012/02/10/the-truly-staggering-cost-of-inventing-new-drugs/
Drug Discovery is a Long Risky process with Low Probability of Success
http://www.bayerpharma.com/en/research-and-development/processes/index.php
Prediction of non-toxic targets & inhibitors
Efficacy
Inhibitor should target the right protein in the pathogen (Mycobacterium tuberculosis)
Toxicity Inhibitor should not target any crucial protein in host (Human)
x
From a mathematical point of view, to create an accurate model of a single mammalian cell may require generating and then solving somewhere between 100,000 to one million equations
Biology is complex !!
http://news.vanderbilt.edu/2011/10/robot-biologist/
The human brain can only process seven pieces of data at a time!!! Need automation & new
technology to address the complexity
Predictive Science in the Drug Discovery (DD) Process
Predicting toxicity and metabolism
of drugs
Prediction tools and models to prioritize candidates molecules
HPC for OSDD Community by
Garuda/ CMMACS
Systems Level Models for DD - Target Identification - Pharmacomodeling - Off-target binding predictions
Virtual Screening for selected
targets& Models for predicting antiTB
and mutagenic properties
Systems Biology for predicting -
Drug-targets MOA
Why Open Source Drug discovery ?
Many eye balls make the bug shallow!
Lack of market incentive for TB
Successful Open Source Models
Human Genome Sequencing Initiative
Open Source Software Initiative (eg: Linux OS)
Android
The WWW
Real Innovation lies in
“Innovating how we innovate”…
“We cannot solve our problems with the same
thinking we used when we created them.”
Albert Einstein
Open TB Drug Discovery Platform Informatics to Experimental Validation to Clinical Trials
Target
Validation
of insiilico targets
Systems
Biology
Chem-
informatics
Mtb Strain
and Clone
Repository
Screening
Facility
Assay
Developm-
ent
OSDD
Chem and
Directed
Synthesis
Lead
Identificati-
on
Lead
Optimizati-
on
Target
Identificati
on for
Leads
DMPK In vivo
efficacy
Safety
Pharmacol-
ogy
Pre-
Clinical
Candidate Phase I-III
Pharmco-
genomics
OSDD portal Virtual Lab
Computer Scientists
Mathematical modeling
Data upload
Disease experts Gene/Protein Expression Analysis
Pharmacogenomics expert
Administrator Manages server
Virtual Screening
Unconventional Collaborative Network
Shaping Science 2.0 OSDD Semantic Web Architecture
OSDD Platform
System Architecture
Collaborative tools to accelerate neglected diseases research” in the book “Collaborative Computational Technologies for Biomedical Research”. Wiley and Sons. May 2011
Released : April 2010
Scientific Workflow Management Systems
http://www.tavaxy.org/ http://www.taverna.org.uk/ https://kepler-project.org/ http://galaxyproject.org/
Experimental data from biology and chemistry needs to be managed and analyzed systematically Large datasets and compute intensive analyses needs compute infrastructure
Weka Workflow
a. Convert CSV to test and train files
b. Convert both CSVs to arff files: output_file1 is always train file and output_file2 is test file.
c. Select two input files for Classifier. Change the parameters in right side panel for each tool
d. Evaluate model file: Classifier will be Misc -> SerializedClassifier
http://sysborg2.osdd.net
Electronic lab note books
APIs to submit workflow method to lab note book
APIs to submit results to lab note book
APIs to extract files from lab note books
More than 250 applications integrated
Customized workflow with grid infrastructure & applications
Jobs are invoked from Customized Galaxy and submitted to Gridway
Input file + parameters
Gridway meta
scheduler
LRM Torque
Clusters
Programs
Gridway runner Job template PBS
Customized
Job Status may be checked using DRMAA API
Get data customized for extracting files from open lab note book
Custom APIs for importing input files from OSDD’s open lab note book into Galaxy
Workflows and the result of the workflows are stored as separate lab note books
Lab note book has details of the experiments performed Results of one experiment may be invoked for analysis in another experiment All versions of the workflow and the results are stored Flexibility to execute nested workflows
Custom APIs for exporting results to OSDD’s Open lab note book
List of >250 modules integrated as web services by OSDD Community
S. No Resources Clients 1 KEGG: Kyoto Encyclopedia of Genes and Genomes 60 2 GetEntry: DDBJ sequence search by accessionID 43 3 GPSR : tools 33 4 PDB : Protein Data Bank 30 5 BioModel:mathematical models of biological DB 25 6 Gtps : Gene Trek in Prokaryote Space 8
7 WSDbfetch: retrieve entries from biological dbs using entry identifiers or accession no.
7
8 Gibv: Genome Information Broker for Viruses 7 9 DDBJ :DNA Data bank of Japan 7 10 Mafft: a multiple sequence alignment program 4 11 Fasta:- DDBJ database 4 12 Ensembl : maintains automatic annotation 4 13 VecScreen vector contamination 4 14 OMIM:Online Mendelian Inheritance in man 4 15 Gtop: Gene-product Informatics 3 16 GO: Gene Ontology 3 17 SPS : Splicing Profile based Score 2 18 GIBIS: Genome Information Broker for Insertion Sequence 1 19 RefSeq: database of sequence 1 20 GIB: Genome Information Broker 1 21 GIBEnv- DDBJ database 1 22 TxSearch: Database indexing & searching 1
Ongoing: Cheminformatics
Curated molecule datasets
Cheminformatics Models
Data Mining and Analysis
HT Virtual screening
PubChem
ChEMBL
DrugBank
Experimental Assays
Community of About 400
Other Active Communities: •OSDD Women Scientists Forum •OSDD Junior Scientists Forum
Background and Premise
Why are we doing this?
Crowd-Sourcing Large-Scale Data-Driven
Cheminformatics Analysis
Machine Learning
based
Computational
Models
Bioassay Datasets
Computational Tools
and Resources
People
Standard
re-ususable
models/
Publications
Pubchem Bioassay data
(approx. 1 lakh molecules/
dataset
6000 descriptors/molecule
Successful Models
Screen PubChem
(30 million)
Data amplification in Cheminformatics
Potential Hits
o Down sizing and random validation require multiple calculation for validation of results o Cross validation up to 50+ time for each experiment
The Problem
C-DAC’s Garuda Grid – Indian Grid Computing Initiative
• C-DAC is R&D organization under Ministry of Communication & Information Technology, India
• C-DAC’s Garuda Grid is targeted at providing a facility for the scientific community, which would enable them to seamlessly access the distributed resources
• Compute Power of GARUDA: ~ 70TFs (6000 CPUs)
• Currently there are 55 Garuda Partners
• Has NKN (National Knowledge Network) connectivity at 10Gbps
Internet/NKN
Results
NKN
OSDD-Garuda Interface
Weka in Galaxy
OSDD – Garuda Activities • Created OSDD Virtual organization and 70 users registered
under this VO.
• Garuda Portal customized to support OSDD requirements
• Galaxy – a biology workbench has been customized as per OSDD requirements
• JNU Head node was set up for hosting Galaxy
• Common data has been uploaded to Data Location for accessibility through Galaxy and Portal by all OSDD users
• Three cluster resources have been provided for OSDD activities – Hyderabad Cluster with 320 CPUs
– Chennai Cluster with 304 CPUs
– Param Yuva at Pune with 4368 CPUs
• Hand-holding users from the community & resolving their queries
OSDD Cheminformatics Programme Present Status
Models for anti-tubercular activity
Periwal et al (2012) BMC Pharmacology
Periwal et al (2011) BMC Res Notes
Models for anti-malarial activity
Periwal et al (2012) under review
Models for drug toxicity
Seal et al (2012) Journal of Cheminformatics
Models for specific drug targets
(GlmU, Kinases, DAP)
Singla et al (2011) BMC Pharmacology
Garg et al (2010) BMC Bioinformatics
Garg et al (2010) BMC Bioinformatics
Models for drug metabolism
Mishra et al (2010) BMC Pharmacology
Databases and Datasets for Cheminformatics
Singh et al (2012) Nucleic Acids Research
Singla et al (2010) BMC Pharmacology
Collaboration on
cheminformatics training and
research
Trained ~ 50 students in
advanced cheminformatics data
analysis methods
Training for students on parallel
data analysis environments
TRAINING
OSDD Cheminformatics Programme Overview
Models for anti-tubercular activity
Models for anti-malarial activity
Models for drug toxicity
Models for drug metabolism
Computational
Resources for
Drug Discovery
(CRDD)
Models for specific drug targets NKN+
CDAC-Garuda
Public reporitories of
Chemical Data
(PubChem/ChEMBL/Drug
bank)
OSDD Chemical
Respository
(OSDDChem)
OSDD Chemistry
Outreach Programme
ANALYTICS DATA RESOURCES
Prioritization of biologically active molecules for assays
Predictive modeling of Drug Metabolism and toxicity
(predictive-insilico pre-clinical trial)
OUTCOMES
Anshu Bhardwaj Council of Scientific & Industrial Research (CSIR),
India
Chintalapati Janaki, Center for Development of Advanced Computing (C-DAC),
India
www.osdd.net 25-26 May 2011
Customized Galaxy with applications as Web Services and on the Grid for Open Source Drug Discovery (OSDD)
A CSIR led team India consortium with global partnership for affordable healthcare
Literature
Annotation Tools
Genomic Databases
Curated Annotations
Raw Annotations
OSDD C2D Community
800+ Student Researchers
Collaborative Curation
Pathway/Interactome | Gene Ontology | Protein Structure/Fold | Glycomics| Immunome
The “Connect to Decode” Programme
Community Curation!!
Wrong (mark in red)
Right (mark in green)
Online discussion
Working on the cloud..
OSDD Community Effort to Understand Mtb Biology
The largest Mtb Interactome
54 Authors 29 Institutions
More than 2500 views and 350 downloads till date
Published: July 11, 2012
Knowledge Discovery Systems S. no. Resource Description URL
1 SysBorg* Community interaction portal http://sysborg2.osdd.net
2 OSDDChem* Portal for submission/proposal of synthetic compounds for screening
http://crdd.osdd.net/osddchem
3 OSDDChemDesign Portal for submission/proposal for compounds for screening
http://180.149.49.37/servers/osddchemdesign
4 Tbrowse* Genome browser for Mtb http://tbrowse.osdd.net
5 IPW* Interacting partners database http://crdd.osdd.net/servers/ipw
6 curateTB Curated data on TB from literature http://180.149.49.37/servers/ctb
7 Structural Annotation* Structural proteome of Mtb http://proline.physics.iisc.ernet.in/Tbstructuralannotation
8 ccPDB* Compilation and creation of data sets from Protein Data Bank
http://crdd.osdd.net/raghava/ccpdb
9 GDoQ* Predicting novel/potent inhibitors against GLMU http://crdd.osdd.net/raghava/gdoq
10 KiDoQ:* Predicting novel/potent inhibitors against DHDPS http://crdd.osdd.net/raghava/kidoq
11 MbtA* QSAR and combinatorial library for MbtA
12 MetaPred* Prediction of cytochrome P450 isoform responsible for metabolizing a drug molecule
http://crdd.osdd.net/raghava/metapred
13 Anti-tubercular models* Predictive models for anti-tubercular molecules using machine learning on high throughput biological screening data sets
14 Mutagenicity models* In-silico Predictive Mutagenicity Model Generation Using Supervised Learning Approaches
15 Natural product database - β version
Database of biologically active phytomolecules and plant extracts with anti-mycobacterial activity
http://crdd.osdd.net/osddchem/biophytmol
16 Pharmacomodeling predictions*
Modeling metabolic adjustment in Mtb upon treatment with isoniazid
17 Galaxy workflow engine Workflow engine to plugin applications for generating computational pipelines
http://sysborg2.osdd.net
* Published Available
Within weeks, 830 volunteered to re-annotate the entire M. tuberculosis genome. The work started in December 2009 and was completed by April 2010, packing nearly 300 man-years into 4 months!
Source: Munos B. Can Open-Source Drug R&D Repower Pharmaceutical Innovation? Clin Pharmacol Ther 2010;87:534–536
Source: Hiroaki Kitano Nature Chemical Biology 7, 323–326 (2011)
Social engineering for virtual 'big science' in systems biology
OSDD : A Global Community - More than 6500 members from over 130 countries
Statistics as of October 2012
Together we can …
.. and we should !
http://www.osdd.net http://c2d.osdd.net
[email protected] [email protected]
anshu.bhardwaj Report of the CEWG of WHO
Recognised OSDD as an Open
Innovation Model 5 April 2012 | Geneva
How Open Source Drug Discovery Is
Helping India Develop New Drugs
Apr 9, 2012
DNDi POLICY BRIEF recognised
OSDD as part of Global Landscape
for Neglected Diseases R&D
April 2012
Crowd Sourcing
Innovation:
CSIR portal for OSDD
2011
Crowd-Sourcing Drug Discovery
24 February 2012
Vol. 335 no. 6071 p. 909