indo us 2012
DESCRIPTION
OSDD Presentation at Washington DCTRANSCRIPT
Abhik Seal Phd Student Indiana University)
(Researcher OSDD CSIR)
Anshu Bhardwaj Scientist, OSDD Unit
Council of Scientific & Industrial Research Delhi, India
23rd March 2012, Washington DC http://www.osdd.net
Open Source Drug Discovery CSIR-led Team India Consortium with Global Partnership
Affordable Healthcare for All
Cheminformatics and Open Source Drug Discovery: a case study in academic collaboration between the
U.S. and India
First Disease Target : Tuberculosis Tuberculosis (TB) is one of leading causes of fatality, ranking second only to HIV as the killer infectious disease of adults worldwide.
Source: http://www.globalhealthfacts.org/data/topic/map.aspx?ind=12
OSDD Focus : Tropical Neglected Diseases
At least one person in the world is newly infected with TB bacilli every second
Over 1000 deaths a day or 3 deaths every 2 mins
New TB cases 2010
Countries that had reported at least one XDR-TB case by end March 2011
Argentina Bhutan France Japan Namibia Republic of Korea ThailandArmenia Cambodia Georgia Kazakhstan Nepal Republic of Moldova TogoAustralia Canada Germany Kenya Netherlands Romania TunisiaAustria Chile Greece Kyrgyzstan Norway Russian Federation UkraineAzerbaijan China India Latvia Pakistan Slovenia United Arab EmiratesBangladesh Colombia Indonesia Lesotho Peru South Africa United KingdomBelgium Czech Republic Iran (Islamic Rep. of) Lithuania Philippines Spain United States of AmericaBotswana Ecuador Ireland Mexico Poland Swaziland UzbekistanBrazil Egypt Israel Mozambique Portugal Sweden Viet NamBurkina Faso Estonia Italy Myanmar Qatar Tajikistan
TB Drug Discovery
It commemorates the discovery of TB bacillus (Mycobacterium tuberculosis) through sputum microscopy which is still the diagnostics used to detect TB! No progress whatsoever, and we are discussing 'network communications'
World TB Day is 24th March 2012
Challenges with Drug Discovery of Neglected Diseases
• Lack of market incentives • TB is a complex disease – latency, relapse, resistance • Clinical trials take a long time & study of relapse
needs long follow up (up to 18months) • Patient access is not direct, is through government
agencies
Conventional vs Open Innovation Approach to Drug Discovery
Corporate HQ R&D
Cancer R&D
Neurological Disorder
Packaging
Sales
Clinical Trial
…
… R&D
Diabetics
…
Production
Pre-Clinical Trial Formulation
Conventional vs Open Innovation Approach to Drug Discovery
Research groups Industry collaboration Individual participation Open Data Sharing
OSDD Process Flow
Clinical trials
Public Funding of Clinical Trials
Government of India commitment - $46 million
Drug Target Identification
Virtual Screening
Chemical Synthesis/library
Screening/ Hit
identification
Hit to
Lead
Clinical Trials
Candidate
45
19
9
6
2
Status: OSDD Projects
Other projects aim to develop tools, databases and repositories for the OSDD community
1
September 2008…………………………………………………………………March 2012
OSDD Platform
System Architecture
Collaborative tools to accelerate neglected diseases research” in the book “Collaborative Computational Technologies for Biomedical Research”. Wiley and Sons. 2011
Gene/operon predictions
Gene Expression
Regulatory Elements
Variation and repeats
Orthologs
Drug targets
Pathway/ Networks
More than a Million Data Points are now “Linked”
Deeksha Bhartiya Nitin Kumar
Mtb Data
* This is representative set of post-genomics data available on TB Collaborator: Dr. Vinod Scaria
Post-genomics data on Mtb is ‘Linked’ from disparate resources
s.no. Source Tracks
1 UCSC Genome Browser on Mycobacterium tuberculosis H37Rv 06/20/1998 Assembly 6
2 WebTb Operon Map
3 Argo Genome Browser not web based
4 PGBrowser: Pathogen Genome Browser 3
5 BioHealthBase 16
6 Ensembl ~15
7 Tbrowse 100
Comparison of Browsers
DeekshaBhartiya
OpenLabNoteBook on SysBorgTB http://sysborgtb.osdd.net/bin/view/OpenLabNotebook/TBMapDataset
Deeksha Bhartiya Nitin Kumar
From a mathematical point of view, to create an accurate model of a single mammalian cell may require generating and then solving somewhere between 100,000 to one million equations
Biology is complex !!
http://news.vanderbilt.edu/2011/10/robot-biologist/
The human brain can only process seven pieces of data at a time!!! Need automation & new
technology to address the complexity
Literature
Annotation Tools
Genomic Databases
Curated Annotations
Raw Annotations
OSDD C2D Community
800+ Student Researchers
Collaborative Curation
Pathway/Interactome | Gene Ontology | Protein Structure/Fold | Glycomics| Immunome
The “Connect to Decode” Programme
Community Curation!!
Wrong (mark in red)
Right (mark in green)
Online discussion
Working on the cloud..
Many eye balls, make the ‘bug’ shallow!!!
Mtb Metabolome Map on Payao
Sub-map of the metabolic network on Payao
SBI developed customized plug ins for OSDD for generating the metabolic map
C2D April 2010 – Onsite Activity
iOSDD890 From Social Network to Biological Network
OSDD Community Effort to Understand Mtb Biology
Within weeks, 830 volunteered to re-annotate the entire M. tuberculosis genome. The work started in December 2009 and was completed by April 2010, packing nearly 300 man-years into 4 months!
Source: Munos B. Can Open-Source Drug R&D Repower Pharmaceutical Innovation? Clin Pharmacol Ther 2010;87:534–536
Source: Hiroaki Kitano Nature Chemical Biology 7, 323–326 (2011)
Social engineering for virtual 'big science' in systems biology
Connect to Decode Phase II - Themes
Large student community from colleges and university are Cloning, Expressing and Purifying selected Mtb genes
To clone and express select genes of Mycobacterium tuberculosis Open Access Repository of Mtb clones
More than 120 sequence confirmed clones are ready for distribution
http://sysborg2.osdd.net/group/sysborgtb/project-details/-/projects/show/3212
OSDDChem: Open Chemistry Initiative
A Large number of molecules are being
submitted for screening
Bhardwaj et al. Tuberculosis (Edinb). 2009 Sep;89(5):386-7
http://tbrowse.osdd.net
Computational Resources developed with Community participation
Bhardwaj et al. 2011 John Wiley & Sons, Inc.
Mtb essential genes database
TrapTB Mtb drug targets database
Chembio Toolkit Workflow engine with federated resources
AmPhyDB Antimycobacterial Phytomolecule Database
http://sysborg2.osdd.net
A Comprehensive database of Mtb transporters Mtb-Human Interaction Database
Q. Find novel genes and mutations & map known drug resistance mutations on genome of an MDR-TB strain
Enabling Complex Computational Analysis For Experimental Biologists/Chemists
Galaxy provides - Simplified GUI design Ease of integrating modules Fewer components for creating workflows Sharable workflows for better collaboration
Get data customized for extracting files from open lab note book
Custom APIs for importing input files from OSDD’s open lab note books
Workflows and the result of the workflows are stored as separate lab note books Lab note book has details of the experiments performed Results of one experiment may be invoked for analysis in another experiment All versions of the workflow and the results are stored Flexibility to execute nested workflows
Custom APIs for exporting results to OSDD’s Open lab note book
Our Approach : Data & Tool integration
In addition to access heterogeneous sources of data like BioMart Central/UCSC Table Browser (http://genome.ucsc.edu/), Open lab note
book of http://sysborg2.osdd.net is interfaced with Galaxy
Standalone databases and tools Tools as web services:
• Web services can be added as tools in Galaxy • Extends the potential of galaxy workflows
The process
Identify the module
Search for the WSDL
Code for client
Write XML for Galaxy
Configure & Integrate to
Galaxy
ChemBio toolkit : >300 Modules integrated by OSDD Community
S. No Resources Clients 1 KEGG: Kyoto Encyclopedia of Genes and Genomes 60 2 GetEntry: DDBJ sequence search by accessionID 43 3 GPSR : tools 33 4 PDB : Protein Data Bank 30 5 BioModel:mathematical models of biological DB 25 6 Gtps : Gene Trek in Prokaryote Space 8
7 WSDbfetch: retrieve entries from biological dbs using entry identifiers or accession no. 7
8 Gibv: Genome Information Broker for Viruses 7 9 DDBJ :DNA Data bank of Japan 7 10 Mafft: a multiple sequence alignment program 4 11 Fasta:- DDBJ database 4 12 Ensembl : maintains automatic annotation 4 13 VecScreen vector contamination 4 14 OMIM:Online Mendelian Inheritance in man 4 15 Gtop: Gene-product Informatics 3 16 GO: Gene Ontology 3 17 SPS : Splicing Profile based Score 2 18 GIBIS: Genome Information Broker for Insertion Sequence 1 19 RefSeq: database of sequence 1 20 GIB: Genome Information Broker 1 21 GIBEnv- DDBJ database 1 22 TxSearch: Database indexing & searching 1
OSDD Community suggests tools for integration in Galaxy
Pubchem Bioassay data
(approx. 100,000
molecules/ dataset
6000 descriptors/molecule
Successful Models
Screen PubChem
(30 million)
Data amplification: Cheminformatics
Potential Hits
o Down sizing and random validation require multiple calculation for validation of results o Cross validation up to 50+ time for each experiment
C-DAC’s Garuda Grid – Indian Grid Computing Initiative
C-DAC is R&D organization under Ministry of Communication & Information
Technology, India
C-DAC’s Garuda Grid is targeted at providing a facility for the scientific community,
which would enable them to seamlessly access the distributed resources.
Compute Power of GARUDA: ~ 70TFs (6000
CPUs)
Currently there are 55 Garuda Partners
Has NKN (National Knowledge Network) connectivity at 10Gbps
Features:
Customized Galaxy on GARUDA • Integrated with Grid Authentication mechanism - Indian Grid Certificate
Authority (IGCA)
• Integrated with Gridway Metascheduler - Job scheduling and management
• Integrated OSDD tools - Weka (for data mining) and Autodock (Virtual screening).
• Provided support to upload multiple input files as tar file
• Data libraries of OSDD community are uploaded and are shared by all users
• Integrated with PostgreSQL
Garuda- Galaxy Job Submission - Flow
Garuda-OSDD Server
Galaxy GUI
1. User selects tool and Input parameters
Galaxy Job Manager
Gridway Job runner
3. Gridway job runner uses user’s Garuda proxy file for job submission
2. Based on Tool, it sends the job to the correct runner.
Internet
Weka in Galaxy
Garuda Usage by OSDD: Job Accounting
High Performance Grid Computing for OSDD members
Anshu Bhardwaj Council of Scientific & Industrial Research (CSIR),
India
Chintalapati Janaki, Center for Development of Advanced Computing (C-DAC),
India
www.osdd.net 25-26 May 2011
Customized Galaxy with applications as Web Services and on the Grid for Open Source Drug Discovery (OSDD)
A CSIR led team India consortium with global partnership for affordable healthcare
“In the long history of human mankind those who have learned to collaborate and improvise most effectively have prevailed.” -- Charles Darwin
Cheminformatics: a strong case for community collaborative science
There is now an incredibly rich resource of public information relating compounds, targets, genes, pathways, and diseases. Just for starters there is in the public domain information on:
~30 million compounds and ~500,000 bioassays (PubChem, ChemSpider) ~60 million compound bioactivities (PubChem Bioassay) ~5,000 drugs (DrugBank) ~9 million protein sequences (SwissProt) and ~60,000 3D structures (PDB) ~14 million human nucleotide sequences (EMBL) ~20 million life science publications (PubMED) Multitude of other sets (drugs, toxicogenomics, chemogenomics, metagenomics …)
I have thus chosen ‘Cheminformatics’ to study the vast pool of chemical compounds much more in details and analyze so as to narrow down to potential drug candidate. With the unique combination of IT and Chemistry, I am confident that one can actually derive much more
meaningful information of a chemical entity on this earth. Rajdeep (BioIT) I am organic chemist. I prepared several organic molecules.We go for biological activity,
maximum times it gives negative result. But with help of informatics in chemistry we can predict molecular properties. We can replace many ligands or substituents or functional group easily. And we can design our desirable molecule. ---Chirupulo
I am doing my M.Pharm in pharmaceutical chemistry,and i like cheminformatics that i need
accurate results but soon....and i am really interested in molecular modelling...so I am here. --- Haffy manaf
Cheminformatics deals with information about chems. It combines tools and techniques of IT
for information about chemical entities at the finger tip on click of a mouse. Databases are available for properties of descriptors. Softwares help to calculate molecular properties. Cheminformatics thus come handy tool for learning chemistry.------ Dr Keshav Mohan
Community Speaks: What excites them about Cheminformatics
• Access to Journals for Chemical Structures • Lack of proper communication systems other than skype • Lack of software tools for accelerated drug discovery • Need of high speed internet • Need more experts to teach/train community members • Proper time schedule of IU cheminformatics classes
Challenges in implementation of Cheminformatics projects
Indiana University Initiatives (Prof David J Wild)
Cheminformatics Awareness
http://icep.wikispaces.com
Association Search – visualize literature supported associations between any two entities (compound, drug, gene, pathway, disease, side effect). PLoS One, in press.
Semantic Link Association Prediction (SLAP) – find most highly associated entities (compound, drug, gene, pathway, disease, side effect) to any other entity, based on probabilistic weightings of graph edges based on public experimental datasets. Paper in preparation
BioLDA – find most highly associated entities to any other entity based on a complex topic model analysis of the literature (PubMed). PLoS One, 2011, 6 (3), e17243
See also: WENDI (J. Cheminf., 2010,2,6); Chemogenomic Explorer (BMC Bio. 2011,12,256), ChemLDA, ChemBioGrid (J. Chem. Inf. Model., 2007; 47(4) pp 1303-1307)
Tools Developed for Large Scale Bio-Chemical Data Minning
OSDD virtual resources
Cheminformatics
Curated molecule datasets
Cheminformatics Models
Data Mining and Analysis
HT Virtual screening
PubChem
ChEMBL
DrugBank
Experimental Assays
Community of About 400
Other Active Communities: •OSDD Women Scientists Forum •OSDD Junior Scientists Forum
Ideal Case US-India Cheminformatics Collaboration
IU CCRG
Research
Education Industry partnerships OSDD
Wet lab research
Open cheminfo.
group
Many interested students
Funding for research in U.S.
$1.3m NIH
$360,000 Eli Lilly $120,000 Pfizer
Funding for research in
osdd
$46m Govt
$0
But in order to sustain…?
Most of the biologists and chemists do not use computational workflows for their analysis
Awareness about the advantages of using such workflow engines
The Community needs to be trained for using the workflows
The Community needs to be trained for integrating applications
Web services vs standalone applications – each have their own set of advantages and limitations
Developers of algorithms should be encouraged to report results in globally accepted standard formats with standard ontologies
What should be our approach to reach out and integrate?
Assembly line for drug discovery
I Biological Repository
i. Open access clinical strains repository ii. Open access clone repository iii. Open access protein repository
II Chemical Repository i. Open access small molecule repository
III Open Screening Facility
I. Submit your compounds for anti-tuberculosis screening
OSDD Open Access Resources
Inhibition of FAAL and FACL enzymes by acyl-sulfamoyl
analogues
O O
NNO
CF3
s12
s14 s15
Preclinical development of thiophene containing
trisubstituted methanes
• Five synthetic ‘thiophene containing trisubstituted methanes’, which showed a MIC of <1.56 µg/ml, no cytotoxicity in mammalian cells being synthesised in PPP Mode
Public Private Partnerships as Open Collaborative Endeavors to solve Scientific Challenges
Collaboration with TB Alliance on Human Clinical Trials
PA-824 in combination with other drugs
Affordable Healthcare for All
Systems Biology
Target based
approach Human Clinical Trials
Hit to Lead Ligand based
approach
An Innovative Approach to Drug Discovery: A New Paradigm
Valu
e
Biology/ Genomics
Target Identification
Target Validation
Hit(s)
Validated/ Quality Lead
Optimised Candidate Drug
Clinical Trials
Registered Drug
Risk
High Risk, Innovation Driven Sphere Strategy-> Open Innovation with best minds from academia/ industry
Process Oriented – Strategy-> Industry CRO’s Participation
Strategy-> OSDD to support clinical trials in collaboration with pharma
Innovation Funnel
Drugs to be available without IP encumbrances
Major International Collaborations
Cheminformatics and e-learning
Structural Interactome to predict Off-Site Interactions of Drug Candidates
Metabolic Map Network Generation
Author, Angela Saini
Geek Nation: How Indian Science Is Taking Over The World
http://www.sunday-guardian.com/bookbeat/tour-of-indian-science-that-fails-to-see-full-picture
Target Validation
PPI Validation
Cloning of potential drug targets
Galaxy Integration with Grid
Some of the OSDD PIs
Mtb Systems Biology
Mtb Genome Analysis
OSDDChem
Email: [email protected] Skype: anshu.bhardwaj
Cheminformatics Community + E-learning
OSDD : A Global Community - More than 5500 members from over 130 countries
Statistics as of March 2012
Open Source Drug Discovery (OSDD) Model “Team India Consortium with International Participation”
Council of Scientific and Industrial Research (CSIR), India
Current Partners
Mycobacterium tuberculosis
Wiki Portal
Exchange of Ideas/Results Community Participation
Lead Molecules Drug
Contract Research Organisations
Academia & Hospitals
Open Synthesis and Exchange
of Knowledge
PRECLINICAL & CLINICAL TRIAL
Candidate Targets
in silico SCREENING
in vivo VALIDATION
Lead Organization
Together we can … .. and we should !
Matt Smadley | Flickr.com
http://www.osdd.net http://c2d.osdd.net
http://sysborg2.osdd.net
Email: [email protected] [email protected] [email protected] Skype: anshu.bhardwaj
http://scienceopenscience.blogspot.com/2011/12/osdd-song.html