combining cheminformatics, diverse databases and logic ... · pdf filewhere are the best...

20
Combining Cheminformatics, Diverse Databases and Logic Based Pathway Analysis For TB Sean Ekins Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA. NIAID Systems Biology Approaches for Tuberculosis Workshop

Upload: dangtu

Post on 11-Feb-2018

219 views

Category:

Documents


1 download

TRANSCRIPT

Combining Cheminformatics, Diverse Databases and

Logic Based Pathway Analysis For TB

Sean Ekins

Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA.

NIAID Systems Biology Approaches for Tuberculosis Workshop

Partial Map of North America , 1765, T.

Kitchin

The cellular overview diagram for M.

tuberculosis H37Rv, from the TBCyc

database

(http://tbcyc.tbdb.org/index.shtml)

Maps as a starting point for further exploration

Pathways inform little on chemistry

Where are the best targets for drugs?

How can we overcome resistance?

What can we add to improve maps?

When I think of systems biology it is an integrated map of our knowledge on a topic

Limited by what we know

~ 25 public datasets

for TB

Including GSK,

Novartis data on TB

hits

>300,000 cpds

Patents, Papers

Annotated by CDD

Open to browse http://www.collaborativedrug.com/register

Molecules with activity

against

Public molecule datasets tell you little about biology

CDD

Literature data on

molecules

and their targets

Similarity search with a

mimic enables target

fishing

SRI

Pathway data (targets)

Species differences in

pathways

Where to intervene

Combine the knowledge

Select new targets

Take mimic strategy

What if you combine pathways and molecules - can this help in drug discovery?

Developed API

Enables connection to

other tools e.g.

PipelinePilot, Knime

CDD no longer an island

TB molecules and target information database connects molecule, gene,

pathway and literature – then use to identify targets for mimic strategy

Take substrate or metabolite and generate 3D conformers and build a pharmacophore

Use the pharmacophore to search vendor libraries in 3D

Buy and test compounds

dethiobiotin

Two Proposed Mimics of D-fructose 1,6 bisphosphate in less than 6 months

DFP000133SC MIC 40μg/ml

DFP000134SC MIC 20μg/ml

Computationally searched >80,000 molecules – narrowed to 842 hits -tested

23 compounds in vitro (3 picked as inactives), lead to 2 proposed mimics

Sarker et al., Pharm Res, 29(8):2115-27, 2012

a.

b.

Continuation in phase II on a bigger scale, validation of hits ongoing…

Pathway analysis

Binding site similarity to

Mtb proteins

Bayesian Models - ligand

similarity

Docking

Predicting the target/s for small molecules

Is there a simpler way? What about molecule similarity?

iPhone

TB Mobile: A free app for iPhone and Android

Each molecule can be copied to the clipboard then opened with other

apps, exported via Twitter or email or shared via Dropbox

The 1st app for TB.

- combines chemistry and

bioinformatics – and can be used

to predict potential targets for

new compounds

Ekins et al., Cheminformatics 5:13, 2013

Molecules active against Mtb evaluated in TB Mobile app

Ekins et al., Cheminformatics 5:13, 2013

Blue = 745 molecules with

known targets in Mtb

Molecules active against Mtb are only sampling part of target space and

metabolome – based on simple properties

Yellow = 177 GSK in vitro actives Yellow = 1429 in vitro active

and non cytotoxic hits from

SRI

Yellow = 338 molecules tested in mouse

3 PCs explain 83% of variance 3 PCs, which explain 88 % of variance

3PCs explain 86% of variance

In vitro data is more

localized with partial

coverage of target space

Historic in vivo data in

mouse covers target space

Yellow Mtb metabolome from TBcyc

3PCs explain 89% of variance

Phenotypic screening HTS Hit rates

SRI papers

Become more stringent in what we call an ACTIVE for

models

IC90 < 10 mg/ml (CB2) or <10mM (MLSMR) and a selectivity

index (SI) greater than ten.

SI was calculated as SI = CC50/IC90 where CC50 is the

concentration that resulted in 50% inhibition of Vero cells

(CC50).

Literature usually < 1%

Top scoring molecules

assayed for

Mtb growth inhibition

Mtb screening

molecule

database/s

High-throughput

phenotypic

Mtb screening

Descriptors + Bioactivity (+Cytotoxicity)

Bayesian Machine Learning classification Mtb Model

Molecule Database

(e.g. GSK malaria

actives)

virtually scored

using Bayesian Models

New bioactivity data

may enhance models

Identify in vitro hits and test models

Increased hit/lead discovery efficiency

NH

S

N

Dual-Event Bayesian Models for whole cell Mtb activity

Could use

any machine

learning

methods

Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011. Ekins et al.,Chem Biol 20, 370–378, 2013

5 active compounds vs Mtb in a few months

7 tested, 5 active (70% hit rate)

Ekins et al.,Chem Biol

20, 370–378, 2013

1. Virtually screen

13,533-member GSK

antimalarial hit library

2. Model = SRI

TAACF-CB dose

response +

cytotoxicity model

3. Top 46

commercially

available compounds

visually inspected

4. 7 compounds

chosen for Mtb testing

based on

- drug-likeness

- chemotype diversity

GSK # Bayesian

Score Chemical Structure

Mtb H37Rv MIC

(mg/mL)

GSK Reported

% Inhibition HepG2 @ 10 mM cmpd

TCMDC-123868 5.73 >32 40

TCMDC-125802 5.63 0.0625

5

TCMDC-124192 5.27 2.0 4

TCMDC-124334 5.20 2.0 4

TCMDC-123856 5.09 1.0 83

TCMDC-123640 4.66 >32 10

TCMDC-124922 4.55 1.0 9

Bayesian Model Follow-up: do we have a lead?

• BAS00521003/ TCMDC-125802 reported to be a P.

falciparum lactate dehydrogenase inhibitor

• Only one report (that we were unaware of when picking the

compound) of antitubercular activity from 1969

- solid agar MIC = 1 mg/mL (“wild strain”)

- “no activity” in mouse model up to 400 mg/kg

- however, activity was solely judged by

extension of survival!

Bruhin, H. et al., J. Pharm. Pharmac. 1969, 21, 423-433.

SRI MLSMR 220K library contains:

107 hits with this substructure

- 3 nitrofuryl hydrazones

- 10 furyl hydrazones

- 19 nitrophenyl hydrazones

32 inactives with this substructure

Maddry et al., Tuberculosis 2009, 89, 354.

MIC of 0.0625 ug/mL

• 64X MIC affords 6 logs of

kill

• Resistance and/or drug

instability beyond 14 d

Vero cells : CC50 = 4.0

mg/mL

Selectivity Index SI =

CC50/MICMtb = 16 – 64

In mouse no toxicity but also

no efficacy in GKO model –

probably metabolized.

To be continued….

Ekins et al.,Chem Biol 20, 370–378, 2013

Filling out the triazine matrix using SARtable: A new kind of map

Green = good activity, Red = bad; colored dots are predictions

A summary of some of the numbers involved – filtering for hits.

>100,000 molecules screened through Bayesian models

~700 molecules were tested in vitro

150 actives were identified

>20 % hit rate Identified several novel potent hit series with good cytotoxicity & selectivity

Identified known human kinase inhibitors and FDA approved drugs as hits

Also taken this approach with another institute’s data and had 22.9% hit rate

Ekins et al., PLOSONE 2013 May 7;8(5):e63240; Ekins et al.,Chem Biol 20, 370–378, 2013

1924 new compounds used as external test set for all 3 dual-event models

Mtb Models (training set N) ROC ARRA dose response and

cytotoxicity (1924)

MLSMR dose response and

cytotoxicity (2273) 0.82

TAACF-CB2 dose response and

cytotoxicity (1783) 0.54

TAACF Kinase dose response and

cytotoxicity (1248) 0.74

Ideal ROC = 1

suggests value of using multiple models, multiple algorithms and validation

Continuous testing of in vitro models, fusing datasets and using

different machine learning methods

Ekins et al., Pharm Res, In press 2013; Ekins et al., Submitted 2013

Mtb Models (training set N) ROC ARRA dose response and cytotoxicity (1924)

SVM - Combined 0.72 (Binary data)

Random Forest - Combined 0.83 (probability data)

Random Forest - Combined 0.75 (Binary data)

Bayesian - Combined 0.83 (Bayesian score)

Bayesian - Combined 0.69 (Binary data)

Summary & Questions

Why can’t we predict compounds active in vivo?

Can we go straight to the mouse based on models, what is stopping us?

Can combining all the in vitro, in vivo, pathway structural data etc help us find better

compounds, faster ?

To date we have only sampled a part of the known target space, with in vitro data (?)

We have only sampled a fraction of the metabolome space (?)

We have only sampled a miniscule fraction of the chemistry space (?)

“We shall not cease from exploration and the end of all our exploring

will be to arrive where we started... and know the place for the first

time.”

T.S. Elliot

Acknowledgments

STTR: Malabika Sarker, Carolyn Talcott, Peter Madrid, Sidharth Chopra,

Barry A. Bunin, Gyanu Lamichhane, Joel Freundlich, Alex Clark

SBIR: Joel Freundlich, Robert C. Reynolds, Hiyun Kim, Mi-Sun Koo,

Marilyn Ekonomidis, Meliza Talaue, Steve D. Paget, Lisa K. Woolhiser,

Anne J. Lenaerts, Barry A. Bunin, Nancy Connell, Baojie Wan, Scott G.

Franzblau, Allan Casey (IDRI)

Accelrys

Funding

2R42AI088893-02 “Identification of novel therapeutics for tuberculosis combining

cheminformatics, diverse databases and logic based pathway analysis” from the National

Institute of Allergy And Infectious Diseases. (PI: S. Ekins)

R43 LM011152-01 “Biocomputation across distributed private datasets to enhance drug

discovery” from the National Library of Medicine (PI: S. Ekins)

The CDD TB has been developed thanks to funding from the Bill and Melinda Gates

Foundation (Grant#49852 “Collaborative drug discovery for TB through a novel database of

SAR data optimized to promote data archiving and sharing”).

http://goo.gl/vPOKS http://goo.gl/iDJFR