vanderwall cheminformatics drexel part 1

25
Cheminformatics & The evolving relationship between data in the public domain & pharma Dana Vanderwall Cheminformatics, Bristol-Myers Squibb

Upload: jean-claude-bradley

Post on 22-Dec-2014

1.058 views

Category:

Documents


2 download

DESCRIPTION

Dana Vanderwall, Associate Director of Cheminformatics at Bristol-Myers Squibb, presented at Drexel University for Jean-Claude Bradley's Chemical Information Retrieval class on December 2, 2010. This first part covers "Cheminformatics & The evolving relationship between data in the public domain & pharma" and includes a general discussion of modern drug discovery and the details of a malaria dataset recently released from the pharmaceutical industry to the public.

TRANSCRIPT

Page 1: Vanderwall cheminformatics Drexel Part 1

Cheminformatics & The evolving relationship

between data in the public domain & pharma

Dana VanderwallCheminformatics, Bristol-Myers Squibb

Page 2: Vanderwall cheminformatics Drexel Part 1

How do we start to find a new chemical that might be the next drug?

Typically- Need a specific protein to target that we think we can use to fix the problem that causes the disease Caveat: emerging trends (&what’s old is new again)

Need to design experiments that test for chemicals that can fit that protein (lock & key)

Thousands to >2 million chemicals are tested with that protein to look for a starting point

Page 3: Vanderwall cheminformatics Drexel Part 1

This is where drug discovery gets really modern Highly automated robots and infromatics can do work that used to

take years in 1 week

Page 4: Vanderwall cheminformatics Drexel Part 1

Compound optimization Compounds are optimized for many parameters including

Potency, selectivity, oral bioavailability, safety

1-3 years

>2 million compounds tested in primary assay

Make another 2-10,000

Page 5: Vanderwall cheminformatics Drexel Part 1

Getting ready for the clinic All compounds are tested for safety in animals Need to prove we can give enough to get the positive benefit without side effects We have to be able to make it on a scale and form suitable for dosing in the clinic

The early stages need milligrams or grams (tablespoons) To start testing in humans requires many kilograms of very, very pure material

1-3 years

2-3 years

Page 6: Vanderwall cheminformatics Drexel Part 1

Profiling Assays and Lead Op Progression

2M

2K

200

1-10

1 2-5 10-50 300-600

HTS

Hit Triage

Early Lead Op

Late Lead Op

Target to Hit Hit to Lead Lead to Candidate

# C

om

po

un

ds

# Assays

Page 7: Vanderwall cheminformatics Drexel Part 1

Chemical Structures are the Intellectual Property The targets exist in nature- chemical structures are the

unique component that pharma & biotech can bring to the table

(Biologicals are increasing in importance) As such, the structures, and their biological activity, are

extremely sensitive

Captured in the patents filedNever disclosed until protectedEven similarity/sub-structure searches on public

sites are treated cautiously

Page 8: Vanderwall cheminformatics Drexel Part 1

GlaxoSmithKline moves to stimulate public-private partnerships for R&D in neglected tropical diseases

http://www.gsk.com/responsibility/access/rnd-neglected-tropical-diseases.htm

GSK launched the open lab at Tres Cantos as one way in which to share our expertise and seek to stimulate open innovation in drug discovery into diseases of the developing world 60 slots for scientists Access to screening facility & TC staff scientists to support

collaborations $5M GBP facilities expansion

Committed to sharing data & IP on GSK research in DDW Starting with recently generated novel anti-malarial hits

Page 9: Vanderwall cheminformatics Drexel Part 1

Malaria

Mosquito-borne infectious disease, caused by the plasmodium parasite

250 M cases/annum, 1-3 M deaths

Variety of drugs available, but resistance is a constant problem

http://www.mcwhealthcare.com/malaria_drugs_medicines/life_cycle_of_plasmodium.htm http://www.mcwhealthcare.com/malaria_drugs_medicines/life_cycle_of_plasmodium.htm

Page 10: Vanderwall cheminformatics Drexel Part 1

The assumption is:

One target One consequence

The Complexity of Cell Biology

Target

In reality:This target is one component of a complicated biochemical network.

• A selective probe may influence many pathways.

• Probes can interact with multiple targets.

• Network interactions can be redundant.

• Biological effects are often a consequence of interaction with multiple targets.

Target

Page 11: Vanderwall cheminformatics Drexel Part 1

Emerging paradigm- look for the cellular activity first Advances in cell biology & the HTS

platforms are enabling HTS screening for a cellular phenotype

Start with something that works in a cellular model for disease phenotype (a.k.a. black box), then figure out how it worksTarget deconvolution

Page 12: Vanderwall cheminformatics Drexel Part 1

Supporting black-box HTS for anti-malarials 2M compound GSK HTS collection screened @ 2M vs. P.

falciparum (3D7) infected human erythrocytes

12 mos. Screening in biohazard labAvg. z’ = 0.7

19,451 primary hits; inh. parasite growth >80%; 13,533 confirmed in via retests 1,982 showed cytotox in HepG2s @10M None active in cell background control

8,000 also active against DD2 (multi-drug resistant strain) >50%

F-J Gamo et al. Nature 465, 305-310 (2010) doi:10.1038/nature09107

Page 13: Vanderwall cheminformatics Drexel Part 1

Characterizing the hits Clustering was used to help characterize chemical space

416 “molecular frameworks” Bemis & Murcko J. Med. Chem. 39 2887 (1996)

O

O

HN

N

N

857 clusters/1978 singletons by Daylight FP/Tanimoto (.85)

O

HN

HN

FCl

N

N

HN

NH

O HN

O

OH

N

NN

N

N

H2N

O

O

O

N

HN

N

H2N

HN

O

Page 14: Vanderwall cheminformatics Drexel Part 1

F-J Gamo et al. Nature 465, 305-310 (2010) doi:10.1038/nature09107

Three-dimensional plot of some of the novel chemical diversity present in TCAMS

Page 15: Vanderwall cheminformatics Drexel Part 1

Characterizing the hits

Compounds with an abnormally high frequency of activity across HTS campaigns were filtered out

Excluded where IFI=5% where tested in >100 HTS to 20% where tested in >25 HTS (~1800 cmpds.)

~70 compounds that clustered with know anti-malarials

How are these rest of these compounds working???

100screens HTS ofnumber total

50% Inh. % wherescreens HTS ofnumber IndexFrequency Inhibition

Page 16: Vanderwall cheminformatics Drexel Part 1

F

HN N

NH

NN

Can we leverage the historical target data on compounds?

Target assays Clear relationship between interactions and

measurements, but what does it mean biologically?

Can we use the data to figure out which targets lead to which biological

response?

kinase_1 kinase_2 kinase_3 kinase_4

7TM_1 7TM_2

NR_3NR_2NR_1

??

stimulant

readout

Phenotypic assays Clear biological result associated with

readout, but from which interaction(s)?

Page 17: Vanderwall cheminformatics Drexel Part 1

Can we leverage the historical target data on compounds?

Find all target assay data for compounds tested in anti-malarial screen Aggregate at the target-result type level (max pIC50/pEC50)

Of the 2M tested, 130K had some associated target assay data Incl. 3,435 of the 13,500 ‘actives’ “Hits”* at 413 targets

*pIC50 >7.0 for antag/inh/blocker *pEC50 >6.5 for ag/activation/opener

Given that some targets are screened in 2-3 modes, >650 target-result type combinations

Surely not all 400 targets are significant Data very sparse, avg. ~2 pXC50s per compound that

had data

Page 18: Vanderwall cheminformatics Drexel Part 1

Finding targets ‘enriched’ among the anti-malarials An ‘enrichment’ was calculated for each possible target-result type

combination Are compounds active at target X more prevalent amongst the compounds that

inhibited P. falciparum, or equally distributed across all screened compounds?

For each target –result type, calculate:

@target 0pIC50/pEC5 measuredset with screening entire from compounds ofnumber the

@target thresholdactivity set with screening entire from compounds ofnumber the

@target 0pIC50/pEC5 measured a with hits alantimalari ofnumber the

@target thresholdactivity with hits alantimalari ofnumber the

where

compounds screened allin actives target all

hits among activestarget factor Enrichment

N

X

n

x

NX

nx

Page 19: Vanderwall cheminformatics Drexel Part 1

Narrowing down the possible candidates

~140 targets @ ≥2 fold enrichment ~50 with homologues in P. falciparum

400 targets >2 fold

enrichment>2 fold

enrichment

F-J Gamo et al. Nature 465, 305-310 (2010) doi:10.1038/nature09107

Page 20: Vanderwall cheminformatics Drexel Part 1

Targets with homologues in P. falciparum genomeAspartic protease Methionyl-tRNA synthetase

b-Ketoacid reductase Phenylalanyl-tRNA synthetase

Calcium/calmodulin-dependent kinase

Phosphatidylinositol 3-kinase

Cysteine protease Plasmodium electron transport chain

Dihydrofolate reductase Ribosome

Dihydroorotate dehydrogenase

Ser/Thr protein kinase

DNA gyrase Tyrosyl-tRNA synthetase

Isoleucyl-tRNA synthetase

Page 21: Vanderwall cheminformatics Drexel Part 1

Targets with NO homologues in P. falciparum genomeGPCR: Adrenergic antag Nuclear Receptor ag/antag

GPCR: Cannabanoid antag Ion Channel inh

GPCR: Chemokine antag Phospholipse inh

GPCR: Cholinergic ag Lipid amide hydrolase inh

GPCR: Free Fatty Acid ag Serine protease inh

GPCR: Serotonin ag/antag Toll-like receptor ag

GPCR: Opiod ag/antag

GPCR: Peptide hormone receptor ag/antag

Page 22: Vanderwall cheminformatics Drexel Part 1

Data publicly available All chemical structures and exp. data for compounds

available@http://www.ebi.ac.uk/chemblntd

EXT_CMPD_NUMBER

SMILES

Percentage_inhibition_3D7

Percentage_inhibition_DD2

Percentage_inhibition_3D7_PFLDH

XC50_MOD_3D7

XC50_3D7 (µM)

Percentage_inhibition_HEPG2

Chemical cluster Nr

IFI

Graph_Frame_Cluster

Target_Hypothesis

P. falciparum locus

Commercial Supplier_Reference

Additional information & interest in additional collaborations contact:

[email protected]

Page 23: Vanderwall cheminformatics Drexel Part 1

And the raw target data used to develop hypotheses? That was trickier Release the list of 400 targets & all the inactive

compounds would Reveal our whole compound collection All the targets in the current (and past) portfolio

Needed some level of validation for analysis to publish

Page 24: Vanderwall cheminformatics Drexel Part 1

Surrogates for internal data

Chemical structures associated with a particular target hypothesis were used as ‘bait’ to find published structures & data that validate proposed MOA for each chemotype Similarity & SSS in Aureus DBs & SciFinder Exemplars and their similarity to original hits

published in Suppl. Material with reference We often found our own compounds and data in J

Med Chem and Patent literature.

Page 25: Vanderwall cheminformatics Drexel Part 1

AcknowledgementsAnti-malarial HTS

Tres Cantos Medicines Development Campus, Tres Cantos Spain

Medicines Research Centre, Stevenage, UK

Darren VS Green

Collegeville & King or Prussia, PA, USA

Vinod Kumar Samiul Hasan James Brown Catherine Peishoff Lon Cardon

Francisco-Javier Gamo Laura Sanz Jaume Vidal Cristina de Cozar Emilio Alvarez Jose-Luis Lavandera Jose Garcia-Bustos