drug design / drug discovery jerome baudry assistant professor bcmb ut/ornl center for molecular...
TRANSCRIPT
Drug Design / drug discovery
Jerome BaudryAssistant Professor BCMB
UT/ORNL Center for Molecular Biophysics
2 previous incarnations:
Research faculty at UIUCResearch scientist at Transtech Pharma, Inc.
Drug Design / drug discovery
What’s a drug? A substance that treats/cure a disease.A small molecule that interacts with a target, (often protein involved in the disease process; activator/inhibitor)
Drug discovery:The process of finding such a small molecule – combination of approaches
Drug discovery or drug design? In principle: “Design” is more rational and targeted, and “discovery” is more serendipitous. But design and discovery share a lot and are ~ synonymous in a pharmaceutical context.
Hoopkins, Groom, Nat Rev Drug Discov. 2002 1(9):727-30. 5% of human genome is “druggeable”
Gigantic economic importance: 10 years & $200 to $1,900 million to develop a drug
25 new molecules /year
Intense scientific activity: very interdisciplinary approach
> $340 billion
Drug discovery market
in millions US$ Revenue R&D income
Johnson & Johnson 53,324 7,125 11,053
Pfizer 48,371 7,599 19,337
GlaxoSmithKline 42,813 6,373 10,135
Novartis 37,020 5,349 7,202
Sanofi-Aventis 35,645 5,565 5,033
Hoffmann–La Roche 33,547 5,258 7,318
AstraZeneca 26,475 3,902 6,063
Merck & Co. 22,636 4,783 4,434
Abbott Laboratories 22,476 2,255 1,717
Wyeth 20,351 3,109 4,197
http://en.wikipedia.org/wiki/List_of_pharmaceutical_companies
Chemistry: synthesis
Discovery and design (hit/lead/optimisation)
Biology: assay (binding/activity; in vitro / in vivo,)
Target identification
The drug discovery and design workflow:
drug development:Pharmacology / testing
The long and winding road to drug discovery
Computational chemistry /Molecular modeling
useful across the pipeline, but
very different techniques
aim for success,but if not:
fail early, fail cheap
Structure-basedknow receptor,don’t known ligands
Two pathways to drug discovery / drug seign
?What will be happy in there?
Structure-baseddon’t know receptor,known ligands
Protein/ligand interactionsstructure/biophysicsdocking
Statistical analysis of what group(s) are important for biological activity
structure modeling(homology/experimental X-ray/NMR/neutron)Get a structure
high-throughput docking/screeningGet a “hit” (anything at all)
Structure-based approachesUse knowledge of structure to find something that 1) binds, and 2) does the desired biological activity
focused library dockingfragment-based growth
‘individual’ molecules simulations
Structure-based library screening
What do we need:
1) Compounds libraries2) Protein target3) Binding site in the protein4) Docking: generate different (many) possible conformations of the compounds in the binding site5) Scoring: evaluate the strength of the protein/ligand interactions (score).6) Select preferred ligands to propose a list of prioritized compounds for experimental screening.
Best case scenario, a high-quality experimental structure exists:PDB: http://www.rcsb.org/pdb/- experimental collection of (49 295) structures, ~18 000 non-redundant sequences- X-Ray & NMR,- nucleic acids, proteins, carbohydrates
Structure-based approachesStructure modeling
that’s ~1% of the 5.5 million protein sequences in swissprot (http://www.ebi.ac.uk/swissprot/sptr_stats/index.html)
and < ~0.00007% of earth’s proteins, (5E6 organisms, 5K genes/genome, low-end estimate.)
~50,000 non-redundant protein structures in the PDB: is that a lot?
Structure-based drug discovery = “Post genomics challenge”:structural biology, functional genomics, chemical biology…
…AQRTEVYTYRRS…proteinsequence protein
structure
Must do for new pharmaceutical target
(homology, ab-initio folding…)
Structure-based approachesStructure modeling
Structure-based approachesStructure modeling
If no available experimental structure – work on that , and in the meantime: Homology modeling: use structure of close (sequence-wise) proteins to build, by analogy, a new protein.
R1
R2
R4N
N O
O
R3
http://blaster.docking.org/zinc/
Databases of compounds- vendors- literature- corporate/laboratory - virtual compounds-A priori anything, but we can be smarter than that
http://nihroadmap.nih.gov/molecularlibraries/
F
O
N
Library designed against protein target, - based on hits from previous database screening
Millions of cmpds’ structures are available from public databases.Major NIH effort to fund & develop libraries:
moreexploratory
morefocused
Structure-based approachesCompound selection
outside inside deleted
When site is not known, eraser/flooding techniques
binding site (3D)
Or…make your life easier and build the site around a co-crystallized ligandIf available…
Locate cavities in a protein
Structure-based approachesBinding Site
save
HIGH-THROUGHPUT OR LOW-THROUGHPUT ?fast (initial) accurate (on best cmpds from initial)
Choices based on the desired throughputfrom 10 seconds to 10 minutes / compound
650,000 cmpds library, on 10 processors: from 3 days to 6 months
Most time-consuming part (by far)
YES
NO
OK
BETTER
Structure-based approachesdocking
Scoring functions. Quantify the energy of protein/ligand interactions such as: hydrogen bondelectrostaticsvan der Waalshydrophobic etc …
Several scoring functions exist, more/less specialized, fast etc…
PROTEIN
LIGAND
Structure-based approachesscoring
scoring functions:
Force-field based: (CHARMM, AMBER etc). MMFF: very popular one because of “modular parametrisation”: easy to derive parameters from functional groups, well adapted to organic molecules.
Physically ‘accurate’ but slow, parametrisation issues.
Empirical – count the number of interactions and assign a score based on the # of occurrences. E.g. :H-bonds, ionic interactions (easy because very directional and well quantified)
Hydrophobic interactions (more difficult to assess and quantify)Number of rotatable bonds frozen (link to entropic cost of binding, quite difficult to estimate)
Knowledge-based – observe known protein/ligand structures, and favor interactions and geometries that are seen often. Idea: directly link to free energy because “real life” distribution (potential of mean force).
But: based on small # of entries.
Intense competition “my scoring function is better than yours”
Future: force-field based / even QM-basedDifferent approaches depending on size
Structure-based approachesscoring
Enrichment factor = (5/30) / (30/ 1000000) = 166 HUGE SUCCESS
Often: consensus scoring: choose the few molecules that are ranked consistently well among many docking function
1,000,000 molecules, 30 actives. 1000 selected, 5 actives
Enrichment factor = (3/1000) / (30/1,000,000) = 100 HUGE SUCCESS
1,000,000 molecules, 30 actives. 1000 selected, 3 actives
Structure-based approachesscoring
R1
R2
R4N
N O
O
R3
F
O
N
COMPUTATIONAL DOCKING: GENERATE TESTABLE IDEAS
Chemistry: synthesis
Discovery and design (hit/lead/optimisation)
Biology: assay (binding/activity; in vitro / in vivo,)
Possible to start next round of iteration (or do ‘traditional’ modeling). Redock with improved accuracy (e.g QMMM)
Reproduce know xtal structure HIV protease and inhibitor
Examples (low-throughput)Works great … in most publications
crystal structurefirst round of docking (shape only)final result (after rigid-body minimizations: energetics taken into account)
Ligand-based site Flood-based site
Venkatachalam, et al.; J. Mol. Graph. Model. 2003, 289-307
But also… fails miserably (rarely in publications !)
crystal structurefinal results (rigid-body minimizations)Illustrate issues with binding site’s shape (there are workarounds)
Examples (low-throughput)
Venkatachalam, et al.; J. Mol. Graph. Model. 2003, 289-307
Ke et al, Archives of Biochemistry and Biophysics 436 (2005) 110–120
Example II): discovery of ligand/function for a new P450
Development of a database of bio and agrochemical compounds of relevance for P450 (currently ~ 14,000 structures). In-house compounds, KEGG database: (http://www.genome.jp/kegg/ligand.html), Compendium of Pesticide Common Names: (http://www.alanwood.net/pesticides/index.html).
Development of CYP120A1 model from CYP107A template (23.6% identity)
HT-docking (LigandFit). identify 99 compounds consistently predicted to be good binders. Confirmed: retinoic acid
~14,000 structures
Ke et al.. Arch. Biochem. Biophys. 2005
high-throughput dockingGet a “hit” (anything at all)
CONCLUSIONS
In-silico combinatorial library design & structure-based screening:fast, efficient and inexpensive tool to :- discover new possible ligands against a macromolecular target- test library design ideas- identify most promising scaffolds and R groups prior to synthesis
Baudry, J.; Hergenrother, P. J. "Structure-based Design and In-Silico Virtual Screening of Combinatorial Libraries. A Combined Chemical/Computational Laboratory Assignment" J. Chem. Ed. 2005, 82, 890-894. http://www.scs.uiuc.edu/~phgroup/pdfs/2005PJHchemed.pdf
HT-DOCKING SUCCESS IF:
i) FIND A FEW MOLECULES OF INTERESTii) MUCH QUICKER AND CHEAPER THAN “real” screening
Comparison model / crystal structure
residues within 4 Å of heme
Green/blue: model, red/orange:crystal
Residues around the ligand’s -ionone ring are very close in both structures
(phe182 & Trp76 same pharmacophore)
Green/blue: model, red/orange:crystal
Comparison model / crystal structure
De novo designFragment-based “inside-out” approach
Put functional groups in binding site (docking or manually, or combination)
Link these groups (docking or manual, or combination): *must* be able to synthesize it – no molecular monsters
Caflish, Miranker, Karplus J .Med.Chem. 36, 2142-2167 (1993) Eisen, Wiley, Karplus, Hubbard Proteins Structure, Function and Genetics 19, 199-221 (1994).
i)dock functional groupsii)keep low energy groupslink with scaffolds
iii) correct binding site, but ≠ too;“lead hopping”