target based high throughput screening and lead designing...

11
Indian Journal of Chemistry Vol. 45A, January 2006, pp. 163-173 Target based high throughput screening and lead designing in pharmaceutical drug industry Indira Ghosh Institute of Bioinformatics & Biotechnology, University of Pune, Ganeshkhind, Pune 411 007, India Email: [email protected] Received 24 February 2005; revised 29 November 2005 Recent advancement in the field of chemoinformatics and computer-aided drug design has aided the pharmaceutical industry to develop more compounds in less time in the pipeline of genome based drug design for specific disease. Unravelling of genome sequence of several species and the understanding of disease has helped the biologist to have several proteins and enzyme as drug targets, which has made the medicinal chemist to take up the challenge of designing of several class of compounds which could possess the characteristics of drug-like properties. Some of the techniques which have made it possible, are discussed here with a preference towards the industry's approach to the drug designing. The current paradigm of drug discovery has become a game of spinning large numbers. High Throughput screening (HTS), Combinatorial Chemistry, Compound Library, Database Management, are the main building blocks of this paradigm. All of these require efficient handling of a very large number of data points, where structure and actiVIty are interrelated. The prediction of bioactive candidate molecules requires the investigation of a diverse set of molecules, their substituent groups and drug scaffolds as well as critical data related to each entity and the processing of the whole information through some sophisticated software. Based on the optimization using biological aCtiVIty and physico-chemical properties, new molecules are predicted which are likely to have better overall actlVlty. Likely candidates are then tested in a variety of experimental settings to validate the structure-activity relationship. Challenges in identifying the "appropriate" hit are due to the fact that the definition of the pharmacophore cannot be generic as it will depend on the class of proteins. In addition, all the interactions (favourable or not) may not be deterrent to the bioactivity of the macromolecule (i.e., false positive). Similarly, very specific and weak interactions might be missed (false negative) due to the process of assai· The dominance of a gene based research in drug discovery means that companies are increasingly reliant on computers. The speed and the amount of sequencing are doubling each year. To handle the torrent of data emerging from the human genome project and other genome mapping projects, researchers need software tools that sequence genes, perform 3-D modeling of drug and disease targets. It is approximated that out of 30,000 human genes disease modified genes could be about 3000. If one uses the list of known druggable genomes (about -3000) and maps it with the knowledge available targets, then the intersections have at least 600-1500 proteins which could be used for designing inhibitors. Almost same number of target proteins will also be available from comparative genomics study between the host and pathogens sequences (in-house data). Studies to design specific inhibitor or class of inhibitors, which will ultimately be converted into drug for specific disease 2 , are also reported. Biostratum, a company located in North Carolina uses a supercomputer to do virtual screening. In this protocol, the computer is programmed with 3-D structural coordinates of the solved X-ray diffraction structure of MMP-2 (Ref. 3). The researchers then run program(s) that takes the structural coordinates of potential inhibitor compounds and determine which of these compounds relate to a designated structural region of the enzyme. The computer then repeats this for each of the compounds in the database. The supercomputer then arranges the compounds according to their calculated binding affinity. How accurate this virtual screening could be depends on the methods adopted for the screening and the probability of occurrences of the hits in the database screened, similar to the experimental HTS hit rate.

Upload: lephuc

Post on 06-Mar-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Indian Journal of Chemistry Vol. 45A, January 2006, pp. 163-173

Target based high throughput screening and lead designing in pharmaceutical drug industry

Indira Ghosh Institute of Bioinformatics & Biotechnology, University of Pune, Ganeshkhind, Pune 411 007, India

Email: [email protected]

Received 24 February 2005; revised 29 November 2005

Recent advancement in the field of chemoinformatics and computer-aided drug design has aided the pharmaceutical industry to develop more compounds in less time in the pipeline of genome based drug design for specific disease. Unravelling of genome sequence of several species and the understanding of disease has helped the biologist to have several proteins and enzyme as drug targets, which has made the medicinal chemist to take up the challenge of designing of several class of compounds which could possess the characteristics of drug-like properties. Some of the techniques which have made it possible, are discussed here with a preference towards the industry's approach to the drug designing.

The current paradigm of drug discovery has become a game of spinning large numbers. High Throughput screening (HTS), Combinatorial Chemistry, Compound Library, Database Management, are the main building blocks of this paradigm. All of these require efficient handling of a very large number of data points, where structure and actiVIty are interrelated. The prediction of bioactive candidate molecules requires the investigation of a diverse set of molecules, their substituent groups and drug scaffolds as well as critical data related to each entity and the processing of the whole information through some sophisticated software. Based on the optimization using biological aCtiVIty and physico-chemical properties, new molecules are predicted which are likely to have better overall actlVlty. Likely candidates are then tested in a variety of experimental settings to validate the structure-activity relationship.

Challenges in identifying the "appropriate" hit are due to the fact that the definition of the pharmacophore cannot be generic as it will depend on the class of proteins. In addition, all the interactions (favourable or not) may not be deterrent to the bioactivity of the macromolecule (i.e., false positive). Similarly, very specific and weak interactions might be missed (false negative) due to the process of assai·

The dominance of a gene based research in drug discovery means that companies are increasingly reliant on computers. The speed and the amount of sequencing are doubling each year. To handle the torrent of data emerging from the human genome

project and other genome mapping projects, researchers need software tools that sequence genes, perform 3-D modeling of drug and disease targets. It is approximated that out of 30,000 human genes disease modified genes could be about 3000. If one uses the list of known druggable genomes (about -3000) and maps it with the knowledge available targets, then the intersections have at least 600-1500 proteins which could be used for designing inhibitors. Almost same number of target proteins will also be available from comparative genomics study between the host and pathogens sequences (in-house data). Studies to design specific inhibitor or class of inhibitors, which will ultimately be converted into drug for specific disease2

, are also reported.

Biostratum, a company located in North Carolina uses a supercomputer to do virtual screening. In this protocol, the computer is programmed with 3-D structural coordinates of the solved X-ray diffraction structure of MMP-2 (Ref. 3). The researchers then run program(s) that takes the structural coordinates of potential inhibitor compounds and determine which of these compounds relate to a designated structural region of the enzyme. The computer then repeats this for each of the compounds in the database. The supercomputer then arranges the compounds according to their calculated binding affinity. How accurate this virtual screening could be depends on the methods adopted for the screening and the probability of occurrences of the hits in the database screened, similar to the experimental HTS hit rate.

164 INDIAN J CHEM. SEC A, JANUARY 2006

Cytoclonal Pharmaceutical led by John Pople uses a proprietary program called Quantum Core Technology (QCTt QCT bypasses the regular methods of rational drug design that rely on the activation of an enzyme after binding to the ligand . In this method very small molecules are created which react directly with the active center and do not require all the binding properties of a structure-based compound. The QCT technology can make use of a partial crystal structure by using homology models; the only requirement is to know the active center.

Methodology In a quest for chemical entities requiring biological

activity, mainly two types of approaches have been developed. The most popular method, named as 'rational designing ', not because of the lack of reason in the other methods but for having a direct relation and stepwise development of a medicinal chemist's approach, exploits the lock-and-key model"' for the ligand-receptor binding. The limitations6

, like conformational flexibilities for both ligand and receptor, binding of higher energy conformers7

,

influence of water and salt concentration on the active conformers8

, etc. have been attended to, so that as far as possible, an ill vitro model could be mimicked while designing the clrug. The other method is named as "reverse designing" which has evolved due to the introduction of High Throughput Screening and Combinatorial Chemistry. Tn this method, similar structurally or functionally (if possible) compounds'! are grouped together ancl searching 10 is clone usi ng the common biologically effective motif, defined as pharmacophore ll in the available (commercial or corporate) datahase. [n every step of the discovery process, computer-assisted drug designing gi ves an extra advantage in processing the data into relevant information for future analysis.

The application of computatilHwl chemistry and mathematics in drug discovery can not be summarized in one review, rather, one should be referred to as many as 100 books or soon articles published during last SO years.

The flow chart of both these processes along with the ill silico methods (in box) used commonly in drug designing proeess is as shown in Scheme I.

Results and Discussion i \ li mited effort has heen made here (0 cove r a i'ew

'llClhoLis. which 'law been :Idor ted !ly ji i"fercill

pharmaceutical companies to achieve the expected milestones in the drug discovery phase.

The methods that will be discussed here are: (i) Structure based design and generating leads; (ii) Small molecules designing & QSAR; (iii) Novel lead generation using Pharmacophore; and, (iv) Optimizing Leads using Combinatorial Chemistry and Virtual screening. The last two methods have been developed to assist the Combinatorial Chemistry and High Throughput Screening technology.

Structure based design and generating leads

In the field of pharmaceutical research, most pervasively used method for designing of compounds (in the projects like lead identification or lead optimization) is to use the active site of target.to dock the compound and then use different techniques like LUDI, DOCK, CAVITY, GROWMOL, GRID, LigFit, etc. to optimize the favorable interaction between the available interaction sites, i.e. the functional groups offered by the target

TARGET IDENTIFICATION

IITS ASSAY TARGET

~ U Structure based designing

X'raylNMR

IITS SCREENING 3D STRUCTURE

COMPOUND DATABASE n

~ Searcbing GENERATION OF small PHARMACOPHORE molecules

ij D Pharnl8cophorc

Iscaffold. based st.·archinl:

CLUSTERING SEARCIIiNG til! 110 \ '0

/« PRIORITISATI()N DATAIlASE DESIGN

LEAD IDENTIFICATION

BIO,\ V AILABILITY 1 "'" '" n """"~" ¥"" DMPK /

LEA\) OPTIMIZATION

~---~

I CombinatoriallibrsJ)' !

I generation & virtual sCI"ecnine /desie.ninl!

COMBINA rORIAL I QSARJQSPR 1 IlIO,ACTIVITY I

SYNTIIESIS

GHOSH: SCREENING AND LEAD DESIGNING IN PHARMACEUTICAL DRUG INDUSTRY 165

macromolecule!2. Using the rigorous methods such as the relative free energy calculation by molecular dynamics or Monte Carlo simulation!3, the final prediction is done. At this stage, the macromolecule, generally protein structures dissolved in water molecules are used to get the estimate of relative free energy of binding of different ligands and modified analogs or hits from HTS. The suggested modifications of the lead molecule are actually made by the chemists and their biological activities like ICso, Ki and MIC are measured. These experimental data are then used to modify the compounds generated using lead and several cycles of structure­based designing are performed. The advantage of this method is that the experiment and designing goes hand in hand and the success rate is high. One of the best examples is the success story of the Merck Research Laboratory in designing HIY -protease inhibitor!4. The first compound modeled L-685434 (IC50 = 0.25 nM, cell potency CIC95 = 400 nM), an early lead!5, was modified to have better pharmacokinetic properties to yield the compound L-689502 (IC50=0.45 nM; cell potency CIC95 = 12 nM)!6. But, due to the hepatotoxicity of this compound, there was a need for developing a new series of compounds. Using the existing knqwledge of this series, a semi-quantitative inhibitor model was developed by the Merck group which could predict the IC50 prior to synthesis!7 and lead to generation of a new series using the knowledge of Roche compounds!8. This resulted in designing the parent compound for the phase II clinical candidate, L-735524 (Ref. 19). In another case, the Dupont Merck group translated a peptido-mimetic compound into a bioavailable cyclic compound2o by using C2-

. 1 f 2! symmetnca nature 0 the enzyme , presence of a structural water molecule accepting H-bonds from the main chain amide of lle50 and lle50' and donation of H-bonds to the carbonyl oxygen flanking the transition state pseudopeptide molecule22. This example elaborates the power of combining tools like X-ray crystallography22, molecular modeling23, QSAR and 3D searching of databases using a pharmacophore modef4 in solving the problem in drug development24. A pharmacophore is the geometrical arrangement of features in a molecule that is necessary for activity!!. The searching is performed using sub-structural pattern using graph theory25; distance or 3D geometry based criteria26, and flexible queries using multiple conformers27

• This method has

not only provided the medicinal chemists a range of improved tools, it has also opened up a new concept in drug designing, using motif not known earlier derived from database searching.

Medicinal chemists involved in the HIV -protease inhibitors generated pharmacophore and identified a different set of motifs, pseudo symmetric hydroxy-2-pyrone, which led to several series of more potent inhibitors including cyclooctylpyranone sulfonamides28 and reached the phase I clinical trial.

SmaU molecules and QSAR In the field of pharmaceutical research and drug

development, Quantitative Structure Activity Relation has become the most efficient theoretical tool for the medicinal chemists. The initial concept that a standard biological response produced by any set of con­generic compounds (belong to same chemical scaffold) could be explained using three sets of descriptors yielding from their electronic, hydrophobic and steric characters, has been valid for the last 50 years. The importance of H-bond parameter29 has been added to improve the relationship with biological activity in some of the cases. Before the concerct of 3D QSAR was introduced by Cramer et al. 0, almost all papers in the field of QSAR till 1988, used these parameters, calculated by as many possible accurate ways as possible like, simple Taft'sEs, STERIMOL, 10gP, HOMO, LUMO (quantum chemical calculations), etc. It has been widely demonstrated that use of a very small set of compounds may not provide the information about all the biological space that one likes to cover but it may give very important clues for generating new compounds with bioactivity.

It is important to note that, in lead optimization phase, QSAR becomes the only interactive tool for the medicinal chemists, because of its simplicity and quick calculation time. Most of the methodologies are described in seminal books written by Hansch3!, Kubinyi32 and in a series of reviews33. The basic theory behind QSAR is that the biological activities are contributed by the linear additive effect of different physico-chemical structural parameters of chemical compounds as given by the following relations:

... (1)

166 INDIAN J CHEM, SEC A, JANUARY 2006

where i =compound number (maximum = M) and j = descriptors (total N), Xij is the fh.descriptor for the i1h

compound and Yj is the biological activity of the i1h

compound. Solving these equations, values of the Aij

are found out and used to predict the biological activity of an unknown compound.

The non-linear properties like transport into the cell have not been included in this concept but could be predicted by using different sets of non-linear methods33

. The simplest of all the methods is the Free Wilson analysis suggested in 1964, which provides a one-ta-one correspondence in regard to biological activity and chemical substituents. It requires substitutions of at least two different positions and prediction can be done only if the substituent is already included in the training set. Nevertheless, it provides the medi!{inal chemists a direction for new synthesis guided by the physico-chemical parameters influencing the biological activities. The Linear free­energy-related Hansch's model gives a more quantitative approach to the QSAR and includes the descriptors related to molecular properties like lipophilicity, log P (P: n-octanollwater partition coefficient), polarizability which could be measured experimentally. But, the dependence on a single regression equation even though it met acceptable statistical criteria was not sufficient. This prompted Hansch to develop the approach that uses 'Lateral Validation' of QSAR equations and implemented in the C-QSAR program34 containing a database of over 14,000 equations, which allows a quick comparison of any other software, developed for QSAR35

.

The success of QSAR analysis · depends on three factors, choice of set of chemical compounds, choice of descriptors available and choice of mathematical process to build the model. The choice of compounds are decided by complexity of synthesis, generally congeneric, i.e., derivatives of the same motifs are used. QSAR of the non-congeners, i.e. derived from different motifs, are generally done using 3D descriptors as described below. Recently, a set of topostructural and topochemical descriptors36 has been used to build effective QSAR model of the non­congeneric compounds.

Mathematical toolbox . depends on the proportion of the number of descriptors and the compounds. Starting from simple regression, numerous mathematical methods like MLR, KNN, GFA, ANN, PCAlPLS . have been used extensively in QSAR37

.

Most decisive factor to influence the success of

QSAR is the choice of descriptors, which sholiJd be appropriate enough to represent the biological space (activity) one is trying to encompass. Sets of descriptors for choice are readily available through the programs like CERIUS II, SYBYL, MOLCONNZ38

, etc. The set of descriptors available in most of these software could be classified into the Table 1.

These descriptors could ' be calculated by using semi-empirical, quantum chemical or graph topological methods. A proper choice and combination of these has to be done so that all the properties necessary for the compounds to have the biological activities are included. In some cases, one includes as many as possible descriptors, some of them being interrelated to each other, and use a method like principle component analysis to choose the appropriate components so that most of the factors leading to the biological activities are represented. In the case of compounds generated from non­congeneric sets, the method like CoMFA gives a better prediction. It deals with the problem of missing physicochemical parameters, includes the 3D characteristics of the compounds, both steric and electronic, and takes care of diverse compounds. The most critical step in this method is the proper

Table l---Classification of a set of descriptors commonly used in QSAR studies in biology

Class of descriptors

Properties represented

Electronic Atomic .polarizabilities, Dipole, Dipole moment, Highest Occupied Molecular Orbital Energy (HOMO), Lowest Unoccupied Molecular Orbital Energy(LUMO), Delocal i zabili ty .

Spatial Molecular surface area, Density, Principal moment of inertia and its components, . Number of rotatable bonds, Molecular volume, Valence corrected molecular Connectivity indices, Molecular weight.

Shape Common and non-common overlap steric volume, Difference volume, Volume occupied by hydrophobic probes, RMS to shape reference.

Conformati~nal Lowest energy conformer, Radius of gyration, Conformational energy penalty.

Receptor Molecule strain energy inside receptor, Nonbonded and electrostatic energy between molecule and receptor.

Thermodynamic Desolvation free energy for water, Conformational entropy, Log (partition coefficient), Heat of formation, Molecular

. ;refractivity.

GHOSH: SCREENING AND LEAD DESIGNING IN PHARMACEUTICAL DRUG INDUSTRY 167

alignment of the compounds so that the binding to the receptor molecule is correctly represented. In the absence of active site structure, a pharmacophore hypothesis approach can help to orient all the compounds. Kubinyi et al. 39

, Martin et al. 40 and Kim41 have done a comprehensive review of CoMFA and related methods. A similar approach has been used here to include the electrostatic potential in a 2D and 3D QSAR to correlate the bioactivity of a set of non-

z l

N14•••· ·••• •• ••• •• • I Y

71S- N40

tx

Xl : Ct~C4

X2 : C6-Ov.~

Xa: ~.CZZ.Q3

41- U.V ~)

Dial1 ~ R3(Nitrogen) ... R8(05) ( ..... .. J

Dist2 = N14 ... RS(C25) ( ...... )

Fig. I-The schematic diagram of H+/K+ ATPase inhibitors and the descriptors used in the study.

congeneric compounds inhibiting gastric acid secretion. The analysis was done on two series of compounds (imidazopyridine and 7-azaindoles), with selective substituents and their enzymatic and biological activity on H+ IK+ ATPase 42. Initially, we used seven structural parameters like molecular volume, calculated using an algorithm of Bodor et al.43 and six conformational variables defining the orientation of the substituents with respect to the imidazopyridine ring (Fig. 1) as descriptors. These are responsible for hydrophobic and hydrogen bond formation of the molecules at the active site as well as the conformational restrictions. Three torsion angles kail, kai2 and kai3 take care of the flexibility of the molecules. The angle (theta) between imidazopyridine ring plane and the phenyl ring plane in R8 substitution, along with the -descriptors Oistl and Oist2 represent the relation between the probable hydrogen bonds and the distance between the hydrophilic centers to the hydrophobic region of the molecule. ' .

This study has shown that using 20 electrostatic force fields (Fig. 2) developed by Pepe et al. 44 in addition to the L-J descriptors, we could develop a QSAR model that correlated the structure activity of the short acting inhibitors of H+/K+ ATPase compounds. The limitation observed in this case is the variation of the maximum number of atoms

Fig. 2- 20 Electrostatic potential calculation using MOLPOT. [(a) Potential versus angle plot of inhibitor, and, (b) Different slices of the inhibitor with van der Waals and accessible surface].

168 INDIAN J CHEM, SEC A, JANUARY 2006

Fig. 3--Molecular electrostatic potential surface of (a) active, and, (b) inactive molecule. [Different colors on the accessible points . indicate different potential ranges] .

. contributing to the slices of fields considered as descriptors45

• 3D potentials surrounding the exposed surfaces of the inhibitors were calculated considering the interaction between the ligand molecules and the receptor binding site atom (assuming van der Waal's radius of 1.4 A with charge of + 1.0 esu) using the Connolly's accessible surface points46 (Fig. 3). Inclusion of the 3D potentials of the molecules improved the correlation to a higher extent and the activity of two different sets of chemicals (non­congeneric) could be predicted. The alignment was done using QCPE program called SEA 47 based on mutual similarity indices, calculated pairwise between all atoms of the molecules being compared with respect to their electrostatic properties (Fig. 4). Using the "bucketing" method and then using PCA achieved reduction of approximately 500 accessible points for each compound to a reasonable limit. The regression was done using PLS47

• Though the prediction quality of the model was moderate (QPRESS <0.5), it could include the protonation state of the compounds in building the QSAR model.

Novel lead generation using pharmacophore

Measurement of molecular similarity and complementarity are both iinportant factors while designing drugs. In the receptor-ligand interaction, it is the complementarity that one would look for to start de-novo designing. In the case of database searching

Fig . 4-Alignment of (a) active, (b) inactive compounds using SEA.

for the "starting motif" in lead identification phase, it would be appropriate to use molecular similarity approach. In either case, one should be able to define similarity or complementarity. Most often, similarity is defined as perception of interesting patterns and their frequency of occurrence, which makes it dependent on the classification of patterns. More often, patterns are searched using similarity index and based on the concept that sirr.ular compounds will have similar biological effects. Using minimum common scaffold or substructure (MCS) search, one could identify many active compounds against a

GHOSH: SCREENING AND LEAD DESIGNING IN PHARMACEUTICAL DRUG INDUSTRY 169

disease. Fig. 5 depicts48 a search using MCS, derived from anti-tuberculosis compounds, gives a high hit rate to a data set containing good anti-mycobacterials. The challenges of substructure searching are that there could be a set of common scaffolds (more prevalent in drugs49

) which will be present in the drugs which inhibit different sets of target enzymes in different diseases I , as shown in Fig. 6.

Patterns could also be searched as discrete entity like in pharmacophore searching using a few number of donor, acceptor, functional groups and their distances from each other, as done by the program CatalystSo. It could also be done using topological descriptions of molecular shapes,

. In sh<rpe matching or field matching, patterns are searched as a continuous function of the descriptor like electrostatic

potentials and are overlapped on a semi-continuous basis to get a measure of similarity like CoMFA52

does in TRIPOS. In the case of complementarity searching, a key-lock model is studied and the correlation co-efficients are expressed in the range of o to -1 instead of 0 to 1, as in the case of similarity. The limitation is that some flexible compounds choose to fit the active site in a manner defined as "induced fit". In spite of this , the similarity index derived from different descriptors for a set of molecules helps in de-novo designing. The main applications of these methods are to design effective compounds starting from a set of primary hits collected from a High Throughput Screening campaign, though the active site structure of the target enzyme may not be known. Proper attention is to be

Some of the Known Drugs Retrieved

f o o

18·3 CIPROFlOXACIN 18·80FlOXACIN

18-13 RIFABUTIN 18-14 RIFAMPIN

c,~ y

o CCM: ¢

CI

18·12 ClOFAZIMINE

N

¢ CI

28·BM 212

Fig. 5---The active drugs identified from an antituberculosis database using a sub structure searching.

170 INDIAN J CHEM, SEC A, JANUARY 2006

P A -(5 / -t5~ / ~ '>:: ~-CH, '/ ~ ,N-~ .- H,C

---- HJC Uptake blokers (Desipramine) Uptake blokers (Imipramine)

9, y@ ~3 V "': jN -"'N~

---- 'CH3 I N Sa N,CH,

6 v ~ H,c .--0:

Dopamine antagonist H 1 antagonist (Chlorpromazine) (Promethazine)

Fig. 6--Similar scaffold containing compounds, which are drugs for different targets.

given to the choice of starting set of compounds with respect to which the similarity index is to be generated. The set of descriptors required to appropriately measure the similarity index, flexibility of the compounds51 and criteria of superposition of the set of chosen compounds also playa pivotal role in successfully deriving the active chemical contour or shape of the pharmacophore. Recent literature survey reveals that the activities of 297 compounds in 25 datasets have been forecasted with the average RMS error of 0.70 logarithm unit40. CoMFA has been successfully applied to predict the biological potencies and a comprehensive comparison has been written by Maritn and Kubinyi 39.40. Using CoMFA model to design existing compound with enhanced potency has also been suggested by Cramer52. Several recent publications53 have cited different types of application of CoMFA in the designing of drugs from ligand-based approach.

Optimization of lead using combinatorial chemistry and virtual screening

The progress in Combinatorial Chemistry has provided a novel method for shortening the Lead optimization phase for the medicinal chemists. The idea of production of diverse, yet representative subsets of compounds in a very short time as mixtures or as individuals for HTS54 is the main contribution of combinatorial chemistry. But, the similarity-based selection of test compounds from an existing collection increases the effectiveness of lead finding and optimization55.s6. If the X-ray structure of the

receptor is known as demonstrated in the case of aspartyl protease cathepsin D and plasmepsin 1157, the coupling of structure based design and combinatorial chemistry will lead to the targeted library for quick lead optimization process,

During designing the targeted library, selection of descriptors is crucial 58 as the amount of calculation is enormous, just because of the size of libraries are ranging from 103

-4 for lead optimization, to 106.7 for HTS collection. Atom pair fingerprints, topological indices59, 2D and 3D auto-correlation vectors60

,

flexible 3D fingerprints, weighted holistic invariant molecular (WHIM) indices61 and physico-chemical properties62 are used frequently to evaluate the libraries. An analysis63 was performed using all these descriptors and a part of the IC93 database64 with 1283 active compounds in 55 biological classes was derived using the bioactivity strings to assess, whether the subsets represent all biological properties. 2D fingerprints show the best performance using hierarchical cluster analysis representing 78.2% of the classes and using maximum dissimilarity method representing 83.2% classes. Lowest performers are 3D type descriptors. The clustering method65 was used to assess the performance of the descriptors and to separate active from inactive group. Results show that 2D fingerprints alone or a combination with secondary descriptors performs better than others. Similar results have been illustrated when retrieval plots were done using similarity searching for three different enzymes, the comparison between different fingerprints were analyzed66

.

The designing of combinatorial library keeping a balance between diversity of substituents and similarity of scaffold is really a difficult task. To make a combinatorial library, it also requires available chemical technology , choice of proper reactants and conditions suitable to achieve the expected yield. All these may not finally give the targeted diversity . It has been noted that a product­based design of true combinatorial libraries using genetic algorithm provides more diverse libraries than a simple reactant-based design67 . Nonetheless, it has been claimed by the Pharmacopia group68 that they have designed a combinatorial library using cell­based methods and then screened for three different enzymes and generated successful and selective initial hits (IC50<100 !lM).

Presently, several methods69 have been developed to virtual screen the collection of compounds before

GHOSH: SCREENING AND LEAD DESIGNING IN PHARMACEUTICAL DRUG INDUSTRY 171

actual HTS is performed. Most of these methods are based on different kinds of scoring function69. 2D fingerprints and topological descriptors are being used for the virtual screening of a large number of chemical'databases, as the time requirement is really small compared to actual screening. Several review articles on the virtual screening and designing combinatorial library have been published during the recent years discusses the scope and limitations of the applications in the drug designing7o.

Challenges Drugs usually interact with macromolecules in

body fluids and tissue. The portion of a drug bound to plasma protein is unable to interact · with the receptor/enzyme. So, prediction of protein binding using descriptors is a field of research by itsele l

.

Similarly, the prediction of toxicity 72 in the earlier phase of discovery process helps the medicinal chemists to choose the scaffolds for lead optimization. Another challenge in the discovery process is the prediction of solubility and the Ionization State of drug that influences the transport and absorption of drugs. Kubinyi has reviewed the seminal work done by Hansch et al. in this field32

•73

. Recently, attempts have been made using density functional theory and continuum solvation methods 74 for building up models to correlate experimental results. But, as these methods involve highly computer-intensive calculations, it is not feasible to predict large number of compounds, which is a need at the lead optimization, phase in drug discovery process. Though this review does not contain any discussion on the challenges of efficacy of the drugs, nonetheless it must be stated that this is the major cause for faiiure in drug discovery at the later stage75.

For the last twenty years, chemistry and computational methods have been growing in an exponential rate. The developments in biological sciences have put before the computational chemists a never-ending challenge of prediction of biological properties to the best of accuracy. But in terms of cost-effectiveness, computational chemistry and molecular modeling have influenced significantly the discovery process in pharmaceutical industry. In future76, substantial enhancements are expected in the field of efficient representation of chemical structures, algorithms for searching and utilization of three­dimensional structure of receptors, appropriate selection criteria for identification of hits and

designing of biologically annotated diverse compound library for screening.

References 1 Greene J, Kahn S, Savoj H, Sprague P & Teig S, J Chem In!

Computer Sci, 34 (1994) 1297-1308; Kubinyi H, Nature Rev Drug Discov. 2 (2003) 151.

2 Hopkins A L & Groom C R, Nature Rev Drug Discov, 1 (2002) 727; Lahana R, Drug Discov Today, 4 (1999) 447.

3 Morgunova E ,Tuuttila A, Bergmann U, Isupov M, Lindqvist Y, Schneider G & Tryggvason K, Science, 284 (1999) 1667.

4 http://wwwcytoc!onal.com; Erdmann J, Genetic Eng News, 19 (1999) 47.

5 Ehrlich P, Chern Ber. 42 (1909) 17; Fiscser E, Chern Ber. 27 (1894) 2985.

6 Jorgensen W L, Science, 254 (1991) 954. 7 Desjarlais R L, Sheridan R p, Dixon J S, Kuntz I D &

Venkataraghavan R, J Med Chem, 29 (1986) 2149; Murrall N W & Davies E K, J Chern In! Comput Sci, 30 (1990) 312; Lewis R A, Roe D C, Huang C, Ferrin T E, Langridge R & Kuntz] D, J Mol Graphics. 10 (1992) 66; Leach A R & Kuntz I D, J Comput Chem. J3 (1992) 730; Moock T E, Henry D R, Ozkabak A G & Alamgir M, J Chem In! Comput Sci, 34 (1994) 184; Hurst T J, J Chem In! Cornput Sci, 34 (1994) 190; Oshiro C M, Kuntz I D & Dixon J S, J Comput Aided Mol Des, , 9 (1995) 113.

8 Alberty R A & Cornish-Bowden A, Trend Biochem Sci, 8 (1993) 288; Karplus P A & Faerman C, Curr Opin Struct Bioi, 4 (1994) 770; Wlodek S T, Antosiewicz J & McCammon J A, Protein Sci, 6 (1997) 373, Protein Sci. 7 (1998) 573.

9 Molecular Similarity in Drug Design, edited by P M Dean, (Blackie, Glasgow, UK), 1995; Good A C & Mason J S, Rev Comput Chem Lipkowitz, 7 (1996) 1; Brown R D & Martin Y C, J Chem In! Comput Sci, 37 (1997) I ; Gillet V J & Johnson A P, in Designing Bioactive Molecules: Three Dimensional Techniques and Applications, edited by Y C Martin & P Willett (American Chemical Society, Washington, DC) 1997.

10 Humblet C & Marshall G R, Drug Dev Res, 1 (1981) 409; Humblet C, Jakes S E, Willett P J, Mol Graphics, 4 (1986) 12; Martin Y C, J Med Chem, 35 (1992) 2145; Martin Y C, Chap 6 & Dean P & Perkins T D J, in Designing Bioaclive Molecules: Three Dimensional Techniques and Applications, Chap 9, edited by Y C Martin & P Willett (American Chemical Society,Washington, DC) 1997.

11 Kier L B, Pure Appl Chem, 135 (1973) 509; Mason J S, Morize I, Menard P R, Cheney D L, Hulme C & Labaudiniere R F, J Med Chem, 42 (1999) 3251 ; Schneider G, Neidhart W, Giller T & Schmid, G, Angew Chem Int Ed Engl, 38 (1999) 2894.

12 Oprea T T, Ho C M W & Marshall G R, in Computer-Aided Molecular Design: Applications ill Agrochemicals. Materials & Pharmaceuticals, Chap 5. edited by C H Reynolds, M K Holloway & H K Cox (American Chemical Society, Washington, DC), 1995.

13 Kollman P A, Chem Rev. 93 (1993) 2395; Aqvist J, Medina C & Samuelsson J E, Protein Eng, 7 (1994) 385; Carlson H A & Jorgensen W L, J Phys Chem, 99 (1995) 10667; McCammon J A, Gilson K, Given J A & Bush B L, Biophys J, 72 (1997) 1047.

172 INDIAN J CHEM, SEC A, JANUARY 2006

14 Holloway M K & Wai I M, in Computer-Aided Molecular Design: Applications in Agrochemica/s, Materials & Pharmaceuticals, Chap 3, edited by C H Reynolds, M K Holloway & H K Cox (American Chemical Society, Washington, DC) 1995.

15 Lyle T A, Wiseount C M, Guare J P, Thompson W J, Anderson P S, Darice P L, Zugay J A, Emini E A & Schleif W A,J Med Chem, 34 (1991) 1228.

16 Thompson W I, Paula M D Fitzgerald, Holloway M K, Emini E A, Darke P L, McKeever B M, Schleif W A, Quintero J C, Zugay lA, J Med Chern, 35 (1992) 1685.

17 Vacca J P, Fitzgeralda P M D, Holloway M K, Hungate R W, Starbuck K E, Cben L I, Darke P L, Anderson P S & Huff I R, Bioorg Med Chern Lett, 4 (1994) 499.

18 Thompson W J, Ghosh A K, Holloway M K, Lee H Y, Munson P M, Schwering I E, Wai J, Darke P L & Zugay J, J Arn Chem Soc, 115 (1993) 801.

19 Vacca I P, Dorsey B D, Schleif W A, Levin R B, McDaniel S L, Darke P L, Zugay I, Quintero I C, Blahy 0 M, Roth E, Sardana V V, Schlabach A I, Graham P I, Condra I H, Gotlib J H, Holloway M K, Lin I, Chen I, Vastag K, Ostovic D, Anderson P S, Emini E A & Huff J R, Proc Nat! Acad Sci, USA, 91 (1994) 4096.

20 De Lucca G V, Erickson-Viitanen S & Lam P Y S, Drug Disc Today, 2 (1997) 6.

21 KempfD, Methods Enzymol, 241 (1994) 334. 22 Erickson J, Neidhart D J, VanDrie J, Kempf D J, Wang XC,

Norbeck D W, Plattner J J, Rittenhouse J W, Turon M, Wideburg N, Science, 249 (1990) 527; Swain A L, Miller M M, Green J, Rich D H, Schneider J, Kent S B H & Wlodawer A, Proc Natl Acad Sci USA, 87 (1990) 8805.

23 Lam P Y S, Science, 263 (1994) 380; Lam P Y S, Ru y, Jadhav P K, Aldrich P E, DeLucca G V, Eyermann C J, Chang C H, Emmett G, Holler E R, Daneker W F, Li L, Confalone P N, McHugh R J, Han Q, Li R, Markwalder J A, Seitz S P, Sharpe T R, Bacheler L T, Rayner M M, Klabe R M, Shum L, Winslow D L, Kornhauser D M, Jackson D A, Erickson-Viitanen S, Hodge C N, J Med Chem, 39 (1996) 3514.

24 Jadhav P K & Woerner F, J Biorg Med Chem Lett, 2 (1992) 353; Kempf D J, Codacovi L, Wang X C, Kohlbrenner W E, Wideburg N E, Saldivar A, Vasavanonda S, Marsh K C, Bryant P, J Med Chem, 36 (1993) 320; Allen F H, Davies J E, GaUoy J J, Johnson 0 , Kennard 0, Macrae C F, Mitchell E M, Mitchell G F, Smith J M, Watson D G, J Chem Inf Comput Sci, 31 (1991) 187; Hodge C N, Aldrich P E, Bacheler L T, Chang C-H, Eyermann C J, Garber S, Grubb M, Jackson D A, Jadhav P K, Korant B, Chem Bioi, 3 (1996) 301.

25 Jakes S E & Willett P, J Mol Graphics, 4 (1986) 12; Jakes S E, Watts N, Willett P, Bawden D & Fisher J D, J Mol Graphics, 5 (1987) 41; Van Drie J H, Weininger D & Martin Y C, J Comput Aided Mol des, 3 (1989) 225; Sheridan R P, Nilakantan R, Rusinko A, J Chem Inf Comput Sci, 29 (1989) 255.

26 Murrall N W & Davis E K, J Chem Inf Comput, Sci, 30 (1990) 312; Hurst T J, J Chem Inf Comput, Sci, 34 (1994) 190.

27 Lewis R A, Roe D C, Huang C, Ferrin T E, Langridge R & Kuntz I D, J Mol Graphics, 10 (1992) 66; Oshiro C M, Kuntz I D & Dixon J S, J Comput Aided Mol Des, 9 (1995) 113.

28 Skulnick H I, Johnson P D, Howe W J, Tomich P K, Chong K-T, Watenpaugh K D, lanakiraman M N, Dolak L A & McGrath J P, J Med Chern, 38 (1995) 4968.

29 Abraham M H, Whiting G S & Alarie Y, Quant Struct-Act Relat, 9 (1990) 6; Abraham M H, Chem Society Rev, 22 (1993) 73.

30 Cramer R D III, Patterson D E & Bunce J D, JAm Chem Soc, 110 (1988) 5959; Cramer R D III & Wold S B, US Patent, 5 (1991) 025-388; Cramer III R D, DePriest S A, Patterson D E & Hecht P, The Developing Practice of Comparative Moleculor Field Analysis in 3D QSAR itl Drug Design, edited by H Kubinyi, (Eseom: Leiden) 1993, pp 443-485.

31 Hansch C & Leo A, Exploring QSAR Vol. 1: Fundamentals and Applications in Chemistry and Biology, (Oxford University Press, USA & UK) 1995.

32 Kubinyi H, QSAR:Hansch Analysis and Related Approaches (VCH, Weinheim) 1993.

33 Kubinyi H, Drug Disc Today, 2 (1997) 457 & 538. 34 BioByte Corp (201 W Fourth St, Suite#204, CA91711-

4707); email:c1ogp@biobyte com, URL: http://www biobyte com.

35 Hansch C, Acc Chem Res, 26 (1993) 147; Garg R, Gupta S P, Gao H, Babu M S, Debnath A K & Hansch C, Chem Rev, 99 (1999) 3525.

36 Gute B D, Grunwald G D & Basak S C, SAR QSAR Environ Res, 10 (1999) 1.

37 Martin Y C, Quantitative Drug Design (Dekker, New York), 1978; Glen W G, Dunn III W J & Scott D R, Tetrahed Comput Meth, 2 (1989) 349.

38 CERIUSIIIMSl Inc (San Diego, CA 92121-3752); SYBYL6 6rrRIPOS Inc (St Louis, MO 63144); Molconn-Z, Hall Associates Consulting, (Quincy, MA 02170).

39 Pearlman R S, 3D Molecular Structures: Generation and Use in 3D Searching, in 3D QSAR in Drug Design: Theory, Methods and Applications, edited by H Kubinyi, (ESCOM Science Publishers, Leiden) 1993, pp. 41-79.

40 Martin Y C, Kim K H & Lin C T, Comparative Molecular Field Analysis: CoMFA in Advances in Quantitative Structure-Property Relationships,edited by M Charton, (lAI Press Inc., Greenwich, Connecticut) 1996, pp. 1-52.

41 Kim K H, Chap 12, in Molecular Similarity in Drug Design, edited by P M Dean (B A&P, UK) 1995.

42 Kaminski J J, Wall mark B, Briving C & Andersson B, J Med Chem, 34 (1991) 533.

43 Bodor N, Gabany Z & Wong C, J Am Chem Soc, III (1989) 3783.

44 Pepe G, J Mol Graphics, 7 (1989) 233. 45 Banerjee T & Ghosh l, Molecular Design Down Under,

1995, PosB-l; Ghosh I, First Indo-US Workshop on Mathematical Chemistry, 1998.

46 Connolly M L, J Appl Cryst, 16 (1983) 548; Connolly M L, J Mol Graphics, 4 (1986) 3.

47 Smith G M, QCPE, 567; SEA Wold S, Ruhe A, Wold H & Dunn W J III, SIAM J Sci Stat Computing, 5 (1984) 735.

48 Om Prakash & Ghosh Indira, The Euro-QSAR, 2004, Istanbul, Turkey; (http://www euro-qsar2004 org.

49 Willett P, J Chemometrics, 6 (1992) 289; Dean P M, Chap 1, Leach A R, Chap 3 & Willett P, Chap 5, in Molecular Similarity in Drug Design, edited by PM Dean (BA&P, UK) 1995; Bemis G W & Murcko M A, J Med Chem, 39 (1996) 2887.

GHOSH: SCREENING AN D LEAD DESIGNING IN PHARMACEUTICAL DRUG INDUSTRY 173

50 Sprague P W, in Perspectives in Drug Discovery and Design: De novo Design, 3 (1995) I, ed ited by K Muller & EscomLeiden (Cata lystlMSI Inc, Califo rnia) .

51 Arleca, G A & Mezey P G, J COfilput Chem, 9 ( 1988) 554; Mezey P G, Chap 10; and Whi tley D & Ford M , Chap I J in Molecular Similarity in Drug Design, edited by P M Dean (BA&P,UK) 1995 .

52 Cramer, Clark R D, Patterson D E & Ferguson A M , J Med Chelll, 39 (1 996) 3060.

53 Talele T T, Kulkarni S S & Kulkarn i V M. J Chell1 In! Complil Sci, 39 (1999) 958; Matter H, Schwab W, Barbier D, Billen G, Haase B, Neises B, Schudok M, Thorwart W, Sc hreuder H, Brachyogel V, Uinze P & Wei thmann K U, J Med Chelll. 42 (J 999) 1908; Debnath A K. J Med Chel1l, 42 ( 1999) 249; Bohm M, Stlirzebecher J & Klebe G, J Med Chem, 42 ( J999) 458.

54 Spencer R W, Biotech. Bioellg (ColI/b Chell/ ), 6 1 ( 1998) 61. 55 Martin E J. Blaney J M, Siani M A. Spellmeyer D C , Wong

W H & Moos W H, J Med Chem, 38 (1995 ) 1431 ; M adden D, Krchnak V & Lebl M, Persp Drug Disco Des, 2 (1995) 269.

56 Cummins D J, Andrews C W, Bentley J A, Cory M, J Chelll In! Call/put Sci, 36 ( 1996) 750 ; Po tte r T & Matler H, J Med Chem, 41 (1998) 478 .

57 Li J, Murray C W, Waszkowycz B & Young S C, Drug Disc Today, 3 ( 1998) lOS; Kick E K, Roe D C , Ski llman A G , Liu G, Ew ing T J A, S un Y, Kuntz I D & Ellman J A, Chell/ Bioi, 4 ( 1997) 297 ; Salemme F R, Spurlino J & Bone R, Structure,S (1997) 319.

58 Patte rson D E. C ramer R D, Ferguson A M. C lark R D & Weinberger .L E, 1 Med Chem, 39 ( 1996) 3049.

59 Scheridan R p, Nachbar R B & Bush B L, J Call/p Aided Mol Des. 8 ( 1994) 323; S heridan R P, Miller M D, Underwood D J & Kearsley S K. J Chem In{ COli/PilI Sci. 36 ( 1996) 128; Gombar V K & Enslein K. Qllallt StrllCf Act Rd(/t, 9 ( 1990) 321; Hall L H. Mohney B & Kier L n, .I Chelll 111/ COli/pur Sci. 3 1 ( 1991 ) 76.

60 Viswanadhan V N, Ghose A K. Reyankar G R & Robins R K, J C/JeII1II1t" COI1Pllt Sci. 29 (1989) 163; Moreau G & Brolo P, Allaillsi.l'. 24 (1996) M 17-M22.

6 1 Tndeschini R, Bettio l C. Giurin G. Gr<lll1at ica P, Miana P & Argese E, Chell/o.l'phere. 33 ( 1996) 7 1.

62 Lewis R A, Mclay I M & Mas()n J S, Chl'lll De.l' Alltoll/ News. 10 (1995 ) 37. Shemelulskis N E. Weininger D, Blankley C .f . Yang .I J & Humble! Sti gmata C. .I Che fll III /"

COli/put Sci, 36 ( 1996) 862; Nilakantan R, Bauman N & Haraki K S, Comptlt -Aided Mol Des. II (J 997) 447.

63 Matter H, J Med Chem, 40 (1997) 1219; Potter T & Matter H, J M ed Chem, 41 ( 1998) 478.

64 Index Chem ical Database-Subset from 1993, (lSI 3501 Market Streel. Philade lphi a, PA. USA).

65 Brown R D & Martin Y C , J Chem Ill! COli/put Sci. 36 ( 1996) 572.

66 Schuffenhauer A. F10ershei m P, Acklin P & Jacoby E, J Chem Inf Comput Sci, 43 (2003) 391.

67 Gillet V J, Willett P & Bradshaw J, J Chem Jl1f Complll Sci, 37 (1997) 731.

68 Schnur D, J Chel1l h~f COIllPlIt Sci, 39 ( 1999) 36. 69 Muegge J, Marlin, Y C, J Med Chelll , 42 (1999) 791 ; Hirst,

J D, Cur,. Opin Drug Disc Dev, I ( 1998) 28; Oprea T L & Marshall GRin 3D-QSAR in Drug Design, Vol. 2, edited by H KUbiny i, G Folkers & Y C Martin , (Klu wer, Dordrecht) 1998, p 35 ; Sadowski J & Kubinyi H, J Med Chelll , 4 J ( 1998) 3325; Ajay, Walters W P & Murcko M A, J Med Chem, 18 (1998) 3314.

70 Kl ebe G, Persp Drug Discov Design, 20 (2000) I ; Bbhm H J & Schneider G, Methods and Principles in Medicinal Chemistry, Vol 19, edited by R Mannhold, H Kubinyi & G Folkers (Wiley-VCH, Weinhei m), 2003; Latha N & Jayaram, B Drug Design Reviews -Online, 2 (2005) 145 ; Lyne P D, !?rug DiscOII Today . 7 (2002) 1047.

71 Fichtl B, Niecieeki A & Walte r K, Adv Dmg Res, 20 (1991 ) I J 8; Proost J H, Wierda J M K H & Meijer D K F, J Pharlllacokin Biopharm, 24 (1996) 45 ; Zlotos G , Bucker A, Jens Jurgens & Holzgrabe Ulrike, 1111 J Pharmaceutics , 169 ( 1998) 229 and ref 32 , cited above.

72 Ensle in K , In Vitro Toxicology, 6 ( 1993) 163 ; Gombar V K. SAR QSAR £nviroll Res, 10 ( 1999) 37 1; Enslein K, Gombar V K & Blake B W, Mutat Res, 305 (1994) 47.

73 Bodor N & Buchwa ld P, PhOn/weal Therp. 76 (1997) l: Winiwarter S, Bonham N M , Ax F, Hallberg A, Lennernas H & Karlen A, J Med Chelli. 41 ( 1998) 4939; Buchwald P & BodoI' N, CLlrr Med ChOIl, 5 ( 1998) 353; Abraham M H & Le J, J Plwrlll([co/ Sci, 88 ( 1999) 868.

74 Topol I A, Tawa G J , Burt S K & Rashin A A, J Phys Chelll A. 10 1 (1997) 10075; Topol 1 A, Burt S K, Rashin A A & Erick son J W , .1 Ph.".\ Chell/ A, 104 (2000) 866.

75 Prentis R A. Lis Y & Walker S R, Br J Clill Pharll/aml, 25 ( 1988) 387 ; Ken nedy T , Dntg Di.lcov Today, 2 (1997) 436.

76 Horrobin D F, Natu re Rev Drug Di.l'c(lv, 2 ( 2003) lSI.

. i .'