chemometric methods in molecular design. han van de waterbeemd (eds.), vch publishers, new york,...

2
JOURNAL OF CHEMOMETRICS, VOL. 10,269-270 (1996) BOOK REVIEW CHEMOMETRIC METHODS IN MOLECULAR DESIGN, Han van de Waterbeemd (eds.), VCH Publishers, New York, ISBN 3-521-30044-9, 359 pp., $1 10.00. Structure-activity relationship (SAR) research has tended to concentrate on two complementary but quite different approaches. The so-called intensive approach makes use of sophisticated molecular graphics, combined with force field mechanics and conformational analysis calcula- tions, which are performed on one or a few carefully chosen molecules. However, a fairly detailed knowledge of the mechanism of action is required. When this knowledge is lacking, a more extensive approach is usually called for, the so-called QSAR approach, since the information content per molecule will be less. Here a large number of compounds are usually studied to uncover more or less empirical relationships between chemical structure and biological activity. The focus of van de Waterbeemd’s text is the use of chemometrics in QSAR modelling, which is a logical application of this subfield of analyti- cal chemistry since it is assumed in QSAR that within a series of compounds a small change in chemical structure will be accompanied by an analogous small change in biological activity, and that a multivariate physicochemical description of the system will reveal this relationship. Although a large number of texts have been published on QSAR, very few have focused on the role of chemometrics in molecular design. In that sense, van de Waterbeemd’s text fills a vital niche, especially in view of growing interest in the field of combinatorial chemistry. However, there are other reasons why an interested reader should purchase this text, which become apparent when considering the breadth of coverage pro- vided by the authors on this subject. This text is divided into four sections. The first section covers molecular descriptors, including the classical physicochemical parameters used in Hansch analysis and descriptors derived from topological indices and solvent-accessible sur- face areas. Unfortunately, this section of the text is weak. The two chapters on computer-gener- ated descriptors are a rehash of previously published work and do not provide the reader with the necessary understanding for proper application of these tools. On the other hand, the second section of the text on experimental design does provide the reader with the necessary understanding to implement these techniques in the development of a QSAR, in part owing to the large number of worked examples supplied by the different authors. The importance of properly selecting training set compounds is explained and techniques for accomplishing this task are discussed at great length, including methods for the direct optimization of lead compounds as well as approaches for systematic investigations of parameter space, e.g. Craig plots, parameter focusing, etc. Overall, the quality and clarity of the discussions in this section of the text are high. The third section, multivariate data analysis, is truly the ‘heart and soul’ of this text. It provides an update of current and newly emerging data analysis techniques in QSARs. There are several chapters describing the development of QSAR models using discriminant analysis, SIMCA, principal component analysis, factor analysis, cluster analysis and canonical correlation analy- sis. In the studies referenced in these chapters, biological activity is a categorical variable (e.g. active/inactive/weak/moderate/strong). How- ever, the biological response variable can be quantitative in nature (e.g. log(l/C), where C is the molar concentration of a substance required to elicit a fixed response), which was evidently the motivation for the chapter on PLS. What I find most impressive about this section of the text is the large number of worked examples and the availability of real data sets for testing the various multivariate analysis methods. (The data sets are either listed directly in the text or supplied by an appropriate literature reference.) In addition, the emphasis on using eigen-based methods to uncover structure in data offers the reader a rudimentary strategy for QSAR based on statis- tical experimental design and multivariate data analysis. The fourth and final section of the text treats the topic of statistical validation of QSAR 0 1996 by John Wiley & Sons, Ltd

Upload: barry-k-lavine

Post on 06-Jun-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chemometric methods in molecular design. Han van de Waterbeemd (eds.), VCH Publishers, New York, ISBN 3-527-30044-9, 359 pp., $110.00

JOURNAL OF CHEMOMETRICS, VOL. 10,269-270 (1996)

BOOK REVIEW

CHEMOMETRIC METHODS IN MOLECULAR DESIGN, Han van de Waterbeemd (eds.), VCH Publishers, New York, ISBN 3-521-30044-9, 359 pp., $1 10.00.

Structure-activity relationship (SAR) research has tended to concentrate on two complementary but quite different approaches. The so-called intensive approach makes use of sophisticated molecular graphics, combined with force field mechanics and conformational analysis calcula- tions, which are performed on one or a few carefully chosen molecules. However, a fairly detailed knowledge of the mechanism of action is required. When this knowledge is lacking, a more extensive approach is usually called for, the so-called QSAR approach, since the information content per molecule will be less. Here a large number of compounds are usually studied to uncover more or less empirical relationships between chemical structure and biological activity.

The focus of van de Waterbeemd’s text is the use of chemometrics in QSAR modelling, which is a logical application of this subfield of analyti- cal chemistry since it is assumed in QSAR that within a series of compounds a small change in chemical structure will be accompanied by an analogous small change in biological activity, and that a multivariate physicochemical description of the system will reveal this relationship. Although a large number of texts have been published on QSAR, very few have focused on the role of chemometrics in molecular design. In that sense, van de Waterbeemd’s text fills a vital niche, especially in view of growing interest in the field of combinatorial chemistry. However, there are other reasons why an interested reader should purchase this text, which become apparent when considering the breadth of coverage pro- vided by the authors on this subject.

This text is divided into four sections. The first section covers molecular descriptors, including the classical physicochemical parameters used in Hansch analysis and descriptors derived from topological indices and solvent-accessible sur- face areas. Unfortunately, this section of the text is weak. The two chapters on computer-gener-

ated descriptors are a rehash of previously published work and do not provide the reader with the necessary understanding for proper application of these tools. On the other hand, the second section of the text on experimental design does provide the reader with the necessary understanding to implement these techniques in the development of a QSAR, in part owing to the large number of worked examples supplied by the different authors. The importance of properly selecting training set compounds is explained and techniques for accomplishing this task are discussed at great length, including methods for the direct optimization of lead compounds as well as approaches for systematic investigations of parameter space, e.g. Craig plots, parameter focusing, etc. Overall, the quality and clarity of the discussions in this section of the text are high.

The third section, multivariate data analysis, is truly the ‘heart and soul’ of this text. It provides an update of current and newly emerging data analysis techniques in QSARs. There are several chapters describing the development of QSAR models using discriminant analysis, SIMCA, principal component analysis, factor analysis, cluster analysis and canonical correlation analy- sis. In the studies referenced in these chapters, biological activity is a categorical variable (e.g. active/inactive/weak/moderate/strong). How- ever, the biological response variable can be quantitative in nature (e.g. log(l/C), where C is the molar concentration of a substance required to elicit a fixed response), which was evidently the motivation for the chapter on PLS. What I find most impressive about this section of the text is the large number of worked examples and the availability of real data sets for testing the various multivariate analysis methods. (The data sets are either listed directly in the text or supplied by an appropriate literature reference.) In addition, the emphasis on using eigen-based methods to uncover structure in data offers the reader a rudimentary strategy for QSAR based on statis- tical experimental design and multivariate data analysis.

The fourth and final section of the text treats the topic of statistical validation of QSAR

0 1996 by John Wiley & Sons, Ltd

Page 2: Chemometric methods in molecular design. Han van de Waterbeemd (eds.), VCH Publishers, New York, ISBN 3-527-30044-9, 359 pp., $110.00

270 BOOK REVIEW

results, which has not been covered by previous texts on QSAR and which I have always found suprising in view of the obvious importance of this topic. Overall, the chapters in this volume provide the reader with a practical introduction to the developing field of chemometrics in QSARs, which is quite an accomplishment, since each chapter has been written by a different author. Nevertheless, this team approach to writing has its drawbacks due to differences in nomenclature among the various authors. In other words, there are mistakes in the text which probably would not occur if the volume was not authored by a team. On page 5 canonical correlates is listed as a regression method, which it is not, equation 10 on page 118 is incorrect and the linear learning machine and other simple neural network

methods should not be called methods of dis- criminant analysis

The wealth of the material covered in this text will make it suitable for medicinal chemists interested in using these techniques in their work, chemometricians who are interested in under- standing other applications of chemometrics, and teachers who are interested in finding a textbook suitable courses on either QSAR or chemome- tncs. In conclusion, I recommend the purchase of this text.

BARRY K. LAVINE Department of Chemistry

Clarkson University Potsdam, NY 13699, U.S.A.