structure−activity relationship studies of chemical mutagens and carcinogens

34
Structure-Activity Relationship Studies of Chemical Mutagens and Carcinogens: Mechanistic Investigations and Prediction Approaches Romualdo Benigni* Istituto Superiore di Sanita’, Experimental and Computational Carcinogenesis, Department of Environment and Primary Prevention, Viale Regina Elena 299-00161 Rome, Italy Received June 9, 2004 Contents 1. Introduction 1767 2. QSARs for Congeneric Series of Chemical Mutagens and Carcinogens 1769 2.1. Aromatic Amines 1770 2.1.1. QSARs for the Mutagenic Activity of the Aromatic Amines 1770 2.1.2. QSARs for the Carcinogenic Activity of the Aromatic Amines 1773 2.1.3. Conclusions on the Aromatic Amines 1775 2.2. Nitroaromatic Compounds 1776 2.2.1. QSARs for the Nitroarenes 1776 2.2.2. Conclusions on the QSARs for the Nitroarenes 1779 2.3. N-Nitroso Compounds 1780 2.3.1. QSARs for the Mutagenicity of N-Nitrosamines 1780 2.3.2. QSARs for the Carcinogenicity of N-Nitroso Compounds 1781 2.3.3. Carcinogenicity of N-Nitrosamines 1781 2.4. Quinolines 1781 2.5. Triazenes 1782 2.6. Polycyclic Aromatic Hydrocarbons 1783 2.6.1. QSARs for the Carcinogenicity of PAHs 1783 2.6.2. QSARs for the Genotoxicity of PAHs 1784 2.7. Halogenated Aliphatics 1784 2.8. Direct-Acting Compounds 1785 2.8.1. Mutagenicity of Platinum Amines 1785 2.8.2. Mutagenicity of Lactones 1785 2.8.3. Mutagenicity of Epoxides 1786 3. Nongenotoxic or Epigenetic Carcinogens 1786 4. QSAR Models For Noncongeneric Chemicals 1788 4.1. Statistically Based Approaches for Noncongeneric Datasets 1789 4.2. Knowledge-Based Approaches for Noncongeneric Datasets 1790 4.3. The Predictive Ability of the QSARs for Noncongeneric Chemicals 1791 4.3.1. Mutagenicity Prediction Models: External Validation Studies 1792 4.3.2. Carcinogenicity Prediction Models: External Validation Studies 1792 4.3.3. Considerations on the External Validation Exercises 1793 4.3.4. NTP Prospective Prediction Exercises 1793 4.3.5. The Predictive Toxicology Challenge 2000-2001 1795 5. Conclusions 1796 6. References 1798 1. Introduction Carcinogenicity and mutagenicity are among the toxicological endpoints that pose the highest concern. The environmentsin its broadest sense, as opposed to geneticssis the major determinant of cancer, and it is responsible for up to 80% of tumor cases. 1,2 Environment includes lifestyle, diet, smoking, pol- lution, workplace exposures, etc.; an essential com- ponent of all this is the chemicals. Society in general, and the discipline of toxicology in particular, face the parallel tasks of performing safety evaluations for the uses of new chemicals before human exposure is permitted and assessing the potential hazards posed by exposure to chemicals that lack safety evaluations. In addition, the accelerated pace of chemical discov- ery and synthesis has heightened the need for ef- ficient prioritization and toxicity screening methods. Whereas the mutagenic potential of chemicals can be assessed with relatively simple test methods, the standard bioassay in rodents used to assess the carcinogenic potential of chemicals is extremely long and costly and requires the sacrifice of large numbers of animals. For these reasons, chemical carcinogenic- ity has been the target of numerous attempts to create alternative predictive models, ranging from short-term biological assays (e.g., the mutagenicity tests themselves) to theoretical models. Among the theoretical models, the application of the science of structure-activity relationships (SARs) has earned special prominence since it permits the exploitation of the huge wealth of existing chemical knowledge in understanding the interactions between chemicals and living organisms. The use of SAR concepts obviously applies also to the rationalization and prediction of the mutagenic properties of chemicals. Principles and concepts of chemical induction of mutations and cancer are covered by a large body of literature; a brilliant example is represented by the series on Chemical Induction of Cancer (see, for example, ref 3). A short and clear introduction to the subject is discussed earlier in ref 4. From the point of view of the mechanism of action, the carcinogens can be classified into (a) genotoxic carcinogens, which cause damage directly to DNA. * To whom correspondence should be addressed. E-mail: [email protected]. 1767 Chem. Rev. 2005, 105, 1767-1800 10.1021/cr030049y CCC: $53.50 © 2005 American Chemical Society Published on Web 03/15/2005

Upload: others

Post on 03-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

Structure −Activity Relationship Studies of Chemical Mutagens andCarcinogens: Mechanistic Investigations and Prediction Approaches

Romualdo Benigni*Istituto Superiore di Sanita’, Experimental and Computational Carcinogenesis, Department of Environment and Primary Prevention,

Viale Regina Elena 299-00161 Rome, Italy

Received June 9, 2004

Contents1. Introduction 17672. QSARs for Congeneric Series of Chemical

Mutagens and Carcinogens1769

2.1. Aromatic Amines 17702.1.1. QSARs for the Mutagenic Activity of the

Aromatic Amines1770

2.1.2. QSARs for the Carcinogenic Activity ofthe Aromatic Amines

1773

2.1.3. Conclusions on the Aromatic Amines 17752.2. Nitroaromatic Compounds 1776

2.2.1. QSARs for the Nitroarenes 17762.2.2. Conclusions on the QSARs for the

Nitroarenes1779

2.3. N-Nitroso Compounds 17802.3.1. QSARs for the Mutagenicity of

N-Nitrosamines1780

2.3.2. QSARs for the Carcinogenicity ofN-Nitroso Compounds

1781

2.3.3. Carcinogenicity of N-Nitrosamines 17812.4. Quinolines 17812.5. Triazenes 17822.6. Polycyclic Aromatic Hydrocarbons 1783

2.6.1. QSARs for the Carcinogenicity of PAHs 17832.6.2. QSARs for the Genotoxicity of PAHs 1784

2.7. Halogenated Aliphatics 17842.8. Direct-Acting Compounds 1785

2.8.1. Mutagenicity of Platinum Amines 17852.8.2. Mutagenicity of Lactones 17852.8.3. Mutagenicity of Epoxides 1786

3. Nongenotoxic or Epigenetic Carcinogens 17864. QSAR Models For Noncongeneric Chemicals 1788

4.1. Statistically Based Approaches forNoncongeneric Datasets

1789

4.2. Knowledge-Based Approaches forNoncongeneric Datasets

1790

4.3. The Predictive Ability of the QSARs forNoncongeneric Chemicals

1791

4.3.1. Mutagenicity Prediction Models: ExternalValidation Studies

1792

4.3.2. Carcinogenicity Prediction Models:External Validation Studies

1792

4.3.3. Considerations on the External ValidationExercises

1793

4.3.4. NTP Prospective Prediction Exercises 1793

4.3.5. The Predictive Toxicology Challenge2000−2001

1795

5. Conclusions 17966. References 1798

1. IntroductionCarcinogenicity and mutagenicity are among the

toxicological endpoints that pose the highest concern.The environmentsin its broadest sense, as opposedto geneticssis the major determinant of cancer, andit is responsible for up to 80% of tumor cases.1,2

Environment includes lifestyle, diet, smoking, pol-lution, workplace exposures, etc.; an essential com-ponent of all this is the chemicals. Society in general,and the discipline of toxicology in particular, face theparallel tasks of performing safety evaluations for theuses of new chemicals before human exposure ispermitted and assessing the potential hazards posedby exposure to chemicals that lack safety evaluations.In addition, the accelerated pace of chemical discov-ery and synthesis has heightened the need for ef-ficient prioritization and toxicity screening methods.Whereas the mutagenic potential of chemicals canbe assessed with relatively simple test methods, thestandard bioassay in rodents used to assess thecarcinogenic potential of chemicals is extremely longand costly and requires the sacrifice of large numbersof animals. For these reasons, chemical carcinogenic-ity has been the target of numerous attempts tocreate alternative predictive models, ranging fromshort-term biological assays (e.g., the mutagenicitytests themselves) to theoretical models. Among thetheoretical models, the application of the science ofstructure-activity relationships (SARs) has earnedspecial prominence since it permits the exploitationof the huge wealth of existing chemical knowledgein understanding the interactions between chemicalsand living organisms. The use of SAR conceptsobviously applies also to the rationalization andprediction of the mutagenic properties of chemicals.

Principles and concepts of chemical induction ofmutations and cancer are covered by a large body ofliterature; a brilliant example is represented by theseries on Chemical Induction of Cancer (see, forexample, ref 3). A short and clear introduction to thesubject is discussed earlier in ref 4.

From the point of view of the mechanism of action,the carcinogens can be classified into (a) genotoxiccarcinogens, which cause damage directly to DNA.

* To whom correspondence should be addressed. E-mail:[email protected].

1767Chem. Rev. 2005, 105, 1767−1800

10.1021/cr030049y CCC: $53.50 © 2005 American Chemical SocietyPublished on Web 03/15/2005

Page 2: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

Many known mutagens are in this category, andoften mutation is one of the first steps in thedevelopment of cancer;5 and (b) epigenetic carcino-gens, which do not bind covalently to DNA, do notdirectly cause DNA damage, and are usually negativein the standard mutagenicity assays.6 Whereas theepigenetic carcinogens act through a large variety ofdifferent and specific mechanisms, the genotoxiccarcinogens have the unifying feature that they areeither electrophiles per se or can be activated toelectrophilic reactive intermediates. The activationcan be chemically, photochemically, or metabolicallymediated.7 This idea, as well as the observation ofmany different cases, has led to the identification ofseveral chemical functional groups and substructures(structural alerts, SAs) for genotoxic carcinogens.Well-known SAs are the carbonium ions (alkyl-, aryl-,benzylic-), nitrenium ions, epoxides and oxoniumions, aldehydes, polarized double bonds (R,â-unsatur-ated carbonyls or carboxylates), peroxides, free radi-cals, and acylating intermediates.8,9 On the contrary,the recognition of SAs for the nongenotoxic carcino-gens is far behind, basically because no unifyingtheory provides scientific support.

Together with the recognition of the SAs, muchevidence has pointed to a number of factors that arecritical for the structure-activity analysis of chemicalcarcinogens.4 Among the physicochemical factors,there are (1) molecular weight (MW), chemicals withvery high MW and size have little chance of beingabsorbed in significant amounts; (2) physical state,which influences the capability of the compounds toreach critical targets; (3) solubility: in general highlyhydrophilic compounds are poorly absorbed and, ifabsorbed, are readily excreted; (4) chemical reactiv-ity: compounds that are “too reactive” may not becarcinogenic because they hydrolyze or polymerizespontaneously or react with noncritical cellular con-

stituents before they can reach critical targets incells. Another critical factor is the geometry of thechemical compounds: many potent carcinogens andmutagens [e.g., polycyclic aromatic hydrocarbons(PAHs), aflatoxin B1, etc.] are planar molecules, withan electrophilic functional group and favorable size,so that they can intercalate properly into DNA. Oneother critical factor is metabolism. Metabolism canboth activate and detoxify chemical carcinogens:substitutions at critical molecular regions can influ-ence one metabolic pathway at the expense of others.Thus, knowledge of the metabolic pathway of achemical can substantially enhance the accuracy ofstructure-activity analyses.

The recognition of SAs and the critical structuralfactors has been a very important scientific advance-ment since it has contributed to the design of saferchemicals10 and to the assessment of the toxic po-tential of chemicals devoid of appropriate toxicologi-cal data.4,9

Obviously, the final toxicological effect of a certainSA in a molecule depends heavily on the generalstructure/properties of the molecule, both in termsof potency and of yes/no activity. The recognition ofSAs has a strong statistical value as an indicator ofthe propensity for a given chemical to be a mutagenand/or a carcinogen; however, a more efficient as-sessment of the toxicological properties of the chemi-cals requires the more powerful tools provided by thequantitative structure-activity relationships (QSARs)methods. Thus, a very specialized section of thestructure-activity analyses on chemical mutagensand carcinogens has been developed through the useof concepts from the QSAR science and technology.

QSAR is a widely developed branch of chemicalresearch. Its foundation came about 40 years agowhen Hansch found the way to bring together twoareas of science that had seemed far apart for manyyears: physical organic chemistry and the study ofchemical T life interaction. A cornerstone of physicalorganic chemistry is the Hammett equation:

The Hammett equation models the reaction mech-anisms of organic chemicals: k is a rate or equilib-rium constant; F is a measure of the sensitivity ofthe reaction to substituents changes; σ is a parametercharacteristic of each chemical. The Hammett equa-tion was subsequently extended by Taft who alsoconsidered steric factors. Next, Hansch showed thatthis type of model can also be used for chemicobio-logical reactions by introducing a hydrophobic pa-rameter:

This approach was developed from, and meant tobe applied to, sets of congeneric chemicals, i.e.,chemicals structurally similar, and acting throughthe same mechanism of action (better, the same rate-limiting step): for optimal applications, QSAR analy-sis requires a very clear definition of the applicabilitydomain of the model (i.e., the type of chemicals towhich it applies). Finally, the model is derived from

Romualdo Benigni received his education in chemistry at the Universityof Rome “La Sapienza”. He then joined the Istituto Superiore di Sanita’(Italian National Institute of Health) where he got a permanent position in1977 and where he remained except for two sabbaticals, at the NewYork University in 1988 and at the Jawaharlal Nehru University in NewDelhi in 2000. He worked experimentally in the field of molecular biologyand environmental chemical mutagenesis. In the 1980s, he turned hisattention to the statistical analysis and modeling of toxicological data andto the study of the relationships between the structure of organiccompounds and their toxicological properties (mainly mutagenesis andcarcinogenesis). Dr. Benigni has published over 100 journal articles andbook chapters, applying a wide variety of quantitative analysis techniques,including QSAR, to the examination of chemical toxicity information.

log k ) Fσ + constant

log k ) f (electronic, steric, hydrophobic)

1768 Chemical Reviews, 2005, Vol. 105, No. 5 Benigni

Page 3: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

the statistical analysis of a (training) set of chemicals;thus its character is fundamentally empirical. Thismodel worked for an enormous number of biologicalproblems, and its success is demonstrated clearly byits widespread diffusion. In the years after the 1960s,the need to solve new problems, together with thecontributions of many other investigators, generatedthousands of variations of the Hansch approach, aswell as approaches that are formally completely new.However, the QSAR science still maintains a funda-mental unity, founded on the systematic use ofmathematical models and on the multivariate pointof view. At present, the QSAR science is one of thebasic tools of modern drug and pesticide design andhas an increasing role in environmental sciences.11-15

The strength of QSAR applications to toxicologyderives from the fact that it contributes to theidentification of the molecular determinants of thebiological action of the chemicals and provides math-ematical models to predict the activity of chemicalsnot tested experimentally, provided that informationon similar chemicals, acting with the same mecha-nisms, is available.

This review paper focuses first on QSAR applica-tions; however, because of the contribution that morequalitative approaches have given, especially to thefield of risk assessment, this area of research alsowill be considered. Thus, the next section (Section 2)presents the rigorous applications of QSAR conceptsto individual chemical series of mutagens and car-cinogens. All these are characterized by the genotoxicmechanism of action. Section 3 reports on the lesspopulated area of nongenotoxic carcinogens. Forthese chemicals, the QSAR applications are quitescarce. Section 4 focuses on a wide series of applica-tions of a very different nature. Some derive fromQSAR itself, other are very qualitative in nature;their practical implementations are extremely dif-ferent as well. What they have in common is thatthey are all aimed at dealing with large samples ofchemicals with different structures and action mech-anisms (noncongeneric chemicals). These are themethods most commonly used for risk assessment.

2. QSARs for Congeneric Series of ChemicalMutagens and Carcinogens

The QSAR approach, mainly in its original Hanschformulation, has been applied to numerous conge-neric classes of mutagens and carcinogens. A numberof reviews have already appeared on this subject.11,16-20

One factor that has sometimes discouraged inves-tigators from applying QSAR analysis to chemicalcarcinogenesis has been the idea that this biologicalprocess is too complicated and involves too manysteps to be successfully modeled. The complexity ofthe situation has been thoroughly discussed in thecase of skin carcinogenicity induced by PAHs.21 Inthe first step, the carcinogen must move through theskin (either through or around the cells) and into thecell where it is to be activated by the P-450 metabolicenzymes. This step is likely to be highly dependenton hydrophobicity, with little dependence on elec-tronic or steric properties. In the following step,binding to P-450 is again dependent on hydrophobic,

and possibly steric factors. The next steps, involvingoxidation of the P-450-bound carcinogen, are expectedto be sensitive to the carcinogen’s electronic and stericfactors. After activation, the product must movewithin the cell, or into a nearby cell, to the DNA: thisstep will depend on hydrophobic and steric factorssince the electronic characteristics of the diol epoxidewill determine the lifetime of this intermediate andits tendency to undergo deactivating side reactionsbefore reaching the DNA. In the generation of reac-tion products with DNA, all three physicochemicalfactors are likely to be important. Of course, reactionof the carcinogen with DNA is far from being the laststep in the carcinogenesis problem; there is thepotential for the organism to repair the DNA damageto be taken into account. Moreover, the “promotion”by other chemicals (in which pre-neoplastic cells areexpressed into individual tumor cells) and “progres-sion” (involving progress to malignancy by hysto-pathologic criteria) have to be considered.

Thus, many steps are likely to occur, with differentweights for the hydrophobic, electronic, and stericfactors. However, as it will be clear from the followingsections, a large number of successful QSAR analysesof chemical mutagens and carcinogens have beengenerated during the past 40 years of QSAR research.For a final QSAR of some value to emerge, one mustassume that only one step is rate limiting for all thecongeners under consideration, or that the differentweights of the physicochemical factors are linearlycombined. It must be assumed also that all congenersare acting via the same mechanism. At present, thisis largely unknowable and is usually dealt with bydiscarding very poorly fit congeners.21 Fortunately,it appears that the above assumptions are oftenrealistic and that QSAR analysis is able to model therate limiting step(s) of mutations and cancer inducedby various chemical classes, thus providing modelsthat are both mechanistically valid and useful formaking predictions.

The following sections will summarize QSAR analy-ses on individual classes of chemical mutagens andcarcinogens. A crucial issue that affects the qualityof the results is the quality of the experimental data.This also includes the nature of the biological end-point measured. For example, the number of existingmutagenicity tests is quite wide, but only a restrictednumber of them have been recognized as beingvalidated enough. Moreover, the measures obtainedfrom the mutagenicity tests can be both quantitative(potency of the mutagenic chemicals) and qualitative(mutagenic versus nonmutagenic chemicals). Evenmore complicated is the situation with carcinogenic-ity, in which different endpoints can be measured:overall yes/no response, response of the individualanimal experimental groups, carcinogenic potency(e.g., TD50), profile of tumors induced, etc. For adiscussion of this point, see ref 22. A related issue isthat of the availability of data from public sources(including existing websites) and how this influencesthe practical feasibility of QSAR modeling. This isdiscussed in detail in refs 23 and 24.

SAR Studies of Chemical Mutagens and Carcinogens Chemical Reviews, 2005, Vol. 105, No. 5 1769

Page 4: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

2.1. Aromatic Amines

Aromatic amines represent one of the most impor-tant classes of industrial and environmental chemi-cals. They have a wide variety of uses in manyindustries, i.e., the manufacture of polymers, rubber,agricultural chemicals, dyes and pigments, pharma-ceuticals, and photographic chemicals. Many aro-matic amines have been reported to be powerfulcarcinogens and mutagens and/or hemotoxicants.Exposure to aromatic amines occurs in differentindustrial and agricultural activities as well as intobacco smoking. Moreover, several types of aromaticamines are generated during cooking.25-29

The aromatic amines have to be metabolized toreactive electrophiles to exert their carcinogenicpotential. For aromatic amines and amides, thistypically involves an initial N-oxidation to N-hydroxyarylamines and N-hydroxyarylamides, which in ratliver is mediated primarily by cytochrome P-450isozyme c (BNF-B) and d (ISF-G).30,31 Moreover,hydroxylamino, nitro, and nitroso groups are able togenerate amine groups (due to metabolic intercon-version).

The initial activation of nitroaromatic hydrocar-bons is likewise through the formation of an N-hydroxyarylamine, a reduction catalyzed by bothmicrosomal and cytosolic enzymes.25,31 Microsomalnitroreduction too appears to depend on cytochromeP-450 complex, in particular rat liver isozymes c, d,b (PB-B) and e (PB-D). Cytosolic nitroreductaseactivity is associated with a number of enzymes,including DT-diaphorase, xanthine oxidase, aldehydeoxidase, and alcohol dehydrogenase.31 In addition tothe reactions of nitrogen oxidation and reduction(main activation pathways), certain aromatic aminesand nitroaromatic hydrocarbons are converted intoelectrophilic derivatives through ring-oxidation path-ways. N-Hydroxyarylamines, iminoquinones, and ep-oxide derivatives are directly electrophilic metabo-lites, while N-hydroxy arylamides require esterificationbefore becoming capable of reacting with DNA.32

2.1.1. QSARs for the Mutagenic Activity of the AromaticAmines

Because of their environmental and industrialimportance, the aromatic amines are the singlechemical class most studied for their ability to inducemutations and cancer. For example, in a database ofexperimental carcinogenicity results collected in ourlaboratory, about one-fourth (200 out of 800) of thechemicals were aromatic amines. Because of theshortcomings of the rodent carcinogenicity bioassay(long times, high price, sacrifice of large numbers ofanimals), the number of studies in short-term mu-tagenicity assay, notably, the Salmonella typhimu-rium (Ames test) bacterial assay, is far higher thanthat in the rodent carcinogenicity bioassay. Thisassay is also used as prescreen for rodent carcino-genicity.33-35

The large database of mutagenicity results for thearomatic amines has been studied with QSAR ap-proaches by several authors. Major reviews haveappeared on this topic.36-38

Trieff et al.39 studied the S. typhimurium mutage-nicity of 19 aromatic amines tested in the strainsTA98 (frame shift mutations) and TA100 (base pairmutations), with the addition of S9 metabolizingfraction from Aroclor 1254-induced rat liver. TheQSAR models for the two strains were

The bacterial mutagenic potency was defined as BR) 1 + NR/nmol, where NR is the net revertantnumber. The revertants are the cells that underwentmutation. The indicator variable I1 was 1.0 if theamine or acetamido group was proximal (adjacent)to the juncture (i.e., the carbon atom connecting thesubstituted ring with the rest of the molecule). I2related to whether the amine group was free (I2 ) 1)or acetylated (I2 ) 0). Equations 1 and 2 are quitesimilar and show that mutagenicity increased withlipophilicity. On the other hand, mutagenicity wasreduced when the amine or acetamido group wasortho to the juncture because of steric hindrance inits biotransformation. Mutagenic potency was alsodecreased by the acetylation of the amino group,probably because, according to the authors, the acetylgroup should be first split off prior to oxidation of theamine group. However, this interpretation may becontroversial since, at least in mammalian systems,N-acetylation, followed by N-hydroxylation, N,O-transacetylation, and departure of acetoxy group, isan established metabolic activation pathway.

Ford and Griffin40 related the mutagenicity of avariety of heteroaromatic amines present in cookedfoods with the stabilities of the corresponding nitre-nium ions. The stability of the nitrenium ions wasmeasured by the calculated energy (∆∆H) of theprocess

∆∆H was calculated using the semiempirical AM1molecular orbital procedure. The mutagenic potencies(m) in three S. typhimurium strains (TA98, TA100,and TA1538) correlated with the ∆∆H values:

log BR-TA98 ) -1.639((0.399) +0.816((0.127) log P - 0.752((0.174)I1 +

0.377((0.174)I2

s ) 0.78, n ) 19, r2 ) 0.78(1)

log BR-TA100 ) -1.559((0.282) +0.784((0.090) log P - 0.735((0.123)I1 +

0.496((0.123)I2

s ) 0.80, n ) 19, r2 ) 0.88(2)

ArNH2 + PhN+H f ArN+H + PhNH2

log(m) TA98 ) -0.181((0.043)∆∆H +0.227((0.2792)

s ) 0.966, r2 ) 0.593, n ) 14(3)

log(m) TA100 ) -0.147((0.024)∆∆H -0.1619((0.450)

s ) 0.540, r2 ) 0.770, n ) 13(4)

1770 Chemical Reviews, 2005, Vol. 105, No. 5 Benigni

Page 5: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

The above equations show that in the three Sal-monella strains, the mutagenic potency was highlycorrelated with the stability of the hypothesizedintermediate metabolite nitrenium ion.

Ford and Herman41 studied the relative energetics(∆∆H) of arylamine N-hydroxylation and N-O het-erolysis (ArNH2 f ArNHOH f ArNH+) for con-densed systems of two, three, and four rings usingthe semiempirical AM1 molecular orbital theory.Limited correlations between the energetics of nitre-nium ion formation and experimental TA98 andTA100 mutagenicities were found.

Debnath et al.42 studied a large database of chemi-cals with different basic structures (e.g., aniline,biphenyl, anthracene, pyrene, quinoline, carbazole,etc..) tested in S. typhimurium TA98 and TA100strains, with S9 metabolic activation. The mutagenicpotency is expressed as log(revertants/nmol). TheAM1 molecular orbital energies are given in eV. Themutagenic potency in TA98 + S9 was modeled by

where HOMO is the energy of the highest occupiedmolecular orbital, LUMO is the energy of the lowestunoccupied molecular orbital, IL is an indicatorvariable that assumes the value of 1 for compoundswith three or more fused rings. The electronic termsHOMO and LUMO, though statistically significant,accounted for only 4% of variance, whereas log Palone accounted for almost 50%. The most hydrophilicamines (n ) 11) could not be treated by eq 6 and weremodeled by a separate equation containing only logP (with negative sign), thus suggesting that theseamines may act by a different mechanism.

The mutagenic potency in the S. typhimuriumstrain TA100 + S9 was modeled by

Also in this case, a different equation (with nega-tive log P) was necessary for the most hydrophilicamines (n ) 6). Overall, the principal factor affectingthe relative mutagenicity of the aminoarenes wastheir hydrophobicity. Mutagenicity increased withincreasing HOMO values: this positive correlationseems reasonable since compounds with higherHOMO values are easier to oxidize and should bereadily bioactivated. For the negative correlationwith LUMO, on the other hand, no simple explana-tion could be offered by the authors. A remarkabledifference between the models for the two S. ty-phimurium strains was that the TA100 QSAR lacked

the IL term present in the TA98 model. It washypothesized that larger amines are more capable ofinducing frame shift mutations (TA98 is specific forframe shift mutations, whereas TA100 is specific forbase pair substitution mutations) and that this effectis not accounted for by the increase of log P atincreasing size of the molecules.42

The above QSARs are quite good in modelingmutagenic potency, whereas they are less satisfactorywhen one wants to predict the activity of the non-mutagenic amines. Using eqs 6 and 7, several in-active compounds are incorrectly predicted to behighly mutagenic.42 Benigni et al.43 studied thediscrimination between mutagenic and nonmutagenicamines for the same set of compounds considered byDebnath et al.42 Lipophilicity alone had no discrimi-nating power in TA98 and TA100, which is at oddswith the major role played in the modulation ofpotency within the group of active compounds. Dis-criminant functions separating mutagenic from non-mutagenic amines were based mainly on electronicand steric hindrance factors. The overall reclassifica-tion rate was around 70% accurate. In a followingpaper, Benigni et al.44 split the amines into structuralsubclasses and then applied discriminant analysisseparately to each subclass. The single ring amineswere best separated by electronic factors (first HOMOand second LUMO, in decreasing order of impor-tance) (correct reclassification rate around 70%). Thisresult confirmed the central role of metabolic trans-formation in the mutagenic activity of these chemi-cals. The diphenyl methanes were modeled by thecontribution to molar refractivity (MR) of the sub-stituents in the ortho position to the functional group,thus indicating the negative effect of steric hindranceon the accessibility of the metabolizing system (cor-rect reclassification rate: 87% for TA98; 93-100%for TA100). Steric factors, as measured by a similar-ity index, were also a key factor in the discriminationof biphenyls. The fused-rings amines were all mu-tagenic, so no discriminant model was necessary.

A mechanistically based QSAR experiment wasperformed by Glende et al.45 They synthesized anumber of alkyl-substituted (ortho to the aminofunction) derivatives of 2-aminonaphthalene, 2-ami-nofluorene, and 4-aminobiphenyl not included in theDebnath et al.42 compilation. The measured mu-tagenic activity was compared with that predictedthrough the QSARs (eqs 6 and 7) for TA98 andTA100. The mutagenicity of the ethyl-substitutedcompounds was decently predicted, whereas withgrowing steric demand of the alkyl groups (n-butyl,isopropyl), the predicted and experimental valuesdiffered considerably. The bulky alkyl substituentsdecreased the mutagenicity of the arylamines. Theauthors argued that this was due to the sterichindrance of the metabolic oxidation of the aminogroup by the enzymes. This effect was not modeledby eqs 6 and 7 since the databases used to build theequations did not include a representative sample ofthe ortho-substituted compounds.

Klopman et al.46 analyzed a set of approximately100 aromatic amines using the computer automatedstructure evaluation (CASE) software. The CASE

log(m) TA1538 ) -0.2417((0.0353)∆∆H -0.801((0.765)

s ) 0.245, r2 ) 0.922, n ) 6(5)

log TA98 ) 1.08(( 0.26) log P +1.28((0.64)HOMO - 0.73((0.41)LUMO +

1.46((0.56)IL + 7.20((5.4)

n ) 88, r ) 0.898, s ) 0.860(6)

log TA100 ) 0.92((0.23) log P +1.17((0.83)HOMO - 1.18((0.44)LUMO +

7.35((6.9)

n ) 67, r ) 0.877, s ) 0.708 (7)

SAR Studies of Chemical Mutagens and Carcinogens Chemical Reviews, 2005, Vol. 105, No. 5 1771

Page 6: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

methodology is a software that selects its descriptors(molecular substructures) automatically from a learn-ing set of molecules. It identifies single, continuousstructural fragments that are embedded in thecomplete molecule and selects those that are statisti-cally associated with activity or nonactivity or withincreasing potency. Normally, the program screensthe molecules for all the possible fragments rangingfrom 2 to 10 heavy (nonhydrogen) atoms. The pro-gram was used to examine mutagenicity in S. ty-phimurium strains TA98 and TA100 (with S9 activa-tion) and yielded a number of structural featuresassociated with mutagenicity and nonmutagenicity.This work was extended by Zhang et al.47 who studied61 heterocyclic amines formed during food prepara-tion. In both studies, the major feature leading tomutagenic activity was the aromatic amino group.Electronic parameters were also calculated, and theLUMO energy was found to correlate negatively withthe mutagenic potency of the molecules. A modelbased on a number of fragments (the amino groupin different combinations of atoms), together withLUMO attained r2 ) 0.857.

Within a research program aimed at highlightingthe structural determinants that make the chemicalsgood substrates for cytochrome P-4501 (CYP1) me-tabolism, Lewis et al.48 studied a noncongeneric setof food mutagens, the majority being heterocyclicamines (n ) 17). For the TA98 strain (frame shiftmutations) of S. typhimurium, the best correlationof mutagenicity was with molecular diameter (r )0.91), hence with planarity. For the TA100 strain(base pair mutations), the best correlation was withthe difference between the LUMO and HOMO ener-gies: high mutagenicity was related to low values ofthe difference, hence to high chemical reactivity.

Basak and Grunwald49 considered a set of 73aromatic and heteroaromatic aminesspreviously stud-ied by Debnath et al.42sand calculated a wide range(n ) 90) of topological indices. They constructed fivesimilarity spaces, based on (a) counts of atom pairs;(b) principal components (PC) from the topologicalindices; (c) PCs from topological indices plus physi-cochemical parameters; (d) PCs from physicochemicalparameters; (e) physicochemical parameters. In eachof the five similarity spaces, the mutagenic potencyof every chemical was estimated by averaging thepotency of its K-nearest neighbors (k ) 1-5). Theeasily computable method based on atom pairs wasalmost as reliable (r ) 0.77) as the similarity methodbased on physicochemical properties (r ) 0.83).

In another paper, Basak et al.50 considered a setof 127 aromatic and heteroaromatic amines (from theDebnath et al. compilation42) to discriminate betweenmutagens and nonmutagens. They computed a largerange of topological indices: (a) topostructural indices(which encode information about the adjacency anddistances of atoms irrespective of the chemical prop-erties of the atoms involved); and (b) topochemicalindices (which quantify information regarding thetopology as well as specific chemical properties of theatoms). The combinations of topochemical and topo-structural parameters had an unbalanced perfor-mance, missing many of the nonmutagens (accu-

racy: 42.9% nonmutagens; 93.4% mutagens). Log Pand the quantum chemical parameters did not con-tribute to improve the discrimination.

Hatch et al.51 studied the mutagenic potency (frame-shift mutations in TA98 or TA1538 S. typhimuriumstrains) of a series of heteroaromatic amines formedduring the cooking of the food, from two classes:aminoimidazo-azaarene (AIA) (n ) 38) and ami-nocarboline (AC) (n ) 23). For the AIA compounds,the features relevant for the mutagenic activity werenumber of fused rings, number of heteroatoms inrings 2 and 3, methyl substitution on imidazo ringnitrogen atoms, and methyl substitution on ringcarbon atoms (r2 ) 0.78). The relevant features forthe AC compounds were position of the pyridine-typenitrogen atom in ring 1, position of the exocyclicamino group in ring 1, and methyl substitution atring carbon atoms (r2 ) 0.80).

In a further analysis, Hatch et al.52 consideredseveral molecular orbital properties calculated atdifferent approximations, together with structuralfactors, for 16 AIA mutagens and their nitrenium ionmetabolites. The major findings were (1) the potencyincreased with the size of the aromatic ring system;(2) potency was enhanced by the presence of anN-methyl group; (3) introduction of additional nitro-gen atoms in pyridine, quinoline, and quinoxalinerings supported potency; (4) potency was inverselyrelated to the LUMO energy; (5) potency was directly(although weakly) related to the LUMO energy of thederived nitrenium ions; (6) the calculated thermody-namic stability of the nitrenium ions was directlycorrelated with nitrenium LUMO energy and withthe negative charge on the exocyclic nitrogen atom.The authors did not find a clear explanation for therole of LUMO energy. Hatch and Colvin53 recon-firmed the above results in a wider set of 95 aromaticand heteroaromatic amines, together with the puz-zling role of LUMO energy.

Further studies on the mutagenicity of the hetero-cyclic amines formed during meat cooking wereperformed by Felton et al.54 They studied a set of 10isomeric imidazopyridines showing a wide range ofmutagenic potency in S. typhimurium strains TA98and YG1024. Ab initio and semiempirical computa-tional quantum chemical methods were used topredict the structures, energies, and electronic prop-erties of the parent amines of these mutagens, as wellas their nitrenium, imine tautomer, and imidazolering protonated form. No explicit QSAR equation wasbuilt. Correlations were found between the dipolemoments predicted for the parent amines and thelogarithm of the potencies (r ) 0.86-0.91), whereasthere was only a borderline correlation betweenmutagenic potency and the LUMO energy of theparent amine (at odds with the strong correlationseen in other QSAR studies of heterocyclic aminemutagens). No attempt was made to measure thecorrelation of the mutagenic potency with hydropho-bicity.54

Hatch et al.55 extended their previous studies54 byconsidering the mutagenic potency (TA98 or TA1538S. typhimurium strains) of a set of 80 aromatic andheterocyclic amines, for which a wider range of

1772 Chemical Reviews, 2005, Vol. 105, No. 5 Benigni

Page 7: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

descriptors was calculated. The most significantstructural variable was the similarity to 3,4-meth-ylimidazoquinoline, with an adjusted r2 ) 0.64 nearlyequal to the adjusted r2 achievable by any multivari-ate model studied. The mutagenic potency was alsohighly correlated with the ring number, molecularweight, and volume. The calculated π-electron energyfrom Huckel theory and the ab initio LUMO energyyielded the highest correlations for single quantumchemical variables (r ) 0.757 and 0.755, respectively).It should be noticed that the π-electron energy washighly collinear with the ring number and molecularweight. Log P did not show significant correlationwith mutagenic potency.

Another reevaluation of the Debnath et al.42 datasetwas performed by Maran et al.,56 which calculated avery large set of descriptors (n ) 619) (includingvarious constitutional, geometrical, topological, elec-trostatic, and quantum chemical descriptors). A finalmodel with six descriptors was established (r2 )0.8344). The most important descriptor was thenumber of aromatic rings, followed by (in decreasingorder of importance): polarizability (second-orderhyperpolarizability), hydrogen acceptor surface area,hydrogen donors surface area, maximum total inter-action energy for the C-C bond, and maximum totalinteraction energy for a C-N bond. The authorsobserved that the leading descriptor in their model(number of rings) was approximately proportional tothe area of the hydrophobic aromatic hydrocarbonpart of these molecules and was thus directly relatedto the hydrophobicity of polycyclic and condensedaromatic compounds (correlation coefficient betweennumber of rings and log P, r2 ) 0.3715). Thiscorrelation is only weak, and the authors stressedthat they could not add log P to their model. Thereis probably a high multiple correlation between theentirety of their variables and log P, which was notinvestigated. However, the number of rings waspreferred by the authors to log P based on theargument that it is not an empirical parameter. TheHOMO ad LUMO energies did not appear in themodel.

2.1.2. QSARs for the Carcinogenic Activity of theAromatic Amines

Yuta and Jurs57 applied the automatic data analy-sis using pattern-recognition techniques (ADAPT)software to a set of 157 aromatic amines, divided intocarcinogens and noncarcinogens (including five struc-tural classes: biphenol, stilbene, azo-compounds,fluorene, methylene). Topological and geometricaldescriptors were used, and the analyses repeatedwith several pattern-recognition methods. The chemi-cals were divided into 11 possible subsets, accordingto organs and route of administration. The iterativeleast-squares program enjoyed the most success(classification rates around 90%). Overall, the num-ber of rings (related to molecular volume or bulk) wasthe descriptor most suitable to discriminate betweencarcinogens and noncarcinogens. Other importantdescriptors were related to size and shape (e.g.,smallest principal moment). Several subsets of de-scriptors supported linear discriminant functions thatcould separate carcinogens from noncarcinogens.

Loew et al.58 studied four pairs of isomeric amines.One of each pair was an active carcinogen, while theother was inactive or of doubtful activity. Mutagenicpotency data paralleled the carcinogenic activity; theweak mutagens were the inactive or more marginallyactive carcinogens. These pairs of isomers wereselected as ideal tests of the ability of calculatedelectronic parameters alone to predict relative bio-logical activity since effects such as transport andelimination should be more nearly the same for bothisomers of a given pair than for the group as a whole.Electronic reactivity parameters relevant to the rela-tive ease of metabolic transformation of each parentcompound to hydroxylamine by cytochrome P-450, aswell as to other competing metabolic products involv-ing ring epoxidation and hydroxylation, were calcu-lated. In each pair, the N atom superdelocalizabilityschosen as an indicator of the extent of formation ofhydroxylamine from parent compoundsswas largerfor the more potent mutagen/carcinogen. Moreover,the less potent isomer in each pair had the ringcarbon that was most reactive (i.e., larger values ofring carbon superdelocalizability) to direct phenolformation, which appeared to be an effective detoxi-fication pathway. Ring epoxidation (as measured byπ-bond reactivity) appeared to be more activatingthan detoxifying. In addition, two measures of cova-lent adduct formation ability of the hypothesizedintermediate reactive species (arylnitrenium ion)paralleled the biological activity within each pair(electron density on N and C6 atoms in lowest energyempty molecular orbital of the arylnitrenium ion).

The carcinogenicity of nonheterocyclic aromaticamines was been studied in two related papers. Inthe first analysis, only the carcinogenic chemicalswere considered, and the structural factors thatinfluence the gradation of carcinogenic potency inrodents were investigated.59 The second analysisdiscriminated between carcinogenic and noncarcino-genic amines.60

In59 the QSAR models for the carcinogenic potencyin rodents (BRM ) carcinogenic potency in mice; BRR) carcinogenic potency in rats) were

where BRM ) log(MW/TD50)mouse and BRR ) log(MW/TD50)rat.

BRM ) 0.88((0.27) log P*I(monoNH2) +0.29((0.20) log P*I(diNH2) +

1.38((0.76)HOMO - 1.28((0.54)LUMO -1.06((0.34)EMR2,6 - 1.10((0.80)MR3 -

0.20((0.16)ES(R) + 0.75((0.75)I(diNH2) +11.16((6.68)

n ) 37, r ) 0.907, r2 ) 0.823, s ) 0.381,F ) 16.3, P < 0.001

(8)

BRR ) 0.35((∀0.18) log P + 1.93((∀0.48)I(Bi) +1.15((0.60)I(F) -1.06((0.53)I(BiBr) +

2.75((0.64)I(RNNO) - 0.48((0.30)

n ) 41 r ) 0.933 r2 ) 0.871 s ) 0.398F ) 47.4 P < 0.001

(9)

SAR Studies of Chemical Mutagens and Carcinogens Chemical Reviews, 2005, Vol. 105, No. 5 1773

Page 8: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

TD50 is the daily dose required to halve theprobability for an experimental animal of remainingtumorless to the end of its standard life span.61

Besides log P, HOMO, and LUMO, the chemicalparameters in the equations are EMR2,6, sum of MRof substituents in the ortho positions of the anilinering; MR3, MR of substituents in the meta positionof the aniline ring; ES(R), Charton’s substituentconstant for substituents at the functional aminogroup; I(monoNH2) ) 1 for compounds with only oneamino group; I(diNH2) ) 1 for compounds with morethan one amino group; I(Bi) ) 1 for biphenyls;I(I(BiBr) ) 1 for biphenyls with a bridge between thephenyl rings; I(RNNO) ) 1 for compounds with thegroup N(Me)NO; I(F) ) 1 for aminofluorenes. Themultiplicative terms “log P*I(monoNH2)” and “logP*I(diNH2)” are aimed at giving different weights(equation coefficients) to the log P’s of amines withone or two NH2 groups, respectively.

The key factor for carcinogenic potency is hydro-phobicity (log P). Both BRM and BRR increase withincreasing hydrophobicity. In the case of BRM(mouse), the influence of hydrophobicity is strongerfor compounds with one amino group (characterizedby the indicator variable I(mono NH2)) in comparisonwith compounds with more than one amino group(characterized by the indicator variable I(di NH2))(see the different coefficients 0.88 and 0.29). ForBRM, electronic factors also play a role: potencyincreases with increasing energy of HOMO and withdecreasing energy of LUMO. Such effects seem to beless important for BRR (rat): no electronic termsoccur in eq 9. Carcinogenic potency also depends onthe type of the ring system: aminobiphenyls (indica-tor variable I(Bi)) and, in the case of BRR, alsofluorenamines (indicator variable I(F)) are intrinsi-cally more active than anilines or naphthylamines.A bridge between the rings of the biphenyls decreasespotency (I(BiBr)). Steric factors are involved in thecase of BRM but cannot be detected in the case ofBRR. BRM strongly decreases with bulk in thepositions adjacent to the functional amino group, andbulky substituents at the nitrogen and in position 3also decrease potency. The latter effects are, however,not so important. In the case of BRR, R ) (Me)NOstrongly enhances potency (compounds with thissubstituent have no measured value for BRM).

Equations 8 and 9 were derived from the analysisof the carcinogenic aromatic amines only. Whenapplied to the noncarcinogenic amines, the equationsdid not predict well their lack of carcinogenic effects(the noncarcinogens were predicted as having acertain, even though low, degree of activity). Thismeans that the molecular determinants that rule thegradation of carcinogenic potency are not the samedeterminants that make the difference betweencarcinogens and noncarcinogens. In a subsequentwork, the differences in molecular properties betweenthe two classes of carcinogenic and noncarcinogenicaromatic amines were specifically studied.60 Fourequations were derived, one for each of the experi-mental groups (rat and mouse, male and female). Thetwo classes of chemicals were coded as 1 ) inactive;2 ) active compounds.

The following discriminant equation achieves ahighly significant separation of classes for female ratcarcinogenicity:

where L(R) is the length of the substituent at theamino group; I(An) ) 1 for anilines; I(o-NH2) ) 1 ifnonsubstituted amino group occurs in the orthoposition to the functional amino group. The correctreclassification rate of discriminant function (10)amounts to 91.1% (Class 1: 93.3%; Class 2 88.5%)with a fairly stable cross-validation (all compounds:80.4%; Class 1: 93.3; Class 2: 84.6%).

For male rat carcinogenicity a good separation ofclasses is achieved by the discriminant function:

The correct reclassification rate amounts to 91.7%(Class 1: 92.9; Class 2: 90.6%) with a good resultfor cross-validation (all compounds: 83.3%; Class 1:82.1; Class 2: 84.4%).

The results for male and female resemble eachother. Of key importance for class separation areelectronic properties as expressed by HOMO andLUMO, the type of ring system, and substitution inthe ortho position as well as at the amino nitrogen.The probability of a compound to be assigned to theactive class increases with increasing values ofLUMO, decreasing values of HOMO, decreasing bulkof substituents in position 2 (ortho), decreasing length(or bulk) of substituents at the amino nitrogen, andincreasing number of aromatic rings (anilines havea distinctly lower probability of being active thanbiphenyls, fluorenes, or naphthalenes). An importantfeature promoting carcinogenic potency also is theoccurrence of an amino group in the ortho positionto the functional amino group. Of lesser importanceare the other variables.

For female mouse carcinogenicity, the followingdiscriminant function reclassifies 85.7% of the com-pounds correctly (Class 1: 87.9%; Class 2: 83.3%)and is of acceptable stability in cross-validation (allcompounds: 81.0%; Class 1: 84.8%; Class 2: 76.7%):

w ) 0.65 L(R) + 0.79 HOMO - 1.54 LUMO +0.76 MR2 - 0.50 MR5 + 1.32 I(An) - 0.53 I(o-NH2) + 0.99 I(BiBr) + 0.99 I(diNH2) -

1.08 log P*I(diNH2) (10)

w(mean,class1) ) 1.05, N1 ) 30

w(mean,class2) ) -1.21, N2 ) 26

w ) 0.48L(R) + 0.90HOMO - 1.43LUMO +0.72MR2 + 1.13I(An) - 0.54I(o-NH2) -0.45MR5 + 0.70I(diNH2) - 0.80 log P*I

(diNH2) + 0.65I(BiBr) (11)

w(mean,class1) ) 1.15, N1 ) 28

w(mean,class2) ) -1.01, N2 ) 32

1774 Chemical Reviews, 2005, Vol. 105, No. 5 Benigni

Page 9: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

where I(NR) ) 1 if the amino nitrogen is substituted.For male mouse, the discriminant function ob-

tained is

where the Sterimol parameter B514 is the maximal

width of the substituent at the amino group.Discriminant function (13) shows a good reclas-

sification rate (all compounds: 89.8%; Class 1: 96.0%;Class 2: 83.3%) and stability in cross-validation (allcompounds: 83.7%; Class 1: 96.0%; Class 2: 70.8%).

The results for the mouse resemble that for the ratfor the key importance of the substitution at theamino nitrogen and at the ortho position, as well asthe type of ring system. The male mouse model (eq13) also contains HOMO and LUMO, like the twomodels for the rat (eqs 10 and 11). Only eq 12 isdifferent for the absence of electronic properties.

The comparison of the models for the carcinogenicpotency (eqs 8 and 9) with the models for thediscrimination between carcinogens and noncarcino-gens (eqs 10-13) shows that the key factors dif-ferentiating between active and inactive compoundson one hand and governing potency within the groupof active compounds are different. The most pro-nounced differences are with respect to the impor-tance of hydrophobicity (crucial for potency andminor for yes/no) and the directionality of electroniceffects (while the degree of carcinogenic potency inmice increases with increasing values of HOMO anddecreasing values of LUMO, the reverse is true forthe probability of a compound of belonging to theclass of carcinogens).60 Interestingly, the mutagenicproperties of the aromatic amines also pointed to asimilar picture: the patterns of molecular determi-nants for the potency and the yes/no activity weredifferent and were analogous to those found for therodent carcinogenicity.44

Aromatic amines were included in two datasets ofnoncongeneric chemicals, which were studied withtheoretical descriptors using artificial neural net-works. The models devised by Vracko62 were able todescribe the training set, but their prediction abilityof carcinogenic potency (TD50) was limited. Gini etal.63 performed a retrospective study on 104 N-containing benzene derivatives that resulted in quitea good correlation after removal of several outliers.

2.1.3. Conclusions on the Aromatic Amines

In the most mechanistically oriented QSAR analy-ses, the toxic activity of the amines was demonstratedto correlate with the ease of formation of the N-hydroxylamine,58 with the stability of the nitreniumion,41,52 and with the ease of formation of epoxideson the aromatic ring.58 Loew et al. also found thatthe ease of formation of phenols (a detoxifyingpathway) is actually negatively correlated with thecarcinogenic activity.58

Various studies39,42,59 pointed to the central role ofhydrophobicity in the modulation of the mutagenicand carcinogenic potency of the aromatic amines. Anexception is represented by the results of Hatch etal.55 on a specific set of aromatic and heterocyclicamines.

Regarding the electronic descriptors, HOMO andLUMO energies were found to have a role both forthe mutagenicity in S. typhimurium42,44,47,48,52 and thecarcinogenicity in rodents.59;60 The role of HOMOenergy can be rationalized in terms of propensity ofthe toxic amines to form the intermediate metabolitehydroxylamine. The role of the LUMO energy is quitepuzzling. The two terms LUMO and HOMO may belinked together through the concept of “hardness” [η) (LUMO-HOMO)/2] as a measure of chemicalreactivity; or LUMO energy may account for thereduction of the nitro group present, together withthe amino group, in a number of amines.42 However,a LUMO term appears also in datasets withoutnitroarenes.47,52 King at al.64 found a new enzymaticmechanism of carcinogen detoxification: a microso-mal NADH-dependent reductase that rapidly con-verts the N-hydroxy arylamine back to the parentcompound. Within this perspective, a low LUMOenergy could favor the detoxification. However, theLUMO energy of the metabolite is not necessarilycoincident with that of the parent amine; thus theentire matter needs further clarification.

Regarding steric effects, bulky substituents at thenitrogen of the amino group generally inhibit bio-activation (see the Es(R) contribution in eq 859 andthe inhibiting effect of the acetylation of the aminogroup found by Trieff et al.39). Moreover, in all fourequations (eqs 10-13) that model the separationbetween carcinogenic and noncarcinogenic aminesthe first term indicates that the probability of theamines of being noncarcinogenic increases with in-creasing length of the substituent (L(R)) or simplywith the presence of a substituent (I(NR)) at theamino nitrogen.60 A general finding is that theactivity decreases with steric bulk in ortho to theamino function. This is consistent with the decreasein mutagenic potency found by Trieff et al.,39 thedecrease in carcinogenic potency in mouse (eq 859),the decreased probability of the amines of beingcarcinogenic in rat (eqs 10 and 1160), and the de-creased probability of the subclass of diphenyl meth-anes of being mutagenic in S. typhimurium.44 Thesefindings are in line with the observation of Glendeet al.45 that bulky alkyl substituents ortho to theamino group decreased the mutagenicity of thearylamines. The mechanistic rationale for these

w ) -0.47I(NR) + 1.38 log P*I(monoNH2) +1.68 log P*I(diNH2) - 0.37I(An) + 0.33I

(o-NH2) - 0.55MR5 - 0.45I(BiBr) (12)

w(mean,class1) ) -0.92, N1 ) 33

w(mean,class2) ) 1.01, N2 ) 30

w ) -1.96L(R) + 1.69B5(R) - 0.83HOMO +0.97LUMO -1.22I(An) + 0.73I(o-NH2) +0.59MR3 + 0.69MR5 + 0.77MR6 - 0.76I

(diNH2) + 1.09 log P*I(diNH2) - 0.79I(BiBr) (13)

w(mean,class1) ) -1.11, N1 ) 25

w(mean,class2) ) 1.16, N2 ) 24

SAR Studies of Chemical Mutagens and Carcinogens Chemical Reviews, 2005, Vol. 105, No. 5 1775

Page 10: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

observations is that steric bulk prevents enzymaticaccess to the nitrogen and formation of the reactiveintermediate.

Several authors used topological or substructuralparameters, as well as indicator variables. A findingcommon to many of them is the correlation betweenactivity and/or potency, and number of (fused) aro-matic rings.42,48,51,52,55-57,60 This can be interpreted indifferent ways: (a) indicator for the planar systemsapt to induce frameshift mutations in TA98 S.typhimurium strain; (b) indicator for the hydropho-bicity of polycyclic and condensed aromatic rings; (c)indicator for the presence of extended conjugatedsystems that favor the formation of reactive inter-mediates. Debnath et al.42 showed that, beside logP, an additional contribution to the mutagenic po-tency in TA98 was given by the presence of three ormore fused rings. This effect was absent in TA100strain and was related to the specificity of TA98 forframeshift mutations.

Carcinogenic potency also depends on the type ofthe ring system: aminobiphenyls (and, in the caseof the rat, also fluorenamines) are intrinsically moreactive than anilines or naphthylamines. A bridgebetween the rings of the biphenyls decreases carci-nogenic potency,59 as well as the probability of beingcarcinogenic.60

An important finding is that the models andmolecular determinants for the mutagenic and car-cinogenic potency of the aromatic amines are differ-ent from those relevant to the separation betweenactive and inactive compounds.43,60 In other words,the Hansch equations permit the recognition ofstrong carcinogens and the estimation of the grada-tion of potency within active compounds but cannotseparate weak carcinogens from inactive compounds.A similar concept applies to the mutagenic activity.

2.2. Nitroaromatic CompoundsThe concern for the mutagenicity and carcinogenic-

ity shown by many nitroaromatic compounds derivesfrom the fact that they have long been of importanceas intermediates in the synthesis of several kinds ofindustrial chemicals, as well as present in matricesof environmental importance (e.g., automobile ex-haust fumes).65 Nitroaromatics are also used for theirantibacterial activity and in chemotherapy.66 Thenitroarenes appear to induce mutations and cancerthrough a mechanism similar to that of the aromaticamines. Both compounds are believed to be biochemi-cally transformed to a common hydroxylamine in-termediate, which is then activated to give an elec-trophilic nitrogen species. The difference is that theamines are oxidized by P-450 enzymes, whereas thenitroarenes are reduced to the critical hydroxylamineby cytosolic reductases. The reductases are presentalso in bacteria, whereas, in the case of bacterial testsof aromatic amines, the P-450 enzymes have to beadded to the experimental mixture (S9 fraction).67

2.2.1. QSARs for the NitroarenesBecause of the role of nitroheterocyclic compounds

in chemotherapy, Biagi et al.66 studied the relation-ship between the mutagenic activity of 17 nitroimi-

dazo(2,1-b) thiazoles and their lipophilic character.The mutagenic activity was measured by means ofthe Ames test with S. typhimurium TA100 strain.

Lipophilicity was expressed in two ways: by chro-matographic retention time (Rm) (eq 14) and sum oflogP contributions (∑π) (eq 15). The alternativeequations obtained were

The mutagenic potency was expressed as micro-molar concentration increasing revertants by fivetimes (C).

Equation 14 had a much better fit than eq 15. Theauthors argued that the experimental Rm contributedwith both lipophilic (cell permeation) and polar (drugreceptor binding) information, and both were neces-sary for describing the action mechanism.

Walsh and Claxton68 studied a large set of ni-troarenes (114 nitrogenous cyclic compounds withdifferent basic structures) tested for their mutagenicactivity in S. typhimurium (strains TA98, TA100,TA1535, TA1537, TA1538) with the ADAPT software(using pattern-recognition techniques).69

The analysis was aimed at modeling the yes/noresponse (instead of the mutagenic potency of theactive chemicals): out of the 114 substances, 64 weremutagenic and 50 were nonmutagenic.

Out of more than 100 molecular descriptors (topo-logical, electronic, geometrical, and physicochemical),19 were left after statistical feature selection.

The obtained QSAR classified correctly 96% ofcompounds. In a subsequent analysis, 109 compoundswere utilized for the learning set, and 5 for the testset, obtaining 89% of correct classification. As afurther validation test, 10 external similar com-pounds were selected from the literature: the QSARmodel obtained 100% correct prediction.

The parameters included in the QSAR model wererelated to size and branching of the molecules.

Maynard et al.70 studied the relationship of themutagenicity of 20 nitrated PAHs (in S. typhimuriumTA98, TA100, TA1537, TA1538 strains) with theirelectron affinities (LUMO energies calculated by theSTO-3G method).

The equations relative to the strains were

log 1/C ) +10.955((1395) Rm -

3.721((0.481) Rm2 - 8.140((0.889)

n ) 17, r ) 0.903, s ) 0.464, F ) 30.85(14)

log 1/C ) +3.507((1.278)∑π -1.236((0.479)(∑π)2 - 3.012((0.656)

n ) 17, r ) 0.598, s ) 864, F ) 3.90(15)

TA98 ) -64.892LUMO + 11.559

n ) 20, r ) -0.82, F ) 35.9(16)

TA100 ) -46.342LUMO + 8.583

n ) 20, r ) -0.75, F ) 20.5(17)

TA1537 ) -73.702LUMO + 11.689

n ) 20, r ) -0.88, F ) 38.1(18)

1776 Chemical Reviews, 2005, Vol. 105, No. 5 Benigni

Page 11: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

The mutagenic potency was expressed as number ofrevertants/nmol of mutagen.

The authors remarked that the studied nitroarenesexhibit high assay variability under identical experi-mental conditions and significant interlaboratory andintralaboratory mutagenic variability, thus explain-ing the limited r values obtained (especially forTA100). This may be related to the concomitantantibacterial activity of nitroarenes.

The authors concluded that their results indicatedthat the dominant metabolic pathway of the mutage-nicity involves a reduction of the nitro groups to givehydroxylamines and an ultimate conversion to elec-trophilic intermediates (arylnitrenium ions) whichinteract with the key tissue macromolecules. More-over, the existence of a positive electrostatic regionabove and below the C-NO2 bond is confirmed bythe increasing LUMO energies with nitro rotationrelative to the aromatic ring system. This situationfacilitates the reduction process, i.e., it is a site fornucleophilic reception.

The data considered by Maynard et al.70 werereexamined by Compadre et al.,71 which included alsolipophilicity (logP) in the analysis. This was the firstof a series of analyses performed in the laboratory ofHansch on the nitroarenes.

Adding the hydrophobic parameter, they obtainedthe equations:

TA100 and TA98 are revertants/nmol of mutagen.LUMO, expressed in electronvolts, is calculated usingthe MNDO method.

It appears that a lipophilicity parameter improvedremarkably the quality of the QSARs.

The demonstration, provided with the above work,that lipophilicity was essential to model the mutage-nicity of nitroarenes stimulated Compadre et al.65 tostudy the mutagenicity in TA100 and TA98 S. typh-imurium strains for a wider set of fluorenes deriva-tives and other nitroarenes.

The models for the two strains were

TA100 and TA98 are revertants/nmol of mutagen.

Equations 22 and 23 confirmed that the mutage-nicity of the nitro aromatic compounds is significantlyinfluenced by their hydrophobic character, in additionto their electronic properties. Higher mutagenicactivity is associated with lower values of LUMO, i.e.,better electron acceptors, which is consistent with anincreasing ease of nitroreduction.

Interestingly, the authors compared the abovemodels with the QSARs for the reduction of nitroben-zenes to hydroxylamines by xanthine oxidase. Thehydrophobic and steric effects of substituents wereabsent in the reduction of the substances. Thus, itcould be that the logP term mainly reflects theinteraction of the nitroarenes and its metaboliteswith lipophilic components of the cell and/or interac-tion with other enzymes.

After the above paper,65 in the Hansch laboratorythe databases for TA98 and TA100 strains of S.typhimurium were expanded, and new QSAR analy-ses were developed on larger numbers of chemicals,belonging to wider ranges of basic structures. Theresults are in the two following papers.

Debnath et al.72 derived a QSAR for 188 aromaticand heteroaromatic nitro compounds tested in S.typhimurium strain TA98.

The model developed was

TA98 is the mutagenic activity in revertants/nmol.I1 is an indicator variable, set equal to 1 for com-pounds with three or more fused rings and to 0 whentwo or fewer rings are present. Ia is set equal to 1 forfive substances of the set that are much less activethan expected. LUMO was calculated with the AM1method.

Hydrophobicity is a major factor in mutagenicactivity of aromatic nitro compounds: in this dataset,logP follows a bilinear relationship. The negativecoefficient of LUMO is explained by the initialreduction of the nitro group as a rate-limiting stepin nitroarene activaction: substances with lowerLUMO energies are supposed to be reduced moreeasily by cytosolic nitroreductases. Electron-attract-ing elements conjugated with the nitro group en-hance mutagenicity. The positive coefficient of I1

means that large-ring compounds are more activethan expected from logP and LUMO alone.

A new QSAR for the mutagenicity of 117 nitroare-nes in S. typhimurium TA100 strain, without meta-bolic activation, was developed as well.67

TA1538 ) -62.662LUMO + 11.008

n ) 20, r ) -0.86, F ) 39.6(19)

log TA100 ) -1.91LUMO + 1.30 log P - 6.59

n ) 18, r ) 0.874, s ) 0.783(20)

log TA98 ) -2.82LUMO + 1.48 log P - 8.59

n ) 20, r ) 0.926, s ) 0.737(21)

log TA100 ) 1.36((0.20) log P -1.98((0.39)LUMO - 7.01((1.2)

n ) 47, r ) 0.911, s ) 0.737, F1,44 ) 99.9(22)

log TA98 ) -2.29((0.41)LUMO +1.62((0.28) log P -

4.21 ((0.80) log(â 10logP + 1) - 7.74((1.4)

n ) 66, r ) 0.886, s ) 0.750,F2,61 ) 54.3, optimum log P ) 4.86,

log â ) -5.06 (4.60-5.06)

(23)

log TA98 ) 0.65((0.16) log P -2.90((0.59)log(â 10logP + 1) -

1.38((0.25)LUMO + 1.88((0.39)I1 -2.89((0.81)Ia - 4.15((0.58)

n ) 188, r ) 0.900, s ) 0.886,log P0 ) 4.93, log â ) 5.48, F1,181 ) 48.6

(24)

SAR Studies of Chemical Mutagens and Carcinogens Chemical Reviews, 2005, Vol. 105, No. 5 1777

Page 12: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

The following relationship was obtained:

Ia is an indicator variable, set 1 for compounds wherethe acenthrylene ring is present; Iind is anotherindicator variable, set 1 for the 1- and 2-methylin-dazole derivatives (six compounds).

Hydrophobicity plays a crucial role in determiningthe relative mutagenicity in most systems; hence,hydrophobicity alone can make the difference be-tween an inactive and a highly mutagenic compound.As shown by the bilinear dependence of the activityon log P, the nitroarene mutagenicity increasesslowly at low log P and then decreases more rapidlyat high log P, probably because of a combination ofadverse hydrophobic and steric effects. In addition,the activity increases as -LUMO (for more easilyreduced compounds). The mutagenic activity in TA100does not depend on the size of the aromatic ringsystems, whereas it does in TA98 (see eq 24).

Benigni et al.43 considered the ability of eqs 24 and25 to model the lack of activity of the nonmutagenicnitroarenes. The above equations, in fact, werederived from the mutagenic potency values of onlyactive nitroarenes. It appeared that only a fewnonmutagenic nitroarenes were predicted to havevery low potency, whereas the predicted potency formost of them spanned a large range of values, up tovery high potency values. The subsequent analysisshowed that this depended on the fact that log P,which was the major determinant of the mutagenicpotency, had no discriminant power for separatingthe active from the inactive nitroarenes. On thecontrary, LUMO and MR had discriminant power,even though the models obtained were far from beingsatisfactory. The general lesson was that the rate-limiting steps for determining yes/no activity andpotency may be different and require different de-scriptors to be modeled.

Debnath and Hansch73 expanded the modeling ofthe mutagenicity of the nitroarenes by consideringdata obtained with the SOS chromotest in E. coliPQ37, relative to 23 polycyclic aromatic nitro com-pounds.

The following relationships were found:

Genotoxicity was expressed as SOS induction poten-tial values (SOSIP), obtained without activation byS9, from a plot of the induction factor (IF) vs nmol ofcompound.

Equation 26 indicates that the mutagenicity ofthese chemicals is related to the hydrophobicity

(main determinant) and to LUMO, like in S. typh-imurium. Since there is no indicator variable forcongeners with three fused rings and more, the SOSsystem does resemble the TA100 S. typhimuriumstrain and not the TA98 strain.

A special class of nitroaromatic compounds are thenitrofurans. These are used as antimicrobial agentsin human and veterinary medicine. Although eq 24(QSAR for the mutagenicity in TA98) applies to alarge array of heterocycles, it did not fit the 2-ni-troarenes.74 Debnath et al.74 further investigated theapplicability of traditional QSAR and comparativemolecular field analysis (CoMFA) to predict thegenotoxicity of nitrofuran derivatives and to proposea possible mechanism for their unusual genotoxicbehavior.

The best model for the genotoxic activity in theSOS chromotest was

SOSIP is the SOS induction potential in E. coliPQ37. The electronic descriptor qc2 is the partialatomic charge on the carbon attached to the nitrogroup (calculated by AM1 method). MR is molarrefractivity. Isat is an indicator variable equal to 1 forsaturated ring compounds. I5,6 is also an indicatorvariable equal to 1 for compounds with substituentsat the 5- or 6- position of 2-nitronaphthofurans andpyrenofurans. The two latter variables account forsteric effects.

The partial atomic charge on the carbon attachedto the nitro group is the single most importantvariable; the second one is log P whose coefficientnear 1 is in line with the results obtained for a varietyof compounds in the mutagenicity assays. MR, ameasure of substituent bulk, applies only to substit-uents adjacent to the nitro group: its negativecoefficient points to the detrimental effect of suchsubstituents.

Subsets of chemicals were also tested in S. ty-phimurium TA100 and TA98 strains, for which thefollowing equations were found:

Equation 29 is different from eq 28, and it seemsthat log P does not affect the mutagenicity of chemi-cals on the TA98 strain, in contrast to previousresults. However, there is a strong collinearity be-tween several variables in the TA98 dataset, so that

log TA100 ) 1.20((0.15) log P -3.40((0.74) log(â 10logP + 1) -

2.05((0.32)LUMO - 3.50((0.82)Ia +1.86((0.74)Iind - 6.39((0.73)

(25)n ) 117, r ) 0.886, s ) 0.835,

log P0 ) 5.44((0.24), log â ) -5.7, F1,110 ) 24.7

log SOSIP ) 1.07((0.36) log P -1.57((0.57)LUMO - 6.41((1.8)

n ) 15, r ) 0.922, s ) 0.534, F1,12 ) 36.21 (26)

log SOSIP) -33.1((11.9)qc2 +1.00((0.26) log P - 1.50((0.49)Isat -

1.19((0.49)MR - 0.76((0.49)I5,6 - 3.76((1.56)

n ) 40, r ) 0.900, s ) 0.475, F1,34 ) 9.76(27)

log TA100 ) 1.15((0.65) log P - 57.6((22)qc2 -1.46((0.86)Isat - 1.57((0.98)I5,6 - 6.32((3.1)

n ) 20, r ) 0.881, s ) 0.626, F4,15 ) 51.81(28)

log TA98 ) -18.1((15.6)qc2 - 1.20((0.52)Isat +2.15((0.84)IL + 0.29((1.7)

n ) 22, r ) 0.894, s ) 0.490, F3,22 ) 71.74(29)

1778 Chemical Reviews, 2005, Vol. 105, No. 5 Benigni

Page 13: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

it is impossible to obtain a clear understanding of theQSAR for TA98.

In the same paper,74 the SOSIP data were furtheranalyzed with CoMFA, a procedure that providesmeans to assess specifically steric factors.75 For theCoMFA analysis, the CH3 probe in combination withH+ probe was chosen, and the genotoxicity waspredicted for a final set of 44 compounds (including6 compounds initially omitted) by means of thefollowing relationship:

Press is the standard deviation from the leave-one-out jacknife cross-validation. The parameters in eq30 are latent variables obtained from the mathemati-cal reduction of the interaction energies between theprobes and the molecule, at the different grid points(for more technical details, see ref 75).

The CoMFA analysis reveals that combination ofsteric and electrostatic probes explains a majority ofthe variance in the data.

Overall, (a) the SOS E. coli system resembles theTA100 and not the TA98 system; (b) a high electrondensity on the carbon in position 2 promotes mu-tagenicity, which seems counterintuitive. This “aber-rant” behavior of the nitrofurans may be related tothe unusual ring opening which these compounds aresusceptible to upon reduction; (c) a coefficient of about1 for log P is common to a variety of classes regardingmutagenicity; (d) three types of steric effects wereuncovered. The negative Isat for ring saturationsuggests that planar rings are important. The MRterm shows that a bulky group adjacent to the nitrogroup has a deleterious effect. The negative I5,6 showsthat bulk in this region decreases potency; (e) CoMFAconfirms the importance of an electronic factor,although not in a way that enables mechanisticdiscussion.

The CoMFA model was used by other investigatorsto further elucidate the steric factors in the mutage-nicity of nitroaromatic compounds in S. typhimurium.Caliendo et al.76 provided details on the steric probe-ligand ineraction energies and found evidence of theimportance of the overall lipophilicity and of LUMOenergy in the mutagenic acitvity in S. typhimuriumTA98 strain. Fan et al.77 compared the CoMFAmodels for the S. typhimurium strains TA98 andTA100 (see also a review in ref 78). The molecularareas of nitroaromatic chemicals found to be associ-ated with mutagenicity were the high electronicdensity regions equivalent to C4-C5 in the pyrenering and an electron-deficient site equivalent to C6.The authors argued that the electron-deficient areasmay be associated with the nitroreductive bioactiva-tion of nitroaromatics, whereas the electron-rich sites

may be involved with oxidative mechanisms, alsopresent in the bioactivation pathway of PAHs. Stericfactors were more important for TA98 than for TA100strain: increasing bulk perpendicular to the aromaticplane would decrease mutagenicity, and increasingthe aromatic ring system along the C6-C7 region inI-nitropyrene would increase mutagenicity.

The structural basis of the genotoxicity of nitro-furans was also investigated with the CASE approachby Mersch-Sundermann et al.79 Genotoxicity wasexamined with the SOS chromotest that measuresthe potency of a compound to induce the expressionof the sfiA gene in E. coli PQ37 (DNA damage).

The model found was

where n1B1 is the QSAR activity value of eachbiophore B1 of the molecule multiplied by the numberof occurrences; and n2B2 is the QSAR activity valueof each biophobe B2 (inactivating fragment) of themolecule multiplied by the number of occurrences.

Out of nine major activating structural fragments(biophores), the most important was the nitro groupat position 2 of the furan ring. In addition, anincrease in genotoxicity can be expected as a resultof the addition of one, two, or more aromatic rings tothe 2-nitrofuran structure, and log P is an importantdescriptor for genotoxic potency in E. coli PQ37.

CASE correctly predicted the probability of geno-toxicity of all active and inactive compounds (94% ofall predicted results were within ( 1 order of mag-nitude of the experimental value) and correctlypredicted the probability of sfiA induction of 95.8%of 24 unknown nitroarenofurans (r ) 0.88-0.97) usedas validation set.

2.2.2. Conclusions on the QSARs for the NitroarenesOverall, the QSAR studies on the mutagenicity of

the nitroaromatics provide a coherent picture. Acommon pattern is the presence of hydrophobicityand LUMO energy in most equations. The LUMOenergy term indicates that the lower the energy ofthe lowest unoccupied molecular orbital (i.e., themore readily it can accept electrons), the more potentthe mutagen. This suggests that the electronic effectis associated with the reduction step commonlyaccepted as crucial in the biotransformation of thesecompounds by the nitroreductases. Interestingly, theQSARs for the reduction of nitrobenzenes to hy-droxylamines by xanthine oxidase65 did not includeterms for the hydrophobic (and steric) effects. Thus,it could be that the log P term mainly reflects theinteraction of the nitroarenes and its metaboliteswith lipophilic components of the cell and/or interac-tion with other enzymes.

The steric effects were also subjected to extendedinvestigations. Generally speaking, a major differencebetween the TA98 and TA100 strains of S. typhimu-rium was that steric effects were more effective forTA98. It also appeared that the SOS E. coli systemresembles the TA100 and not the TA98 system.74 Asdiscussed in ref 43, this difference can be related to

log SOSIP ) 0.080((0.003)Z1CH3,H+ +0.084((0.005)Z2CH3,H+ +0.056((0.006)Z3CH3,H+ +0.045((0.007)Z4CH3,H+ +

0.020((0.006)Z5CH3,H+ + 3.572((0.030)

(30)n ) 44, r ) 0.981, s ) 0.202, F ) 195,

P ) 0.0001, press s ) 0.413

activity (logarithmic CASE units, LCU) )-12.8 + n1B1 - n2B2 + 4.539 log P (31)

SAR Studies of Chemical Mutagens and Carcinogens Chemical Reviews, 2005, Vol. 105, No. 5 1779

Page 14: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

the influence that the size of mutagens have in theinduction of frame shift mutations (specific of TA98,and not of TA100).

Regarding the steric effects, interesting resultswere found in the context of investigations aimed atreducing the mutagenicity of nitroaromatics.80,81 De-spite the large range of structures used to developeqs 24 and 25, essentially variations of the aromaticring system were covered. Substituent effects werepoorly represented in the dataset, and the authorsstated that “it is important to investigate the hydro-phobic effect of aliphatic side chains on the mutage-nicity to see if it parallels that of the flat aromaticsystems.”72 In a first paper, Klein et al.80 studied theeffect of bulky alkyl substitutents ortho to the nitrogroup and in the 2′-position on the mutagenic activityof 4-nitrobiphenyl in TA98 and TA100 strains. Theexperimental mutagenicities were compared withthose predicted by eqs 24 and 25 and results showedthat the nitrocompounds most sterically hinderedwere remarkably less mutagenic than predicted. Thisreduction in mutagenicity was attributed to purelysteric effects. In the case of substituents ortho to thenitro group, the reduction was correlated with devia-tions from the coplanar orientation of the nitro grouprelative to the aromatic plane. In the case of alkylgroups in 2′-position, the coplanarity of the nitrogroup was not affected, but the twisting of the twoaromatic rings was associated with a less effectivecharge delocalization of the nitrenium ion. In asecond paper,81 the authors studied the influence ofbulky alkyl substituents far from the nitro group (4′-position in nitrobiphenyls, 7-position in 2-nitrofluo-renes). The mutagenicity of all compounds decreasedwith increasing steric demand of the attached alkylgroups, to an extent greater than predicted based ontheir contributions to log P or LUMO (see eqs 24 and15). The authors hypothesized that changes of themolecular shape from planar to nonplanar may beresponsible for this effect by interfering with anefficient intercalation into DNA.

The evidence from the two latter papers haspractical importance since it provides a recipe forsynthesizing nontoxic nitro compounds. In addition,its theoretical importance should be remarked aswell. The discrepancy between actual mutagenicitypotency and that predicted by the QSAR modelspoints to the intrinsic difficulty of providing of anexhaustive definition of a chemical series (congeners).Even in cases in which the database consists of largenumbers of well-selected compounds (between 100and 200 chemicals for eqs 24 and 25), specificstructural changes (e.g., the bulky alkyl substituentsconsidered in refs 80 and 81) may radically changethe underlying chemistry and require substantialmodifications of the QSAR models. Whereas severalpapers point to statistics as the most important checkof the validity of a QSAR models, the above evidencestrongly reaffirms the predominance of the chemicalknowledge in the process of generating valid models.

2.3. N-Nitroso CompoundsA very large number of N-nitrosamines are known

to be carcinogenic in various animal species. Human

exposure to preformed N-nitrosamines occurs throughthe diet, in certain occupational settings (e.g., in therubber industry), and through the use of tobaccoproducts, cosmetics, pharmaceuticals, and agricul-tural chemicals. In addition, N-nitrosamines aregenerated in the body by nitrosation of amines (viaacid- or bacterial-catalyzed reaction with nitrite) orby reaction with products of nitric oxide generatedduring inflammation or infection. N-Nitrosamines areactivated by cytochrome P-450 to the ultimate mu-tagens and carcinogens.82

2.3.1. QSARs for the Mutagenicity of N-Nitrosamines

Singer et al.83 studied the mutagenicity of a set ofN-nitroso-N-benzylmethylamines in S. typhimuriumTA1535. The chemicals were metabolically activatedwith the S9 fraction. The physicochemical parametersscreened were the Hammett σ constant, Hansch’s π,MR, and two molecular connectivity indices. Theresulting QSAR model was

where C is the molar concentration of nitrosamineinducing 50 revertants/plate.

The authors did not find any involvement ofhydrophobicity (π) in the mutagenicity. The quadraticterm of σ takes into account the fact that mutage-nicity in this dataset first increases with increasingvalues of σ and then decreases. According to theauthors, eq 32 indicates that substituents that areelectron withdrawing and therefore tend to polarizeand weaken the methylene C-H bond result ingreater mutagenicity. The rupture of the C-H bondis considered to be a rate-limiting step in the meta-bolic process, which initiates with the hydroxylationat the R-carbon and subsequent conversion to morereactive species. The presence of the connectivityterm 3øPv in eq 32 indicates that activity is dependenton overall shape and volume.

Hansch and Leo11 reconsidered the dataset studiedby Singer et al.83 and replaced the experimental σvalues with ordinary σ values from the benzenesystem. The new equation was

According to the authors, eq 33 is better than eq32. One reason is that eq 33 has a slightly better fitthan eq 32 and contains only two terms. Moreimportantly, one normally finds hydrophobicity to bea parameter of importance in mutagenicity, whendirect alkylation of DNA is not involved. Moreover,evidence is accumulating that the coefficient ofhydrophobicity is near 1.0. All this information pointsto eq 33 as the best rationalization of the data.Overall, eq 33 says that hydrophobic, electron-releas-

log(1/C) ) 3.55((1.40)σ - 3.88((1.85)σ2 +1.62((0.71)3øPv - 5.11

n ) 13, r2 ) 0.761, p ) 0.004(32)

log(1/C) ) 0.92((0.43)π + 2.08((0.88)σ -3.26((0.44)

n ) 12, r2 ) 0.794. s ) 0.314(33)

1780 Chemical Reviews, 2005, Vol. 105, No. 5 Benigni

Page 15: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

ing substituents on the ring of N-nitroso-N-benzyl-methylamines increase mutagenicity.

2.3.2. QSARs for the Carcinogenicity of N-NitrosoCompounds

The SIMCA pattern-recognition method was usedby Dunn III and Wold84 to classify 45 N-nitrosocompounds as active or nonactive, with respect to ratcarcinogenicity. The set included nitrosoamines, N-nitrosoureas, and N-nitrosourethans. The structuraldescription of the substituents consisted of six pa-rameters: the Rekker’s lipohilicity constant, σ* andEs Taft’s parameters, MR, L and B4 Verloop’s stericconstants.

Initially, all the active compounds were examinedtogether as the training set and the inactive anduntested compounds as the test set. No model wasachieved. Then, the active chemicals were dividedinto subclasses and used as the training set.

The compounds requiring metabolic activationwere split in two classes:

Class 1: dialkylnitrosoamines with electronicallyneutral and/or electron-donating substituents on theamine nitrogen (27 compounds);

Class 2: compounds that putatively do not requiremetabolic activation (9 compounds);

Class 3: dialkylnitrosoamines which have at leastone rather strongly electron-withdrawing substituenton the amine nitrogen (14 compounds).

The SIMCA analysis procedure was applied sepa-rately to the three classes of chemicals, giving thefollowing results:

Class 1: 23 out 27 chemicals were correctly classi-fied with a two-components model;

Class 2: 7 out 9 with a one-component model;Class 3: all 14 substances were correctly classified

with a three-component model.A total of 44 out 50 (88%) of the active compounds

were correctly classified, with all 12 variables rel-evant for the description of the classes.

The test set consisted of three compounds (twononactive, one active); two out three were correctlypredicted with one false positive.

2.3.3. Carcinogenicity of N-Nitrosamines

Frecer and Miertus85 performed a theoretical in-vestigation of the putatively crucial steps in theactivation process, namely, the initial enzymatic CR-oxidation (CR is the carbon atom adjacent to theaminic nitrogen, Na), the amine nitrogen hydroxyla-tion or the N-dealkylation. Also the transport proper-ties of the studied compounds and their metaboliteswere considered.

The study of the molecular reactivity and transportproperties of the parent compounds was modeled by

where D50 is the mean lethal dose (mol/Kg), used asmeasure of the experimental carcinogenic potency in

rat; ∆Gcoulw,o is the difference in Coulombic contribu-

tions to the Gibbs solvation energies in water andoctanol (kcal/mol); ∆Gd,r

w,o is the difference in disper-sion-repulsion contributions; and ∆Gcav

w,o is the dif-ference in cavitation contributions.

From eq 34, it appears that the initial transport ofthe parent molecules to the site of biodegradationinfluences significantly the carcinogenic activity. Thenegative value of the term ∆Gcoul

w,o indicates that themetabolic activation could take place preferentiallyin a nonpolar lipophilic phase, i.e., in the liver.Therefore, higher metabolic activation, and then astronger carcinogenicity, is expected for more lipo-philic molecules.

Regarding the metabolic activation process, theauthors tried to describe the probable initial steps:the radical CR-hydroxylation and the amine Na-hydroxylation. They calculated the theoretical reac-tivity indices of atoms and chemical bonds involvedin these processes (net atomic charges, orbital ener-gies, orbital electron densities, etc.). The best modelfor this step was

where QCR is the net charge on the CR atom of parentmolecule.

Equation 35 supports the hypothesis that theinitial biotransformation reactions occur at the CRatom and not at the aminic nitrogen: the mostprobable metabolic activation mechanism starts atthe CR atom or the CR-HR bond.

A model including both reactivity and transportcharacteristics was also formulated:

In a next step, the authors found that the charge onthe CR-radicals was more correlated with the carci-nogenic potency than that of the parent molecules;thus a final equation was formulated:

where QCR refers to the first metabolite and not tothe parent compound as in eq 35. Overall, this studyindicates: (a) the role of the transport of the parentcompound is quite dominant; (b) the combination oftransport and reactivity indices does not improve thecorrelation; (c) for the most probable first metabolites(CR-radicals of NA) the correlation of the biologicalactivity with QCR is the most significant; (d) thecombination of the transport properties of the parentmolecule with the QCR of the first metabolite gives ameaningful QSAR equation for the carcinogenicproperties of the nitrosamines.

2.4. QuinolinesQuinoline and its derivatives are environmental

pollutants found in automobile exhausts, urban air

log 1/D50 ) 0.476∆Gcoulw,o - 0.228∆Gd,r

w,o +

0.155∆Gcavw,o + 3.979

n ) 12, r ) 0.868, F ) 6.140(34)

log 1/D50 ) 21.568QCR - 1.825

n ) 12, r ) 0.669, F ) 8.122(35)

log 1/D50 ) 0.456∆Gcoulw,o + 12.576QCR + 1.001

n ) 12, r ) 0.813, F ) 6.825(36)

log 1/D50 ) 0.188∆Gcoulw,o + 16.285QCR + 1.162

r ) 0.903, F ) 12.618(37)

SAR Studies of Chemical Mutagens and Carcinogens Chemical Reviews, 2005, Vol. 105, No. 5 1781

Page 16: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

particulate, and tobacco smoke, and are used for thepreparation of industrial chemicals and pharmaceu-tical drugs. Some are also food mutagens. A numberof quinolines are recognized mutagens and carcino-gens. In this paper Debnath and co-workers86 studiedthe mutagenesity of 33 quinoline derivatives in S.typhimurium TA100 strain, with metabolic activation(S9 fraction).

A considerable amount of study was devoted to theconsideration of electronic descriptors, includingLUMO and HOMO energies, as well as electroincdensity on various ring atoms. During this prelimi-nary work, it was discovered that quinolines substi-tuted in 2-, 3- and 4-positions did not fit the corre-lation equations and hence were omitted from thefinal formulation.

The resulting model was

where q2 is the net charge on carbon atom 2, i.e.,adjacent to nitrogen atom. The mutagenic activitywas experimentally measured and expressed as rateof mutation in revertants/nmol; log P values wereexperimentally determined.

As in other studies relative to the mutagenicity ofcompounds requiring S9 metabolic activation, log Pis of overwhelming importance and has a coefficientclose to 1.0.18

Using q5 or q7 in eq 38 results in much poorercorrelation, but using q4 or qN yields almost as goodresults as q2. The authors elected position 2 as beingthe most likely site of activation to form the ultimatemutagen and bind with DNA not just because of eq38 but also because a similar suggestion derived fromprevious mechanistic studies.

Another approach, based on parameters of thephysical organic chemistry, gave rise to the followingequation:

where R is the resonance parameter in differentpositions, whose negative coefficient means that anelectron release to the ring increases the activity (thiseffect is more important for 6-substituents than for8-substituents). The addition of a term in F (fieldinductive parameter) did not improve the equation.

The QSARs found provide a considerable insightto this structure-activity problem, even though theyare not able to rationalize the substitutions in allpositions. This difficulty is likely to derive from theexistence of several different patterns of metabolicactivation.

In a subsequent study, two different types ofactivity for the same set of quinolines were simulta-neously considered. Fifteen 8-substituted quinolineswere tested for both mutagenic and cytotoxic activityin Ames test, TA100 strain (with S9 activation).87

The mutagenicity of the chemicals was modeled by

where: B5-8 is the intermolecular steric Verloopparameter (STERIMOL). Log P was experimentallymeasured by the shake-flask method; two substanceswere omitted.

Similarly to eq 38, log P is present with a coefficientclose to 1.0. The negative coefficient of the Sterimolparameter denotes that bulky substituents inhibitthe mutagenic process either in the metabolic activa-tion step (i.e., interaction with P-450) or in thereaction of the final metabolite with DNA.

The toxicity against the bacterial cells was de-scribed by:

where Tox is the toxic activity to the TA100 bacterialcells, i.e., the inhibition of revertant growth observedas a decrease from the initial slope value; σ is theHammett parameter.

Thus, it appears that it is possible to develop QSARof two distinctly different biological activities for thesame set of congeners. For the mutagenicity of thesequinolines, both hydrophobicity and steric interac-tions appear to be important. In contrast, the cyto-toxicity is mainly affected by increasing hydropho-bicity and by addition of withdrawing substitutentsto the quinoline ring. The latter effect may be relatedto the protonation of the basic nitrogen in thequinoline ring.

2.5. TriazenesTriazenes are chemotherapeutical drugs that, as

many other compounds of the same category, alsoinduce DNA damage and mutations themselves.Shusterman and co-workers88 examined a set con-stituted of 21 phenyl- and heteroaromatic triazenes,active for mutagenicity in S. typhimurium strainTA92 containing the S9 fraction.

Two QSAR models were developed:

where C stands for the molar concentration of chemi-cal that causes 30 mutations above background/108

TA92 bacteria, and q HOMO is the HOMO electrondensity on the alkylated N1.

The two equations suggest that the mutagenicityof these compounds is ruled by the hydrophobicity

log TA100 ) 1.14((0.40) log P -45.76((27.83)q2 - 5.39((1.70)

n ) 21, r ) 0.852, s ) 0.565, F1,18 ) 11.9(38)

log TA100 ) 0.99((0.44) log P -1.48((1.19)R8 - 2.68((2.32)R6 - 3.19((0.98)

n ) 21, r ) 0.842, s ) 0.599, F2,17 ) 20.8(39)

log TA100 ) 1.16((0.35) log P -0.51((0.26)B5-8 - 1.56((0.98)

n ) 13, r2 ) 0.870, q2 ) 0.784, s ) 0.402(40)

log Tox ) 1.16((0.42) log P + 1.02((0.94)σ -5.71((0.94)

n ) 13, r2 ) 0.801, q2 ) 0.685, s ) 0.371(41)

log 1/C ) 0.95((0.25) log P +2.22((0.88)HOMO + 22.69

n ) 21, r ) 0.919, s ) 0.631(42)

log 1/C ) 0.97((0.24) log P -7.76((2.73)qHOMO + 5.96

n ) 21, r ) 0.931, s ) 0.585(43)

1782 Chemical Reviews, 2005, Vol. 105, No. 5 Benigni

Page 17: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

and electronic properties of the molecules. The posi-tive coefficient of log P shows that the mutagenicactivity increases with increasing lipophilicity. Thepositive coefficient of HOMO and the negative coef-ficient of qHOMO indicate that an increase of mutage-nicity is correlated with increased electron donationfrom the ring to the triazene moiety and, thereby,with an increased ease of triazene oxidation.

The generally accepted scheme for triazene activa-tion involves an initial hydroxylation of the N-methylgroup by cytochrome P-450, followed by spontaneoustautomerization and hydrolysis reactions, leading tothe formation of a reactive carbocation. Even thoughthe reactive carbocation actually reacts with DNA,eqs 42 and 43 do not reveal any correlation betweenthe structure/stability of the carbocation and thetriazene mutagenicity. Instead, hydrophobicity andelectron-donating substituents (coded for by theequations) are the same factors involved in thecytochrome P-450 catalyzed hydroxylation of thetriazenes. Thus the QSAR models suggest that therate-limiting step in the mutagenic activity is deter-mined by the rate of initial triazene activation andnot by the rate of DNA alkylation by the reactivecarbocation.

2.6. Polycyclic Aromatic Hydrocarbons

PAHs constitute a large class of ubiquitous envi-ronmental pollutants formed from the incompletecombustion of fossil fuels, tobacco products, food, andvirtually any organic matter. PAHs are of highpriority concern for human health because of signifi-cant human exposure and because both individualsPAHs and mixtures are associated with human andexperimental carcinogenicity.89 The mechanisms ofaction of the PAHs have been studied extensively.The most consistent and salient feature of thecarcinogenic PAHs is the presence of a bay or fjordregion in the molecule. The bay or fjord region diolepoxides are the ultimate carcinogens of these PAHsand are able to form stable DNA adducts. In additiondepending on the type of PAH, the target organ, andthe level of expression of the activation enzymes, atleast two other metabolic pathways are believed toplay important roles: (1) one-electron oxidation atthe most electrophilic carbon to form radical cation;and (2) conversion of dihydrodiols of PAH to reactiveand redox active o-quinones. PAHs are also knownto have a variety of nongenotoxic activities that canplay important contributory roles in the overallcarcinogenic process.6

The PAHs have been the subject of several QSARanalyses; whereas a number were clearly mechanis-tically based (notably, ref 21), most of the papers wereaimed at providing classification tools, with littlemechanistic insight.

2.6.1. QSARs for the Carcinogenicity of PAHs

The theories of K region (e.g., 9,10-bond of phenan-threne) activation and bay (or L-) region (e.g., regionbetween the 4- and 5-positions of phenanthrene)activation as responsible for carcinogenicity are atthe basis of the work by Zhang and co-workers.21

The QSAR regarded the very specific endpoint ofskin carcinogenicity in mice for a set of 161 com-pounds. The criteria for the selection of the chemicalswas that compounds containing an unencumberedbay region as well as carcinogenic activity whichcould be expressed as Iball index (see below) shouldbe considered as a class of congeners (and thenincluded in this analysis). As a matter of fact, theauthors list also a set of 30 compounds that have beentested for skin carcinogenicity but that do not containa bay region or have an encumbered bay region (i.e.,with one or more substituents placed in the positionspresumed to be oxidized to the diol epoxide). As asupport to the bay region theory of PAH carcinoge-nicity and to criteria used for the selection of chemi-cals the authors note that (a) none of the 14 com-pounds with encumbered bay region show activity;and (b) of the 16 compounds without bay region, only5 have been found to be carcinogenic.

Regarding the selection of chemicals for the analy-sis, the authors also remark that a large range ofbasic structures (including chemicals with hetero-atoms in the ring or in the substituents) was neces-sary to break the collinearity among descriptors(notably, log P and HOMO), which is a commonplague for QSARs of PAHs.

The QSAR equation obtained was

The Iball index of carcinogenicity is defined asfollows:

where tumor incidence ) number of animal withtumors/number of animals alive when the first tumorappears. LK is an indicator variable set 1 for com-pounds with a substituent attached to a L or Kregion. A positive coefficient of LK means that asubstitution in a L or K region leads to increasepotency of these congeners, other factors being equal.The activity largely depends on hydrophobicity, withLK and HOMO being of relatively minor importance.

In eq 44, it is seen that carcinogenicity initiallyincreases with the increase of log P, and then afterreaching an optimum value of 6.8, declines linearly.Carcinogenicity begins to become apparent at log Pof 4, peaks at 6.8, and disappears at about 9. Sincemany simple organic compounds with log P valueslower than 2 penetrate skin readily, the high log Prequired for skin carcinogenicity cannot be chargedto penetration. The authors argue that high log P isnecessary to induce enough P-450 in the skin cells.In addition, tumor promoting activity is known todepend on hydrophobicity (in phorbol esters). Theseresults support the conclusion that the threshold

log Iball ) 0.55((0.09) log P -1.17((0.14) log(â 10logP + 1) + 0.39((0.11)LK +

0.47((0.26)HOMO + 1.93((2.4)(44)

n ) 161, r ) 0.845, s ) 0.350,log P0 ) 6.67((0.217), log â ) -6.81,

F1,155 ) 12.8

Iball index ) (tumor incidence)(100%)/mean latent period in days

SAR Studies of Chemical Mutagens and Carcinogens Chemical Reviews, 2005, Vol. 105, No. 5 1783

Page 18: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

value for log P is to be associated more with theactivation and promotion process than with skinpenetration. As further support, the authors alsoshow that the hydrophobicity distribution of chemi-cals causing cancer via diet, water, gauge inhalation,or injection is totally different and peaks at muchlower log P values.

Regarding the other parameters in eq 44, unpro-tected (without substituents) K- and L regions resultin lower than expected potency. The authors arguethat substituents at these points act by inhibitingdeactivating metabolism. It has been assumed formany years that L regions, because of their well-known high chemical reactivity, are readily metabo-lized by P-450 to hydrophilic compounds resulting incarcinogenic activity reduction or loss. Equation 44indicates that K regions behave so much like Lregions in terms of overall activity.

The presence of HOMO in eq 44 is rationalized interms of the first oxidation step in P-450 activationyielding an epoxide.

Other authors reported on QSARs for the carcino-genic activity of PAHs, characterized by minor em-phasis on mechanistic considerations and major focuson the derivation of classification rules. A selectionwill be summarized below; if interested, the readeralso can see refs 90-92.

Richard and Woo89 used the CASE model to de-velop QSAR for the carcinogenicity of a large groupof PAHs. The aim was to (a) develop a classificationrule; (b) for a set of untested PAHs, compare theCASE predictions with those generated by human“expert judgment” based on mechanistic consider-ations; (c) evaluate the CASE methodology.

The CASE program is presented in more detail inSection 4.1. It is an automated, statistically basedcorrelative program that starts from chemical struc-tures and explores all possible chemical substruc-tures to find those correlated with the activity understudy.93,94 In this case, activity/nonactivity carcino-genicity data were used and thus the qualitativeCASE mode was employed.

The chemicals were divided into the followinggroups:

LEARN set: 78 compounds (31 active, 25 inactive,22 marginal). These PAHs had known carcinogenicactivity and were used to train the CASE program;

TEST set: 106 compounds with unknown activity,previously considered by the human experts;

VALIDATE set: 24 compounds with known activ-ity, used to check the prediction ability of the CASEmodel developed on the LEARN set.

In the LEARN set, the procedure identified eightactivating fragments and four inactivating ones. Themost significant fragment identified by CASE rep-resents an unsubstituted bay region benzo ring, withunoccupied peri position at C5. This is consistent withthe current bay region theory of PAH carcinogenicity.The remaining activating fragments are basicallymodifications or extensions of the bay region oraccount for chemicals with no apparent bay region.These fragments classified correctly the activities of94% of PAHs in the LEARN set.

The above CASE model was applied to the predic-tion of the chemicals in the TEST set. The agreementbetween CASE predictions and the predictions of thehuman experts was only 64%. This was attributedto limitations of the LEARN database in terms ofinadequate representation of 2- and 3-ring PAHs.When these subclasses were not considered, theconcordance improved to 90%.

The VALIDATE set contained 24 compounds withmore than three rings, and their activity was pre-dicted with a total prediction accuracy of 75%. Itshould be noted that only 3 out 24 of the VALIDATEstructures were represented within the LEARN set;therefore, almost total extrapolation was necessary.

Yuan and Jurs95 modeled the carcinogenicity of aset of 191 PAHs with the computer program ADAPT.The range of screened molecular descriptors was verylarge (n ) 56) and included topological descriptors(e.g., fragments, sigma electronic distribution ofsubstructures, connectivity), geometric descriptors,and whole molecule descriptors, like log P. Pattern-recognition methods were used to model the data.

A set of 28 descriptors was found to be able toseparate the 191 PAHs into carcinogens and noncar-cinogens. In a cross-validation test (leaving out 10chemicals for 10 times), the predictive ability was90.5%.

The results of the study neither confirmed nordisproved any particular theory of PAH carcinogenic-ity. Among the final 28 descriptors were substructuredescriptors containing K regions, as well as bayregions. While several geometric descriptors wereamong the final set, the dataset could not be sepa-rated based on this information alone. Finally, whilelog P alone was capable of separating a comparativelylarge amount of the dataset, there were severaldescriptors that were more powerful.

2.6.2. QSARs for the Genotoxicity of PAHsRecently, the above laboratory generated a QSAR

model for the genotoxicity of 277 PAHs in the SOSChromotest.96

On the basis of the extent of genotoxic response,the PAHs were divided into the classes of genotoxic(n ) 80) and nongenotoxic (n ) 197). Approximately240 descriptors were generated, using the ADAPTroutine: topological, geometric, electronic, and hybriddescriptors. Three mathematical classification meth-ods were used, together with cross-validation proce-dures.

The k-nearest neighbor method, based on eightdescriptors, provided the best separation, with acorrect training set classification of 93-5%. A consen-sus model that incorporated the three mathematicalclassifiers correctly classified 81.2% of the compoundsand provided a higher prediction rate on the geno-toxic class than any other single model.

2.7. Halogenated AliphaticsThe toxic and aneuploidizing activities in Aspergil-

lus nidulans of 55 halogenated aliphatic hydrocar-bons were studied by Crebelli et al.97 Aneuploidy isa kind of genetic mutation that involves changes inthe correct number of chromosomes.

1784 Chemical Reviews, 2005, Vol. 105, No. 5 Benigni

Page 19: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

Halogenated aliphatic hydrocarbons are widelyused as industrial and household solvents, as inter-mediates in chemical synthesis, and for a variety ofother uses. Since several halogenated aliphatics arerecognized to be genotoxic and/or carcinogenic, hu-man exposure, which occurs both at workplaces andin outdoor and indoor environments, may pose asignificant health hazard. They may act by severalmechanisms of action. Short-chain monohalogenated(excluding fluorine) alkanes and alkenes are potentialdirect-acting alkylating agents. Dihalogenated al-kanes are also potential alkylating or cross-linkingagents (either directly or after GFH conjugation).Fully halogenated haloalkanes tend to act by freeradical or nongenotoxic mechanisms or undergoreductive dehalogenation to yield haloalkenes thatin turn could be activated to epoxides.9

A first analysis of 35 chlorinated C1-C8 alkanesand alkenes led to the development of QSARs for theinduction of aneuploidy and toxicity in Aspergillusnidulans; the QSARs correctly predicted the activityof further six halogenated methanes.98 More chemi-cals were experimentally tested, and new QSARmodels were developed.97 These were very similar tothe previous ones. For each compound, the followingbiological measures were performed:

(1) the dose able to block mitotic growth (ARR);(2) the dose with 37% of survival (D37);(3) the lowest efficient concentration in aneuploidy

induction (LEC).The chemical descriptors considered were log P,

MR, HOMO, LUMO, their difference HOMO -LUMO (DIF).

The QSAR models for the extended set of chemicalswere

In all the above equations, MR was far more impor-tant than the electronic parameters.

Equation 48 regards only 24 chemicals, whichresulted to be positive for induction of aneuploidy. AQSAR model for discriminating between aneuploidiz-ing (mutagenic) and inactive chemicals was formu-lated. The model was based on LUMO, MR, and Iz(inertia momentum along the z-axis) and had 86%accuracy.

The lack of log P in the model for aneuploidy seemsto exclude that this mutational event is driven byaspecific disturbance at the cell membranes. At thesame time, the available experimental evidence al-lows ruling out the induction of concurrent DNAdamage, as expected for strong electrophilic com-pounds capable of a direct interaction with nucleo-philic centers on DNA. The authors suggest theinvolvement of reductive metabolism and the genera-

tion of free radicals that would damage criticaltargets of the mitotic apparatus.

2.8. Direct-Acting Compounds

2.8.1. Mutagenicity of Platinum AminesA set of 13 substituted (o-phenylenediamine)-

platinum dichlorides, tested for their mutagenicityin S. typhimurium TA92 strain, was studied byHansch et al.99 These are chemotherapic agents, andare direct-acting mutagens, since they are active inS. typhimurium without activation by microsomes.

Mutagenicity with TA92 strain was modeled by

where C is the molar concentration of mutagenproducing 30 mutations above background.

The term σ- electronic term indicates that electronwithdrawal through resonance is an important de-terminant of the mutagenic activity, probably be-cause the electron-withdrawal weakens the nitrogen-platinum bond. The author remarked the absence ofa hydrophobic term in eq 49 and related this to thefact that these compounds do not need metabolicactivation to attack the DNA.

2.8.2. Mutagenicity of Lactones

Lactones are reactive chemicals widely used asintermediates in chemical synthesis, solvents, con-stituents of paint removers, and antibacterial agents.They are mutagenic in test systems without meta-bolic activation.

The mutagenicity of lactones on S. typhimuriumTA100 strain was modeled by LaLonde et al.:100

Mn is the mutagenic response of each compoundin TA100; n does not represent 58 different com-pounds but only 10 compounds, each tested a numberof times. LUMO was calculated with the MNDO-PM3program. Electrophilicity (LUMO) was found to becorrelated with the mutagenic activity; no hydropho-bic term appears in the model.

The mutagenicity data for another set of lactones(24 halogenated furanones) was studied by Tup-purainen.101 3-Chloro-4-(dichloromethyl)-5-hydroxy-2(5H)-furanone (MX, Mutagen-X) and its analoguesare strong direct-acting bacterial mutagens. They canbe formed from the chlorine bleaching of wood pulpand the chlorination of humic waters. Whereasunsubstituted lactones are generally weakly mu-tagenic, MX and its analogues are special cases withhalogens and double bonds imparting electrophilicity.MX is one of the most potent mutagens ever testedin S. typhimurium TA100 strain. Specific features ofthese mutagens are that they exhibit strong mu-tagenic activity in only one tester strain (TA100), andthere is no evidence of formation of stable adducts

log(1/D37) ) 0.24 + 0.08MR - 5.13 DIF

n ) 55, r ) 0.80, F ratio ) 45.45(45)

log(1/ARR) ) 0.16 + 0.09MR - 5.28DIF

n ) 55, r ) 0.86, F ratio ) 75.68(46)

log(1/LEC) ) 0.83 + 0.07MR - 4.91LUMO -3.41DIF

n ) 24, r ) 0.94, F ratio ) 69.07(47)

log(1/C) ) 2.23((0.32) σ- + 5.78((0.18)

n ) 13, r2 ) 0.956, s ) 0.260(48)

log Mn ) -6.50LUMO - 6.24

n ) 58, r2 ) 0.925(49)

SAR Studies of Chemical Mutagens and Carcinogens Chemical Reviews, 2005, Vol. 105, No. 5 1785

Page 20: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

with DNA. The QSAR model obtained was:

It appears that an electrophilic π electron system(low LUMO energy values) promotes mutagenicpotency, as in eq 49. An hydrophobic term did notimprove the QSAR model. The lack of the hydropho-bic term is rather characteristic of direct-actingmutagens (that do not require metabolic activation).11

On the basis of the QSAR found and on a range ofevidence, the author conjectures that the halogenatedfuranones can undergo one-electron transfer duringnonbonding interaction with DNA. As a matter offact, the GC sites (predominant target for TA100mutagens) on DNA provide the most stable centersfor a positive hole in DNA. This hypothesis alsoagrees with the lack of evidence for significant DNAadducts.

2.8.3. Mutagenicity of Epoxides

Aliphatic epoxides have found widespread use asvaluable industrial and laboratory alkylating agents,which has led to much interest in their potentialgenotoxicity.

The mutagenicity of propylene oxides with S.typhimurium has been studied by Hooberman atal.102 They found the following QSAR:

TA100-LSA is a measure of mutagenicity in TA100strain using the liquid suspension assay (withoutmetabolic activation), expressed as revertants/nmol.NICOT is the rate constant for the reaction withnicotinamide, which is used as a model for thestereoelectronic effect of the chemical in the DNAreaction (alkylation of the model nucleophile nicoti-namide). Experimentally determined hydrophobicvalues did not improve the correlation, neither didother calculated parameters [Taft σ* values, MR,STERIMOL parameters (L,B1-B4), molecular vol-umes (υW)].

Another study on a very small dataset (n ) 6) of3- and 4-substituted styrene oxides tested on E. coligave a similar QSAR (based on the electronic reactiv-ity parameter σ, and with no hydrophobic term):103

QSAR models of mutagenicity of 7 p-substitutedstyrene oxides were developed by Tamura et al.104

The biological activity was measured by C, molarconcentration required to induce a mutation fre-quency of 10-6 on S. typhimurium TA100 strain. Inaddition, both the mutation frequency at the dose of1 mM [MF(mM)] and the mutation frequency at the

lethal dose LD50 [MF(LD50)] were considered. The toxicactivity was considered as well, as measured byLD50.

The QSAR for toxic activity was

For mutagenicity (excluding p-nitrostyrene oxidebecause of its particular metabolic pathway):

where VW(X) and VW(H) are the van der Waals volumesof X-subsituted and unsubstituted styrene oxides,respectively.

It appears that the lethal effects (LD50) dependonly on hydrophobicity and not on chemical reactiv-ity. This is a general effect for aspecific toxicity.11

Mutagenicity (eqs 54-56) is correlated with epoxidereactivity (σ) and steric effects. In eqs 54 and 55 alsoa hydrophobic term appears.

3. Nongenotoxic or Epigenetic CarcinogensEpigenetic carcinogens are agents that act through

a secondary mechanism that does not involve directDNA damage. They are inactive in the classicalgenotoxicity tests. In reality, the demarcation isseldom absolute: it would probably be more accurateto define carcinogens as predominantly genotoxic orepigenetic. In general, genotoxic carcinogens tend toinduce tumors in more than one species (multispe-cies) or more than one target organ (multitarget),whereas epigenetic carcinogens tend to induce tu-mors in a single target organ/tissue/cell or a narrowlyrelated set of target organs/tissue/cells. Whereas theaction of genotoxic carcinogens has an underlyingunity in the fact that they attack directly, or aftermetabolic activation, the genetic macromolecules, theepigenetic carcinogens act through a myriad of mech-anisms; thus each class has to be described individu-ally.

In the past several years, there has been anexplosive growth of the scientific literature on epi-genetic/nongenotoxic mechanisms of chemical car-cinogenesis. A very detailed recent review on theargument is ref 6. Here I will summarize briefly themain points of this presentation, mainly emphasizing

ln TA100 ) -12.7((1.1)LUMO -12.0((1.3)

n ) 24, r ) 0.930, s ) 1.33, F ) 141.0(50)

log TA100-LSA ) 1.28((0.37)NICOT -0.56((0.21)

n ) 14, r2 ) 0.823, s ) 0.302(51)

log rev/nmol ) -1.93((0.57)σ+ + 1.40((0.12)

n ) 6, r2 ) 0.956, s ) 0.101(52)

log(1/LD50) ) 1.863((0.390)π + 2.421

n ) 8, r ) 0.979, s ) 0.164, r2 ) 0.958(53)

log 1/C) -1.010((1.441)σ +8.763((9.034) log[Vw(X)/Vw(H)] -

0.505((2.102)π + 3.077

n ) 7, r ) 0.970, s ) 0.238, r2 ) 0.940(54)

log MF(mM) ) -1.201((1.555)σ +10.432((9.747) log[Vw(X)/Vw(H)] -

0.704((2.267)π - 5.875

n ) 7, r ) 0.973, s ) 0.256, r2 ) 0.946(55)

log MF(LD50) ) -1.718((0.840)σ +12.943((5.252) log[Vw(X)/Vw(H)] - 3.711((1.227) -

5.012

n ) 7, r ) 0.985, s ) 0.139, r2 ) 0.970(56)

1786 Chemical Reviews, 2005, Vol. 105, No. 5 Benigni

Page 21: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

the few examples in which quantitative modeling ofSAR has been performed.

One important class of epigenetic carcinogens isperoxisome proliferators. Peroxisomes are single-membrane organelles found in mammalian cells andcells of some other organisms. A major role ofperoxisomes in the liver is modulation of lipid ho-meostasis, including the metabolism of long-chainfatty acids and conversion of cholesterol to bile salts.Out of over 100 peroxisome proliferators identifiedto date, about 30 chemicals have been adequatelytested and shown to be carcinogenic (primarilyinducing tumors in the liver). They belong to differentchemical classes, including: (a) phenoxy acid deriva-tives; (b) alkylcarboxylic acids and precursors; and(c) phthalate esters. One of the characteristic featuresof many, but not all, of these peroxisome proliferatorsis the presence of an acidic function. This acidicfunction is usually a carboxyl group either presentin the parent structure or formed after metabolism.QSAR studies have pointed out to similarities in thethree-dimensional structures for a number of peroxi-some proliferators and (for a series of phthalateesters or clofibrate analogues) to a correlation be-tween induction of peroxisomal enzyme activities andelectronic structural parameters.105,106

Another nongenotoxic mechanism of carcinogenesisregards Ah receptor-mediated and other enzymeinducers. The aryl hydrocarbon receptor (AhR) medi-ates the broad spectrum of toxicity of the extremelypotent environmental toxin, 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD). Activation of AhR leads to inductionof Phase 1 (e.g., CYP1A1, CYP1B2, CYP1B1) andPhase 2 (e.g., glutathione S-transferase) drug me-tabolizing enzymes, modulates the expression ofmany genes, and causes hepatotoxic response, im-munotoxicity, developmental and reproductive toxic-ity, disruption of endocrine pathways, chloracne,tumor promotion, and carcinogenesis. From thechemical structural point of view, there are threemajor categories of ligands: (a) polyhalogenatedhydrocarbons, (b) PAHs, and (c) indolecarbazole typephytochemicals. With some possible mechanisticoverlap with AhR, the induction of certain cyto-chrome P-450 isozyme has long been regarded as apredictor of potential rodent hepatocarcinogenicitybecause of its cell proliferative activity and its abilityto activate some classes of chemicals. The abovenotion underlies the development of a computerizedsystem (computer optimized molecular parametricanalysis for chemical toxicity or COMPACT) thatmodels the ability of chemicals to become substratesfor the P-450 enzymes family or to interact with theAh receptor; the model is based on steric (planarity)and electronic parameters. The COMPACT assess-ment has been proposed as predictor of carcinogenic-ity and used in the framework of comparative exer-cises on the prediction of rodent carcinogenicity (seefollowing sections).107,108

The above mechanistic classes of nongenotoxiccarcinogens have been the subject of a limitednumber of QSAR analyses. It should be remarkedthat the literature contains more experimental datapotentially suitable for further modeling. The same

applies to the following classes, for which onlyqualitative structure-activity considerations havebeen presented.

One class of nongenotoxic carcinogens acts viainhibition of Gap junctional intercellular communica-tion (GJIC). GJIC is a form of “local communication”between adjacent cells via a membrane-bound proteinstructure (the gap junction) which facilitates directcytoplasm-to-cytoplasm exchange of small molecules(<900 Da) such as ions, growth factors, and hormonalsecond messenger. It is an important means tocontrol cell proliferation and differentiation to ensurehomeostasis. A wide spectrum of tumor promotersand carcinogens have been shown to inhibit GJIC.The inhibition appears to be reversible. Amongothers, this class includes perfluorinated fatty acids,dialkyl phthalates, and PAHs.

Another class is composed of agents that causeoxidative stress. Oxidative stress arises when theproduction of reactive oxygen species (ROS) or intra-cellular free radicals overrides the antioxidant capac-ity of the target cells. Free radicals and ROS canreadily react with most biomolecules to initiate achain reaction of free radical formation which can beterminated only after reaction with another freeradical or free radical scavengers such as antioxi-dants. ROS can react with all the macromolecularmachinery of a cell, particularly lipids, protein, andDNA. The major structural classes of chemical car-cinogens that can induce oxidative stress are (a)peroxides; (b) polyhaloalkanes such as carbon tetra-chloride; (c) quinones; (d) PAHs; (e) phenolics; (f)homocyclic and heterocyclic aromatic amines; (h)nitroaromatics; (i) hydrazines; (j) metals such as As,Cr, Cu, Fe; (k) mineral fibers such as asbestos; and(l) miscellaneous agents such as potassium bromate,nitroalkanes.

Alteration of the extent of methylation at the5-position of cytosine in DNA is an important epige-netic mechanism that can regulate gene activity,which in turn can contribute to carcinogenesis. Dif-ferential methylation of DNA is a determinant ofhigher order chromatin structure, and the methylgroup provides a chemical signal recognized bytransacting factors that regulate transcription andexpression of genes. Under normal conditions, theextent of DNA methylation in each target/organ/cellis maintained at a level consistent with its status ofcell replication or its cell differentiation-programmedneed for gene expression.

Some nongenotoxic carcinogens cause hormonalimbalance. The endocrine system is under delicatehomeostatic control involving complex feedback regu-latory networks: the central controlling unit is thehypothalamus-pituitary axis. Because of the nega-tive feedback relationships, stoppage of secretion bya target gland or elimination of the secreted hormoneleads to overproduction of the respective pituitarytrophic hormone. If this is maintained for a long time,tumor will develop in either the overactive pituitarygland or the overstimulated target gland. The specificmechanisms for this are various and involve a largenumber of chemical classes (including many pesti-cides).

SAR Studies of Chemical Mutagens and Carcinogens Chemical Reviews, 2005, Vol. 105, No. 5 1787

Page 22: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

Compensatory regenerative cell proliferation, inresponse to cell necrosis caused by cytotoxic agents,is a common nongenotoxic mechanism associatedwith many carcinogens. Unlike receptor-mediatedmechanisms, cytotoxicity-induced cell proliferationonly occurs at high doses that overwhelm the cellulardefense mechanisms. A particular mechanistic classprovokes R2µ-globulin nephropathy, through an ex-cessive accumulation of R2µ-globulin-containing Hya-line droplets in renal proximal tubules ensues. De-spite the chemical structural diversity of the renalcarcinogens which act via the R2µ-nephropathy mech-anism, molecular modeling showed that all activechemicals fit deeply into a hydrophobic pocket of theprotein. Beyond the dimensional requirement, thereis also a requirement for a degree of lipophilicity anda need for an electronegative atom in the moleculeor its active metabolite.109

As remarked above, the nongenotoxic carcinogensact through a myriad of different mechanisms, andthe rationalization of this area (in terms of SARs) isstill poor. However, a number of very general conclu-sions on our present ability to assess their risk withQSAR approaches can be attempted.

In a study on the prediction of the carcinogenicityof pharmaceutical drugs via human expert consid-erations (i.e., largely based on the knowledge of theSAs),110 up to 84% of the chemicals that werecarcinogenic in the four rodent groups were correctlypredicted, against 30% correct prediction for thechemicals carcinogenic only in one rodent group.Since the knowledge of the SAs of genotoxicity ismore advanced than that for nongenotoxic carcino-gens, this may indicate that the chemicals with awider carcinogenic activity spectrum (hence probablymore harmful) are more likely to act via genotoxicmechanisms than those acting through epigeneticmechanisms. At the same time, we are better equippedto recognize the carcinogens with a wider spectrumof potential targets.

The above optimistic evidence goes hand in handwith the fact that the epigenetic carcinogens areprobably a minority in the general database of usedchemicals111 (except perhaps among the pharmaceu-tical drugs) and that some studies seem to indicatethat the carcinogenic potency of epigenetic carcino-gens is remarkably lower than that of genotoxiccarcinogens.112 If this is correct, the risk for thehuman health deriving from the difficulties in iden-tifying the epigenetic carcinogens is less dramaticthan that posed by lack of recognition of genotoxiccarcinogens.

However, a word of caution should be spent hereto counterbalance the above considerations. It shouldbe remembered that the hazard posed by a carcino-gen depends both on its intrinsic carcinogenic potencyand on the extent of human exposure. Even veryweak carcinogens, like benzene, can have disastrouseffects if the population exposed is very large.113

Thus, even if the nongenotoxic carcinogens pose alower intrinsic threat to human health than thegenotoxic carcinogens, they are still to be consideredas a primary area of health concern.

4. QSAR Models For Noncongeneric ChemicalsThe application of QSAR modeling to individual

classes of chemicals, acting through distinct mecha-nisms of action, constitutes the area where theclassical QSAR approaches give their best contribu-tion, both in terms of understanding the biologicalactivity (here, mutagenicity and carcinogenicity) andof predictive ability. The restriction of traditionalQSAR approaches to congeneric chemicals representsboth the key to its success and an inherent limitationto more general applications.11 This is particularlyevident in toxicology, where diverse chemical struc-tures are studied for reasons that rarely take intoconsideration QSAR modeling concerns. In addition,the database of experimental results is not populatedof enough representative chemicals to provide asufficiently representative basis for modeling thecarcinogenicity or mutagenicity of each chemical classof interest; this is also because the chemicals ofpractical interest change with the time since newcompounds (pharmaceuticals, dyes, etc.) are producedand marketed every year. The main consequence isof practical order and regards the use of QSAR forrisk assessment of chemicals devoid of experimentaldata: in the historical carcinogenicity and mutage-nicity database, it is often impossible to retrievechemicals congeneric to a query one, to derive QSARmodels for the predictions. Despite the challenges,there is strong motivation to develop QSAR modelsfor toxicity predictions for use in screening, for settingtesting priorities, and for reducing reliance on animaltesting. This need has motivated attempts to con-struct “general” QSAR models, not tailored to con-generic classes of chemicals. A scientific aspect of thisis the interest in exploring nonapparent associationscrossing traditional chemical class boundaries. Mostof the general QSAR modeling efforts, to date, havefocused on the challenge of predicting rodent carci-nogenicity, with efforts to predict mutagenicity viewedas a component of this challenge.

Issues and difficulties related to the concept itselfof “general” QSAR models have been addressed in anumber of publications.114-117 Since, typically, mul-tiple mechanisms of action can lead to the sametoxicity endpoint, this entails distinguishing islandsof activity from within a sea of inactive chemicals,with each island representing a local chemical bio-logical mechanism domain. In practice, a “general”QSAR model has to model, simultaneously, thevarious different mechanisms of action of the varioustypes of chemicals present in the studied set; thistask is obviously much more demanding than de-scribing the variation of potency or the differencebetween active and inactive chemicals relative to ahomogeneous class of chemicals, based on the varia-tion of a limited number of chemical features. Thus,the concept of a general model is seemingly at oddswith the inherently local nature of chemical-classbased QSAR approaches; as a matter of fact, all“general” models consist of multiple local models,whose definition may or may not be readily appar-ent.117

General models for the prediction of mutagenicityand carcinogenicity have ranged from purely heuris-

1788 Chemical Reviews, 2005, Vol. 105, No. 5 Benigni

Page 23: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

tic human expert judgment, to those that requireprior hypothesis and human intervention in choosingrelevant parameters, to automated rule-based ap-proaches derived from human expert knowledge, toautomated statistical, machine learning and pattern-recognition approaches that independently derivealgorithms for prediction from existing data. Manyof these approaches are driven by the recognition ofSAs in the molecules. SAs are chemical substructuresor functional groups that have been mechanisticallyand/or statistically associated with the induction ofmutations or cancer. At a more sophisticated level,the prediction methods contain information on bothknown SAs and modulating factors. Some approachesfor the prediction of the long-term carcinogenicitybioassay also include the knowledge on short-termtest results, like the S. typhimurium, as basis for theprediction. A generalized useful distinction is be-tween (a) statistically based approaches, which at-tempt to analyze objectively noncongeneric datasetsto extract empirical models (in analogy with theclassical QSAR approach); and (b) knowledge-basedapproaches, which attempt to bring diverse types ofa priori information, in the form of generalized rules.Some of these approaches have been implementedinto commercial software programs. Detailed reviewsare refs 118-121.

Here examples of the various categories of systemswill be briefly presented, whereas applications andresults will be given more in detail.

4.1. Statistically Based Approaches forNoncongeneric Datasets

Statistically based methods attempt to uncovergeneral associations useful for hypothesis generationand prediction. These methods use, for example,statistical, multivariate, rule-induction, artificial in-telligence, cluster analysis, or pattern-recognitiontechniques to analyze diverse, noncongeneric toxicitydatabases in unbiased ways, i.e., with limited or noprior chemical or biological classification accordingto mechanism. Examples include the commercialprograms CASE, MULTICASE, TOPKAT, andADAPT, as well as other methods that have not beenimplemented into commercial softwares.

CASE and MULTICASE are well-known represen-tatives of a very characterized QSAR approach, whichis distinguished from other approaches by its centralreliance on computer-generated substructural frag-ments, which are its major type of molecular descrip-tors, and the completely automated and unbiasedmanner in which these descriptors are generated andchosen for inclusion in the QSAR model. In this theCASE technology relies on previous research, whosefirst examples can be found in refs 122 and 123.Along the same line of thought is the more recentPROGOL program.

In CASE, each of the molecular structures belong-ing to the training database is decomposed by theprogram into all possible constituent fragments oflength 2-10 contiguous heavy atoms, with attachedhydrogens and one possible side chain. The statisticalanalysis of the set of fragments generated by thedecomposition of all molecules in the training set

involves examination of the distribution of eachunique fragment among active and inactive moleculesand identification of fragments whose distributiondeviates from an ideal symmetrical binomial distri-bution: each of the fragments significantly deviatingfrom the reference distribution is labeled either abiophore (activating fragment) or a biophobe (inac-tivating fragment). Biophores and biophobes are theprimary molecular descriptors of the CASE QSARmodel: this may be expressed either as a (Bayesian)activity/inactivity probability or as a linear regressionrelating a potency to the substructural descriptors.

PROGOL, as CASE, is open-ended: it examines thechemical structures on its own and does not dependon predefined lists of chemical substructures. How-ever, the implementations are different. PROGOLlooks at atoms, their bond connectivities, and higherorder molecular structures and forms fully relationaldescriptions of chemical structures: these descrip-tions are manipulated through the inductive logicprogramming (ILP) algorithm. PROGOL finds rulesin form of relational patterns.

MULTICASE (MULTIple Computer AutomatedStructure Evaluation) is a development of the CASEprogram, and it is interesting since it evolved fromthe recognition of problems found in the course ofCASE analyses. In particular, MULTICASE respondsto the problem of distinguishing between fragmentsthat provoke the activity and fragments that modu-late the activity. In more general terms, it attemptsto face the presence of hierarchy and nonlinearitywithin SAR models as applied to noncongeneric setsof chemicals. As CASE, MULTICASE starts by creat-ing its own dictionary of descriptors directly from thedatabase. At this point, and in contrast to CASE,MULTICASE selects the statistically most importantof these fragments as a biophore, believed to beresponsible for the observed activity of those mol-ecules that contain it, and separates out all themolecules containing this biophore from the remain-ing database. This process is repeated on the remain-ing database with the next most significant biophore,and so on, until the database is segmented into majorbiophore-containing chemical classes. CASE analysisis then applied to each biophore class separately todetermine substructural modifiers to the biophoreactivity.

References to CASE and MULTICASE are 94 and124-131. An implementation of MULTICASE, tai-lored specifically on the issue of predicting thecarcinogenicity of pharmaceutical drugs, is in ref 132.Independent software implementations of the CASEtechnology were developed by Parodi and colla-borators.133-135 References to PROGOL are 136-139.

Another type of approach are methods that lookfor statistical associations starting from predefinedlists of molecular descriptors. A commercial imple-mentation is exemplified by Toxicity Prediction byKomputer Assisted Technology (TOPKAT). It is anautomated computer system consisting of differentmodules for the prediction of a variety of acute andchronic toxic endpoints (rodent carcinogenicity andS. typhimurium mutagenicity, but also developmen-tal toxicity, skin and eye irritation, acute oral toxicity

SAR Studies of Chemical Mutagens and Carcinogens Chemical Reviews, 2005, Vol. 105, No. 5 1789

Page 24: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

LD50, etc.). Each model has been derived from aspecific database.

Whereas earlier versions of the TOPKAT equationsdepended, to a substantial extent, on the presenceor absence of substructural fragments, the recentversions use only continuous-valued descriptors.These include topological shape and symmetry indi-ces, and electrotopological state (E-state) on one- andtwo-atom fragments. The latter indices permit thetransformation of information relative to the presenceor absence of fragments (selected from a predefinedlist of about 3000 substructures) into quasi-continu-ous-valued structural descriptors. Unlike fragmentsubstructures, the latter take into approximate ac-count the steric/electronic environment of the frag-ment and conform more easily to the various statis-tical analyses. TOPKAT performs the assessmentrelative to the query chemical in four steps. In thefirst step, TOPKAT determines if the chemical is“covered”, on a univariate basis, by the structuralfragments in the training set. This step identifiesfragments in the query chemical not contained in thetraining set compounds. In most cases, the presenceof such a fragment is sufficient to cause the toxicityassessment to be rejected as unreliable. The secondstep checks whether the chemical fits within theoptimum prediction space (OPS) of the estimatingequation. This enables the user to determine whetherthe test structure is contained in the descriptor spaceof the model: a chemical that is out of the OPS willhave little confidence associated with its toxicityprediction. The third step assesses the toxicity of thechemical. In the fourth step, TOPKAT allows the userto perform a further independent test on the reli-ability of the assessment through a similarity searchin the database. The training database is scannedfor molecules “similar” to the query molecule, toindependently assess the possible chemical or bio-logical significance of model associations. In practice,the user can check both the actual toxicity of “similar”chemicals and the accuracy of the TOPKAT predic-tion generated for the similar chemicals to assessmodel reliability in that portion of the chemicalbiological activity space.

Selected references for the TOPKAT methodologyand its applications are refs 140-143.

Another example of statistically based approachesfor noncongeneric datasets is provided by AutomatedData Analysis and Pattern Recognition Toolkit(ADAPT). Each compound is represented by calcu-lated molecular structure descriptors encoding thetopological, electronic, geometrical, or polar surfacearea aspects of the structure. Topological descriptorsuse only the connection table of a molecule andtherefore do not require accurate three-dimensional(3-D) optimized structures. These descriptors encodesimple counts of atom types, bond types, connectivityindices, and interatomic distances: they correlatewith molecular size, shape, and degree of branching.Geometric descriptors, which encode information onthe overall size and shape of a molecule and require3-D geometries, including length-to-breadth ratios,two-dimensional (2-D) shadow projection areas, andsolvent accessible surface areas. Examples of elec-

tronic descriptors are atomic partial charges, dipolemoment, electron-core repulsion energies, andcharged partial surface areas. Hundreds of descrip-tors can be calculated for each compound. To controlredundancy, very small subsets of descriptors, usu-ally 3-10 descriptors per model, are identified in astepwise manner. Finally, the QSAR model for thebiological activity of interest is found by applyingseparately different pattern-recognition methods (inrecent papers, they include k-nearest neighbor, lineardiscriminant analysis, probabilistic neural networks,and support vector machines) and selecting thepreferred model(s). Selected references for ADAPTare refs 114, 144, and 145.

Within the framework of an analysis of bothcongeneric and noncongeneric sets of mutagens,extensive use of topological (topochemical and topo-structural), geometrical, and quantum chemical de-scriptors was made in Basak’s laboratory.146 Theabove paper also contains an interesting theoreticaldiscussion on chemical parameters.

As a last example of a statistically based approachfor noncongeneric chemicals, I will quote an applica-tion in which substructural fragments were analyzedwith back-propagation neural networks. The initialset of data comprised 1280 fragments, relative to adatabase of 607 chemicals tested for S. typhimuriummutagenicity. The paper also shows how the analysisof the hidden layer permitted understanding of howthe neural network performed its classification.147

4.2. Knowledge-Based Approaches forNoncongeneric Datasets

The other large area of systems used for modelingnoncongeneric datasets includes expert system ap-proaches that attempt to codify existing knowledge,derived from human expert judgment, bioassay data,or any of the previous modeling approaches, intogeneralized rules for use in prediction. Examplesinclude the DEREK system, applicable to predictionof multiple toxicity endpoints, and the OncoLogicsystem, restricted to chemical carcinogenicity. Eachof these expert systems serves as a repository forexisting knowledge, and each rule conveys an explicithypothesis at the local SAR level that can be refinedand modified as further information becomes avail-able. Hence, expert systems do not discover newassociations but rather can be considered the end-stage of the model development process.

OncoLogic was developed for the express purposeof capturing, and making available for outside use,expertise in the field of structure-based mechanismsof chemical carcinogenesis routinely being applied inregulatory setting by the U.S. Environmental Protec-tion Agency. Mechanism-based SAR analysis hasbeen effectively used by the EPA for many years toassess the potential carcinogenic hazard of newchemicals, for which there are no or scanty data,under the Premarketing/Premanufacturing Notifica-tion program of the Toxic Substance Control Act.Essentially, mechanism-based SAR analysis involvescomparison of an untested chemical with structurallyrelated compounds for which carcinogenic activity isknown. The entire process is described with great

1790 Chemical Reviews, 2005, Vol. 105, No. 5 Benigni

Page 25: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

clarity in ref 9: “All available knowledge and datarelevant to evaluation of carcinogenic potential of theuntested chemical are considered. These include a)SAR knowledge base of the related chemicals; b)toxicokinetics and toxicodynamics parameters (in-cluding physicochemical properties, route of potentialexposure, and mode of activation or detoxification)that affect the delivery of biologically active inter-mediates to target tissue(s) for interaction withcellular macromolecules or receptors; c) supportivenoncancer screening or predictive data known tocorrelate to carcinogenic activity.”

OncoLogic is a computer program consisting of fourindependent subsystems for estimating the carcino-genicity of fibers, metals or metal-containing com-pounds, polymers, and organics. Each subsystem hasa hierarchical, decision-tree construction consistingof “IF-THEN-ELSE” rules that attempt to mimic thereasoning of the human experts. This reasoning goesbeyond the recognition of specific SAs to considergeneral reactivity properties of the chemical class,structural modulators to activity, metabolic activa-tion, and mechanisms of chemical carcinogenesis. Anenhancement allows for consideration of functional,noncancer toxicity data (e.g., genotoxicity, oncogeneactivation, P-450 induction, etc.) in the overall deci-sion tree to improve the chemical carcinogenicityevaluation capabilities. The organics subsystem inOncoLogic is by far the largest and most welldeveloped of the four subsystems, with separate anddistinct modules for nearly 50 chemical classes(examples include acrylates, aldehydes, and aromaticamines), although these modules vary considerablyin coverage and information content.

OncoLogic differs from other prediction systems inthat a query molecule is not entered at the start ofthe analysis. Rather, a carcinogenicity evaluation ofan organic chemical begins with user assignment ofthe chemical to one of the predefined chemical classesand proceeds through selection of structural tem-plates or user-drawn entry of structures within theconstraints of the chosen class. Finally, the programproduces, as its primary output, a detailed justifica-tion report in which the discreet program rules areconverted into a dialogue that intelligibly conveys themechanism-based expert reasoning underlying thesemiquantitative evaluation. OncoLogic rules are allbased on qualitative associations with chemical struc-ture, i.e., the program has no capabilities for comput-ing physical chemical properties. References to On-coLogic are 148 and 149.

Another rule-based expert system is DeductiveEstimation of Risk from Existing Knowledge(DEREK), which is the result of a nonprofit col-laboration among the University of Leeds, and vari-ous other educational and commercial institutions,who contribute to the review and evolution of thetoxicity rule bases. Also confidential in-house infor-mation from industries is used. In DEREK, rules (ofthe type “IF-THEN-ELSE”) associate particular chemi-cal functional groups, or SAs, with various forms oftoxicity. The rules are not chemical-specific; ratherthey are generalizations with respect to chemicalstructure (e.g., halogen-containing, acid, or alkylating

agent). The resulting generalized structural featuresused in prediction are termed toxophores.

The toxicological end points currently covered bythe DEREK system include carcinogenicity, mutage-nicity, skin sensitization, irritancy, teratogenicity,and neurotoxicity. Each toxicity endpoint consults adifferent rule base and a set of toxophores. Tointerrogate the system, the query structure is en-tered, and the rulebase is searched by comparingstructural features in the target compound with thetoxophores described in its knowledge base. Any SAlocated within the query structure is highlighted, anda message indicating the nature of the toxicologicalhazard is provided. References to DEREK are 150and 151.

4.3. The Predictive Ability of the QSARs forNoncongeneric Chemicals

The classical QSARs for congeneric classes ofchemicals have a wide scope. In fact, they provideinformation on the action mechanisms, by pointingto the crucial properties/features of the chemicals.They point to outliers, thus helping to identifypossible subsets of chemicals that either have differ-ent mechanisms of action or need more carefulexperimental testing. Finally, they permit predictionof the activity of chemicals, of the same class, whichare devoid of experimental data. On the other hand,the “general” approaches for noncongeneric chemicalsare able to provide little or no mechanistic informa-tion, and their value mainly resides in their abilityto generate correct predictions of activity for untestedchemicals. Thus, the assessment of the predictiveability is crucial to their use for risk assessmentpurposes. Usually, the authors report some accuracymeasure of the system performance, which variesaccording to the type of method. A popular statisticalapproach is to split the available database into atraining and a test set of chemicals: the model isbuilt using only the training set, and then thepredictive ability is checked by applying the modelto the test set. The procedure is applied several times(internal validation, cross-validation).15,152 A morestringent criterion is external validation: the predic-tions are performed by applying the model to com-pounds whose experimental results did not exist orwere unknown to the investigators when the modelwas generated.

Several external validation exercises relative toQSARs for noncongeneric chemicals are available inthe literature, and all together they contribute toassess the real value of the different approaches, and,most importantly, to point to the crucial issues in thisfield. Some of the external validation studies consid-ered one (or a few) system(s). A number of studieswere prediction exercises, in which a range of systemswere challenged to predict the toxicity of a commonset of chemicals, thus permitting one to compare thesystems directly to each other. Particularly importantwere three comparative exercises held under theaegis of the U.S. National Toxicology Program (NTP).These were prospective prediction exercises: theexercises invited the modeling community to submitpredictions on the rodent carcinogenicity (two stud-

SAR Studies of Chemical Mutagens and Carcinogens Chemical Reviews, 2005, Vol. 105, No. 5 1791

Page 26: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

ies) or S. typhimurium mutagenicity (one study) ofchemicals that were in the process of being assayedby the NTP. It should be remarked that the experi-mental results were not known at the time of theprediction, thus contributing to the unbiased char-acter of the assessment. I will present a summary ofthe various validation studies, and, because of theirimportance, I will expand more on the NTP studies.

In the prediction exercises that will be presentedin this paper, the systems generated dichotomouspredictions of the toxicity of the chemicals in the formyes/no, positive/negative. A comprehensive descrip-tion of the predictive ability of a model that producessuch dichotomous results is obtained through thecalculation of several performance indices, namely:(a) accuracy, or concordance (percentage of chemicalscorrectly predicted); (b) sensitivity (percentage ofpositives correctly predicted); (c) specificity (percent-age of negatives correctly predicted); (d) positivepredictivity (percentage of chemicals correctly pre-dicted as positive, out of all the chemicals predictedas positive); (e) negative predictivity (percentage ofchemicals correctly predicted as negative, out of allthe chemicals predicted as negative). Here, positiveand negative indicate the chemicals that resulted as,respectively, toxic or nontoxic in the appropriateexperiments. The above indices are very useful for adetailed analysis of a system; however, they do notpermit an easy and direct comparison of a range ofsystems. For this reason, I will use mainly thereceiver operating characteristic (ROC) space graphs,which have the advantage of comparing simulta-neously the different aspects of the performance ofseveral systems. For example, the accuracy indexalone does not distinguish between positive andnegative predictions and is influenced by the perfor-mance on the most numerous class, whereas the axesof the ROC graphs display independently the infor-mation relative to the prediction of positive andnegative chemicals. In a ROC plot, true positive rate(or sensitivity) is plotted against the false positiverate (1 - specificity). According to the ROC curvetheory, the diagonal line represents random re-sponses, whereas the top left corner is obviously theideal performance. Thus, the most finely tunedsystems are those in the left upper triangle, as closeas possible to the corner.153

4.3.1. Mutagenicity Prediction Models: External ValidationStudies

The results of several external validation studiesare summarized in the ROC graph of Figure 1. Allthe studies refer to applications to bacterial mutage-nicity in the S. typhimurium assay.

The predictive ability of the expert system DEREKand of the statistically based MULTICASE andTOPKAT systems were compared in ref 154. Theyselected two databases of chemicals tested for S.typhimurium mutagenicity; the data were retrievedfrom open literature and U.S. government toxicitydatabases. A first database included 123 pharma-ceutical drugs (codes DEREK1, MCASE1 and TOP-KAT1 in Figure 1); the second one consisted of 516nondrug chemicals (codes DEREK2, MCASE2, andTOPKAT2).

In another study, TOPKAT and DEREK werestudied in ref 155. The results of over 400 mutage-nicity tests conducted at Glaxo Wellcome during aperiod of 15 years were compared with the mutage-nicity predictions of both computer programs. DEREKpredicted the mutagenicity of 409 compounds (codeDEREKc); TOPKAT predicted the mutagenicty of 303compounds (code TOPKATc), since many of the GlaxoWellcome chemicals were either out of the optimumprediction space of TOPKAT, or the prediction wasin the indeterminate region.

A number of other validation studies were reportedin ref 156. In a study, 169 new proprietary Novartispharmaceutical candidate compounds were predictedwith DEREK (code DEREKn) and MULTICASE(code MCASEn). A second study regarded 44 simplearomatic amines, whose bacterial mutagenicity waspredicted with DEREK (code DEREKa) and MUL-TICASE (code MCASEa). A third study regarded theprediction of 27 mutagenicity tests, performed inBayer Toxicology, with DEREK (code DEREKb).Another study regarded the application of DEREK(code DEREKi) to the prediction of 54 mutagenicitystudies retrieved from the IUCLID database.

The evaluation of 83 chemicals from the openliterature with TOPKAT (code TOPKATm) was per-formed in ref 157.

DEREK (code DEREKh) was evaluated for theprediction of the S. typhimurium mutagenicity of 44chemicals in ref 158.

Finally, the ability to predict the Salmonella mu-tagenicity of 394 marketed pharmaceuticals throughDEREK, TOPKAT and MCASE (codes: DEREKs,TOPKATs, MCASEs) was assessed in ref 159.

4.3.2. Carcinogenicity Prediction Models: ExternalValidation Studies

The results of this series of external validationstudies are reported in Figure 2.

The rodent carcinogenicity of a set of 142 pharma-ceutical drugs was predicted with different systemsin ref 154. The systems were (a) DEREK (code

Figure 1. The ROC graph displays the performance of anumber of prediction approaches for S. typhimuriummutagenicity, as resulted from different validation exer-cises on external sets of chemicals. The details of thedifferent studies and the codes of the systems are in thetext.

1792 Chemical Reviews, 2005, Vol. 105, No. 5 Benigni

Page 27: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

DEREKp); (b) MULTICASE in the version specifi-cally adapted by FDA to the prediction of drugcarcinogenicity (codes: MCASE1, if a drug wasconsidered carcinogenic when one rodent experimen-tal group was positive; MCASE2, if a drug wasconsidered carcinogenic when at least two rodentexperimental groups were positive); (c) TOPKAT(codes: TOPKAT1, if a drug was considered carci-nogenic when one rodent experimental group waspositive; TOPKAT2, if a drug was considered carci-nogenic when at least two rodent experimentalgroups were positive).

TOPKAT was tested also in ref 160 on a set of 117chemicals bioassayed by NTP (code: TOPKATp).

Three external evaluation studies regarded thepredictivity of DEREK. In the first study (code:DEREKb), 30 chronic rat studies performed by BayerToxicology were evaluated.156 In another study,DEREK (code: DEREKi) predicted the rodent carci-nogenicity of 61 compounds whose data were ex-tracted from the IUCLID database.156 A third studyregarded 29 compounds from the IPCS database(code: DEREKh).158

Finally, a set of 273 pharmaceutical drugs wasexamined with the human expert approach in ref110: they inspected the chemical formulas andreasoned in terms of chemical analogy with knowncarcinogens and noncarcinogens (code: human ex-pert).

4.3.3. Considerations on the External Validation ExercisesThe inspection of Figures 1 and 2 provides useful

insights into the issue of how much the variousproposed “general” prediction systems can predict themutagenicity and carcinogenicity of chemicals. Itappears that the number of studies available are insufficient number to draw some conclusions. More-over, the studies were essentially concentrated on theuse of the three most popular commercial systems,namely, MULTICASE, DEREK, and TOPKAT.

Recalling that in a ROC graph the diagonal linerepresents random results and that the ideal perfor-

mance is the top left corner, it also appears that, asa general trend, almost all results are in the “good”side of the diagonal line. Moreover, the cloud ofmutagenicity validation studies (Figure 1) is distrib-uted more closely to the ideal performance than isthe cloud of carcinogenicity validation studies (Figure2); this indicates that, as a trend, the prediction ofmutagenic activity was more successful than was theprediction of carcinogenic activity.

The deceiving evidence is that the predictionperformance of the systems is highly dependent onthe set of chemicals used for the exercise. This canbe easily recognized by following, in each figure, theperformance of the individual systems. For example,in Figure 1 each system spans virtually the entirespace in the left corner of the ROC graph, rangingfrom quite limited to quite good performance. Thishigh variability is a seriously negative result sinceit indicates that the user can have very little confi-dence in the degree of reliability of a prediction.

Some of the prediction exercises in Figures 1 and2 compared directly two or three systems, using thesame set of chemicals to be predicted. For example,in ref 154, TOPKAT, MULTICASE and DEREK werechallenged to predict the mutagenicity of two sepa-rate sets of chemicals (Figure 1: TOPKAT1, MCASE1,and DEREK1; TOPKAT2, MCASE2, and DEREK2)and the carcinogenicity of one set of chemicals (Figure2: DEREKp, MCASE1, TOPKAT1, MCASE2, TOP-KAT2). The other example is ref 155, where TOPKATand DEREK were compared (Figure 1: TOPKATc,DEREKc). In the study in ref 154, it appears thatMULTICASE performed constantly better than theother systems; however, the interest of this evidenceis weakened by the general result of the highlyvariable character of the performance of all systems.

4.3.4. NTP Prospective Prediction Exercises

The US NTP sponsored comparative exercises onthe prediction of (a) S. typhimurium (Ames test)mutagenicity (one exercise); and (b) rodent carcino-genicity (two exercises). The chemicals selected forthese exercises were those assayed by NTP. The NTPexercises were characterized by the fact that (a) allthe chemicals were tested with the same standard-ized protocol, thus contributing to the quality of thebiological data; (b) the experimental results wereunknown when the predictions were formulated bythe various investigators. For example, several ro-dent bioassay results were obtained several yearsafter the publication of the predictions. This aspect(prospective prediction) contributed strongly to theunbiased character and to the special interest of theevidence produced by the NTP exercises.

4.3.4.1. NTP Prediction Exercise on S. ty-phimurium Mutagenicity. The predictivity for S.typhimurium mutagenesis of two computer-basedsystems (TOPKAT and CASE), one physicochemicalscreening test (Ke), and one human expert (noncom-puter-based) system were compared using 100 chemi-cals.161 The chemicals were tested by the NTP, andtheir results had not yet been published at the timeof the predictions. The results of the exercise aresummarized in Table 1 and Figure 3.

Figure 2. The ROC graph displays the performance of anumber of prediction approaches for the rodent bioassay(carcinogenesis), as resulted from different validationexercises on external sets of chemicals. The details of thedifferent studies and the codes of the systems are in thetext.

SAR Studies of Chemical Mutagens and Carcinogens Chemical Reviews, 2005, Vol. 105, No. 5 1793

Page 28: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

TOPKAT and CASE were described above in thispaper. In this exercise, CASE was applied in twoversions: (1) CASE was trained on a compilation of820 chemicals tested for mutagenicity by the NTP(code: CASE/n); (2) CASE was trained on 808 mu-tagenicity results evaluated by the US-EPA Gen-ToxProgram (code: CASE/e). There was partial overlap-ping between the two databases. The physiochemicalmeasurement of Ke (the electron rate attachmentconstant) describes the potential electrophilicity of achemical and was performed by Bakale and Mc-Creary as described in ref 162 (code: Ke). The codeKeb indicates a set of predictions in which borderlineor indeterminate results were considered as negative.The human expert assessment was performed byAshby, by identifying the chemical SAs as describedin refs 8 and 163. Ashby produced two sets ofpredictions: (1) chemicals were checked against thelist of SAs (code: SA); (2) in a secondary expertjudgment, the identification of SAs was correctedwith mechanistic inferences about the probability thepotential electrophilicity of the SAs was actuallyexpressed in S. typhimurium (code: SA+).

The inspection of Table 1 and Figure 3 indicatesthat the human expert approach and the two com-puter-based systems produced equivalent results(71-76% concordance with S. typhimurium results),whereas the physicochemical system (Ke) produceda lower concordance (60-61%).

4.3.4.2. The First NTP Prediction Exercise onRodent Carcinogenicity. For this exercise, 44compounds of different chemical classes were se-lected.164 The predictions presented were based ondifferent approaches: QSAR, human experts, andexperimental (see Table 2 and Figure 4).

The computerized approaches TOPKAT,141 MUL-TICASE,165 and DEREK150 have been presentedabove. Other QSAR approaches were COMPACT andBenigni’s approach. The computer-optimized molec-ular parametric analysis of chemical toxicity (COM-PACT) system tried to evaluate the ability of chemi-cals either to be a substrate for the cytochrome P-450I family metabolic enzymes or to be able to interactwith the Ah receptor, based on the analysis of spatialand electronic properties.107 Benigni presented pre-dictions based on the combination of two types ofinformation: (a) the Bakale’s electrophilicity index(Ke) of the chemicals, theoretically estimated; (b) thepresence of SAs in the molecules.166 The experimentalsystems were Ke167 and S. typhimurium mutagenic-ity.168 Sets of predictions were generated by humanexperts as well. The human experts Tennant andAshby AAR combined the information on SAs withother types of information: general and target organtoxicity in rodents, Salmonella mutagenicity, and

Table 1. Concordance (Accuracy) of the QSARPredictions and S. typhimurium MutagenicityResultsa

system concordance

SA 0.72SA+ 0.76Ke 0.60Keb 0.61TOPKAT 0.74CASE/n 0.76CASE/e 0.71

a In a prospective prediction exercise held under the aegisof the NTP (see details in the text).

Figure 3. The ROC graph shows the performance ofdifferent prediction approaches for S. typhimurium mu-tagenicity, based on a prospective comparative exerciseheld under the aegis of NTP. The details of the study andthe codes of the systems are in the text.

Table 2. Concordance (Accuracy) of the QSARPredictions and Rodent Bioassay Resultsa

system concordance

Tennant/Ashby 0.75RASH 0.68Weisburger 0.65Ke 0.65DEREK 0.59TOPKAT 0.57Benigni 0.57S. typhimurium 0.57DEREKh 0.56Lijinsky 0.55COMPACT 0.53CASE/MCASE 0.49

a In the first prospective prediction exercise held under theaegis of the NTP (see details in the text).

Figure 4. The ROC graph shows the performance ofdifferent prediction approaches for the rodent bioassay(carcinogenesis), based on the first prospective comparativeexercise held under the aegis of NTP. The details of thestudy and the codes of the systems are in the text.

1794 Chemical Reviews, 2005, Vol. 105, No. 5 Benigni

Page 29: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

previous carcinogenicity data, when available in theliterature.164 The RASH (RApid Screening of Hazard)approach consisted of a modification of the Tennantapproach through alternative means for ranking toxicpotencies.169 Weisburger168 and Lijinsky168 providedtwo sets of predictions based on expert judgment,using as their initial information the chemical struc-ture of the NTP chemicals.

Results and analyses are in refs 22 and 170-172.It appeared that most of the prediction systems wereconcordant in the identification of the powerfulcarcinogens, whereas several noncarcinogens werepredicted to be positive by different systems. Thisexcessive sensitivity was attributed to the presenceof alerting substructures in the noncarcinogens whichwere erroneously predicted. The various approachesessentially acted as gross “class-identifiers”: theypointed to the presence or absence of alerting chemi-cal functionalities but were not able to make grada-tions within each potentially harmful class.22

The overall accuracy of the predictions is reportedin Table 2. For approaches that relied solely oninformation derived from the chemical structure, theoverall accuracy in terms of positive or negativepredictions was in the range 50-65%, whereas theTennant and Ashby approach attained 75% accuracy.

4.3.4.3. The Second NTP Prediction Exerciseon Rodent Carcinogenicity. A second comparativeexercise on the prediction of rodent carcinogenicitywas devised by the NTP, regarding 30 chemicals inthe progress of being bioassayed.173 The participatingsystems were very different. Several were humanexperts. The OncoLogic team at the U.S. EPA usedthe computerized expert system OncoLogic, in con-junction with human expertise; in fact OncoLogic (seedescription above) does not have rules for all chemicalclasses.174 Huff et al. considered the presence of SAsand analogy with known carcinogens and noncar-cinogens, previous bioassays when available, sub-chronic toxicity and relative toxicity, and reported orpredicted metabolites.175 Benigni et al. relied mostlyon chemical analogy reasoning.176 The expert judg-ment of Tennant and Spalding177 and of Ashby178

used largely biological evidence relative to the chemi-cals, together with SAs. Purdy179 employed a combi-nation of human expert rules and structure-activityinformation, as did the COMPACT/HAZARDEX-PERT approach.108 Human experts predictions wereformulated also by Bootman180 and the RASH ap-proach.181 A QSAR approach was FALS, involving thegeneration of eight submodels method, the universeof chemicals (from the point of view of carcinogenicitymodeling) being divided into eight chemical classes.182

Two computerized models based their predictions onorgan-specific toxicity of the chemicals (R1, R2).183

Predictions were also elaborated with the artificialintelligence systems MULTICASE184 and PROGOL136

and with the rule-based expert system DEREK.185

Finally, experimental data were produced with twoshort-term screening tests: (1) transformation assayin Syrian hamster embryo [SHE] cells;186 and (2) S.typhimurium mutagenicity assay.173

Results and analyses are in ref 187; they aresummarized in Table 3 and Figure 5. As in the first

comparative exercise, several noncarcinogens werepredicted as carcinogens: these chemicals either hadSAs or a clear analogy with known carcinogens. Thus,it appears that the prediction approaches were oftenunable to make gradations between potential andactual carcinogenicity. Table 3 and Figure 5 showthat the human expert-based predictions performedbest overall, especially those methods that incorpo-rated the most information. In addition, the SHEassay, an experimental system specifically designedto incorporate key elements of the transformationprocess that a cell can undergo in becoming malig-nant, was among the best performing methods. Thehighest overall accuracy in this second exercise wasin the range 65-70%.

4.3.5. The Predictive Toxicology Challenge 2000−2001This exercise on the prediction of rodent carcino-

genicity was characterized by the fact that all meth-ods applied machine learning (or artificial intelli-gence) methods. In addition, a same training set ofmolecules was provided to the participants: thesewere about 500 compounds from the NTP bioassay

Table 3. Concordance (Accuracy) of the QSARPredictions and Rodent Bioassay Resultsa

system concordance

OncoLogic 0.65SHE 0.65R1 0.64Huff et al. 0.62R2 0.61Benigni et al. 0.61Tennant et al. 0.60Ashby 0.57Bootman 0.53FALS 0.50RASH 0.45COMPACT 0.43DEREK 0.43S. typhimurium 0.33Purdy 0.32Progol 0.29MCASE 0.25

a In the second prospective prediction exercise held underthe aegis of the NTP (see details in the text).

Figure 5. The ROC graph shows the performance ofdifferent prediction approaches for the rodent bioassay(carcinogenesis), based on the second prospective compara-tive exercise held under the aegis of NTP. The details ofthe study and the codes of the systems are in the text.

SAR Studies of Chemical Mutagens and Carcinogens Chemical Reviews, 2005, Vol. 105, No. 5 1795

Page 30: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

experimentation. Carcinogenicity classifications formale and female rats and mice were provided sepa-rately to account for sex and species-specific differ-ences. Chemical structures were available as SMILESstrings and MDL Molfiles. The test set consisted of185 chemicals from the U.S. Food and Drug Admin-istration (FDA). The participating groups were 14,which provided in a first phase sets of chemicaldescriptors and models for the training set, and, ina second phase, the sets of predictions. The descrip-tors included molecular properties, information aboutthe presence/absence of substructures and functionalgroups (predefined and automatically generated fromthe data), graph theoretical indices, and 3-D param-eters. Summaries and data can be found at http://www.predictive-toxicology.org/ptc/.

The 14 groups generated 111 models for the 4rodent experimental groups. Five models performedbetter than random guessing (at a significance levelp of 0.05), and they are shown in Figure 6. Statisticalevaluations are cited in ref 188 according to whichthe three best predictive models included a small butsignificant amount of empirically learned toxicologi-cal knowledge. For further comments, see also ref189.

5. Conclusions

It appears that the evidence on SARs relative tothe mutagenic and carcinogenic properties of thechemicals is extremely rich. A large number offunctional groups and/or chemical substructures ableto elicit toxic effects have been recognized. Whereasthese SAs define only the potential for the chemicalsto be carcinogenic or mutagenic, the actual modula-tion of this potential needs to be modeled within theindividual chemical classes, for each experimental

system. As a matter of fact, a considerable numberof QSARs for individual chemical classes are nowavailable. Many focus on in vitro mutagenicity;however, a number of QSAR models for the animalcarcinogenicity exist as well. Overall, they provide aconsistent picture of the genotoxic mechanisms oftoxicity of the chemical mutagens and carcinogens.On the contrary, QSAR research on nongenotoxiccarcinogens is still in its infancy. The most importantconclusion to be made from the studies on genotoxiccarcinogens is the great importance of hydrophobicityin the modulation of the potential for mutagenicityand carcinogenicity. Hansch and colleagues haveshown that for the compounds that require S9activation to become mutagenic in bacteria, all havelog P terms with coefficients near 1.0.18 Other QSARsshow that where a direct chemical reaction with DNAappears to be occurring, without metabolic activation,there is not a positive hydrophobic term in theequation (see, for example, the QSARs relative to themutagenicity of aniloacridines, cis-platinum ana-logues, lactones, and epoxides).190

It should be remarked that QSAR results do notrepresent causal relationships, so that very carefulevaluation and interpretation are absolutely neces-sary. However, they are very powerful tools to extractand systematize information from data to obtainhypotheses that can be put to experimental test. Atthe same time, the QSAR results should be viewedas a precious complement to the mechanistic infor-mation already existing. In this respect, inter-pretability and the systematic comparison ofQSARs (lateral validation) are of the utmost impor-tance.15,190,191

An exercise of mechanistic comparative QSAR hasbeen performed relative to the aromatic amines.These chemicals have a great environmental andindustrial importance, so a large database of experi-mental results and different QSARs are available.The comparison of QSARs showed that (1) for bothbacterial mutagenicity and rodent carcinogenicity,the potency gradation of the active aromatic aminesdepends first on hydrophobicity and second on elec-tronic and steric properties. This confirms the exist-ence of a common first step in the mutagenicity andcarcinogenicity of these compounds; (2) the QSARsfor the potency of the active aromatic amines werenot suitable for differentiating the inactives from theactives; (3) the QSARs for the aspecific toxicity of thearomatic amines in a variety of experimental systemswere much simpler than those for bacterial mutage-nicity and rodent carcinogenicity and usually reliedonly on hydrophobicity.192

The QSAR approaches for the evaluation of thetoxic potential of noncongeneric sets of chemicalsdeserve a discussion on their own. The amount ofresults from exercises on the prediction of externaldatasets is now large enough to permit a number ofconclusions.

A deceiving aspect of it is the extremely largevariability of the accuracy of the predictions: Figures1 and 2 show that one system can produce very goodor very bad predictions, depending on the specific setof query chemicals to which it is applied. This is a

Figure 6. The ROC graph shows the performance of anumber of machine learning-based systems in the Predic-tive Toxicology Challenge 2000-2001 (rodent carcinogen-esis). The details of the study are in the text. The namesin the codes (Leuven, Kwansei, Baurin, Viniti) refer to theindividual scientists that submitted the predictions; theirsystems are fully explained in the site: http://www.pre-dictive-toxicology.org/ptc/. The second part of the codesindicate mm ) mouse, male, fr ) rat, female, fm ) mouse,female.

1796 Chemical Reviews, 2005, Vol. 105, No. 5 Benigni

Page 31: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

serious drawback, especially if one thinks that sev-eral of the prediction systems have built-in alerts,aimed at informing the user to what extent a querycompound “resembles” the chemicals used for train-ing the system itself. The correct definition of thesystem applicability domain is crucial in this respect.However, this issue has no easy solution. It shouldbe emphasized that a QSAR model has a purelyempirical character: it is relative to the chemicalspace from which it has been derived and is not theexpression of “universal” scientific laws (such as, forexample, the gravitational or electromagnetic inter-actions). There is no possibility, a priori, of definingin a exact way a chemical series; only the combina-tion of chemical information and of biological datacan lead to its definition.193 On the contrary, thepossibility of establishing a QSAR for a group ofmolecules is a further a posteriori support to the ideathat they belong to the same chemical series. Sincethe foundation of the QSAR science, it has beenclearly recognized that a QSAR, if carefully con-structed, can be used to make predictions within therange of parameters values spanned by the trainingset, whereas nothing certain can be said about howfar outside that space it will continue to be valid.194

In addition, experience shows that sometimes it ispossible to devise new chemicals, which, while stillbeing within the parameters space of the training set,have substituents that change radically the actionmechanisms, so the new chemicals are not predictedcorrectly by the QSAR model. Let us recall again anexample already reported in this paper. Glende etal.45 synthesized alkyl-substituted (ortho to the aminofunction) aromatic amines not included in originaldatabase used by Hansch and colleagues42 to developa QSAR for the bacterial mutagenicity. Most of thenew chemicals had descriptors values in the rangeof original chemicals. The bulky alkyl substituentsdecreased the mutagenicity of the arylamines, prob-ably because of the steric hindrance of the metabolicoxidation of the amino group by the enzymes; thusthe predicted and experimental values differed con-siderably. The quoted example shows that the defini-tion of the applicability domain may involve not onlythe range of parameters values but also extra infor-mation of mechanistic/structural nature.

In general, the models for noncongeneric chemicalsare quite inferior, as quality, to the classical QSARsfor congeneric series. This is not surprising since aQSAR for congeners is aimed at modeling only onephenomenon, whereas the general models for thenoncongeneric sets try to model at the same timeseveral mechanisms of action, each relative to a classof carcinogens. This is a much more difficult task.117

In addition, successful modeling of noncongenersrequires the availability of an adequate number ofrepresentatives for each chemical class. Whereas forsome classes (e.g., aromatic amines) enough chemi-cals have been bioassayed, for most of the chemicalclasses only sparse data exist. A few years ago, wecollected a database of about 800 chemicals testedfor rodent carcinogenicity: the relative numerosityof the most represented classes was (1) aromaticamines n ) 197; (2) nitroaromatics n ) 32; (3)

halogenated alkanes n ) 27; (4) halogenated alkenesn ) 9; halogenated alcohols n ) 8, thus pointing to aremarkable under-representation of many importantclasses (our unpublished observation). No matter howsophisticated are the modeling approaches, theycannot be expected to overcome inadequate learningsets.

It should be remarked that an additional source ofdifficulty for the prediction systems is the evolvingpattern of chemicals in use. Chemicals that play apractical role in our life (e.g., pharmaceutical drugs,dyes) change over the years to follow the evolutionof research or fashion. In addition, there is a deliber-ate effort to design “safer” chemicals: the newknowledge of toxicity mechanisms is applied early onin chemical development, and its effect is to filterknown and predictable toxicity mechanisms out ofthe new chemical pool and shift the composition ofnew chemicals to structures with unanticipated orless obvious modes of toxicity. For example, in thefirst NTP exercise on the prediction of carcinogens,many chemicals were mutagenic in S. typhimurium,whereas only a minority of the second NTP exercisechemicals were mutagenic. Thus, the mechanisticand chemical characteristics of the historical carci-nogenicity database used as learning set by thevarious prediction approaches is somehow “back-ward” in respect to the need of predicting new typesof chemicals.

Regarding the accuracy of the predictions obtainedfor noncongeneric sets of chemicals, the results of theNTP comparative exercises showed that, in all cases,the best performance was attained by approachesthat relied largely on the human expert judgment.In the case of rodent carcinogenicity, a reasonableupper limit for accuracy of the available technologiesis around 65%.187 A further confirmation of thedifficulty of crystallizing the structure-activity no-tions into automatic prediction tools comes from theexperience of the Predictive Toxicology Challenge2000-2001 (see above): the application of sophisti-cated artificial intelligence methods to grasp theinformation contained implicitly in the carcinogenic-ity databases generated only 5 nonrandom sets ofpredictions out of the 111 originally elaborated.188

The human experts are likely to be more free andflexible in the use of different sources of information,including the general principles of chemistry andbiochemistry, and adapt better to issues that requirethe simultaneous use of knowledge at differenthierarchical levels. Obviously, the opposite is true inthe case of the individual chemical classes, where thepotency of the QSAR methods can model subtlegradations in a much more efficient way than the eyeof the human expert.

Both in the first and second NTP carcinogenicityexercises, several noncarcinogens were predicted ascarcinogens. Most often, these false-positive chemi-cals either had known SAs or showed a clear analogywith known carcinogens.187 Thus, it appears that theweak side of the prediction approaches was theirtendency to be unable to make gradations betweenpotential and actual carcinogenicity. This evidencegoes hand in hand with and finds a partial explana-

SAR Studies of Chemical Mutagens and Carcinogens Chemical Reviews, 2005, Vol. 105, No. 5 1797

Page 32: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

tion in the selection strategy for bioassays by NTP.Because of obvious practical limitations, the selectionprocess for chemicals tested in the rodent bioassayhas always been biased toward chemicals suspectedof potential carcinogenicity, with the consequencethat also the chemicals considered in the comparativeexercises were not random samples from the universeof chemicals but were particularly “difficult” test sets,including several suspect structures (e.g., noncar-cinogens with SAs). The recognition of this fact maymitigate the limited performance of the predictionsand may suggest that the general level of ourknowledge is more satisfactory than that expressedby the accuracy figures.

What is the role today of the QSAR research in thefield of mutagens and carcinogens? The answercannot be as simple as yes or no. The pessimistic sideof it is that the predictions for the individual chemi-cals cannot be taken at face value and cannot replacethe experiments, when necessary. However, a sim-plistic view of QSAR should be rejected. QSAR is notan automatic device for generating predictions. Hereapplies the same reflection made by Franke regard-ing the search for new drugs: “As the drug discoveryprocess is of a very complex nature, effective drugdesign requires an entire spectrum of techniques inwhich QSAR methods still play an important role....The real power of drug design methods is to extractand synthesize information from data to obtainhypotheses that can be put to experimental test. Nodramatic overnight discoveries of wonder drug willresult, but an increase in the chance of success dueto indications of promising directions is a realisticexpectation....”15 Thus, the qualitative knowledge ofSARs, together with the generation of QSARs (whenapplicable), constitutes a large body of evidence thatcontributes to both the scientific research on mech-anisms and the practice of risk assessment. In recentyears, this has been proven brilliantly by the experi-ence of the NTP in their process of setting prioritiesfor the rodent experimentation. Two-thirds of thechemicals were selected based on “suspicions” abouttheir toxicity (mainly mutagenicity evidence andstructure-activity reasoning) and one-third werebased on production/exposure considerations. It hasbeen found that the proportion of carcinogens amongthe “suspected” chemicals was almost 10 times higherthan that relative to the chemicals selected only onproduction/exposure considerations.195 This is surelya great success obtained largely with the use ofstructure-activity concepts: thus, whereas assess-ments relative to individual chemicals should not betaken at face value, at the level of large numbers ofchemicals the careful use of QSAR can providesubstantial support. Within this context, the com-mercially available “all purpose” prediction software,together with “intelligent” databases, can be a usefulsupport for the expert judgment as well, providedthat they offer “transparent” predictions and not onlyblack box responses. Transparency includes declaringthe SAs and rules used for formulating the predic-tions, as well as the list of chemicals, with knownactivity, similar to the query chemical (possiblytogether with a similarity measure).

6. References(1) Tomatis, L.; Huff, J.; Hertz-Picciotto, I.; Sandler, D. P.; Bucher,

J.; Boffetta, P.; Axelson, O.; Blair, A.; Taylor, J.; Stayner, L.;Barrett, J. C. Carcinogenesis 1997, 18, 97.

(2) Tomatis, L.; Huff, J. Environ. Health Perspect. 2001, 109, 5.(3) Arcos, J. C.; Argus, M. F.; Woo, Y.-T. Chemical Induction of

Cancer, Modulation and Combination Effects: An Inventory ofthe Many Factors which Influence Carcinogenesis; Birkhauser:Boston, 1995.

(4) Woo, Y. T.; Arcos, J. C. Carcinogenicity and Pesticides: Prin-ciples, Issues, and Relationships; American Chemical Society:Washington, DC, 1989; Chapter 11, p 175.

(5) Arcos, J. C.; Argus, M. F. Chemical Induction of Cancer.Modulation and Combination Effects; Birkhauser: Boston, 1995;p 1.

(6) Woo, Y. T. Quantitative Structure-Activity Relationship (QSAR)Models of Mutagens and Carcinogens; CRC Press: Boca Raton,2003; Chapter 2, p 41.

(7) Miller, J. A.; Miller, E. C. Origins of Human Cancer; Cold SpringHarbor Laboratory: Cold Spring Harbor, 1977; p 605.

(8) Ashby, J. Environ. Mutagen. 1995, 7, 919.(9) Woo, Y. T.; Lai, D. Y.; McLain, J. L.; Ko Manibusan, M.; Dellarco,

V. Environ. Health Perspect. 2002, 110, 75.(10) Ariens, E. J. Drug Metab. Rev. 1984, 15, 425.(11) Hansch, C.; Leo, A. Exploring QSAR. 1. Fundamentals and

Applications in Chemistry and Biology; American ChemicalSociety: Washington, DC, 1995.

(12) Kubinyi, H. 3D QSAR in Drug Design: Theory, Methods andApplications; ESCOM: Leiden, 1993.

(13) Tute, M. S. Comprehensive Medicinal Chemistry; Pergamon:Oxford, 1990; Chapter 17.1, p 1.

(14) Franke, R. Theoretical Drug Design Methods; Elsevier: Amster-dam, 1984.

(15) Franke, R.; Gruska, A. Quantitative Structure-Activity Rela-tionhsip (QSAR) Models of Mutagens and Carcinogens; CRCPress: Boca Raton, 2003; Chapter 1, p 1.

(16) Benigni, R.; Giuliani, A. Med. Res. Rev. 1996, 16, 267.(17) Cronin, M. T. D.; Dearden, J. C. Quant. Struct.-Act. Relat. 1995,

14, 329.(18) Debnath, A. K.; Shusterman, A.; Lopez de Compadre, R. L.;

Hansch, C. Mutat. Res. 1994, 305, 63.(19) Hansch, C. Sci. Total Environ. 1991, 109/110, 17.(20) Passerini, L. Quantitative Structure-Activity Relationship (QSAR)

Models of Chemical Mutagens and Carcinogens; CRC Press:Boca Raton, 2003; Chapter 3, p 81.

(21) Zhang, L.; Sannes, K.; Shusterman, A. J.; Hansch, C. Chem. Biol.Interact. 1992, 81, 149.

(22) Benigni, R. Quantitative Structure-Activity Relationship (QSAR)Models of Mutagens and Carcinogens; CRC Press: Boca Raton,2003; Chapter 9, p 259.

(23) Richard, A. M.; Williams, C. R. Mutat. Res. 2002, 499, 27.(24) Richard, A. M.; Williams, C. R. Quantitative Structure-Activity

Relationship (QSAR) Models of Mutagens and Carcinogens; CRCPress: Boca Raton, 2003; Chapter 5, p 145.

(25) Lai, D. Y.; Woo, Y. T.; Argus, M. F.; Arcos, J. C. Designing SaferChemicals. Green Chemistry for Pollution Prevention; AmericanChemical Society: Washington, DC, 1996; Chapter 3, p 1.

(26) Sugimura, T. Mutat. Res. 1997, 376, 211.(27) Vineis, P.; Pirastu, R. Cancer Causes Control 1997, 8, 346.(28) Skog, K. I.; Johansson, M. A.; Jagerstad, M. I. Food Chem.

Toxicol. 1998, 36, 879.(29) Woo, Y. T.; Lai, D. Y. Patty’s Toxicology; John Wiley and Sons:

New York, 2001; Chapter 58, p 969.(30) Kadlubar, F. F.; Hammons, G. J. Mammalian Cytochromes

P-450; CRC Press: Boca Raton, 1987; Vol. II, p 81.(31) Beland, F. A.; Kadlubar, F. F. Chemical Carcinogenesis and

Mutagenesis I; Springer-Verlag: Berlin, 1990; Chapter 8, p 265.(32) Kadlubar, F. F.; Beland, F. A. Polycyclic Hydrocarbons and

Carcinogenesis; American Chemical Society: Washington, DC,1985; p 341.

(33) Tennant, R. W.; Margolin, B. H.; Shelby, M. D.; Zeiger, E.;Spalding, J. W.; Caspary, W.; Resnick, M. A.; Anderson, B.;Minor, R.; Haseman, J. K.; Stasiewicz, S. Science 1987, 236, 933.

(34) Zeiger, E. Cancer Res. 1987, 47, 1287.(35) Fetterman, B. A.; Soo Kim, B.; Margolin, B. H.; Schidcrout, J.

S.; Smith, M. G.; Wagner, S. M.; Zeiger, E. Environ. Mol.Mutagen. 1997, 29, 312.

(36) Chung, K.-T.; Kirkovsky, L.; Kirkovsky, A.; Purcell, W. P. Mutat.Res. 1997, 387, 1.

(37) Colvin, M. E.; Hatch, F. T.; Felton, J. S. Mutat. Res. 1998, 400,479.

(38) Benigni, R.; Giuliani, A.; Gruska, A.; Franke, R. QuantitativeStructure-Activity Relationship (QSAR) Models of Mutagensand Carcinogens; CRC Press: Boca Raton, 2003; Chapter 4, p125.

1798 Chemical Reviews, 2005, Vol. 105, No. 5 Benigni

Page 33: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

(39) Trieff, N. M.; Biagi, G. L.; Sadagopa Ramanujam, V. M.; Connor,T. H.; Cantelli-Forti, G.; Guerra, M. C.; Bunce, H., III; Legator,M. S. Mol. Toxicol. 1989, 2, 53.

(40) Ford, G. P.; Griffin, G. R. Chem. Biol. Interact. 1992, 81, 19.(41) Ford, G. P.; Herman, P. S. Chem. Biol. Interact. 1992, 81, 1.(42) Debnath, A. K.; Debnath, G.; Shusterman, A. J.; Hansch, C.

Environ. Mol. Mutagen. 1992, 19, 37.(43) Benigni, R.; Andreoli, C.; Giuliani, A. Environ. Mol. Mutagen.

1994, 24, 208.(44) Benigni, R.; Passerini, L.; Gallo, G.; Giorgi, F.; Cotta-Ramusino,

M. Environ. Mol. Mutagen. 1998, 32, 75.(45) Glende, C.; Schmitt, H.; Erdinger, L.; Engelhardt, G.; Boche, G.

Mutat. Res. 2001, 498, 19.(46) Klopman, G.; Frierson, M. R.; Rosenkranz, H. S. Environ.

Mutagen. 1985, 7, 625.(47) Zhang, Y. P.; Klopman, G.; Rosenkranz, H. S. Environ. Mol.

Mutagen. 1993, 21, 100.(48) Lewis, D. F. V.; Ioannides, C.; Walker, R.; Parke, D. V. Food

Addit. Contam. 1995, 12, 715.(49) Basak, S. C.; Grunwald, G. D. Chemosphere 1995, 31, 2529.(50) Basak, S. C.; Gute, B. D.; Grunwald, G. D. SAR QSAR Environ.

Res. 1999, 10, 117.(51) Hatch, F. T.; Knize, M. G.; Felton, J. S. Environ. Mol. Mutagen.

1991, 17, 4.(52) Hatch, F. T.; Colvin, M. E.; Seidl, E. T. Environ. Mol. Mutagen.

1996, 27, 314.(53) Hatch, F. T.; Colvin, M. E. Mutat. Res. 1997, 376, 87.(54) Felton, J. S.; Knize, M. G.; Hatch, F. T.; Tanga, M. J.; Colvin,

M. E. Cancer Lett. 1999, 143, 127.(55) Hatch, F. T.; Knize, M. G.; Colvin, M. E. Environ. Mol. Mutagen.

2001, 38, 268.(56) Maran, U.; Karelson, M.; Katritzky, A. R. Quant. Struct.-Act.

Relat. 1999, 18, 3.(57) Yuta, K.; Jurs, P. C. J. Med. Chem. 1981, 24, 241.(58) Loew, G. H.; Poulsen, M.; Kirkjian, E.; Ferrel, J.; Sudhindra, B.

S.; Rebagliati, M. Environ. Health Perspect. 1985, 61, 69.(59) Benigni, R.; Giuliani, A.; Franke, R.; Gruska, A. Chem. Rev.

2000, 100, 3697.(60) Franke, R.; Gruska, A.; Giuliani, A.; Benigni, R. Carcinogenesis

2001, 22, 1561.(61) Gold, L. S.; Manley, N. B.; Slone, T. H.; Rohrbach, L. Environ.

Health Perspect. Suppl. 1999, 107 (Suppl. 4), 527.(62) Vracko, M. J. Chem. Inf. Comput. Sci. 1997, 37, 1037.(63) Gini, G.; Lorenzini, M.; Benfenati, E.; Grasso, P.; Bruschi, M.

J. Chem. Inf. Comput. Sci. 1999, 39, 1076.(64) King, R. S.; Teitel, C. H.; Shaddock, J. G.; Casciano, D. A.;

Kadlubar, F. F. Cancer Lett. 1999, 143, 167.(65) Lopez de Compadre, R. L.; Debnath, A. K.; Shusterman, A. J.;

Hansch, C. Environ. Mol. Mutagen. 1990, 15, 44.(66) Biagi, L. G.; Hrelia, P.; Gerra, M. G.; Paolini, M.; Barbaro, A.

M.; Cantelli-Forti, G. Arch. Toxicol. 1986, Suppl. 9, 425.(67) Debnath, A. K.; Lopez de Compadre, R. L.; Shusterman, A. J.;

Hansch, C. Environ. Mol. Mutagen. 1992, 19, 53.(68) Walsh, D. B.; Claxton, L. D. Mutat. Res. 1987, 182, 55.(69) Stuper, A. J.; Bruegger, W. E.; Jurs, P. C. Computer Assisted

Studies of Chemical Structure and Biological Function; Wiley:New York, 1979.

(70) Maynard, A. T.; Pedersen, L. G.; Posner, H. S.; Mckinney, J. D.Mol. Pharmacol. 1986, 29, 629.

(71) Compadre, R. L.; Shusterman, A. J.; Hansch, C. Int. J. QuantumChem. 1988, 34, 91.

(72) Debnath, A. K.; de Compadre, R. L. L.; Debnath, G.; Shusterman,A.; Hansch, C. J. Med. Chem. 1991, 34, 786.

(73) Debnath, A. K.; Hansch, C. Environ. Mol. Mutagen. 1992, 20,140.

(74) Debnath, A. K.; Hansch, C.; Kim, K. H.; Martin, Y. C. J. Med.Chem. 1993, 36, 1009.

(75) Cramer, R. D., III; Patterson, D. E.; Bunce, J. D. J. Am. Chem.Soc. 1988, 110, 5959.

(76) Caliendo, G.; Fattorusso, C.; Greco, G.; Novellino, E.; Perissutti,E.; Santagada, V. SAR QSAR Environ. Res. 1995, 4, 21.

(77) Fan, M.; Byrd, C.; Compadre, C. M.; Compadre, R. L. SAR QSAREnviron. Res. 1998, 9, 187.

(78) Compadre, R. L.; Byrd, C.; Compadre, C. M. Comparative QSAR;Taylor and Francis, Ltd: London, 1998; p 111.

(79) Mersch-Sundermann, V.; Rosenkranz, H. S.; Klopman, G. Mutat.Res. 1994, 304, 271.

(80) Klein, M.; Voigtmann, U.; Haack, T.; Erdinger, L.; Boche, G.Mutat. Res. 2000, 467, 55.

(81) Klein, M.; Erdinger, L.; Boche, G. Mutat. Res. 2000, 467, 69.(82) Hecht, S. S. Proc. Soc. Exp. Biol. Med. 1997, 216, 181.(83) Singer, G. M.; Andrews, A. W.; Guo, S. J. Med. Chem. 1986, 29,

40.(84) Dunn, W. J., III; Wold, S. J. Chem. Inf. Comput. Sci. 1981, 21,

8.(85) Frecer, V.; Miertus, S. Neoplasma 1988, 35, 525.(86) Debnath, A. K.; de Compadre, R. L. L.; Hansch, C. Mutat. Res.

1992, 280, 55.

(87) Smith, C. J.; Hansch, C.; Morton, M. J. Mutat. Res. 1997, 379,167.

(88) Shusterman, A. J.; Debnath, A. K.; Hansch, C.; Horn, G. W.;Fronczek, F. R.; Greene, A. C.; Watkins, S. F. Mol. Pharmacol.1989, 12, 939.

(89) Richard, A. M.; Woo, Y. T. Mutat. Res. 1990, 242, 285.(90) Norden, B.; Edlund, U.; Wold, S. Acta Chem. Scand. Ser. B 1978,

32, 602.(91) Gallegos, A.; Robert, D.; Girones, X.; Carbo-Dorca, R. J. Comput.-

Aided Mol. Des. 2001, 15, 67.(92) Villemin, D.; Cherqaoui, D.; Mesbah, A. J. Chem. Inf. Comput.

Sci. 1994, 34, 1288.(93) Rosenkranz, H. S.; Klopman, G. Toxicol. Ind. Health 1988, 4,

533.(94) Rosenkranz, H. S.; Klopman, G. Mutagenesis 1990, 5, 333.(95) Yuan, M.; Jurs, P. C. Toxicol. Appl. Pharmacol. 1980, 52, 294.(96) He, L.; Jurs, P. C.; Custer, L. L.; Durham, S. K.; Pearl, G. M.

Chem. Res. Toxicol. 2003, 16, 1567.(97) Crebelli, R.; Andreoli, C.; Carere, A.; Conti, L.; Crochi, B.; Cotta-

Ramusino, M.; Benigni, R. Chem. Biol. Interact. 1995, 98, 113.(98) Benigni, R.; Andreoli, C.; Conti, L.; Tafani, P.; Cotta-Ramusino,

M.; Carere, A.; Crebelli, R. Mutagenesis 1993, 8, 301.(99) Hansch, C.; Venger, B. H.; Panthananickal, A. J. Med. Chem.

1980, 23, 459.(100) LaLonde, R. T.; Leo, H.; Perakyla, H.; Dence, C. W.; Farrell, R.

P. Chem. Res. Toxicol. 1992, 5, 392.(101) Tuppurainen, K. Chemosphere 1999, 38, 3015.(102) Hooberman, B. H.; Chakraborty, P. K.; Sinsheimer, J. E. Mutat.

Res. 1993, 299, 85.(103) Sugiura, K.; Goto, M. Chem. Biol. Interact. 1981, 35, 71.(104) Tamura, N.; Takahashi, K.; Shirai, N.; Kawazoe, Y. Chem.

Pharm. Bull. 1982, 30, 1393.(105) Lewis, D. F. V.; Gray, T. J. B.; Beamand, J. A. Toxicol. in Vitro

1993, 7, 605.(106) Lake, B. G.; Lewis, D. F. V. Peroxisomes: Biology and Importance

in Toxicology and Medicine; Taylor and Francis: London, 1993;Chapter 14, p 312.

(107) Lewis, D. F. V.; Ionnides, C.; Parke, D. V. Mutagenesis 1990, 5,433.

(108) Lewis, D. F. V.; Ioannides, C.; Parke, D. V. Environ. HealthPerspect. 1996, 104, 1011.

(109) Borghoff, S. J.; Miller, A. B.; Bowen, J. P.; Swenberg, J. A.Toxicol. Appl. Pharmacol. 2004, 107, 228.

(110) Benigni, R.; Zito, R. Curr. Top. Med. Chem. 2003, 3, 1289.(111) Ashby, J.; Paton, D. Mutat. Res. 1993, 286, 3.(112) Parodi, S.; Malacarne, D.; Romano, P.; Taningher, M. Environ.

Health Perspect. 1991, 95, 199-.(113) Soffritti, M.; Belpoggi, F.; Minardi, F.; Bua, L.; Maltoni, C. Ann.

N. Y. Acad. Sci. 1999, 895, 34.(114) Stouch, T. R.; Jurs, P. C. Environ. Health Perspect. 1985, 61,

329.(115) Benigni, R.; Giuliani, A. Arch. Toxicol. 1992, 228.(116) Richard, A. M. Mutat. Res. 1994, 305, 73.(117) Benigni, R.; Richard, A. M. Mutat. Res. 1996, 371, 29.(118) Benigni, R.; Richard, A. M. Methods 1998, 14, 264.(119) Cronin, M. T. D.; Dearden, J. C. Quant. Struct.-Act. Relat. 1995,

14, 518.(120) Dearden, J. C.; Barratt, M. D.; Benigni, R.; Bristol, D. W.;

Combes, R. D.; Cronin, M. T. D.; Judson, P. N.; Payne, M. P.;Richard, A. M.; Tichy, M.; Worth, A. P.; Yourick, J. J. Altern.Lab. Anim. (ATLA) 1997, 25, 223.

(121) Richard, A. M.; Benigni, R. SAR QSAR Environ. Res. 2002, 13,1.

(122) Hodes, L.; Hazard, G. F.; Geran, R. I.; Richman, S. J. Med. Chem.1977, 20, 469.

(123) Tinker, J. J. Chem. Inf. Comput. Sci. 1981, 21, 3.(124) Klopman, G. J. Am. Chem. Soc. 1984, 106, 7315.(125) Klopman, G.; Rosenkranz, H. S. Mutat. Res. 1992, 272, 59.(126) Klopman, G. Quant. Struct.-Act. Relat. 1992, 11, 176.(127) Klopman, G.; Rosenkranz, H. S. Mutat. Res. 1994, 305, 33.(128) Cunnigham, A. R.; Rosenkranz, H. S.; Zhang, Y. P.; Klopman,

G. Mutat. Res. 1998, 398, 1.(129) Cunnigham, A. R.; Klopman, G.; Rosenkranz, H. S. Mutat. Res.

1998, 405, 9.(130) Cunningham, A. R.; Rosenkranz, H. S. Environ. Health Perspect.

2001, 109, 953.(131) Rosenkranz, H. S. Quantitative Structure-Activity Relationship

(QSAR) Models of Chemical Mutagens and Carcinogens; CRCPress: Boca Raton, 2003; Chapter 6, p 175.

(132) Matthews, E. J.; Contrera, J. F. Regulat. Pharmacol. Toxicol.1998, 28, 242.

(133) Malacarne, D.; Pesenti, R.; Paolucci, M.; Parodi, S. Environ.Health Perspect. 1993, 101, 332.

(134) Malacarne, D.; Taningher, M.; Pesenti, R.; Paolucci, M.; Perrotta,A.; Parodi, S. Chem. Biol. Interact. 1995, 97, 75.

(135) Perrotta, A.; Malacarne, D.; Taningher, M.; Pesenti, R.; Paolucci,M.; Parodi, S. Environ. Mol. Mutagen. 1996, 28, 31.

(136) King, R. D.; Srinivasan, A. Environ. Health Perspect. 1996, 104,1031.

SAR Studies of Chemical Mutagens and Carcinogens Chemical Reviews, 2005, Vol. 105, No. 5 1799

Page 34: Structure−Activity Relationship Studies of Chemical Mutagens and Carcinogens

(137) King, R. D.; Muggleton, S. H.; Srinivasan, A.; Sternberg, M. J.E. Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 438.

(138) King, R. D.; Srinivasan, A. J. Comput.-Aided Mol. Des. 1997,11, 571.

(139) King, R. D.; Srinivasan, A.; Dehaspe, L. J. Comput.-Aided Mol.Des. 2001, 15, 173.

(140) Blake, B. W.; Enslein, K.; Gombar, V. K.; Borgstedt, H. H. Mutat.Res. 1990, 241, 261.

(141) Enslein, K.; Blake, B. W.; Borgstedt, H. H. Mutag. 1990, 5, 305.(142) Enslein, K. Toxicol. In Vitro 1993, 6, 163.(143) Enslein, K.; Gombar, V. K.; Blake, B. W. Mutat. Res. 1994, 305,

47.(144) Jurs, P. C.; Chou, J. T.; Yuan, M. J. Med. Chem. 1979, 22, 476.(145) Serra, J. R.; Thompson, E. D.; Jurs, P. C. Chem. Res. Toxicol.

2003, 16, 153.(146) Basak, S. C.; Mills, D.; Gute, B. D.; Hawkins, D. M. Quantitative

Structure-Activity Relationship (QSAR) Models of ChemicalMutagens and Carcinogens; CRC Press: Boca Raton, 2003;Chapter 7, p 207.

(147) Brinn, M.; Walsh, P.; Payne, M.; Bott, B. SAR QSAR Environ.Res. 1992, 1, 169.

(148) Woo, Y. T.; Lai, D. Y.; Argus, M. F.; Arcos, J. C. Toxicol. Lett.1995, 79, 219.

(149) Woo, Y. T.; Lai, D. Y.; Argus, M. F.; Arcos, J. C. J. Environ. Sci.Health 1998, C16, 101.

(150) Sanderson, D. M.; Earnshaw, C. G. Hum. Exp. Toxicol. 1991,10, 261.

(151) Ridings, J. E.; Barratt, M. D.; Cary, R.; Earnshaw, G. G.;Eggington, C. E.; Ellis, M. K.; Judson, P. N.; Langowski, J. J.;Marchant, C. A.; Payne, M. P.; Watson, W. P.; Yih, T. D.Toxicology 1996, 106, 267.

(152) Eriksson, L.; Jaworska, J. S.; Worth, A. P.; Cronin, M. T. D.;McDowell, R. M.; Gramatica, P. Environ. Health Perspect. 2003,111, 1361.

(153) Provost, F.; Fawcett, T. Machine Learn. J. 2001, 42, 5.(154) Pearl, G. M.; Livingstone-Carr, S.; Durham, S. K. Curr. Top.

Med. Chem. 2001, 1, 247.(155) Cariello, N. F.; Wilson, J. D.; Britt, B. H.; Wedd, D. J.; Burlinson,

B.; Gombar, V. K. Mutagenesis 2002, 17, 321.(156) European Centre for Ecotoxicology and Toxicology of Chemicals

(ECETOC). (Q)SARs: Evaluation of the Commercially AvailableSoftware for Human Health and Environmental Endpoints withRespect to Chemical Management Applications; Technical ReportNo. 89; ECETOC: Brussels, 2003.

(157) Mueller, F.; Simon-Hettich, B.; Kramer, P. J. Naunyn-Schmiede-berg’s Arch. Pharmacol. 2000, 361 (Suppl.), R173.

(158) Hulzebos, E. M.; Posthumus, R. SAR QSAR Environ. Res. 2003,14, 285.

(159) Snyder, R. D.; Pearl, G. M.; Mandakas, G.; Choy, W. N.;Goodsaid, F.; Rosenblum, I. Y. Environ. Mol. Mutagen. 2004,43, 143.

(160) Prival, M. J. Environ. Mol. Mutagen. 2001, 37, 55.(161) Zeiger, E.; Ashby, J.; Bakale, G.; Enslein, K.; Klopman, G.;

Rosenkranz, H. S. Mutagenesis 1996, 11, 474.(162) Bakale, G.; McCreary, R. D. Carcinogenesis 1987, 8, 253.

(163) Ashby, J.; Tennant, R. W. Mutat. Res. 1988, 204, 17.(164) Tennant, R. W.; Spalding, J.; Stasiewicz, S.; Ashby, J. Mutagen-

esis 1990, 5, 3.(165) Rosenkranz, H. S.; Klopman, G. Mutagenesis 1990, 5, 425.(166) Benigni, R. Mutagenesis 1991, 19, 83.(167) Bakale, G.; McCreary, R. D. Mutagenesis 1992, 7, 91.(168) Predicting Chemical Carcinogenesis in Rodents. An International

Workshop; National Institute of Environmental Health Sci-ences: Research Triangle Park, North Carolina, 1993.

(169) Jones, T. D.; Easterly, C. E. Mutagenesis 1991, 6, 507.(170) Hileman, B. Chem. Eng. News, June 21, 1993, p 35.(171) Ashby, J.; Tennant, R. W. Mutagenesis 1994, 9, 7.(172) Benigni, R. Mutat. Res. 1997, 387, 35.(173) Bristol, D. W.; Wachsman, J. T.; Greenwell, A. Environ. Health

Perspect. 1996, 104, 1001.(174) Woo, Y. T.; Lai, D. Y.; Arcos, J. C.; Argus, M. F.; Cimino, M. C.;

DeVito, S.; Keifer, L. J. Environ. Sci. Health 1997, C15, 139.(175) Huff, J.; Weisburger, E.; Fung, V. A. Environ. Health Perspect.

1996, 104, 1105.(176) Benigni, R.; Andreoli, C.; Zito, R. Environ. Health Perspect. 1996,

104, 1041.(177) Tennant, R. W.; Spalding, J. Environ. Health Perspect. 1996,

104, 1095.(178) Ashby, J. Environ. Health Perspect. 1996, 104, 1101.(179) Purdy, R. Environ. Health Perspect. 1996, 104, 1085.(180) Bootman, J. Environ. Mol. Mutagen. 1996, 27, 237.(181) Jones, T. D.; Easterly, C. E. Environ. Health Perspect. 1996, 104,

1017.(182) Moriguchi, I.; Hirano, H.; Hirono, S. Environ. Health Perspect.

1996, 104, 1051.(183) Lee, Y.; Buchanan, B. G.; Rosenkranz, H. S. Environ. Health

Perspect. 1996, 104, 1059.(184) Zhang, Y. P.; Sussman, N.; Macina, O. T.; Rosenkranz, H. S.;

Klopman, G. Environ. Health Perspect. 1996, 104, 1045.(185) Marchant, C. A. Environ. Health Perspect. 1996, 104, 1065.(186) Kerckaert, G. A.; Brauninger, R.; LeBoeuf, R. A.; Isfort, R. J.

Environ. Health Perspect. 1996, 104, 1075.(187) Benigni, R.; Zito, R. Mutat. Res. Rev. 2004, 566, 49.(188) Toivonen, H.; Srinivasan, A.; King, R. D.; Kramer, S.; Helma,

C. Bioinformatics 2002, 19, 1183.(189) Benigni, R.; Giuliani, A. Bioinformatics 2003, 19, 1194.(190) Hansch, C.; Kurup, A.; Garg, R.; Gao, H. Chem. Rev. 2001, 101,

619.(191) Hansch, C.; Hoekman, D.; Leo, A.; Weininger, D.; Selassie, C.

D. Chem. Rev. 2002, 102, 783.(192) Benigni, R.; Passerini, L. Mutat. Res. Rev. 2002, 511, 191.(193) Cotta-Ramusino, M.; Benigni, R.; Passerini, L.; Giuliani, A. J.

Chem. Inf. Comput. Sci. 2003, 43, 248.(194) Hansch, C. Biological Activity and Chemical Structure; Elsevi-

er: Amsterdam, 1977.(195) Fung, V. A.; Barrett, J. C.; Huff, J. Environ. Health Perspect.

1995, 103, 680.

CR030049Y

1800 Chemical Reviews, 2005, Vol. 105, No. 5 Benigni