qsar prediction of physico-chemical properties and biological activities of emerging pollutants:...
TRANSCRIPT
QSAR prediction of physico-chemical properties QSAR prediction of physico-chemical properties
and biological activities of emerging pollutants: and biological activities of emerging pollutants:
brominated flame retardants and brominated flame retardants and
perfluorinated-chemicalsperfluorinated-chemicals
Sixth Indo-US Workshop on Mathematical Chemistry Kolkata, 8-10 January 2010
Paola GramaticaPaola Gramatica
Barun Bhhatarai, Barun Bhhatarai, Simona Kovarich and Ester PapaSimona Kovarich and Ester PapaQSAR Research Unit in Environmental Chemistry and Ecotoxicology QSAR Research Unit in Environmental Chemistry and Ecotoxicology
DBSF -University of Insubria, Varese - ItalyDBSF -University of Insubria, Varese - Italy
E-mail: E-mail: [email protected]
http://www.qsar.ithttp://www.qsar.it
THE CHEMICAL UNIVERSETHE CHEMICAL UNIVERSE
More than 50.000.000 (sept.2009)
34,849,353 on the market
Regulated 247,952
EINECS100.204
TSCA
5% 5% KnownKnown data data
Environmental Environmental fate?fate?
Human effects?Human effects?
Environmental Environmental fate?fate?
Human effects?Human effects?
NEW11.000.000 / year
experimentsexperiments
EU-REACHEU-REACH
QQSSAARR
Predictive methodsPredictive methods
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
New EU-regulation:New EU-regulation:RRegistrationegistrationEEvaluationvaluationAAuthorisationuthorisationof of ChChemicalsemicals
The use of predictive QSAR models is suggested :The use of predictive QSAR models is suggested :
To highlight dangerous chemicalsTo highlight dangerous chemicals To prioritize chemicals and to focus the experimental To prioritize chemicals and to focus the experimental teststests To fill the data gapsTo fill the data gaps
Limited availability of experimental Limited availability of experimental datadata
Lack of knowledge of the properties Lack of knowledge of the properties and activities of existing substances and activities of existing substances
CComplexity of “old” regulationsomplexity of “old” regulations
Interest on development and validation Interest on development and validation
of alternative methods, such as QSARs. of alternative methods, such as QSARs.
INTRODUCTION – REACH and QSARINTRODUCTION – REACH and QSAR
in Environmental Chemistryin Environmental Chemistry
and Ecotoxicologyand Ecotoxicology
Staff Staff
Prof. Paola GramaticaProf. Paola Gramatica
Dr. Ester Papa, Ph.DDr. Ester Papa, Ph.D
Dr. Simona KovarichDr. Simona Kovarich
Dr. Jr. Mara LuiniDr. Jr. Mara Luini
Dr. Barun Bhhatarai, Ph.DDr. Barun Bhhatarai, Ph.D
(Dr. Jiazhong Li, Ph.D)(Dr. Jiazhong Li, Ph.D)
http://www.qsar.ithttp://www.qsar.it
DBSF - University of InsubriaDBSF - University of InsubriaVarese - ItalyVarese - Italy
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
INTRODUCTION – Brominated Flame RetardantsINTRODUCTION – Brominated Flame Retardants
• Class of emerging pollutants used in a variety of consumer Class of emerging pollutants used in a variety of consumer products (plastics, polyurethane foams, textiles, electronic products (plastics, polyurethane foams, textiles, electronic equipments..) to increase fire resistancyequipments..) to increase fire resistancy
• Three most marked HPV products:Three most marked HPV products:
PBDEPBDEPolybrominated Diphenyl Ethers Polybrominated Diphenyl Ethers
O
BrBr
CH3
CH3
OHOH
Br
Br
Br
BrTBBPATBBPA
TetraBromoBisphenol-TetraBromoBisphenol-AA
Br
Br
BrBr Br
Br
HBCDHBCDHexabromocyclododecaneHexabromocyclododecane
• Levels in the environment and humans increased since theyLevels in the environment and humans increased since they came into usecame into use
• Ban of penta- and octa-BDE formulations (DecaBDE under Ban of penta- and octa-BDE formulations (DecaBDE under evaluation); HBCD in candidate list?evaluation); HBCD in candidate list?
209 possible209 possibleCONGENERSCONGENERS
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
Background knowledge about BFRs:Background knowledge about BFRs:
• Low water solubilityLow water solubility• High LogKow > 5High LogKow > 5• Persistence in the environmentPersistence in the environment• Liver toxicity, thyroid toxicity, developmental toxicityLiver toxicity, thyroid toxicity, developmental toxicity• Endocrine disruptorsEndocrine disruptors
The available amount of experimental data is very small and The available amount of experimental data is very small and mainly related to already banned BFRs.mainly related to already banned BFRs.
There is the need to extend knowledge about There is the need to extend knowledge about properties and ecotoxicological data for a properties and ecotoxicological data for a better understanding of BFRs behaviour and better understanding of BFRs behaviour and related risksrelated risks
INTRODUCTION – Brominated Flame RetardantsINTRODUCTION – Brominated Flame Retardants
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
• Perfluorinated compounds (PFCs) are Perfluorinated compounds (PFCs) are chemicals containing a long fluorinated chemicals containing a long fluorinated carbon tail attached to different functional carbon tail attached to different functional groupsgroups
• PFCs as perfluoro-octanesulfonate (PFOS), PFCs as perfluoro-octanesulfonate (PFOS), perfluoro-octanoate (PFOA) and perfluoro- perfluoro-octanoate (PFOA) and perfluoro- octane sulfonylamide (PFOSA) are stable octane sulfonylamide (PFOSA) are stable chemicals with a wide range of industrial chemicals with a wide range of industrial and consumer applicationsand consumer applications
• Degradable products of commercial PFCs Degradable products of commercial PFCs are found in environment and biota and are found in environment and biota and diPAPs (a group of PFCs used on food diPAPs (a group of PFCs used on food wrappers) was recently reported in human wrappers) was recently reported in human bloodblood
• PFCs are considered emerging pollutants PFCs are considered emerging pollutants and are believed to have potential toxic and are believed to have potential toxic effects in humans and wildlifeeffects in humans and wildlife
• PFCs along with Polyfluoro compounds are PFCs along with Polyfluoro compounds are studied for LCstudied for LC5050 inhalation toxicity of Mouse inhalation toxicity of Mouse and Ratand Rat
Predictive QSAR Predictive QSAR approaches is used to fill approaches is used to fill
the data gap and to the data gap and to predict toxicity of 250 PFCs on two of 250 PFCs on two
different species viz. different species viz. Mouse and RatMouse and Rat
R
R
R
R
R
R
R
X
R={H,F} X={-H, -OH, -SO2, -COOH,...}A: Perf luoro compounds
nX
X
F
F
X= {-F, -H, -OH, -alkyl, -aryal, -halo, -nitro,...}B: Multif luoro compounds
n
n= 1,2,3...
X
X
7
INTRODUCTION – INTRODUCTION – Perfluorinated CompoundsPerfluorinated Compounds
Aims of the Modelling StudiesAims of the Modelling Studies
Development of QSAR models for available end-points paying Development of QSAR models for available end-points paying
attention to external validation and applicability domain attention to external validation and applicability domain
analysis.analysis.
Evaluation of environmental behaviour and physico-chemical Evaluation of environmental behaviour and physico-chemical
properties of emerging pollutants: BFRs and PFCs.properties of emerging pollutants: BFRs and PFCs.
Identification of more toxic and dangerous chemicals based Identification of more toxic and dangerous chemicals based
on the studied end-points.on the studied end-points.
Prioritization of chemicals for experimental tests under Prioritization of chemicals for experimental tests under
CADASTER projectCADASTER project
Mechanistic interpretation of selected descriptors, Mechanistic interpretation of selected descriptors,
highlighting the fate, distribution and properties of chemicals.highlighting the fate, distribution and properties of chemicals.
EU-FP7EU-FP7 Project - CADASTERProject - CADASTER
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
To facilitate the consideration of a QSAR model To facilitate the consideration of a QSAR model for regulatory purposes, it should be associated with the for regulatory purposes, it should be associated with the
following information:following information:
a defined endpoint a defined endpoint
an unambiguous algorithman unambiguous algorithm
a defined domain of applicabilitya defined domain of applicability
appropriate measures of goodness of fit,appropriate measures of goodness of fit,
robustness and predictivity robustness and predictivity
a mechanistic interpretation, if possiblea mechanistic interpretation, if possible
--
OECD Principles for QSAR models in REACHOECD Principles for QSAR models in REACH
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
METHODSMETHODS
1.1. Defined end-points of Defined end-points of Phys-chem and ToxicityPhys-chem and Toxicity
2.2. Unambiguous algorithm:Unambiguous algorithm:
• Chemical representation by theoretical molecular descriptorsChemical representation by theoretical molecular descriptors
(DRAGON) (DRAGON) selected by Genetic Algorithmsselected by Genetic Algorithms
• Statistical method Statistical method MLR regression (OLS) MLR regression (OLS)
3. Validation for model stability and predictivity (internal and 3. Validation for model stability and predictivity (internal and external validation)external validation)
4. Applicability Domain Analysis: 4. Applicability Domain Analysis:
leverage approach by Hat matrix (MLR)leverage approach by Hat matrix (MLR)
5. Interpretation of the selected molecular descriptors, if possible.5. Interpretation of the selected molecular descriptors, if possible.
Application of the OECD principles for QSAR modelsApplication of the OECD principles for QSAR models
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
RESULTS RESULTS
QSAR/QSPR models QSAR/QSPR models
developed for developed for
Brominated Flame RetardantsBrominated Flame Retardants
Simona Kovarich
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
Endpoint ModelTrainobj.
Test obj.
Desc. R2% Q2LOO%
Q2EXT
%AD%
on 243 BFR
LogKLogKOAOA
Full 30 -T(O..Br)
97.4 96.8 - 81.9 k-ANN Split 24 6 96.1 95.0 95.2 -
LogKLogKOWOW
Full 20 -T(O..Br)
96.4 95.6 - 86.0
k-ANN Split 14 6 97.1 95.9 94.7 -
MPMPFull 25 -
X2A84.4 81.9 - 95.9
k-ANN Split 20 5 82.2 78.5 93.7 -
LogPLogPLL
Full 34 -T(O..Br)
98.7 98.5 - 83.1
k-ANN Split 28 6 98.8 98.5 98.6 -
LogSLogS Full 12 - Mor23m 91.8 88.5 - 95.1
LogHLogH Full 7 - BEHe7 96.9 93.3 - 55.6
LogKp*LogKp* Full 15 - MW 94.9 93.8 - 91.4
LogHLp*LogHLp* Full 15 - T(O..Br) 94.3 92.6 - 81.9
RESULTS – QSPR modelsRESULTS – QSPR models
E. Papa, S. Kovarich, P. Gramatica, 2009. E. Papa, S. Kovarich, P. Gramatica, 2009. Development, validation and inspection of Development, validation and inspection of the applicability domain of QSPR models for physico-chemical properties of the applicability domain of QSPR models for physico-chemical properties of polybrominated diphenyl ethers.polybrominated diphenyl ethers. QSAR & Comb. Sci.,QSAR & Comb. Sci., 2828, 790-796, 790-796..
Physico-chemical and degradation PropertiesPhysico-chemical and degradation Properties
* Photodegradation
RESULTS RESULTS - - Model for Log KoaModel for Log Koa
LogKoa= 6.654 +0.222 T(O..Br)LogKoa= 6.654 +0.222 T(O..Br)
Experimental range of LogKoa: 7.34 (mono-BDE) – 11.96 (hepta-BDE)
1
2
7 8
10
1213
15
17
21
28
30 32
35
37
47
66
69
75
77
82
8599
100
119
126
153
154
156
183
7 8 9 10 11 12 13
LogKoa Exp.
7
8
9
10
11
12
13
Lo
gK
oa
Pre
d.
Training set Prediction set
0
0.2
0.4
1 21 41 61 81 101 121 141 161 181 201
BDE
d
ista
nce
fro
m t
he
stru
ctu
ral
Do
mai
n (
hat
)
nona-deca
Are the predictions in the structural domain ?
90.4 % into AD90.4 % into AD
n° Obj Descriptor R2% Q2boot% Q2
EXT(rand20%) %
30 T(O..Br) 97.36 96.77 99.56
The same descriptor, i.e. The same descriptor, i.e. T(O...Br),T(O...Br), was selected was selected
as the best modeling variable for three differentas the best modeling variable for three different
properties which are related to each other properties which are related to each other
((LogPLogPLL, LogKoa, LogKoa, LogKowLogKow, LogLogHLHLpp).).
This descriptor gives a double structural information:This descriptor gives a double structural information:
its values increases according to both theits values increases according to both the numbernumber and the and the distance distance
of bromine substituentsof bromine substituents from the oxygen ether, from the oxygen ether,
on each phenyl ring. on each phenyl ring.
Thus, Thus, T(O...Br)T(O...Br) takes also into account the information related to takes also into account the information related to
thethe position of the bromine atoms on the phenyl rings.position of the bromine atoms on the phenyl rings.
RESULTS – RESULTS – Interpretation of descriptorsInterpretation of descriptors
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
Exp. vs. Pred. data
6
7
8
9
10
11
12
13
14
15
1 2 7 8 10 12 13 15 17 21 28 30 32 35 37 47 66 69 75 77 82 85 99 100 119 126 153 154 156 183
PBDE
Lo
gK
oa
Exp. LogKoa Chen (2003) Papa (2008) Xu (2007) KoaWIN
Comparison with some existing models Comparison with some existing models
mono-tri
tetra-hepta
Predicted and Experimental data for 30 PBDEs Predicted and Experimental data for 30 PBDEs
Author Method N° obj. N° vars R2% Q2LOO% Q2
EXT % RMSE(30 obj)
Papa et al. (2009) MLR 30 1 97.4 96.8 99.6 0.23
Xu et al. (2007) MLR 22 2 97.6 97.2 - 0.31
Chen et al. (2003) PLS 13 10 97.9 97.5 - -
KoaWIN (Episuite) KOW/KAW 0.81
0
0.5
1
1.5
2
2.5
3
3.5
monoBDE diBDE triBDE tetraBDE pentaBDE hexaBDE heptaBDE octaBDE nonaBDE decaBDE
D lo
g u
nit
s
average Δ (|YPapa- YKoaWIN|) average Δ (|YPapa-YXu|)
YYPapaPapa = Predictions by our model (range Log Koa: 7.32 – 15.09) = Predictions by our model (range Log Koa: 7.32 – 15.09)YYEpisuiteEpisuite = Predictions by KoaWIN ( = Predictions by KoaWIN (DDmax = 3.33 log units; range Log Koa: 6.81-18.23)max = 3.33 log units; range Log Koa: 6.81-18.23)YYXu Xu = Predictions by Xu et al. (2007) (= Predictions by Xu et al. (2007) (DDmax =1.06 log units; range Log Koa: 7.4-15.73) max =1.06 log units; range Log Koa: 7.4-15.73)
n° bromine increase = D increase
Predictions for 209 PBDEsPredictions for 209 PBDEs
Comparison with some existing modelsComparison with some existing models
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
High difference with EPISUITE for highly brominated PBDEs
RESULTS RESULTS –– Environmental fate of BFRsEnvironmental fate of BFRs
5 <LogKow<7
Resistance to Photodegradation / MobilityResistance to Photodegradation / Mobility
Risk forRisk fortri-penta BDE!!tri-penta BDE!!
Endpoint Model Trainobj.
Test obj.
Desc.R2
%Q2
LOO
%Q2
EXT
%AD%
on 243 BFR
Log1/RBALog1/RBA Full 18 - RDF080v
RDF035v86.1 79.3 - 88.5
Random Split 10 8 87.2 74.0 76.8
Log1/ICLog1/IC50 50
PRPRANTANT
Full 19 - R7e+ GATS8e
85.9 81.7 - 94.2
Random Split 10 9 91.3 85.9 71.2
Log T4-Log T4-REPREP
Full 17 - qpmax MATS6v
95.2 92.9 - 97.9
Random Split 9 8 96.7 91.9 90.5
LogELogE22SULTSULT-REP-REP
Full 21 - B08[C-O] GGI7
87.6 83.6 - 100
Random Split 11 10 87.2 73.2 87.6
RESULTS – QSAR modelsRESULTS – QSAR models
Endocrine Disrupting ActivityEndocrine Disrupting Activity
RBARBA = AhR Relative Binding Affinity = EC = AhR Relative Binding Affinity = EC5050(TCDD) / EC(TCDD) / EC5050(BFR)(BFR)
PRPRANTANT = Progesterone Receptor Antagonism = Progesterone Receptor Antagonism
T4-REPT4-REP = T4-TTR Relative Competition = IC = T4-TTR Relative Competition = IC5050(T4) / IC(T4) / IC5050(BFR)(BFR)
EE22SULT-REPSULT-REP = E = E22SULT Relative Inhibition = ICSULT Relative Inhibition = IC5050(E2) / IC(E2) / IC5050(BFR)(BFR)
E. Papa, S. Kovarich, P. Gramatica, E. Papa, S. Kovarich, P. Gramatica, QSAR modeling and prediction of the QSAR modeling and prediction of the Endocrine disrupting potenciesEndocrine disrupting potencies of brominated flame retardantsof brominated flame retardants, , Submitted to Submitted to J. Chem. Inf. ModJ. Chem. Inf. Mod., 2010.
RESULTS RESULTS - - Model for LogE2SULT-REPModel for LogE2SULT-REP
LogELogE22SULT-REP = -0.56 + 2.10 B08[C-O] – 2.77 GGI7SULT-REP = -0.56 + 2.10 B08[C-O] – 2.77 GGI7
Equation of the “Split Model” (Random 50%): Equation of the “Split Model” (Random 50%):
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
Log E 2SULT-REp Exp.
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
Lo
g E
2S
ULT
-RE
P P
red
.
Training set Prediction Set
6-OH-BDE-47
BDE-209
BDE-190
BDE-47
BDE-19
BDE-49BDE-28
BDE-127BDE-100 BDE-169
BDE-183BDE-155
BDE-206
4'-OH-BDE-49
3-OH-BDE-47
5-OH-BDE-47 4-OH-BDE-42
2,4,6-TBP
TBBPA
TBBPA-DBPE
2'-OH-BDE-66
PCP
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
HAT
-3
-2
-1
0
1
2
3
Res
Training set Prediction set
TBBPA-DBPE
MORE ACTIVE THAN PCP!
R2 = 0.87
Q2LOO = 0.73
Q2EXT = 0.88
RESULTS RESULTS
QSAR/QSPR models QSAR/QSPR models
developed for developed for
Per-fluorinated ChemicalsPer-fluorinated Chemicals
Barun Bhhatarai
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
Splitting Compounds Variables selected
R2 (%) Q2 LOO
Q2BOOT Q2 ext
R2-YScrm
Mouse Mouse
InhalationInhalation
56 56
compoundscompounds
SOM28.5%
Train: 40Test: 16 X3v; X3v;
H-048; H-048;
MLOGP; MLOGP;
F01[C-C]F01[C-C]
82.99 78.09 75.46 71.62 10.32
Random by Activity
20%
Train: 44Test: 12
77.07 71.73 69.89 85.11 8.99
Full model 79.83 76.31 75.38 - 7.05
Rat Rat
InhalationInhalation
52 52
compoundscompounds
SOM18.9%
Train: 42Test: 10 Jhetv:Jhetv:
PCR;PCR;
MLOGP; MLOGP;
B02[Cl-Cl]B02[Cl-Cl]
78.36 72.99 71.95 75.47 8.75
Random by Activity
20%
Train: 42Test: 10
80.01 75.21 74.12 66.70 9.91
Full model 78.14 73.85 73.26 - 7.64
Results: QSAR models for LCResults: QSAR models for LC5050 inhalation inhalation
21Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
Barun Bhhatarai and Paola Gramatica, Barun Bhhatarai and Paola Gramatica, Per- and Poly-fluoro Toxicity Per- and Poly-fluoro Toxicity (LC50 inhalation) Study in Rat and Mouse using QSAR Modeling(LC50 inhalation) Study in Rat and Mouse using QSAR Modeling , , Chem.Res. ToxicolChem.Res. Toxicol, 2010, in press, 2010, in press.
Regression plots for the models onRegression plots for the models ondatasets split by SOMdatasets split by SOM
22Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
log 1/LClog 1/LC5050 = 4.21 – 1.27 (±0.31) MlogP + 1.43 (±0.46) X3v + 0.38 (±0.13) F01[C-C] – = 4.21 – 1.27 (±0.31) MlogP + 1.43 (±0.46) X3v + 0.38 (±0.13) F01[C-C] –
1.14 (±0.37) H-048 1.14 (±0.37) H-048 n=56, s=0.72, rn=56, s=0.72, r22==79.83, F=50.5, Kx=42.34, Kxy=50.4079.83, F=50.5, Kx=42.34, Kxy=50.40Mouse Mouse
0 1 2 3 4 5 6 7
Mouse Inhalation Exp
0
1
2
3
4
5
6
7
8
Mouse
Inhala
tion P
red S
OM
Training Prediction
log 1/LClog 1/LC5050 = –12.76 + 1.87 (±0.20) Jhetv + 11.43 (±1.27) PCR – 0.60 (±0.12) MlogP – = –12.76 + 1.87 (±0.20) Jhetv + 11.43 (±1.27) PCR – 0.60 (±0.12) MlogP –
1.41 (±0.40) B02[Cl-Cl]1.41 (±0.40) B02[Cl-Cl]
n=52, s=0.82, rn=52, s=0.82, r22==78.14, F=41.99, Kx=23.55, Kxy=30.8678.14, F=41.99, Kx=23.55, Kxy=30.86RatRat
-1 0 1 2 3 4 5 6 7
Rat Inhalat ion E x p
-1
0
1
2
3
4
5
6
7
Rat
Inh
alat
ion
Pre
d S
OM
TrainingP redic t ion
Descriptor analysisDescriptor analysis
• Common descriptor characterizing Hydrophobicity was negative for both Common descriptor characterizing Hydrophobicity was negative for both speciesspecies
• JhetV and X3v have similar chemical meanings and are positive for both JhetV and X3v have similar chemical meanings and are positive for both speciesspecies
23
• B02[Cl-Cl] present for 5 of 52 compounds – fitting (?) descriptor to include all Freons
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
JhetvPCR
MlogPB02[Cl-Cl]
JhetvPCR
MlogPB02[Cl-Cl] MlogP
X3vF01[C-C]
H-048
MlogPX3v
F01[C-C]H-048
RATRAT
MOUSEMOUSEconventional bond-order ID number (piID) divided by the total path count
presence of heteroatom and double and triple bonds
hydrophobicityhydrophobicity
bond multiplicity, the heteroatoms and the number of atoms
total number of C-C bondpresence/absence of Cl-Cl at topological distance 02
formal oxidation number of C-atom which is the sum of the formal bond orders with electronegative atoms
Applicability Domain (AD) study on 250 PFCsApplicability Domain (AD) study on 250 PFCs
• 75.6% coverage of PFCs in Mouse model (61 compounds are out of 75.6% coverage of PFCs in Mouse model (61 compounds are out of
structural domain) and 76.8% coverage in Rat model (53 out).structural domain) and 76.8% coverage in Rat model (53 out).
24Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
•Arbitrary cutoff 0.5 (dotted lines): 11 common compounds are out of domainArbitrary cutoff 0.5 (dotted lines): 11 common compounds are out of domain
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Hat V alues
0
2
4
6
8
10
12
14
Y P
red.
Com pounds S tudiedCom pounds P redic ted
P FO S A
0 .2 6 7
M ouse A D plot
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6
Hat values
-2
0
2
4
6
8
Y-P
red
C om pounds S tudied C om pounds P redic ted
0 .273 0 .5
PFO SA
R at AD plot
• Predicted compounds out of applicability domain of both Mouse and Rat model are long chain PFCs (>15-Carbon)
• They are probably extrapolated as the longest compounds in the training sets are with 7-Carbon
Focus on AD: Common Out-of-domain compoundsFocus on AD: Common Out-of-domain compounds
25Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
OS
O
NH
F F
FF
F F
FF
F F
FF
F FF
FF
4151-50-2
FF
F FFF F
FF
FF
F
FF F
F
F
FFF
307-07-3
FF
FFFF
F
FFF
F
F
FF
F F
FF
FF
F
FF F
306-91-2
FF
FF F
FF
F F
FF
F F
FF
F F
FF
F F
FF
F F
FF
FF
F
307-62-0
FF
FF
F
FFF
F FF
F
FF
F FF
F FF
51294-16-7
F F F F F
FF FF
FF F
FF
F F
FF
F F
FFFF
F
F
56523-43-4
FF
F FF
F
FF
FF
F F
FF
FF
F
F
60433-12-7
FF
F
FF
O15
O
59778-97-1
FF
F
FF
15
O
HO
FF
16517-11-6
FF
F
FF
I15
65150-94-9
FF
F
FF
I
15
F
F
F F
29809-35-6
Increasing ToxicityIncreasing Toxicity26
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
Toxicity TrendToxicity Trend
# 1
# 1 2 1
# 1 6 5
# 1 7 6
-4 -3 -2 -1 0 1 2 3 4
P C 1
-4
-3
-2
-1
0
1
2
3
PC
2
# 1
# 1 2 1
# 1 6 5
# 1 7 6
Exp .+Pred .=180 Common compounds=28
Exp R a t
PFOA
PFOSA
Exp Mu s
More Toxic Chemicals Predicted: by PCA analysisMore Toxic Chemicals Predicted: by PCA analysis
PFOS is under PFOS is under investigation as toxic investigation as toxic
27Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
These chemicals have been suggested to the CADASTER These chemicals have been suggested to the CADASTER Partners for experimental testsPartners for experimental tests
NF
F
FF
NF
F
FF
N
F F
N
F F
FF
376-89-6
376-53-4
98-16-8
98-46-4
N+O
-O
FF
F
H2N
FF
FO
SO F
F F
FF
F F
FFF
FF
375-81-5
O
OF F
FF
F F
FF
F F
FF
FF
F
376-27-2
O
OFF
F F
FF
F F
O
O
376-50-1
O
OHF F
FF
F F
O
HO
376-73-8
O
HOFF
F F
O
OH
377-38-8
FF
FF F
FF
O
O
F F
FF
F F
559-11-5
FF FF
F F
N
F F
FF
F FFF
F
647-12-1
O
F F
FF
F F
O
Cl Cl
678-77-3
FF
F
F FF
HO
FF
F
FFF
OH
918-21-8
O
O
F FO
O
F F
FF
424-40-8
OH
O
FF
F F
FF
F
FF
F F
FF
F F
335-67-1
1763-23-1
FF
FF F
FF FF
F F
SO
O OH
F F
FF
F F
O
OHF F
FF
F F
FF
F FF
FF
375-85-9
OS
O F
F F
FF
F F
FF
F F
FF
F
423-50-7
FF
FF
F F
F
NH2
O
F F
F
F F
FFF
423-54-1
O
O
F F
FF
F F
FF
F F
FF
F
17527-29-6
O
HOFF
F F
FF
F F
O
OH
336-08-3
O
OHF F
FF
F F
FF
FF
F
307-24-4
FF
FFF
F F
FF
F F
FF
OH
647-42-7
N
F
FF
F
F
773-82-0
F
F FF
F
FF
FF
F
FO
FF
F
813-44-5
O
F
F
FF
F
F
1187-93-5
FI FF
F
F
7783-66-6
FF
FFF
F F
FF
F F
FF
O
O
41430-70-0
PFOAPFOA
28
53 Training
41 Prediction I
Melting Point 94
Random split response
SOM split descriptor
48 Training
46 Prediction I
QSPR of Melting point: Data splittingQSPR of Melting point: Data splitting
17 compounds Prediction II
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
Perfluorinated chemicals
(PERFORCE)
29
Variables Train Set R2 Q2loo Q2boot RMSE train
RMSE ext
Q2ext* R2Yscr
AACAAC
F02[C-F]F02[C-F]
C-013C-013
53
Prediction I SOM
41 test 77.11 73.3571.89
40.8646.65 70.16 5.18
Prediction II17 test
71.90 25.04 91.40 5.16
48
Prediction I Response
46 test 82.85 79.3077.48
38.0748.52 72.16 5.84
Prediction II17 test
77.36 24.60 92.84 6.59
Total 111 78.45 76.82 76.60 40.3641.86(cv)
- 2.82
Results: Melting point (94+17)Results: Melting point (94+17)
AAC = mean information index on atomic correlations, information indicesF02[C-F] = frequency of C-F at topological distance 02, 2D frequency fingerprintC-013 = corresponds to CRX3 (X =electronegative atom), atom-centered fragments
*Consonni, V., et al. J. Chem. Inf. Model., 49, 1669-1678.
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
30
Analysis of Melting Point Model Analysis of Melting Point Model
MP = 148.81 (±18.43) AAC + 4.03 (±0.66) F02[C-F] – 14.47 (±6.88) C-013 – 269.25MP = 148.81 (±18.43) AAC + 4.03 (±0.66) F02[C-F] – 14.47 (±6.88) C-013 – 269.25 n=111n=111
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
-200 -150 -100 -50 0 50 100 150 200
Y-Exp
-200
-150
-100
-50
0
50
100
150
200
Y-P
red
TrainingPrediction I (SOM) Prediction II
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18
Hat
-200
-150
-100
-50
0
50
100
150
200
250
Y-P
red
Available dataCompounds Predicted
PFOSA
PFOA
0.109
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
31
55 Training
50 Prediction I
Boiling Point 105
Random splitresponse
SOM splitdescriptor
53 Training
52 Prediction I
QSPR of Boiling point: Data splittingQSPR of Boiling point: Data splitting
25 compounds Prediction II
Perfluorinated chemicals
(PERFORCE)
32
VariablesVariables TrainTrain SetSet RR22 QQ22looloo QQ22boot boot RMSE RMSE
traintrainRMSE RMSE
extextQQ22extext** RR22YscrYscr
MsMs
ATS1mATS1m
nROH nROH
55
Prediction I SOM 50 test 87.50 85.25
83.16
24.78
34.54 75.71 5.73
Prediction II 25 test
86.2629.14 85.17 5.55
53
Prediction I Response
52 test 86.40 83.55
81.38
30.23
28.98 87.50 6.12
Prediction II 25 test
80.7826.20 89.53 5.35
Total 130 88.54 87.54 87.37 28.21 29.42 (cv)
- 2.41
Results: Boiling point (105+25)Results: Boiling point (105+25)
Ms = mean electro-topological state, constitutional descriptorMs = mean electro-topological state, constitutional descriptor
ATS1m = Autocorrelation of a topological structure, 2D autocorrelationsATS1m = Autocorrelation of a topological structure, 2D autocorrelations
nROH = number of OH groups, functional group countsnROH = number of OH groups, functional group counts
*Consonni, V., et al. J. Chem. Inf. Model., 49, 1669-1678.
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
33
-150 -100 -50 0 50 100 150 200 250 300
Y-Exp.
-150
-100
-50
0
50
100
150
200
250
300
Y-P
red.
Training Prediction I (SOM)Prediction II
Analysis of Boiling Point ModelAnalysis of Boiling Point Model
BP = 128.43 (±5.295)ATS1m + 93.833 (±5.85)nROH – 54.23 (±4.25)Ms – 43.098BP = 128.43 (±5.295)ATS1m + 93.833 (±5.85)nROH – 54.23 (±4.25)Ms – 43.098 n=130n=130
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16
Hat
-200
-100
0
100
200
300
400
Y-P
red
Available data Compounds Predicted
0.09
PFOA
PFOSA
34
24 Training
11 Prediction I
Vapor Pressure 35
Random splitSOM split
22 Training
13 Prediction I
QSPR of Vapor Pressure: Data splittingQSPR of Vapor Pressure: Data splitting
+ PERFORCE data
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
35
VariablesVariables SetSet RR22 QQ22loolooQQ22boot boot RMSE RMSE
traintrainRMSE RMSE
extextQQ22ext*ext* RR22YscrYscr
nDB;AAC;
F03[C-F]
Prediction I SOM 11 test
91.07 84.33 81.63 0.83 0.97 87.78 12.69
Prediction I Response
13 test93.75 91.23 82.13 0.64 1.14 80.36 14.08
Total 35 90.93 88.21 86.06 0.83 0.95 (cv)
- 8.95
Results: Vapor Pressure (35)Results: Vapor Pressure (35)
nDB = number of double bonds, constitutional descriptornDB = number of double bonds, constitutional descriptor
AAC = mean information index on atomic composition , information indicesAAC = mean information index on atomic composition , information indices
F03[C-F] = frequency of C-F at topological distance 03, 2D frequency F03[C-F] = frequency of C-F at topological distance 03, 2D frequency
fingerprintsfingerprints
*Consonni, V., et al. J. Chem. Inf. Model., 49, 1669-1678.
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
36
Analysis of Vapour Pressure ModelAnalysis of Vapour Pressure Model
log VP = –0.642 (±0.405) nDB – 3.164 (±0.924) AAC – 0.165 (±0.025) F03[C-F] + 7.97log VP = –0.642 (±0.405) nDB – 3.164 (±0.924) AAC – 0.165 (±0.025) F03[C-F] + 7.97 n=35 n=35
-6 -4 -2 0 2 4 6
Y-Exp.
-6
-4
-2
0
2
4
6
Y-P
red.
Training Prediction (SOM)
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Hat
-10.0
-8.0
-6.0
-4.0
-2.0
0.0
2.0
4.0
6.0
8.0
Ypr
ed
Available dataCompounds Predicted
End End point point
DescriptorsDescriptors nn RR22 QQ22looloo QQ22bootbootRMSE RMSE traintrain
RMSE RMSE cvcv
RMSRMSEE
EPI*EPI*
(n)(n)AD%AD%
Melting Melting PointPoint
AACAACF02[C-F]F02[C-F]C-013 C-013
111 78.5 76.8 76.1 40.36 41.86 46.678(248)(248)94.794.7
Boiling Boiling PointPoint
MsMsATS1mATS1mnROH nROH
130 88.5 87.5 87.3 27.57 29.12 43.046 (290) (290) 97.997.9
Vapor Vapor PressurePressure
CIC0CIC0MATS1vMATS1v
TPSA(Tot)TPSA(Tot)35 90.9 88.2 87.1 0.83 0.95 1.12 (243)(243)
94.294.2
Summary of QSPR models on PFCs:Summary of QSPR models on PFCs:
37
* http://www.epa.gov/oppt/exposure/pubs/episuite.htm
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
All our models have smaller RMSE in comparison to EPISUITE modelsAll our models have smaller RMSE in comparison to EPISUITE models
ConclusionsConclusions
• Prediction of data for ~250 compounds was done for each set of Prediction of data for ~250 compounds was done for each set of
chemicals: BFRchemicals: BFRs s and PFCsand PFCs
• Applicability domain analysis also for new compounds was doneApplicability domain analysis also for new compounds was done
• QSA(P)Rs developed could be used to fill data gaps according to the QSA(P)Rs developed could be used to fill data gaps according to the
new REACH regulation, facilitating the screening and prioritization new REACH regulation, facilitating the screening and prioritization
of chemicals, reducing animal testing as well as for of chemicals, reducing animal testing as well as for designdesign of of
alternativealternative and safer and safer chemicalschemicals
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
•Predictive models were developed Predictive models were developed ad-hocad-hoc for several toxicity end- for several toxicity end-
points and physico-chemical propertiespoints and physico-chemical properties
•‘‘OECD principles for the validation of QSAR models, for regulatory OECD principles for the validation of QSAR models, for regulatory
applicability’ was strictly followedapplicability’ was strictly followed
•Simplicity (linear analysis, few descriptors, robust models) with Simplicity (linear analysis, few descriptors, robust models) with
external validation were usedexternal validation were used
39
Thanks for your attentionThanks for your attention !! !!
AcknowledgementsAcknowledgementsFFinancialinancial support support
by the FP7th-EU Project CADASTERby the FP7th-EU Project CADASTER
Prof. Paola Gramatica - QSAR Research Unit - DBSF - University of Insubria - Varese (Italy)
http://www.qsar.ithttp://www.qsar.it