modern machine learning regression methods

Modern Machine LearningRegression Methods*

Igor & Igor**

*based on our lecture at Strasbourg Chemoinformatics School**Igor Baskin, MSU

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com

http://www.abbyy.com/buy


The Goal of LearningThe goal of statistical learning (in chemoinformatics) consists in finding functionf relating the value of some property y (which can be physicochemical property,biological activity, etc) to the values of descriptors x1,…,xM (which can describechemical compounds, reactions, etc)

),...,( 1 Mxxfy

Continuous y are handled by regression analysis, function approximation, etc

Discrete y are handed by discriminant analysis, classification, patternrecognition, etc

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



The Goal of LearningSo, the goal of statistical learning is to find such function class F and suchparameters c1,…,cP that would minimize the risk function and therefore provide amodel with the highest predictive ability:

min),...( 1 PccR

In classical statistics it is assumed that:

min),...(min),...( 1exp1 PP ccRccR

Is this correct? Almost correct for big data sets and may not be correct forsmall data sets

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Incorrectness of Empirical Risk MinimizationData approximation with polynoms of different order

Minimization of the empirical risk function does not guaranties the bestpredictive performance of the model

p

i

ii xccpxPol

10),(

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Occam’s Razor

Entia non sunt multiplicanda praeter necessitatem

Entities should not be multiplied beyondnecessity

All things being equal, the simplest solutiontends to be the best one

(c. 1285–1349)

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Machine Learning RegressionMethods

• Multiple Linear Regression (MLR)• Kernel version of this method• K Nearest Neighbours (kNN)• Back-Propagation Neural Network (BPNN)• Associative Neural Network (ASNN

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Multiple Linear Regression

YXXXC TT 1)(

Mc

cc

C 1

0

Ny

yy

Y2

1

NM

N

M

M

xx

xxxx

X

1

221

111

1

11

Regressioncoefficients

Experimental propertyvalues Descriptor values

M < N !!!

Topless: M < N/5 for good models

Y=CX

Topleiss: M < N/5 for good models/

y c0 c1x1 ... cM xM

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Mystery of the “Rule of 5”

John G. Topliss

J.H.Topliss & R.L.Costello, JMC, 1972, Vol. 15, No. 10, P. 1066-1068.

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Mystery of the “Rule of 5”

John G. Topliss

J.H.Topliss & R.L.Costello, JMC, 1972, Vol. 15, No. 10, P. 1066-1068.

For example, if R2 = 0.40 is regarded as the maximumacceptable level of chance correlation then the minimumnumber of observations required to test five variables isabout 30, for 10 variables 50 observations, for 20 variables65 observations, and for 30 variables 85 observations.

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Mystery of the “Rule of 5”C.Hansch, K.W.Kim, R.H.Sarma, JACS,1973, Vol. 95, No.19, 6447-6449

Topliss and Costello2 have pointed out the danger of finding meaninglesschance correlations with three or four data points per variable.

The correlation coefficient is good and there are almost five data pointsper variable.

Topliss Hansch: M < N/5 for good models/

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Kernel ridge regression

• Regression + regularization, in feature space

• Avoids large, canceling coefficients• Bias unnecessary if samples and labels are centered (in

feature space)• Solution in convex hull of samples, ,

where K is the kernel and I the identity matrix• Predict new samples x as

wmin yi w,x i

2w 2

i

K In n1

i x i, xi

minw

y i w , (x i) w 2

i

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



K Nearest Neighbors

M

k

jk

ik

Euclidij xxD

1

2)(

M

k

jk

ik

Manhatij xxD

1

tan

neighbourskjj

predi yy

k1

neighbourskjj

ij

neighbourskj ij

predi y

DD

y 11

1Non-weighted Weighted

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Overfitting by Variable Selection inkNN

Golbraikh A., Tropsha A. Beware of q2! JMGM, 2002, 20, 269-276

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Artificial Neuron

wi5wi4

wi3wi2wi1

o5

o3 o4

o1

O2O2o2

oi = f(neti)

oi

neti = wjioj-ti

)1(1)( xexf

xx

xf,0,1

)(

Transfer function

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Multilayer Neural NetworkInput Layer

Hidden Layer

Output Layer

Neurons in the input layer correspond to descriptors, neurons in the output layer– to properties being predicted, neurons in the hidden layer – to nonlinear latentvariables

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Generalized Delta-RuleThis is application of the steepest descent method to training backpropagationneural networks

')(ji

jji

ij

empij y

deedf

yw

Rw

– learning rate constant

1974

Paul WerbosDavid

RummelhardJames

McClelland

1986

Geoffrey Hinton

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Multilayer Neural NetworkInput Layer

Hidden Layer

Output Layer

The number of weights corresponds to the number ofadjustable parameters of the method

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Origin of “Rule of 2”The number of weights (adjustable parameters) for the case of one hidden layer

W = (I+1)H + (H+1)O

2.28.1T.A. Andrea and H. Kalayeh, J. Med. Chem., 1991, 34, 2824-2836.

Parameter :WN

End of “Rule of 2”I.V. Tetko, D.J. Livingstone, Luik, A.I. J. Chem. Inf. Comput. Sci., 1995, 35, 826-833.

Baskin, I.I. et al. Foundations Comput. Decision. Sci. 1997, v.22, No.2, p.107-116.

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Overtraining and Early Stopping

training set

test set 1

test set 2

point for early stopping

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Associative Neural Network (ASNN)Associative Neural Network (ASNN)

A prediction of casei:

x i ANNE M z i

z1i

zki

zMi

Pearson’s (Spearman) correlation coefficient rij=R(zi,zj)>0 in space ofresiduals

Mk

iki z

Mz

,1

1

)(

1

ikNjjjkii zyzz

x

Ensemble approach:Ensemble approach:

<<= ASNN bias correction<<= ASNN bias correction

The correction of neural network ensemble value is performed using errors (biases)calculated for the neighbor cases of analyzed case xxii detected in space of neuralnetwork models

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Correction of a model bythe nearest neighbors

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



0

0.2

0.4

0.6

0.8

1

-3 -2 -1 0 1 2 30

0.2

0.4

0.6

0.8

1

-3 -2 -1 0 1 2 3

A B

-4

-2

0

2

4

-3 -2 -1 0 1 2 3

C

x

x x

x2

x1

y y

Detection of nearestneighbors in space ofmodels uses invariants in“structure-property” space.

Nearest neighbors for Gauss functionClic

k here

to buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



ASNN local correction stronglyimproves prediction of extreme data

Tetko, Methods Mol Biol, 2008, 458, 826-833.

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Prediction of similar properties

.

0

0.25

0.5

0.75

1

0 1 2 3

t raining "LogP"

"LogD"

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



10 new points are measured

0

0.25

0.5

0.75

1

0 1 2 3

t raining "LogP"

"LogD"

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Library mode (no retraining)

0

0.25

0.5

0.75

10 1 2 3

t raining "LogP"

"LogD"

Library "LogD"

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Library mode vs development ofa new model using 10 cases

0

0.25

0.5

0.75

10 1 2 3

t arget funct ion"LogD"

Library "LogD"

Ret rainingfrom "scrat h"

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



How to validate a model?

• Training/Validation set partition– How to select validation set?– How many molecules should be in it?

• N-fold cross-validation

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



What is n-cross-validation?

White & blue rectangles are training & validation sets, respectively.

n=5

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Conclusions

• There are many machine learning methods• Different problems may require different methods• All methods could be prone of overfitting• But all of them have facilities to tackle this

problem• To correctly evaluate performance of your model

use validation set or better n-fold cross-validation(built-in option)

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Exam. Question 1

What is it?

1. Support VectorRegression

2. BackpropagationNeural Network

3. Partial Least SquaresRegression

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Exam. Question 2Which method is not prone to overfitting?

1. Multiple Linear Regression2. Partial Least Squares3. Support Vector Regression4. Backpropagation Neural Networks5. K Nearest Neighbours

Neither!

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



Thank you for your attention!

Click h

ere to

buy

ABB


www.ABBYY.comClic

k here

to buy

ABB


www.ABBYY.com



modern machine learning regression methods

Education