2nd session machine learning: feed-forward neural networks...

1

2nd Session

Machine learning: feed-forward

neural networks and

self-organizing maps

2

Recommended reading

� J. Zupan, J. Gasteiger, Neural Networks in Chemistry

and Drug Design: An Introduction, Wiley-VCH,

Weinheim, 1999.

� Chemoinformatics - A Textbook, eds. Johnann

Gasteiger and Thomas Engel, Wiley-VCH, 2003.

� Handbook of Chemoinformatics, ed. Johnann Gasteiger,

Wiley-VCH, 2003.

3

Neural networks

Information processing systems

inspired on biological nervous systems.

Ability to learn from observations:

Extract knowledge

Identify relationships

Identify structures

Generalize

4

Statistical methods process information and ‘learn’.

The brain learns with no statistical methods!

Neural networks simulate nervous systems using algorithms

and mathematical models

NNs are interesting from a neuroscience point of view as models of

the brain.

NNs are interesting for computer science as computational tools.

Neural networks

5

input

output

A black box ?

Neural networks

6

input

output

Connected

functional units

NEURONS

Neural networks

7

The biological neuron

Cell body

Dendrites

Axon

The human nervous system has ca. 1015 neurons.

Transmission of an electric signal between dendrites and axons occurs

through the transport of ions.

Axon terminal

8

Neurons in the superficial layers of the visual cortex in the brain of a mice.

PLoS Biology Vol. 4, No. 2, e29 DOI: 10.1371/journal.pbio.0040029

The biological neuron

9

Synapses – neuron junctions

Axon – Dendrite : chemical signal (neurotransmitter).

Signal is transmitted in only one direction.

Some neurons are able to modify the signal transmission at the synapses.

10

Loss of connections between neurons in the Alzheimer disease

Synapses – neuron junctions

11

Neural networks

Similar neurons in different species.

The same type of signal.

What is essential is the whole set of neurons, and the connections.

THE NETWORK

12

Signal transmission at the synapse

The transmitted signal depends on the received signal and

the synaptic strength.

In artificial neurons, the synaptic strength is called weight.

w

s

p = ws

Signal s sent from a previous

neuron

Synapse with weight w

Signal p arriving at the neuron

after crossing a synapse

13

Synapses and learning

� Learning and memory are believed to result

from long-term changes in synaptic strength.

� In artificial neural networks, learning occurs

by correcting the weights.

14

Weights and net input

Each neuron receives signals (si) from many neurons.

0.1

-0.1

0.2

0.4

-0.3

0.5

0.2

0.2

-0.04 Net input = 0.04 = 0.4×0.2 – 0.1×0.1 –

– 0.5×0.3 + 0.2×0.2

inputs

synapses

15

Transfer functions

The net input is modified by a transfer function into an outputOut = f (Net)

16

Sigmoid transfer function

Out = 1 / (1 + e -Net)

Important: it is non-linear!

Derivative is easy to calculate:

d(Out) / d(Net) = Out (1-Out)

17

Simulation of an artificial neuron

http://lcn.epfl.ch/tutorial/english/aneuron/html/index.html

18

The ‘100 steps paradox’

� A neuron recovers approximately one millisecond (10-3 s) after

firing.

� The human brain is able to perform intelligent processes, such

as recognizing a friend's face or reacting to some danger, in

approximately one tenth of a second.

� Highly complex tasks have to be performed in less than 100

steps ?!

� Conclusion: many tasks must be performed simultaneously and

in parallel.

19

Neural network

Input layerInput layer

Hidden layerHidden layer

Output layerOutput layer

Input data

Output values

...

20

Architecture of a neural network

...

• Number of inputs and outputs

• Number of layers

• Number of neurons in each layer

• Number of weights in each neuron

• How neurons are connected

• Which neurons receive corrections

21

The ‘feed-forward’ or ‘backpropagation’ NN

Input data

22

The ‘backpropagation’ learning algorithm

1. Assignment of random values to neurons.

2. Input of an object X.

3. Computation of output values from all neurons in all layers.

4. Comparison of final output values with target values and

computation of an error.

5. Computation of corrections to be applied to the weights of

the last layer.

6. Computation of corrections to be applied to the weights of

the penultimate layer.

7. Application of corrections.

8. Return to step 2.

23

Introduction of a momentum parameter µ.

Correction = computed correction + µ × previous correction

The ‘backpropagation’ learning algorithm

24

Steps in the training of a BPG NN

� Analysis of the problemWhich inputs ? How many ?

Which output(s) ? How many ?

� Data pre-processingNormalization (output varies within ]0,1[ !).

Splitting into training, test, and prediction sets.

� Training with the training set and monitoring with the test set

(to decide when training shall be stopped).

� Repetition of training with different parameters (nr of hidden

neurons, rate, and momentum) until the best network is found for

the test set.

� Application of the best network found to the prediction set.

� Evaluation

25

Monitoring the training of a BPG NN

Stop training

26

BPG NNs using JATOON softwarehttp://www.dq.fct.unl.pt/staff/jas/jatoon

Training set

Test set

Optimum nr of epochs

27

BPG NNs in QSPRExample: prediction of 1H NMR chemical shifts

O

A

A

B

C

C

D

E

F

G

A

B

C

D

EF

G

Chemical shift (ppm)

BPG NNs

Training set with exp. values

Input: descriptors of H-atoms

Output: chemical shift

Y. Binev, J. Aires-de-Sousa; J. Chem. Inf. Comput. Sci. 2004, 44(3), 940-945.

28

Predictions with ASNN

Test with 952 + 259 protons

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

9

Pre

dic

ted C

hem

ical S

hift

Experimental Chemical Shift

Aromatics-Set A

Pi-Set A

Aliphatics-Set A

Rigids-Set A

Aromatics-Set B

Pi-Set B

Aliphatics-Set B

Rigids-Set B

R2= 0.9830

29

Prediction of 1H NMR spectra using BPG NNsThe SPINUS program: www.dq.fct.unl.pt/spinus

30

Self-organizing maps

31

Kohonen neural networks“self-organizing maps (SOMS)”

Algebraic view of a data set(values, signals, magnitudes,...)

vs.

Topological view of a data set(relationships between information)

32

Kohonen neural networks“self-organizing maps (SOMS)”

These are two-dimensional arrays of neurons that reflect as well as possible the topology of information, that is, the relationships between

individual pieces of data and not their magnitude.

Compression of information

Mapping on a 2D surface.

“Self-Organized Topological Features Maps”Preserve topology.

33

Kohonen neural networks

Goal

Mapping similar signals

onto neighbor neurons

34


Similar signals in neighbor neurons

Do similar signals correspond to the same class?

YESNO

35

Kohonen neural networksArchitecture

One layer of neurons.

36

Kohonen neural networksArchitecture

One layer of neurons.

n weights for each

neuron

(n = number of

inputs)

37

Kohonen neural networksTopology

Definition of distance between neurons

Neuron

1st neighborhood

2nd neighborhood

The output of a neuron

only affects neighbor

neurons

38

Kohonen neural networksToroidal surface

Neighborhood

Neuron

1st neighborhood

2nd neighborhood

39

Kohonen neural networksCompetitive learning

After the input, only one neuron is activated

(central neuron or winning neuron)

The central neuron is the one with the

most similar weights to the input.

Traditionaly, similarity = Euclidean distance

2

1

)(i

n

i

ixw∑

=

−n – number of inputs

w – value of the weight

x – value of the input

40


winning

neuron

weights

41


The weights of the winning neuron are corrected to make them even more similar

to the input. The weights of neighbor neurons are also adapted with the same

goal but to a lesser extent.

Neuron

1st neighborhood

2nd neighborhood

42


The correction of the neighbor neurons after the activation of a neuron depends on:

1. The distance to the winning neuron (the farther, the smaller thecorrection)

2. The stage of the training (at the beginning corrections are moredrastic)

3. The difference between the weight and the input (the larger the difference, the stronger the correction).

43

Kohonen neural networksNormalization of data

The activation of neurons, and the corrections, depend on the Euclidean distance.

If the values of a descriptor are in a wider range than another, it will have a larger impact on the result.

Therefore, for all descriptors to make a similar impact, NORMALIZATION of data is required.

44


Example of normalization:

1. Find the maximum (MAX) and the minimum (MIN) value for a

descriptor.

2. Replace each value x by (x-MIN)/(MAX-MIN)

(now the descriptor varies between 0 and 1)

or by 0.1 + 0.8×(x-MIN)/(MAX-MIN)

(the descriptor will vary between 0.1 and 0.9, useful for BPG

networks)

45


Another example of normalization (z normalization):

1. Calculate the average (aver) and the standard deviation (sd) for a

descriptor.

2. Replace each value x by (x-aver)/sd

(the normalized descriptor will have average = 0 and standard

deviation = 1)

46

Kohonen neural networks : Application

Geographical classification of crude oil samples for the identification of spill sources.

From chemical features of oils.

Database of chemical features of oils from different geographical origins.

Sample(chemical features )

NEURALNETS

Geographical class

A. M. Fonseca, J. L. Biscaya, J. Aires-de-Sousa, A. M. Lobo,"Geographical

classification of crude oils by Kohonen self-organizing maps", Anal. Chim. Acta

2006, 556 (2), 374-382.

47

Chemical features of oils

Content in several compoundsdetermined by GC / MS

Examples

• (22R)17α(H),21β(H)-30,31-Bishomohopane / 17α(H),21β(H)-Hopane

• 18α(H)-Oleanane / 17α(H),21β(H)-Hopane

• 1-Isopropyl-2-methylnaphtalene

• 3-Methylphenanthrene

• 1-Methydibenzothiophene

3- Methylphenanthrene

H

H

H

H

18α(H)-Oleanane

48

Vector input

GC/MS descriptors for a

sample of oil


Weights

Winning neuron

49

Test set:

• 55 samples• 70% correct predictions

Test set:

• 55 samples• 70% correct predictions

Training set:

• 133 samples• 20 different geographical origins• 21 descriptors

• Good clustering• 97% correct predictions

Training set:

• 133 samples• 20 different geographical origins• 21 descriptors

• Good clustering• 97% correct predictions

Results

50

Input

layer

Output

layer

Counterpropagation (CPG) neural network

SOM with an output layer

51

Submission ofinput

input

output

Training of a CPG neural network

Correction of

the weights

at the input

layer

Correction of the

corresponding

weights at the

output layer

52

Submission ofinput

input

Prediction by a CPG neural network

prediction

53

A CPG neural network with several outputs

Prediction

Input

layer

Output

layer

Winning neuron

Training

54

CPGNN: application

Ability of a compound to bind GPCR (G-Protein-Coupled Receptors)P.Selzer, P. Ertl, QSAR Comb. Sci. 2005, 24, 270-276; J. Chem. Inf. Model. 2006, 46 (6), 2319 -2323.

55

CPGNN: application

Prediction of the ability to bind GPCR (G-Protein-Coupled Receptors)P.Selzer, P. Ertl, QSAR Comb. Sci. 2005, 24, 270-276; J. Chem. Inf. Model. 2006, 46 (6), 2319 -2323.

CPG network of size 250×250

Training set:24870 molecules randomly taken from catalogs (“drug-like”)

1709 known GPCR ligands

Input: 225 descriptors (RDF descriptors)

Output: 9 levels (GPCR and sub-family “adrenalin, bradykinin, dopamine,

endothelin, histamine, opioid, serotonin, vasopressin”). Binary values (0/1)

according to ‘YES’ or ‘NO’.

56

CPGNN: application to predict GPCR binding

P.Selzer, P. Ertl, QSAR Comb. Sci. 2005, 24, 270-276;

J. Chem. Inf. Model. 2006, 46 (6), 2319 -2323.

Results:1st output level(GPCR ligand)

Weight values are translated into colors.

Regions activated by ligands

57


P.Selzer, P. Ertl, QSAR Comb. Sci. 2005, 24, 270-276; J. Chem. Inf. Model. 2006, 46 (6), 2319 -2323.

Results:output levels nr 4 (‘dopamine’) e nr 7 (‘opioid’)

58


P.Selzer, P. Ertl, QSAR Comb. Sci. 2005, 24, 270-276; J. Chem. Inf. Model. 2006, 46 (6), 2319 -2323.

Results:

Test set

(25096 non-GPCR and 1490 GPCR)

71% of ligands correctly predicted

18% false positives

59

SOMs in the JATOON programhttp://www.dq.fct.unl.pt/staff/jas/jatoon

‘Paste’ data

60


Visualization of the

distribution of the objects.

Neurons colored

according to the classes

of the objects activating

them.

61


Distribution of the

objects.

62


Inspection of the weights

at level 2 of the input

layer.

2nd session machine learning: feed-forward neural networks...

Documents