lecture 20081 bioinspired computing lecture 3 biological neural networks and artificial neural...
TRANSCRIPT
lecture 2008 1
Bioinspired ComputingLecture 3
Biological Neural Networks and
Artificial Neural NetworksBased on slides from
Netta Cohen
lecture 2008 2
We introduced swarm intelligence.
We saw how many simple agents can follow simple rules
that allow them to collectively perform more complex
tasks.
Last week:
Today...Biological systems whose manifest function is information
processing: computation, thought, memory, communication
and control. We begin a dissection of a brain:
How different is a brain from an artificial computer?
How can we build and use artificial neural networks?
lecture 2008 3
Investigating the brain
Thecomputer
Input
program Output
Summon Scottie, your engineer
to disassemble the machines into component parts, test each part (electronically, optically, chemically…), decode the machine language, and study how components are connected.
to connect to the input & output ports of a machine, find a language to communicate with it & write computer programs to test the system’s response by measuring its speed, efficiency & performance at different tasks.
Summon Data - your software wiz
Imagine landing on an abandoned alien planet and finding thousands of alien computers. You and your crew’s mission is to find out how they work. What do you do?
part#373a
Inputs Outputs
lecture 2008 4
The brain as a computerHigher level functions in animal behaviour
• Gathering data (sensation)• Inferring useful structures in data (perception)• Storing and recalling information (memory)• Planning and guiding future actions (decision)• Carrying out the decisions (behaviour)• Learning consequences of these actions
Hardware functions and architectures
• 10 billion neurons in human cortex• 10,000 synapses (connections) per neuron• Machine language: 100mV, 1-2msec spikes (action potential)• Specialised regions & pathways (visual, auditory, language…)
lecture 2008 5
Special task: program often hard-coded into system.
Hardware not hard: plastic, rewiring.
No clear hierarchy. Bi-directional feedback up & down the system.
Unreliable components. Parallelism, redundancy appear to compensate.
Output doesn’t always match input: Internal state is important.
Development & evolutionary constraints are crucial.
Universal, general-purpose. Software: general, user-supplied.
Hardware is hard:Only upgraded in discrete units.
Obvious hierarchy: each component has a specific function.
Once burned in, circuits run without failure for extended lifetimes.
Input-output relations are well-defined.
Engineering design depends on engineer. Function is not an issue.
The brain as a computerversus
lecture 2008 6
Neuroscience pre-history• 200 AD: Greek physician Galen hypothesises that nerves carry signals back & forth between sensory organs & the brain.
• 17th century: Descartes suggests that nerve signals account for reflex movements.
• 19th century: Helmholtz discovers the electrical nature of these signals, as they travel down a nerve.
• 1838-9: Schleiden & Schwann systematically study plant & animal tissue. Schwann proposes the theory of the cell (the basic unit of life in all living things).
• Mid-1800s: anatomists map the structure of the brain.
but…
The microscopic composition of the brain remains elusive. A raging debate surrounds early neuroscience research, until...
lecture 2008 7
1) Neurons are cells: distinct entities (or agents).
2) Inputs & outputs are received at junctions called synapses.
3) Input & output ports are distinct. Signals are uni-directional from input to output.
Today, neurons (or nerve cells) are regarded as the basic information processing unit of the nervous system.
The neuron doctrine Ramon y Cajal (1899)
neuronInputs
Outputs
lecture 2008 8
lecture 2008 9
Neuron details
lecture 2008 10
Organisation of neurons
lecture 2008 11
Ion channels and spiking
• Membrane potential negative (inside /outside)• Na+ would like to rush in but can’t• Depolarisation opens Na+ channels, Na+
flows in• Chain reaction! More Na+ flows in!• This opens K+ channels, K+ flows out:
hyperpolarisation
lecture 2008 12
Macaque brain (Felleman & van Essen 1991)
lecture 2008 13
lecture 2008 14
• Both have well-defined inputs and outputs. • Both are basic information processing units that comprise computational networks.
If transistors can perform logical operations, maybe neurons can too?
The neuron as a transistor
Neuronal function is typically modelled by a combination of • a linear operation (sum over inputs) and • a nonlinear one (thresholding).
input neuron output
This simple representation relies on Cajal’s concept of
15lecture 2008
The basic “bit” of information is represented by neurons in spikes. The cell is said to be either at rest or active. A spike (action potential) is a strong, brief electrical pulse. Since these action potentials are mostly identical, we can safely refer to them as all-or-none signals.
Machine language
Why Spikes?
Why don’t neurons use analog signals? One answer lies in the network architecture: signals cover long distances (both within the brain and throughout the body). Reliable transmissions requires strong pulses.
lecture 2008 16
Computation of a pyramidal neuron
Single all-or-none
output
Many inputs (dendrites)
soma axon
lecture 2008 17
We can now summarise our working principles:
• The basic computational unit of the brain is the neuron.
• The machine language is binary: spikes.
• Communication between neurons is via synapses.
However, we have not yet asked how information is encoded in the brain, how it is processed in the brain, and whether what goes on in the brain is really ‘computation’.
From transistors to networks
lecture 2008 18
Examples of both neural codes and distributed representations have been found in the brain. Example in the visual system: colour representation, face recognition, orientation, motion detection, & more…
Information codes
Temporal code Neural code
Rate code Population code/
Distributed code
noise
http://www.cs.stir.ac.uk/courses/31YF/Notes/Notes_NC.html
lecture 2008 19
Example. A spike train produced by a neuron over an interval of 100ms is recorded. Neurons can produce a spike every 2ms.
Therefore, rates (individual code words) can be produced by this neuron.
In contrast, if the neuron were using temporal coding, up to 250 different words could be represented.
In this sense, temporal coding is much more powerful.
Information content
51 different
250 different
lecture 2008 20
Temporal codes rely on a noise-free signal transmission. Thus, we would expect to find very few ‘redundant’ neurons with co-varying outputs in that network. Accordingly, an optimal temporal coding circuit might tend to eliminate redundancy in the pattern of inputs to different neurons.
On the other hand, if neural information is carried by a noisy rate-based code, then noise can be averaged out over a population of neurons. Population coding schemes, in which many neurons represent the same information, would therefore be the norm in those networks.
Experiments on various brain systems find either coding systems, and in some cases, combinations of temporal and rate coding are found.
Circuitry depends on neural code
lecture 2008 21
Having introduced neurons, neuronal circuits and even information codes with well defined inputs and outputs, we still have not mentioned the term computation. Is neuronal computation anything like computer computation?
Neuronal computation
101111If read 1, write 0, go right, repeat.
If read 0, write 1, HALT!
If read , write 1, HALT!
In a computer program, variable have initial states, there are possible transitions, and a program specifies the rules. The same is true for machine language. To obtain an answer at the end of a computation, the program must HALT.
Does the brain initialise variables? Does the brain ever halt?
lecture 2008 22
Answer: The input causes the network to enter an initial state. The state of the neural network then evolves until it reaches some new stable state.
The new state is associated with the input state.
One recasting of biological brain function in these computational terms was proposed by John Hopfield in the 1980s as a model for associative memory.
Associationan example of bio-computation
Question: How does the brain associate some memory with a given input?
lecture 2008 23
Association (cont.)
Whatever initial condition is chosen, the system will follow a well-defined route through state-space that is guaranteed to always reach some stable point (i.e., pattern of activity)
Hopfield’s ideas were strongly motivated by existing theories of self-organisation in neural networks. Today, Hopfield nets are a successful example of bio-inspired computing (but no longer believed to model computation in the brain).
Trajectories in a schematic state space
lecture 2008 24
Learning
No discussion of the brain, or nervous systems more generally is complete without mention of learning.
• What is learning?• How does a neural network ‘know’ what computation to perform?• How does it know when it gets an ‘answer’ right (or wrong)? • What actually changes as a neural network undergoes ‘learning’?
brainSensory inputs
Motor outputs
body
environment
lecture 2008 25
Learning (cont.)
Learning can take many forms:• Supervised learning• Reinforcement learning• Association• Conditioning• Evolution
At the level of neural networks, the best understood forms of
learning occur in the synapses, i.e., the strengthening and
weakening of connections between neurons. The brain uses its
own learning algorithms to define how connections should
change in a network.
lecture 2008 26
How do the neural networks form in the brain? Once formed, what determines how the circuit might change?
Learning from experience
In 1948, Donald Hebb, in his book, "The Organization of Behavior", showed how basic psychological phenomena of attention, perception & memory might emerge in the brain.
Hebb regarded neural networks as a collection of cells that can collectively store memories. Our memories reflect our experience.
How does experience affect neurons and neural networks?How do neural networks learn?
lecture 2008 27
Synaptic Plasticity
Definition of Learning: experience alters behaviour
The basic experience in neurons is spikes.Spikes are transmitted between neurons through synapses.
Hebb suggested that connections in the brain change in response to experience.
Pre-synaptic cell
Post-synaptic celldelay
time
Hebbian learning: If the pre-synaptic cell causes the post-synaptic cell to fire a spike, then the connection between them will be enhanced. Eventually, this will lead to a path of ‘least resistance’ in the network.
lecture 2008 28
Today... From biology to information processing
Next time... Artificial neural networks (part 1)
Focus on the simplest cartoon models of biological neural nets. We will build on lessons from today to design simple artificial neurons and networks that perform useful computational tasks.
At the turn of the 21st century, “how does it work” remains an open question. But even the kernel of understanding and simplified models we already have for various brain function are priceless, in providing useful intuition and powerful tools for bioinspired computation.
lecture 2008 29
The Appeal of Neural ComputingThe only intelligent systems that we know of are biological. In particular most brains share the following feature in their neural architecture – they are massively parallel networks organised into interconnected hierarchies of complex structures.
In addition, they are very good at some tasks that computers are typically poor at:• recognising patterns, balancing conflicts, sensory-
motor coordination, interaction with the environment, anticipation, learning… even curiosity, creativity & consciousness.
• speed, tolerance, robustness, flexibility, self-driven dynamic activity
For computer scientists, many natural systems appear to share many attractive properties:
lecture 2008 30
The first artificial neuron modelIn analogy to a biological neuron, we can think of a virtual neuron that crudely mimics the biological neuron and performs analogous computation.
The artificial neuron is a cartoon model that will not have all the biological complexity of real neurons. How powerful is it?
Just like biological neurons, this artificial neuron neuron will have:
• Inputs (like biological dendrites) carry signal to cell body.
inputs
• A body (like the soma), sums over inputs to compute output, and
Σcell
body
• outputs (like synapses on the axon) transmit the output downstream.
output
lecture 2008 31
Early history (1943)
In this seminal paper, Warren McCulloch and Walter Pitts invented the first artificial (MP) neuron, based on the insight that a nerve cell will fire an impulse only if its threshold value is exceeded. MP neurons are hard-wired devices, reading pre-defined input-output associations to determine their final output. Despite their simplicity, M&P proved that a single MP neuron can perform universal logic operations.
A network of such neurons can therefore do anything a Turing machine can do, but with a much more flexible (and potentially very parallel) architecture.
McCulloch & Pitts (1943). “A logical calculus of the ideas immanent in nervous activity”, Bulletin of Mathematical Biophysics, 5, 115-137.
lecture 2008 32
*
*
*
*
*
(
)over all i
• Weighted inputs are summed in the cell body.
The McCulloch-Pitts (MP) neuron
The “computation” consists of "adders" and a threshold.
• Each input has an assigned weight w.
• • •
w1
w2
w3
wn
weig
hts
inputs
x1
x2
x3
xn
• • •
inp
uts
• Inputs x are binary: 0,1.
1 if 0 if <
output
=
• Otherwise, the output=0.• If the neuron fires, the output =1.
• Neuron fires if sum exceeds (or equals) activation threshold .
Note: an equivalent
formalism assigns =0 & instead of threshold introduces an extra bias input, such that
bias * wbias = -
bias*wb
lecture 2008 33
IN 1 OUT 1
0 0
1 0
Always 0
IN 1 OUT 2
0 0
1 1
IDENTITY
IN 1 OUT 3
0 1
1 0
NOT
IN 1 OUT 4
0 1
1 1
Always 1
For binary logic gates, with only one input, possible outputs are described by the following truth tables:
For example:
Logic gates with MP neurons
NOT xx
w
= -0.5
w = -1
Excercise: Find w and for the 3 remaining gates.
lecture 2008 34
0 1 0 0 0 1 0 1
IN 1
IN 2
Here is a compact,graphical representationof the same truth table:
Logic gates with MP neurons (cont.)
IN 1 IN 2 OUT 0 0 0 0 1 0 1 0 0 1 1 1
With two binary inputs, there are 4 possible inputs and 24 = 16 corresponding truth tables (outputs)!
For example, the AND gate implemented in the MP neuron:
= +1.5
1
1 x1 AND x2
x1
x2
Excercise: Find w and for OR & NAND.
lecture 2008 35
Computational power of MP neuronsUniversality: NOT & AND can be combined to perform any logical function; MP neurons, circuited together in a network can solve any problem that a conventional computer could.
But let’s examine the single neuron a little longer.
Q: Just how powerful is a single MP neuron?
A: It can solve any problem that can be expressed as a classification of points on a plane by a single straight line.
Generalisation to many inputs: points in many dimensions are now classified, not by a line, but by a flat surface.
0 1 0 0 0 1 0 1
IN 1
IN 2
AND
Even one neuron can successfully handle simple classification problem.
lecture 2008 36
trace 1
2.49.81.20.47.96.7etc.
trace 2
1.08.30.22.18.87.2etc.
problem?
yesnoyesyesnonoetc.
outputsum
∑xi wi
*in
puts
x1
bias
x2
weig
hts
w1
w3
w2
∑ xi wi
∑ xi wi
∑ xi wi
∑ xi wi
∑ xi wi
∑ xi wi
etc.
output
+6.6-8.1
etc.
+8.6
+7.5
-6.7
-3.9
w1=-1, w2=-1, w3=+10 & bias=+1
output
Yes No
etc.
Yes Yes No No
Classification in ActionA set of patients may have a medical problem. Blood samples are analysed for the quantities of two trace elements.
With correct weights, this MP neuron consistently classifies patients.
+ive output = problem
lecture 2008 37
The missing step
The ability of the neuron to classify inputs correctly hinges on the appropriate assignment of the weights and threshold.
So far, we have done this by hand.
Imagine we had an automatic algorithm for the neuron to learn the right weights and threshold on its own.
In 1962, Rosenblatt, inspired by biological learning rules, did just that.
Frank Rosenblatt (1962). Principles of Neurodynamics, Spartan, New York
lecture 2008 38
The perceptron algorithm
Take wj random
START: Take X ε F+ U F-
CHECK: if x ε F+ and Σ wjxj > 0 goto STARTif x ε F+ and Σ wjxj ≤ 0 goto ADD
if x ε F- and Σ wjxj ≤ 0 goto STARTif x ε F- and Σ wjxj > 0 goto SUB
ADD: wj → wj + xj
goto STARTSUB: wj → wj - xj
goto START:
lecture 2008 39
The Perceptron Theorem
• Says that the previous algorithm will converge on a set of weights in a finite number of steps if w* exists
lecture 2008 40
Imagine a naive, randomly weighted neuron. One way to train a neuron to discriminate the sick from the healthy, is by reinforcing good behaviour and penalising bad. This carrot & stick model is the basis for the learning rule:
• Initialise the neuronal weights (random initialisation is the standard).
• Run each input set in turn through the neuron & note its output.
• Whenever a wrong output is encountered, alter responsible weights.
Learning Rule:
• Repeatedly run through training set until all outputs agree with targets.
wi wi + xi if output too lowwi wi xi if output too high
• When training is complete, test the neuron on a new testing set of patients.• If neuron succeeds, patients whose health is unknown may be determined.
• Compile a training set of N (say 100) sick and healthy patients.
lecture 2008 41
Related idea
• Minimize E = Σi(ti – oi )2
• Here:– t is the desired output– o is the observed output
• Find weights that minimize E
• Steepest gradient descent will also yield
lecture 2008 42
Supervised learningThe learning rule is an example of supervised learning.
Training MP neurons requires a training set, for which the ‘correct’ output is known.
These ‘correct’ or ‘desired’ outputs are used to calculate the error, which in turn is used to adjust the input-output relation of the neuron.
Without knowledge of the desired output, the neuron cannot be trained. Therefore, supervised learning is a powerful tool when training sets with desired outputs are available.
When can’t supervised learning be used?
Are biological neurons supervised?
lecture 2008 43
A simple exampleLet’s try to train a neuron to learn the logical OR operation:
0
1
0∑ xi wi 0
x1 OR x2
0
1
1
1
desired outputx2
0
1
0
1
x1
0
0
1
1
x3
1
1
1
1
bias
x1
x3
x2
w1
w3
w2 ∑ xi wi output
wi wi + xi if output lowwi wi xi if output high
Decision ( le 0 or gt 0)
Example on white board
lecture 2008 44
Some common variations on this learning rule:
Adding a learning rate 0<r<1 which “damps” weight changes (i = rxi or i = -rxi).
Widrow & Hoff recognised that weight changes should be large when actual output a and target output t were very different, but smaller otherwise.
They introduced an error term, ∆=t-a, such that i =r∆xi.
The power of learning rulesThe rule is guaranteed to converge on a set of appropriate weights, if a solution exists. While it might not be the most efficient of algorithms, this proven convergence is crucial.
What can be done to improve the convergence rate?
lecture 2008 45
Learning Rule
• Called rule because weight updates have the following form
• w → w + x is a measure for the error: = 0,
no weight change
lecture 2008 46
The Perceptron convergence Theorem
• Suppose there are two sets:• F+ and F- ; F+ ∩ F- empty
Goal:• X ε F+ → Σ wjxj > 0• X ε F- → Σ wjxj < 0• If there are wj* for which this is true, then the
following algorithm finds wj (possibly different ones) which also do the trick
lecture 2008 47
The Fall of the Artificial Neuron
Marvin Minsky & Seymour Papert (1969). Perceptrons, MIT Press, Cambridge.
• Before long researchers had begun to discover the neuron’s limitations.• Unless input categories were “linearly separable”, a perceptron could not
learn to discriminate between them.• Unfortunately, it appeared that many important categories were not
linearly separable. This proved a fatal blow to the artificial neural networks community.
Successful
Unsuccessful
Many Hours in the Gym per Week
Few Hours in the Gym
per Week
Footballers
Academics
In this example, an MP neuron would not be able to discriminate between the footballers and the academics…
This failure caused the majority of researchers to walk away.
Exercise: Which logic operation is described in this example?
lecture 2008 48
Connectionism Reborn
David E. Rumelhart & James L. McClelland (1986).Parallel Distributed Processing, Vols. 1 & 2, MIT Press, Cambridge, MA.
Most influential of these was a two-volume book by Rumelhart & McClelland, who suggested a feed-forward architecture of neurons: layers of neurons, with each layer feeding its calculations on to the next.
The crisis in artificial neural networks can be understood, not as an inability to connect many neurons in a network, but an inability to generalise the training algorithms to arbitrary architectures. By arranging the neurons in an ‘appropriate’ architecture, a suitable training algorithm could be invented. The solution, once found, quickly emerged as the most popular learning algorithm for nnets.
Back-propagation first discovered in 1974 (Werbos, PhD thesis, Harvard) but discovery went unnoticed. In the mid-80s, it was rediscovered independently by three groups within about one year.
lecture 2008 49
This time…• The appeal of neural computing
• From biological to artificial neurons
• Nervous systems as logic circuits
• Classification with the McCulloch & Pitts neuron
• Developments in the 60s:
– The Delta learning rule & variations
– Simple applications
– The fatal flaw of linearity
Next time…The disappointment with the single neuron dissipated as promptly as it dawned upon the AI community. Next time, we will see why the single neuron’s simplicity does not rule out immense richness at the network level. We will examine the simplest architecture of feed-forward neural networks and generalise the delta-learning rule to these multi-layer networks. We will also re-discover some impressive applications.
lecture 2008 50
Optional readingExcellent treatments of the perceptron, the delta rule & Hebbian learning, the multi-layer perceptron and the back-propagation learning algorithm can be found in:
Beale & Jackson (1990). Neural Computing, chaps. 3 & 4.
Hinton (1992). How neural networks learn from experience, Scientific American, 267 (Sep):104-109.