introduction to artificial intelligence · - multilayer perceptron (mlp) - feed forward neural...
Post on 01-May-2020
23 Views
Preview:
TRANSCRIPT
Introduction to Artificial Intelligence
COMP307 Neural Network 1: perceptron
Yi Meiyi.mei@ecs.vuw.ac.nz
1
Outline• Why ANN?• Origin• Perceptron• Perceptron learning• What can (not) perceptron learn• Extending perceptron
2
Why ANN?
3
• Killer applications in a lot of areas– Computer Vision/Image processing– Game playing (AlphaGo, Watson, …)– Big data
Origin• Human brain shows amazing capability in
– Learning– Perception– Adaptability– …
• Simulate human brain to achieve the above functionalities• ANN models for this purpose
4
Origin
5
Artificial Neuron
7
𝑧"
zj =mX
i=1
wjixi + bj
yj = '(zj)
Activation Functions
8
Threshold Sigmoid
yj =
(1, if zj > 0,
0, otherwiseyj =
1
1 + e��zj
Perceptron• A special type of artificial neuron
– Real-valued inputs– Binary output– Threshold activation function
9
yj =
(1, if
Pmi=1 wjixi + bj > 0,
0, otherwise
y =
(1, if x1 + x2 > 0,
0, otherwise.
Perceptron• To perform linear classification
– 2 inputs: a line– 3 inputs: a plane
• Can do online learning– Update 𝑤"$ and 𝑏" along with new examples
10
yj =
(1, if
Pmi=0 wjixi > 0,
0, otherwise
Learning Perceptron• How to get the optimal weights and bias?• Only consider accuracy
– Optimal if 100% accuracy on training set– Can have many optimal solutions
• To facilitate notation, transform bias to a weight 𝑤!" = 𝑏!, and 𝑥" = 1 always holds
11
yj =
(1, if
Pmi=1 wjixi + bj > 0,
0, otherwise
bj = wj0 · 1 = wj0x0
Learning Perceptron• Idea
– Initialise weights and threshold randomly (or all zeros)– Given a new example 𝑥', 𝑥), … , 𝑥+, 𝑑
• Input feature vector: 𝑥', 𝑥), … , 𝑥+• Output (class label): 𝑑• Predicted output 𝑦
– If 𝑦 = 0 and 𝑑 = 1, • increase 𝑏 = 𝑤1, increase 𝑤$ for each positive 𝑥$, decrease 𝑤$ for each
negative 𝑥$, – If 𝑦 = 1 and 𝑑 = 0,
• decrease 𝑏 = 𝑤1, decrease 𝑤$ for each positive 𝑥$, increase 𝑤$ for each negative 𝑥$,
– Repeat the process for each new example until the desired behaviour is achieved
12
y =
(1, if
Pmi=0 wixi > 0,
0, otherwise
Learning Perceptron• Implementation
– Initialise weights and threshold randomly (or all zeros)– Given a new example 𝑥', 𝑥), … , 𝑥+, 𝑑
• Input feature vector: 𝑥', 𝑥), … , 𝑥+• Output (class label): 𝑑• Predicted output 𝑦
– Where 𝜂 ∈ [0,1] is called the learning rate– Repeat the process for each new example until the desired
behaviour is achieved
13
wi wi + ⌘(d� y)xi, i = 0, 1, 2, . . . ,m
y =
(1, if
Pmi=0 wixi > 0,
0, otherwise
Problem with Perceptron• What can the perceptron learn?
• Perceptron convergence theorem: The perceptron learning algorithm will converge if and only if the training set is linearly separable.
• Cannot learn for XOR (Minsky and Papert, 1969)15
Multi-Layer Perceptron• Add one hidden node between the inputs and output
16
Threshold
x1 x2 y (class)
0 0 0
1 0 1
0 1 1
1 1 0 y =
(1, if
Pi wixi � T > 0,
0, otherwise
Neural Network• Add more hidden layers• Add more nodes for hidden layers
17
COMP307 ML5 (NNs):
Multilayer Perceptron (Neural Networks) • Change one or two layers of nodes to three or more layers
- Multilayer perceptron (MLP)
- Feed forward neural networks
- Standard feed forward networks: nodes in neighbouring layers are fully connected
- Input layer: input patterns/features
- Output layer: output patterns/class labels
- Hidden layer(s): high level features
3COMP307 ML5 (Neural Networks): 3
Multilayer Perceptron (Neural Networks)
• Change one or two layers of nodes to three or more layers
– Multilayer perceptron (MLP)
– Feed forward neural networks
– Standard feed forward networks: nodes in neighbouring layers are fully
connected
layerHidden Hidden OutputInput
layer layer layer
– Input layer: input patterns/features
– Output layer: output patterns/class labels
– Hidden layer(s): high level features
Summary• ANN has many killer applications
– Image analysis, game playing, …
• Perceptron – the simplest neural network• How to learn a perceptron• Limitation of perceptron, and general neural
network
• Reading: Text book section 20.5 (2nd edition) or section 18.7 (3rd edition) or Web materials
18
top related