Perceptron
Ruy Luiz Milidiú
Resumo ObjetivoExaminar o modelo do perceptron,
seu algoritmo de aprendizado e limitações
Sumário O Perceptron Aprendizado Limitações
O Perceptron Rosenblatt
50’s
O Perceptron
“oito”
O Perceptron
“dois”
O Perceptron de Rosenblat Exemplos
(x1, y1), (x2, y2), … , (xn, yn) xi {0,1}n , yi {0,1}
Neurônio Não-linear a= w0 . 1 + w1. x ativação linear ŷ = f(a) disparo não-linear w0 , w1 = ?
Desempenho E Erro = (ŷ1 – y1)2 + … + (ŷn – yn)2
Disparo por Limiar Se a < 0 w0 limiar
f(a) = 0 não dispara
Se a 0 w0 limiar f(a) = 1 dispara
f(a) = [sign(a) + 1]/2
Disparo por Limiar
0
0.5
1
-7 -5 -3 -1 1 3 5 7
Disparo Sigmóide
0
0.5
1
-7 -5 -3 -1 1 3 5 7
Disparo Sigmóide f(a) = 1 / (1+exp(-k.a)) f(a) 0 f(-) = 0 f(0) = ½ f() = 1 f’(a) = k.exp(-k.a) / (1+exp(-k.a))2
f’(a) = k.f(a).[1-f(a)] f’(a) 0 f’(0) = k/4
f’’(a) = k2.f(a).[1-f(a)].[1-2.f(a)] f’’(0) = k2.f(0).[1-f(0)].[1-2.1/2] = 0 inflexão
Disparo Sigmóidef(a) = 1 / (1+exp(-k.a))
f’(a) = k.f(a).[1-f(a)]
f’’(a) = k2.f(a).[1-f(a)].[1-2.f(a)]
Aprendizado para Sigmóide
Minimizar Erro … E(w) = (ŷ1 – y1)2 + … + (ŷn – yn)2
E(w) = [f(w.xi) – yi]2
Diferenciando…E/w = 2. [f(w.xi) – yi].f’(w.xi).xi
E/w = 2.k. [f(w.xi) – yi].f(w.xi).[1 – f(w.xi)].xi
wn = wn-1 + .[yn- f(wn-1.xi)].f(wn-1.xi).[1 – f(wn-1.xi)].xnT
Da Sigmóide para a Limiar
Minimizar Erro … E(w) = (ŷ1 – y1)2 + … + (ŷn – yn)2
E(w) = [f(w.xi) – yi]2
Diferenciando…E/w = 2. [f(w.xi) – yi].f’(w.xi).xi
f’(w.xi) ≈ 1
wn = wn-1 + .[yn- f(wn-1.xi)].f(wn-1.xi).[1 – f(wn-1.xi)].xnT
Da Sigmóide para a Limiar
Minimizar Erro … E(w) = (ŷ1 – y1)2 + … + (ŷn – yn)2
E(w) = [f(w.xi) – yi]2
Diferenciando…E/w = 2. [f(w.xi) – yi].f’(w.xi).xi
f’(w.xi) ≈ 1
wn = wn-1 + .[yn- f(wn-1.xi)].f(wn-1.xi).[1 – f(wn-1.xi)].xnT
Aprendizado para Limiar
Minimizar Erro … E(w) = (ŷ1 – y1)2 + … + (ŷn – yn)2
E(w) = [f(w.xi) – yi]2
Regra do Perceptron…
wn = wn-1 + [yn- f(wn-1.xn)].xnT
Regra do Perceptron
wn = wn-1 + [yn- f(wn-1.xn)].xnT
yn = f(wn-1.xn)wn = wn-1 + 0.xn
T
wn = wn-1
yn = 0 e f(wn-1.xn) = 1 yn = 1 e f(wn-1.xn) = 0wn = wn-1 + (-1).xn
T wn = wn-1 + (1).xnT
Binary Classification Two classes
Y or N 0 or 1 -1 or 1 or
Linear Classifier
w.x + b = 0
w.x + b < 0
w.x + b > 0
y(x) = sign(wT.x + b)
Linear Separators Training instances
x m
y {-1, 1}
Hyperplane <w,x> = 0 w1x1 + w2x2 … + wmxm = 0
w m
Decision functionf(x) = sign(<w,x>)
Algoritmo do Perceptronw 0error TRUEwhile error
error FALSEfor i := 1 to n
if yi.<w,xi> 0 erradow w + yi.xi
error TRUE
Convergência Perceptron Disparo por limiar Aprendizado
Regra do Perceptron Teorema do Perceptron
Separabilidade Linear Teorema
Separabilidade linear
1 1
11
1 1 1
1 111
11
1
1 00
0000
0000
0
0
0
00
0
0
0
0
OR1
10
1
AND1
00
0
Learningexampl
ex0 x1 x2 y
I -1 0 0 0
II -1 0 1 0
III -1 1 0 0
IV -1 1 1 1
Round 1ŷ w0 w1 w2 x0 x1 x2 y 0 1 2
1 0 0 0 -1 0 0 0 1 0 00 1 0 0 -1 0 1 00 1 0 0 -1 1 0 00 1 0 0 -1 1 1 1 -1 1 1
0 1 1
Round 2ŷ w0 w1 w2 x0 x1 x2 y 0 1 2
0 0 1 1 -1 0 0 01 0 1 1 -1 0 1 0 1 0 -11 1 1 0 -1 1 0 0 1 -1 00 2 0 0 -1 1 1 1 -1 1 1
1 1 1
Round 3ŷ w0 w1 w2 x0 x1 x2 y 0 1 2
0 1 1 1 -1 0 0 01 1 1 1 -1 0 1 0 1 0 -10 2 1 0 -1 1 0 00 2 1 0 -1 1 1 1 -1 1 1
1 2 1
Round 4ŷ w0 w1 w2 x0 x1 x2 y 0 1 2
0 1 2 1 -1 0 0 01 1 2 1 -1 0 1 0 1 0 -11 2 2 0 -1 1 0 0 1 -1 00 3 1 0 -1 1 1 1 -1 1 1
2 2 1
Round 5ŷ w0 w1 w2 x0 x1 x2 y 0 1 2
0 2 2 1 -1 0 0 00 2 2 1 -1 0 1 01 2 2 1 -1 1 0 0 1 -1 00 3 1 1 -1 1 1 1 -1 1 1
2 2 2
Round 6 and 6.5ŷ w0 w1 w2 x0 x1 x2 y 0 1 2
0 2 2 2 -1 0 0 01 2 2 2 -1 0 1 0 1 0 -10 3 2 1 -1 1 0 01 3 2 1 -1 1 1 10 3 2 1 -1 0 0 00 3 2 1 -1 0 1 0
Teorema do Perceptron w* define um separador linear
|w*| = 1 y.(w*.x) y=sign(w*.x)
|x| ≤ Rentão
cos(n) = w*.wn / |wn| n . / R
n R2 / 2
Teorema do Perceptron
1 1
11
1 1 1
1 111
11
1
1 00
0000
0000
0
0
0
00
0
0
0
0
R
w
w*.wn
w*.(wn-1 + yn.xn)w*.wn-1 + yn.w*.xn
w*.wn-1 + sign(w*.xn).w*.xn
w*.wn-1 + |w*.xn| w*.wn-1 + w*.0 + n.
w*.wn n.
wn.wn
(wn-1 + yn.xn)T.(wn-1 + yn.xn)T
wn-1T.wn-1 + 2.yn.(wn-1
T.xn ) + yn2.xn
T.xnwn-1
T.wn-1 + 2.yn.(wn-1T.xn ) + xn
T.xn wn-1
T.wn-1 + xnT.xn
wn-1T.wn-1 + R2
0.0 + n.R2 = n.R2
|wn| R.n
cos()
n . / R ≤ w*.wn / |wn|
por Cauchy-Schwarz n . / R ≤ |w*|.|wn| / |wn| = 1.|wn|/|wn| n . / R ≤ 1
n R2 / 2
Limitações Minsky & Papert 69
XOR Padrões de conectividade
Aprendizado p/multicamadas não disponível
Lacuna de 20 ano$$$$ …
XOR1
01
0
Padrões de Conectividade
L + C + R < T
L’ + C + R > T
L + C + R’ > T
L’ + C + R’ > L + C + R’ > T
0
1
1
1
XORy = XOR(x1, x2)
y = x1+x2 – 2.x1.x2
quadrática
duas camadas
aprendizado?
-0.5
-1.5
-0.5
1
-11
1
1
1
X1
X2
Y