greg grudicintro ai1 introduction to artificial intelligence csci 3202: the perceptron algorithm...
Post on 21-Dec-2015
237 views
TRANSCRIPT
![Page 1: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/1.jpg)
Greg Grudic Intro AI 1
Introduction to Artificial IntelligenceCSCI 3202:
The Perceptron Algorithm
Greg Grudic
![Page 2: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/2.jpg)
Greg Grudic Intro AI 2
Questions?
![Page 3: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/3.jpg)
Greg Grudic Intro AI 3
Binary Classification
• A binary classifier is a mapping from a set of d inputs to a single output which can take on one of TWO values
• In the most general setting
• Specifying the output classes as -1 and +1 is arbitrary!– Often done as a mathematical convenience
{ }inputs:
output:
1, 1
d
y
x Î Â
Î - +
![Page 4: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/4.jpg)
Greg Grudic Intro AI 4
A Binary Classifier
ClassificationModelx ˆ 1, 1y
Given learning data: ( ) ( )1 1, ,..., ,N Ny yx x
A model is constructed:
( )M x
![Page 5: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/5.jpg)
Greg Grudic Intro AI 5
Linear Separating Hyper-Planes
1x
2x
01
0d
i ii
xb b=
+ £å
01
0d
i ii
xb b=
+ >å
01
0d
i ii
xb b=
+ =å
1y=-
1y=+
![Page 6: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/6.jpg)
Greg Grudic Intro AI 6
Linear Separating Hyper-Planes
• The Model:
• Where:
• The decision boundary:
( )0 1ˆ ˆ ˆˆ ( ) sgn ,..., T
dy M b b bé ù= = +ê úë ûx x
[ ]1 if 0
sgn1 otherwise
AA
ì >ïï=íï -ïî
( )0 1ˆ ˆ ˆ,..., 0T
db b b+ =x
![Page 7: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/7.jpg)
Greg Grudic Intro AI 7
Linear Separating Hyper-Planes
• The model parameters are:
• The hat on the betas means that they are estimated from the data
• Many different learning algorithms have been proposed for determining
( )0 1ˆ ˆ ˆ, ,..., db b b
( )0 1ˆ ˆ ˆ, ,..., db b b
![Page 8: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/8.jpg)
Greg Grudic Intro AI 8
Rosenblatt’s Preceptron Learning Algorithm
• Dates back to the 1950’s and is the motivation behind Neural Networks
• The algorithm:– Start with a random hyperplane– Incrementally modify the hyperplane such that
points that are misclassified move closer to the correct side of the boundary
– Stop when all learning examples are correctly classified
( )0 1ˆ ˆ ˆ, ,..., db b b
![Page 9: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/9.jpg)
Greg Grudic Intro AI 9
Rosenblatt’s Preceptron Learning Algorithm
• The algorithm is based on the following property:– Signed distance of any point to the boundary is:
• Therefore, if is the set of misclassified learning examples, we can push them closer to the boundary by minimizing the following
( ) ( )( )0 1 0 1ˆ ˆ ˆ ˆ ˆ ˆ, ,..., ,..., T
d i d ii M
D yb b b b b bÎ
=- +å x
( ) ( )0 1
0 1
2
1
ˆ ˆ ˆ,...,ˆ ˆ ˆ,...,
ˆ
Td T
dd
ii
dx
xb b b
b b b
b=
+= µ +
æ ö÷ç ÷ç ÷ç ÷è øå
x
M
![Page 10: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/10.jpg)
Greg Grudic Intro AI 10
Rosenblatt’s Minimization Function
• This is classic Machine Learning!
• First define a cost function in model parameter space
• Then find an algorithm that modifies such that this cost function is minimized
• One such algorithm is Gradient Descent
( )0 1 01
ˆ ˆ ˆ ˆ ˆ, ,...,d
d i k iki M k
D y xb b b b bÎ =
æ ö÷ç=- + ÷ç ÷ç ÷çè øå å
( )0 1ˆ ˆ ˆ, ,..., db b b
![Page 11: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/11.jpg)
Greg Grudic Intro AI 11
Gradient Descent
( )0 1ˆ ˆ ˆ, ,..., dD b b b =
0b̂ = 1̂b =
![Page 12: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/12.jpg)
Greg Grudic Intro AI 12
The Gradient Descent Algorithm
( )0 1ˆ ˆ ˆ, ,...,
ˆ ˆˆ
d
i i
i
D b b bb b r
b
¶¬ -
¶
Where the learning rate is defined by: 0r >
![Page 13: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/13.jpg)
Greg Grudic Intro AI 13
The Gradient Descent Algorithm for the Perceptron
0 0
11 1
ˆ ˆ
ˆ ˆ
ˆ ˆ
i
i i
i idd d
y
y x
y x
b b
b br
b b
æ ö æ ö æ ö÷ ÷ç ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ ÷¬ - ç ÷ç ç÷ ÷ ç ÷÷ ÷ç ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ç÷ ÷ çè øç ç÷ ÷ç ç ÷÷ ÷è ø è øç ç÷ ÷
MM M
( )0 1
0
ˆ ˆ ˆ, ,...,
ˆd
ii M
Dy
b b b
b Î
¶=-
¶å
( )0 1ˆ ˆ ˆ, ,...,
, 1,...,ˆ
d
i iji Mj
Dy x j d
b b b
b Î
¶=- =
¶å
0 0
11 1
ˆ ˆ
ˆ ˆ
ˆ ˆ
ii M
i ii M
d d i idi M
y
y x
y x
b b
b br
b b
Î
Î
Î
æ ö- ÷ç ÷æ ö æ ö ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ ÷ -çç ç ÷÷ ÷ ç ÷ç ç÷ ÷ ÷çç ç÷ ÷¬ - ÷çç ç÷ ÷ ÷ç÷ ÷ç ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ ÷ ç ÷ç ç÷ ÷è ø è ø - ÷ç ÷ç ÷çè ø
å
å
åM M M
Update One misclassified point at a time (online)
Update all misclassified points at once (batch)
Two Versions of the Perceptron Algorithm:
![Page 14: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/14.jpg)
Greg Grudic Intro AI 14
The Learning Data
• Matrix Representation of N learning examples of d dimensional inputs
11 1 1
1
, d
N Nd N
x x y
X Y
x x y
æ ö æ ö÷ ÷ç ç÷ ÷ç ç÷ ÷ç ç÷ ÷= =ç ç÷ ÷ç ç÷ ÷ç ç÷ ÷÷ ÷ç çè ø è ø
K
M O M M
L
( ) ( )1 1, ,..., ,N Ny yx xTraining Data:
![Page 15: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/15.jpg)
Greg Grudic Intro AI 15
The Good Theoretical Properties of the Perceptron Algorithm
• If a solution exists the algorithm will always converge in a finite number of steps!
• Question: Does a solution always exist?
![Page 16: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/16.jpg)
Greg Grudic Intro AI 16
Linearly Separable Data
• Which of these datasets are separable by a linear boundary?
+
a) b)
+
+
-
-
-
+
+
-
-
-
![Page 17: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/17.jpg)
Greg Grudic Intro AI 17
Linearly Separable Data
• Which of these datasets are separable by a linear boundary?
+
a) b)
+
+
-
-
-
+
+
-
-
- NotLinearly
Separable!
![Page 18: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/18.jpg)
Greg Grudic Intro AI 18
Bad Theoretical Properties of the Perceptron Algorithm
• If the data is not linearly separable, algorithm cycles forever!– Cannot converge!– This property “stopped” active research in this area between
1968 and 1984…• Perceptrons, Minsky and Pappert, 1969
• Even when the data is separable, there are infinitely many solutions– Which solution is best?
• When data is linearly separable, the number of steps to converge can be very large (depends on size of gap between classes)
![Page 19: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/19.jpg)
Greg Grudic Intro AI 19
What about Nonlinear Data?
• Data that is not linearly separable is called nonlinear data
• Nonlinear data can often be mapped into a nonlinear space where it is linearly separable
![Page 20: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/20.jpg)
Greg Grudic Intro AI 20
Nonlinear Models
• The Linear Model:
• The Nonlinear (basis function) Model:
• Examples of Nonlinear Basis Functions:
01
ˆ ˆˆ ( ) sgnd
i ii
y M xb b=
é ùê ú= = +ê úë û
åx
( )01
ˆ ˆˆ ( ) sgnk
i ii
y M b bf=
é ùê ú= = +ê úë û
åx x
( ) ( ) ( ) ( ) ( )2 21 1 2 2 3 1 2 4 55sinx x x x xff ff= = = =x x x x
![Page 21: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/21.jpg)
Greg Grudic Intro AI 21
Linear Separating Hyper-Planes In Nonlinear Basis Function Space
1f
2f
01
0k
i ii
b bf=
+ £å
01
0k
i ii
b bf=
+ >å
01
0k
i ii
b bf=
+ =å
1y=-
1y=+
![Page 22: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/22.jpg)
Greg Grudic Intro AI 22
An Example
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
x1
x 2
: y=+1: y=-1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 = x
12
2 = x
22
: y=+1: y=-1
F
![Page 23: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/23.jpg)
Greg Grudic Intro AI 23
Kernels as Nonlinear Transformations
• Polynomial
• Sigmoid
• Gaussian or Radial Basis Function (RBF)
( ) ( )( ) ( )
( ) 2
2
, ,
, tanh ,
1, exp
2
k
i j i j
i j i j
i j i j
K q
K
K
x x x x
x x x x
x x x x
k q
s
= +
= +
æ ö÷ç= - - ÷ç ÷÷çè ø
![Page 24: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/24.jpg)
Greg Grudic Intro AI 24
The Kernel Model
( )01
ˆ ˆˆ ( ) sgn ,N
i ii
y M Kx x xb b=
é ùê ú= = +ê úë û
å
( ) ( )1 1, ,..., ,N Ny yx xTraining Data:
The number of basis functions equals the number of training examples!
- Unless some of the beta’s get set to zero…
![Page 25: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/25.jpg)
Greg Grudic Intro AI 25
Gram (Kernel) Matrix
( ) ( )
( ) ( )
1 1 1
1
, ,
, ,
N
N N N
K K
K
K K
æ ö÷ç ÷ç ÷ç ÷=ç ÷ç ÷ç ÷ç ÷÷çè ø
x x x x
x x x x
K
M O M
L
( ) ( )1 1, ,..., ,N Ny yx xTraining Data:
Properties:•Positive Definite Matrix
•Symmetric•Positive on diagonal
•N by N
![Page 26: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/26.jpg)
Greg Grudic Intro AI 26
Picking a Model Structure?
• How do you pick the Kernels?– Kernel parameters
• These are called learning parameters or hyperparamters– Two approaches choosing learning paramters
• Bayesian– Learning parameters must maximize probability of correct
classification on future data based on prior biases• Frequentist
– Use the training data to learn the model parameters – Use validation data to pick the best hyperparameters.
• More on learning parameter selection later
( )0 1ˆ ˆ ˆ, ,..., db b b
![Page 27: Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic](https://reader036.vdocuments.net/reader036/viewer/2022062313/56649d5f5503460f94a403ae/html5/thumbnails/27.jpg)
Greg Grudic Intro AI 27
Perceptron Algorithm Convergence
• Two problems:– No convergence when data is not separable in
basis function space– Gives infinitely many solutions when data is
separable
• Can we modify the algorithm to fix these problems?