duality - university of rhode island · duality duality is a powerful algorithm design tool that...
TRANSCRIPT
Duality
Duality is a powerful algorithm design tool that allows us to explore different algorithmicalternatives based on the inherent constraints of the underlying problem.
In our case the constraints on the solution (the position of the decision surface) are theobservations close to the class boundaries where the classes meet.
We find that the dual solution is dominated by the constraints represented by the pointsclose to the class boundaries.
– p. 1/
Perceptron Learning
...
repeat
for i = 1 to l
if sign(w • xi − b) �= yi then
w ← w + ηyixi
b← b− ηyir2
end if
end for
until sign(w • xj − b) = yj with j = 1, . . . , l
...
Observation: w will contain many copies of the scaled position vector xi for “difficult”points; points that are difficult to classify.
Idea: Instead of updating the normal vector of the decision surface, count the number oftimes a particular point is being misclassified.
– p. 2/
Duality
Introduce a new variable that keeps track of the misclassification counts:
α = (α1, . . . , αl)
Then we can write our learning algorithm as,
Initialize α and b to 0.
repeat
for each (xi, yi) ∈ D do
if f̂(xi) �= yi then
Increment αi by 1
Update b
end if
end for
until D is perfectly classified
return α and b
Problem: How do we represent our free parameter w using α? Or f̂ for that matter?
– p. 3/
Duality
+
w =lX
i=1
ηαiyixi
= ηlX
i=1
αiyixi.
– p. 4/
Duality
This means that we can rewrite the normal vector w as the following sum,
w =lX
i=1
αiyixi
where
αi ≈ 0 for “easy” points,
αi � 1 for “difficult” points.
Note: The learning rate η is simply a scaling constant of the resulting normal vector w
and since we are primarily interested in the orientation of the decision surface it iscustomary to drop this constant.
– p. 5/
Duality
Now our perceptron decision function now looks like
f̂(x) = sign(w • x − b)
= sign
(
lXi=1
αiyixi • x) − b
!
where l is the number of instances in D.
– p. 6/
Duality
let D = {(x1, y1), (x2, y2), . . . , (xl, y)} ⊂ Rn × {+1,−1}
let 0 < η < 1
α← 0
b← 0
r ← max{|x| | (x, y) ∈ D}repeat
for i = 1 to l
if sign(Pl
j=1 αjyjxj • xi − b) �= yi then
αi ← αi + 1
b← b− ηyir2
end if
end for
until sign(Pl
j=1 αjyjxj • xk − b) = yl with k = 1, . . . , l
return (α, b)
– p. 7/
Duality
The two perceptron learning algorithms give rise to different representations of theinduced decision surface.
++
-- -
-
+
-
++
-- -
-
+
-
– p. 8/
Duality
The dual algorithm still can give rise to degenerate decision surfaces.
+
++
--
-
-
-
– p. 9/