applied machine learning lecture 4-1: linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · applied...

26

Upload: others

Post on 26-May-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

Applied Machine Learning

Lecture 4-1: Linear models

Selpi ([email protected])

The slides are further development of Richard Johansson's slides

January 31, 2020

Page 2: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

Overview

Linear classi�ers and the perceptron

Linear regressors

Page 3: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

Linear classi�er

I Has scoring function looks like:

score = w · x

I whereI x is a vector with features of what we want to classify (e.g.

made with a DictVectorizer)I w is a vector representing which features the classi�er thinks

are importantI · is the dot product between the two vectors

I there are two classes: binary classi�cationI return the �rst class if the score > 0I . . . otherwise the second class

I the essential idea: features are scored independently

Page 4: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

quick note

I sometimes, linear classi�ers are expressed as

score = w · x + b

where b is an �o�set� or intercept

Page 5: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

interpreting linear classi�ers

I each weight in w corresponds to a featureI e.g. "fine" probably has a high positive weight in sentiment

analysisI "boring" a negative weightI "and" near zero

Page 6: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

Geometric view

I Linear classi�ers are a class of geometric models.

I Geometrically, a linear classi�er is a classi�er that usehyperplanes to separate/classify the examples/vector spaceinto two regionsI using lines to separate 2-dimensional dataI using planes to separate 3-dimensional data

Page 7: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

example: plotting the decision boundary in scikit-learn

I See notebook

Page 8: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

training linear classi�ers

I the family of learning algorithms that create linear classi�ers isquite largeI perceptron, Naive Bayes, support vector machine, logistic

regression/MaxEnt, . . .

I their underlying theoretical motivations can di�er a lot but inthe end they all return a weight vector w

Page 9: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

the perceptron learning algorithm

I start with an empty weight vector (all zeros)

I repeat: classify according to the current weight vector

I each time we misclassify, change the weights a bitI if a positive instance was misclassi�ed, add its features to the

weight vectorI if a negative instance was misclassi�ed, subtract . . .

Page 10: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

the perceptron learning algorithm, formally

w = (0, . . . , 0) (← a weight vector of all zeros)repeat N timesfor (x i , yi ) in the training set

score = w · x iif a positive instance is misclassi�edw = w + x i

else if a negative instance is misclassi�edw = w − x i

return w

Page 11: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

a historical note

I the perceptron was invented in 1957 byFrank Rosenblatt

I initially, a lot of hype!

I the realization of its limitations led to abacklash against machine learning in generalI the nail in the co�n was the publication in

1969 of the book Perceptrons by Minskyand Papert

I new hype in the 1980s, and now. . .

[source]

Page 12: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

linear separability

I a dataset is linearly separable if there exists a w that givesus perfect classi�cation

I theorem: if the dataset is linearly separable, then theperceptron learning algorithm will �nd a separating w in a�nite number of steps

I what if the dataset is not linearly separable?

Page 13: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

a simple example of linear inseparability

x1 x2 y

0 0 00 1 11 0 11 1 0

very good Positivevery bad Negativenot good Negativenot bad Positive

Page 14: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

coding a linear classi�er using NumPy

class LinearClassifier(object):

def predict(self, x):

score = x.dot(self.w)

if score >= 0.0:

return self.positive_class

else:

return self.negative_class

Page 15: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

better: handle all instances at the same time

class LinearClassifier(object):

def predict(self, X):

scores = X.dot(self.w)

out = numpy.select([scores>=0.0, scores<0.0],

[self.positive_class,

self.negative_class])

return out

Page 16: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

side note: what happened in that code?

>>> import numpy

>>> scores = numpy.array([-1, 2, 3, -4, 5])

>>> scores >= 0

array([False, True, True, False, True], dtype=bool)

>>> scores < 0

array([ True, False, False, True, False], dtype=bool)

>>> numpy.select([scores >= 0, scores < 0], ["positive", "negative"])

array(['negative', 'positive', 'positive', 'negative', 'positive'],

dtype='|S8')

https://docs.scipy.org/doc/numpy-1.15.4/reference/

generated/numpy.select.html

Page 17: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

perceptron reimplementation in NumPy

class Perceptron(LinearClassifier):

def __init__(self, n_iter=10):

self.n_iter = n_iter

def fit(self, X, Y):

n_features = X.shape[1]

self.w = numpy.zeros( n_features )

for i in range(self.n_iter):

for x, y in zip(X, Y):

score = self.w.dot(x)

if score <= 0 and y == self.positive_class:

self.w += x

elif score >= 0 and y == self.negative_class:

self.w -= x

Page 18: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

still too slow. . .

I this implementation uses NumPy's dense vectors

I with a large training set with lots of features, it may be betterto use SciPy's sparse vectors

I however, w is a dense vector and I found it a bit tricky to mixsparse and dense vectors

I this is the best solution I've been able to come up with for thetwo operations w · x and w+=x

def sparse_dense_dot(x, w):

return numpy.dot(w[x.indices], x.data)

def add_sparse_to_dense(x, w, xw):

w[x.indices] += xw*x.data

Page 19: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

reimplementation with sparse vectors

class SparsePerceptron(LinearClassifier):

# ...

def fit(self, X, Y):

# ... some initialization

for i in range(self.n_iter):

for x, y in zip(X, Y):

score = sparse_dense_dot(x, self.w)

if score <= 0 and y == self.positive_class:

add_sparse_to_dense(x, self.w, 1)

elif score >= 0 and y == self.negative_class:

add_sparse_to_dense(x, self.w, -1)

Page 20: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

Linear classi�ers in scikit-learn

I small selection of learning algorithms:I sklearn.linear_model.PerceptronI sklearn.linear_model.LogisticRegression (lecture 5)I sklearn.svm.LinearSVC (lecture 5)

I to compute the score function: model.decision_function(x)

I after training, weights are stored in the attribute model.coef_

Page 21: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

Overview

Linear classi�ers and the perceptron

Linear regressors

Page 22: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

linear regression models

I a linear regression model predicts numerical output valueslike this:

y = w · xI explanation of the parts:

I again, x is an encoded feature vectorI w is a vector representing relationships between features and

the output y

Page 23: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

recall: analytic solution to least squares regression

w = (XTX )−1

XTY

I not the way it's implemented in scikit-learn: any idea why?

Page 24: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

recall: analytic solution to least squares regression

w = (XTX )−1

XTY

I not the way it's implemented in scikit-learn: any idea why?

Page 25: Applied Machine Learning Lecture 4-1: Linear modelsrichajo/dit866/lectures/l4/l4_1.pdf · Applied Machine Learning Lecture 4-1: Linear models Selpi (selpi@chalmers.se) The slides

the Widrow-Ho� algorithm

I almost like the perceptron!

I NB the �learning rate� η

w = (0, . . . , 0)repeat N timesfor (x i , yi ) in the training set

g = w · x ierror = g − yiw = w − η · error · x i

return w