session 06 machine learning.pptx
TRANSCRIPT
![Page 1: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/1.jpg)
Machine LearningData science for beginners, session 6
![Page 2: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/2.jpg)
Machine Learning: your 5-7 things
Defining machine learningThe Scikit-Learn libraryMachine learning algorithmsChoosing an algorithmMeasuring algorithm performance
![Page 3: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/3.jpg)
Defining Machine Learning
![Page 4: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/4.jpg)
Machine Learning = learning models from data
Which advert is the user most likely to click on?Who’s most likely to win this election?Which wells are most likely to fail in the next 6 months?
![Page 5: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/5.jpg)
Machine Learning as Predictive Analytics...
![Page 6: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/6.jpg)
Machine Learning Process
● Get data● Select a model● Select hyperparameters for that model● Fit model to data● Validate model (and change model, if necessary)● Use the model to predict values for new data
![Page 7: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/7.jpg)
Today’s library: Scikit-Learn (sklearn)
![Page 8: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/8.jpg)
Scikit-Learn’s example datasets
● Iris
● Digits
● Diabetes
● Boston
![Page 9: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/9.jpg)
Select a Model
![Page 10: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/10.jpg)
Algorithm Types
Supervised learningRegression: learning numbersClassification: learning classes
Unsupervised learningClustering: finding groupsDimensionality reduction: finding efficient representations
![Page 11: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/11.jpg)
Linear Regression: fit a line to (numerical) data
![Page 12: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/12.jpg)
Linear Regression: First, get your dataimport numpy as npimport pandas as pd
gen = np.random.RandomState(42)num_samples = 40
x = 10 * gen.rand(num_samples)y = 3 * x + 7+ gen.randn(num_samples)X = pd.DataFrame(x)
%matplotlib inlineimport matplotlib.pyplot as pltplt.scatter(x,y)
![Page 13: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/13.jpg)
Linear Regression: Fit model to data
from sklearn.linear_model import LinearRegression
model = LinearRegression(fit_intercept=True)model.fit(X, y)
print('Slope: {}, Intercept: {}'.format(model.coef_, model.intercept_))
![Page 14: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/14.jpg)
Linear Regression: Check your model
Xtest = pd.DataFrame(np.linspace(-1, 11))predicted = model.predict(Xtest)
plt.scatter(x, y)plt.plot(Xtest, predicted)
![Page 15: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/15.jpg)
Reality can be a little more like this…
![Page 16: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/16.jpg)
Classification: Predict classes
● Well pump: [working, broken]
● CV: [accept, reject]
● Gender: [male, female, others]
● Iris variety: [iris setosa, iris virginica, iris versicolor]
![Page 17: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/17.jpg)
Classification: The Iris Dataset Petal
Sepal
![Page 18: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/18.jpg)
Classification: first get your data
import numpy as np
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
Y = iris.target
![Page 19: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/19.jpg)
Classification: Split your data
ntest=10np.random.seed(0)indices = np.random.permutation(len(X))
iris_X_train = X[indices[:-ntest]]iris_Y_train = Y[indices[:-ntest]]
iris_X_test = X[indices[-ntest:]]iris_Y_test = Y[indices[-ntest:]]
![Page 20: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/20.jpg)
Classifier: Fit Model to Data
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5, metric='minkowski')
knn.fit(iris_X_train, iris_Y_train)
![Page 21: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/21.jpg)
Classifier: Check your model
predicted_classes = knn.predict(iris_X_test)
print('kNN predicted classes: {}'.format(predicted_classes))
print('Real classes: {}'.format(iris_Y_test))
![Page 22: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/22.jpg)
Clustering: Find groups in your data
![Page 23: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/23.jpg)
Clustering: get your data
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
Y = iris.target
print("Xs: {}".format(X))
![Page 24: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/24.jpg)
Clustering: Fit model to data
from sklearn import cluster
k_means = cluster.KMeans(3)
k_means.fit(iris.data)
![Page 25: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/25.jpg)
Clustering: Check your model
print("Generated labels: \n{}".format(k_means.labels_))
print("Real labels: \n{}".format(Y))
![Page 26: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/26.jpg)
Dimensionality Reduction
![Page 27: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/27.jpg)
Dimensionality reduction: Get your data
![Page 28: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/28.jpg)
Dimensionality reduction: Fit model to data
![Page 29: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/29.jpg)
Recap: Choosing an Algorithm
Have: data and expected outputsWant numbers? Try regression algorithmsWant classes? Try classification algorithms
Have: just dataWant to find structure? Try clustering algorithmsWant to look at it? Try dimensionality reduction
![Page 30: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/30.jpg)
Model Validation
![Page 31: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/31.jpg)
How well does the model fit new data?
“Holdout sets”:
split your data into training and test sets
learn your model with the training set
get a validation score for your test set
Models are rarely perfect… you might have to change parameters or model
● underfitting: model not complex enough to fit the training data
● overfitting: model too complex: fits the training data well, does badly on test
![Page 32: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/32.jpg)
Overfitting and underfitting
![Page 33: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/33.jpg)
The Confusion Matrix
True positiveFalse positiveFalse negativeTrue negative
![Page 34: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/34.jpg)
Test MetricsPrecision:
of all the “true” results, how many were actually “true”?Precision = tp / (tp + fp)
Recall: how many of the things that were really “true” were marked as “true” by the
classifier?Recall = tp / (tp + fn)
F1 score: harmonic mean of precision and recallF1_score = 2 * precision * recall / (precision + recall)
![Page 35: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/35.jpg)
Iris classification: metrics
from sklearn import metrics
print(metrics.classification_report(iris_Y_test, predicted_classes))
![Page 36: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/36.jpg)
Exercises
![Page 37: Session 06 machine learning.pptx](https://reader036.vdocuments.net/reader036/viewer/2022062503/58ecd8d01a28ab0e278b469b/html5/thumbnails/37.jpg)
Explore some algorithms
Notebooks 6.x contain examples of machine learning algorithms. Run them, play with the numbers in them, break them, think about why they might have broken.