machine learning 101 sit hvr

21
Machine Learning 101 Fred Verheul

Upload: fredverheul

Post on 19-Mar-2017

424 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Machine learning 101 sit hvr

Machine Learning 101Fred Verheul

Page 2: Machine learning 101 sit hvr

2

What we won’t cover…

• Deep learning / Neural Networks

• Specifics of ML-algorithms

• Tools / Libraries / Code

• SAP Products, like HANA / Predictive Analytics / Vora / …

• Ethics, algorithmic transparency & fairness

• Hardware

Page 3: Machine learning 101 sit hvr

3

Examples: Recommender systems

Page 4: Machine learning 101 sit hvr

4

Examples, continued…

SPAM-filtering

Handwriting recognition

Page 5: Machine learning 101 sit hvr

5

ML in the news: Deepmind’s AlphaGo

Page 6: Machine learning 101 sit hvr

6

Page 7: Machine learning 101 sit hvr

7

Machine Learning

"Field of study that gives computers the ability to learnwithout being explicitly programmed” (Arthur Samuel, 1959)

Page 8: Machine learning 101 sit hvr

8

What is Machine Learning?

Computer

Computer

Traditional Programming

Machine Learning

Data

Data

Program Output

ProgramOutput

Page 9: Machine learning 101 sit hvr

9

Sweet spot for Machine Learning

• It’s impossible to write down the rules in code:• Too many rules• Too many factors influencing the rules• Too finely tuned• We just don’t know the rules (image recognition)

• Lots of labeled data (examples) available (e.g. historical data)

Page 10: Machine learning 101 sit hvr

10

Basic Machine Learning ‘workflow’

Feature Vectors

Training data

Labels

Machine Learning Algorithm

Feature Vectors

New data Prediction

Training Phase

Operational Phase

Predictive Model

Page 11: Machine learning 101 sit hvr

11

Training Phase in more detail

Raw dataData

preparation Feature Vectors

Training Data

Test data

Model Building (by ML

algorithm)

Model Evaluation

Predictive Model

Feedback loop

data cleansingdata transformation

normalizationfeature extraction

aka ‘learning’

Page 12: Machine learning 101 sit hvr

12

CRISP-DM: data mining process

ML important

ML important

Page 13: Machine learning 101 sit hvr

13

Examples of ML tasksSupervised learning

Regression target is numeric

Classification target is categorical

Unsupervised learning

Clustering

Dimensionalityreduction

Page 14: Machine learning 101 sit hvr

14

Modeling: so many algorithms…

Page 15: Machine learning 101 sit hvr

15

ML Algorithms: by RepresentationCollection of candidate models/programs, aka hypothesis space

Decision trees

Instance-based

Neural networks

Model ensembles

Page 16: Machine learning 101 sit hvr

ML Algorithms: by Evaluation

Evaluation: Quality measure for a model

16

Regression

Example metric: Root Mean Squared Error

RMSE =

Binary classification: confusion matrix

Accuracy: 8 + 971 -> 97,9%

Example: medical test for a disease

Accuracy: Better evaluation metrics:• Precision: 8 / (8 + 19)• Recall: 8 / (8 + 2)

Page 17: Machine learning 101 sit hvr

17

Optimization: how the algorithm ‘learns’, depends on representation and evaluation

ML Algorithms: by Optimization

Greedy Search, ex. of combinatorial optimization

Gradient Descent (or in general: Convex Optimization)

Linear Programming (or in general:Constrained/Nonlinear Optimization)

Page 18: Machine learning 101 sit hvr

18

Training error vs test error

Page 19: Machine learning 101 sit hvr

19

Data Science for Business

• Focuses more on general principles than specific algorithms

• Not math-heavy, does contain some math

• O’Reilly link: http://shop.oreilly.com/product/0636920028918.do

• Book website: http://data-science-for-biz.com/DSB/Home.html

Page 20: Machine learning 101 sit hvr

20

Take-aways

• Goal of ML: generalize from training data (not optimization!!)

• Part of ‘Data Mining Process’, not a goal in and of itself

• No magic! Just some clever algorithms…

• Increasingly important non-technical aspects:• Ethics

• Algorithmic transparency

Page 21: Machine learning 101 sit hvr

Thank [email protected]@SOAPEOPLE

Fred VerheulBig Data Consultant+31 6 3919 [email protected]@fredverheul