practical approach to machine learning techniques for classification and anomaly detection

52
1 BDIGITAL: After Work Knowledge Program Practical approach to machine learning techniques for classification and anomaly detection Xavier Rafael-Palou [email protected] (12/12/2014)

Upload: xavier-rafael-palou

Post on 11-Jul-2015

335 views

Category:

Technology


6 download

TRANSCRIPT

Page 1: Practical approach to machine learning techniques for classification and anomaly detection

1

BDIGITAL: After Work Knowledge Program

Practical approach to machine learning techniques for

classification and anomaly detection

Xavier Rafael-Palou

[email protected]

(12/12/2014)

Page 2: Practical approach to machine learning techniques for classification and anomaly detection

2

New Hype surrounding AI

Page 3: Practical approach to machine learning techniques for classification and anomaly detection

3

Even… Turing test!!

Page 4: Practical approach to machine learning techniques for classification and anomaly detection

4

(Classic test)

Natural Language Processing - communication

Knowledge representation - knowledge storage (KS)

Automated reasoning - use KS to answer questions

Machine Learning - detect patterns, adapt (total Turing Test)

(Advanced Turing Test)

Computer vision - perceive objects

Robotics - manipulate objects + move around

Blade Runner (Ridley Scott, 1982): Deckard and the Voight-Kampff machine in 2019.

Inspired on Philip K. Dick's book "Do Android's Dream of Electric Sheep” (1968)

(*) Source:

“Artificial

Intelligence, a

modern approach“

by Stuart Russel &

Peter Norvig.

Page 5: Practical approach to machine learning techniques for classification and anomaly detection

5

Agenda

1. Introduction (15 min)

2. Basic Techniques (45 min)

3. Guides & Tips Building a Classifier (15)

4. Practice:

- Environment(15 min)

- Examples & exercises (60 min)

6. References

Page 6: Practical approach to machine learning techniques for classification and anomaly detection

6

Introduction

Classification, Anomaly detection but also clustering, regression are examples of

Machine Learning (ML) tasks.

ML is a subfield of Artificial Intelligence to :

- Give computers the ability to learn without being explicitly programmed. (Arthur

Samuel, 1959)

- Give computer program ability to learn from experience E with respect to some task

T and some performance measure P, if its performance on T, as measured by P,

improves with experience E. (Tom Mitchell, 1998)

Data mining (DM) overlaps in many ways with Machine Learning:

- DM uses many ML methods, but often with a slightly different goal of discovering

previously unknown knowledge.

-While ML aims to perform accurately on new, unseen examples/tasks after having

experienced a learning data set.

Page 7: Practical approach to machine learning techniques for classification and anomaly detection

7

Main ML tasks:

Supervised learning. The goal is to learn a general rule given a set of examples

that maps inputs to outputs.

Others:

Unsupervised learning, no labels are given to discovering patterns in data.

Reinforcement learning, interaction with a dynamic environment in which it must

perform a certain goal without a teacher.

Semi-supervised learning, the teacher gives an incomplete training set with some of the

target outputs missing.

Page 8: Practical approach to machine learning techniques for classification and anomaly detection

8

Examples:

Email: Spam / Not Spam?

Online Transactions: Fraudulent (Yes / No)?

Tumor: Malignant / Benign ?

0: “Negative Class” (e.g., benign tumor)

1: “Positive Class” (e.g., malignant tumor)

Classification

Variable to predict:

Page 9: Practical approach to machine learning techniques for classification and anomaly detection

9

Tumor SizeTumor Size

(Yes) 1

(No) 0

Binary Classification (y = 0 or 1)

Anomaly?

Decision boundary

Classification

Malignant ?

Page 10: Practical approach to machine learning techniques for classification and anomaly detection

10

Classification Complexities

Page 11: Practical approach to machine learning techniques for classification and anomaly detection

11

x1

x2

x1

x2

Binary classification: Multi-class classification:

Multiclass classification

Email foldering/tagging: Work, Friends, Family, Hobby

Medical diagrams: Not ill, Cold, Flu

Weather: Sunny, Cloudy, Rain, Snow

Page 12: Practical approach to machine learning techniques for classification and anomaly detection

12

x1

x2

One-vs-all (one-vs-rest)

Class 1:

Class 2:

Class 3:

x1

x2

x1

x2

x1

x2

On a new input � output the class that maximizes

Principle: Divide & conquer

Page 13: Practical approach to machine learning techniques for classification and anomaly detection

13

Agenda

1. Introduction

2. Basic Techniques

3. Guides & Tips Building a Classifier

4. Practice:

- Environment

- Examples & exercises

6. References

Page 14: Practical approach to machine learning techniques for classification and anomaly detection

14

There are multiple classification techniques:

- Probabilistic

- Decision Tree

- Linear

- Instance-based

- Genetic algorithms

- Fuzzy logic

- …

Each of them learns a decision function in a different way:

Basic Classification Methods

Page 15: Practical approach to machine learning techniques for classification and anomaly detection

15

Probabilistic classifiers

Example: “Automatic fruit classification”

- Random variable (y) says if fruit is M or A

- Looking at the conveyor belt during some time, we get probs of M, A (“a priori”

knowledge of the harvest) P(y=M), P(y=A) both sum up 1

- Classifier: M if p(y=M) >= p(y=A) else A enough?

CompacInVision 9000

Page 16: Practical approach to machine learning techniques for classification and anomaly detection

16

- We add new random variable x to the system for a better performance

x = size degree of the fruit [1,2,3…]

- So, we get probs of p(x) too

- Since x depends on the type of fruit, we get densities of x depending on the type of

fruit:

p(x| y=A) , p(x | y=M) “conditional probability densities”

How size affects our attitude regarding the type of fruit in question?

- p(y=A | x) = (p(x| y=A) P(y=A)) / p(x)

- P(y=M | x) = (p(x| y=M) P(y=M)) /p(x)

Naive Bayes: A if p(y=A | x) >= p(y=M | x) else M (probs “a posteriori”)

Probabilistic classifiers

Page 17: Practical approach to machine learning techniques for classification and anomaly detection

17

Pros:

- Simple to implement

- Fast to compute (e.g. fits in map & reduce paradigm)

- works surprisingly well

- Compatible with missing data

- Used in text mining � Multinomial Naive Bayes

Cons:

- Unrealistic hypothesis: All features equally important and independent of another

given a class

- Dependencies among features (i.e. recall all have same power)

- Zero probs holds a veto over other ones

- Requires process all data

Probabilistic classifiers

Page 18: Practical approach to machine learning techniques for classification and anomaly detection

18

- Widely used because of the ease of understanding of the knowledge proposed

- Set of conditions (nodes) organized hierarchically

- Prediction: Apply a new unseen instance from root to leaves of the tree

Decision Tree Learning

Page 19: Practical approach to machine learning techniques for classification and anomaly detection

19

Tid Refund Marital

Status

Taxable

Income Cheat

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

10 No Single 90K Yes 10

Training

Greedy strategy. Split records based on an attribute test that optimizes certain criterion.

The tree is built recursively adding conditions until the leaves containing the same kind

elements

- Partitioning strategy: best attribute, best condition� NP problem

- Determine when to stop

Don’t

Cheat

Refund

Don’t

Cheat

Don’t

Cheat

Yes No

Refund

Don’t

Cheat

Yes No

Marital

Status

Don’t

Cheat

Cheat

Single,

DivorcedMarried

Taxable

Income

Don’t

Cheat

< 80K >= 80K

Refund

Don’t

Cheat

Yes No

Marital

Status

Don’t

CheatCheat

Single,

DivorcedMarried

Example:

Decision Tree Learning

Page 20: Practical approach to machine learning techniques for classification and anomaly detection

20

Partitioning strategy : Preferred aattribute's that generate disjoint sets (homogeneity)

Strategy examples :

∑−=

j

tjptGINI2)]|([1)(

Non-homogeneous,

High degree of impurity

Homogeneous,

Low degree of impurity

p( j | t) is the relative frequency of class j at node t

)|(max1)( tjPtErrorj

−=

Decision Tree Learning

Measures homogeneity of a node. Used in CART, SLIQ, SPRINT

−= ∑

=

k

i

i

splitiEntropy

n

npEntropyGAIN

1

)()(

Measures misclassification error made by a node

Choose split that achieves most homogeneity reduction (e.g. ID3, C4.5)

Page 21: Practical approach to machine learning techniques for classification and anomaly detection

21

Based on the principle that the instances within a dataset will generally exist in

close proximity to other instances that have similar properties.

kNN (Cover and Hart, 1967) locates the k nearest instances to the query instance and

determines its class by identifying the single most frequent class label.

Instances can be considered as points

within an n-dimensional instance space

where each of the n-dimensions corresponds

to one of the n-features.

A distance metric must minimize the distance

between two similarly classified instances,

while maximizing the distance between instances

of different classes

Instance-Based Learning

Page 22: Practical approach to machine learning techniques for classification and anomaly detection

22

Distance metrics: Euclidean distance (*), Mahalanobis, Manhattam,…

To determine the class given the neighbour list, we can use e.g. majority voting or

weights according to distance (1/d2)

Instance-Based Learning

Page 23: Practical approach to machine learning techniques for classification and anomaly detection

23

Pros:

- Less computational cost during training (Lazy learning)

Cons:

- Slow classification

- Requires store large amounts of information

- Sensitive to the choice of the similarity method

- Unclear selection criteria K

Instance-Based Learning

Page 24: Practical approach to machine learning techniques for classification and anomaly detection

24

x1

Decision Boundary

1 2 3

1

2

3

Predicts “ “ when…

The idea is to get a function h (x) (parameters and attributes) to partition

data into desired output classes

x2

Probabilistic Statistical Classification

Principal objective is to find h(x) :

Page 25: Practical approach to machine learning techniques for classification and anomaly detection

25

Then we predict “ “ if

predict “ “ if

z

1

Expected values for h(x) are :

We need to transform h(x) to accommodate it to this behavior (Sigmoid function)

Logistic Regression :

Replace z for:

Probabilistic Statistical Classification

Page 26: Practical approach to machine learning techniques for classification and anomaly detection

26

How to choose parameters ? Those that minimize error (cost)

If y = 1

10

Cost function

The more our hypothesis is off from y, the

larger the cost function output. If our

hypothesis is equal to y, then our cost is 0

Logistic Regression

Gradient descendent � Method to find local minimum cost

Page 27: Practical approach to machine learning techniques for classification and anomaly detection

27

Logistic vs SVM vs Neural Networks

N (features) is large � Preferred using a logistic regression, or SVM without a kernel

(the "linear kernel")

N is small and M (instances) is intermediate � Preferred using a SVM with a Gaussian

Kernel

N is small and M is large� manually create/add more features , then use logistic

regression or SVM without a kernel.

Neural networks is likely to work well for any of these situations, but may be slower to

train.

Comparative Classification Methods

Page 28: Practical approach to machine learning techniques for classification and anomaly detection

28

Comparative Classification Methods

Page 29: Practical approach to machine learning techniques for classification and anomaly detection

29

Supervised Machine Learning: A Review of Classification Techniques

S. B. Kotsiantis. Informatica 31 (2007) 249–268

Comparative Classification Methods

Page 30: Practical approach to machine learning techniques for classification and anomaly detection

30

Anomaly Detection

Anomalous behavior's Classification

• Fraud detection

• Manufacturing (e.g. aircraft

engines)

• Monitoring machines in a data

center

• Email spam classification

• Weather prediction

(sunny/rainy/etc).

• Cancer classification

Page 31: Practical approach to machine learning techniques for classification and anomaly detection

31

Anomaly detection vs Classification

Very small number of positive

examples (y=1). (0-20 is common).

Large number of negative (y=0)

examples.

Many different “types” of anomalies.

Hard for any algorithm to learn from

positive examples what the anomalies

look like; future anomalies may look

nothing like any of the anomalous

examples we’ve seen so far.

Large number of positive and

negative examples.

Enough positive examples for

algorithm to get a sense of what

positive examples are like, future

positive examples likely to be similar

to ones in training set.

Page 32: Practical approach to machine learning techniques for classification and anomaly detection

32

Given a new example � we want to know whether is abnormal/anomalous.

We define a "model" p(x) that says the probability the example is not anomalous.

We use a threshold ϵ (epsilon) as a dividing line so we can say which examples are

anomalous and which are not.

If our anomaly detector is flagging too many anomalous examples, then we need to

decrease our threshold ϵ

Anomaly Detection Methods

Page 33: Practical approach to machine learning techniques for classification and anomaly detection

33

The Gaussian Distribution is a familiar bell-shaped curve that can be described by a

function N(μ,σ2)

Mu, or μ, describes the centre of the curve, called the mean. The width of the curve is

described by sigma, or σ, called the standard deviation.

Parameter μ is the average of all the examples:

We can estimate σ2, with our familiar squared error formula:

Gaussian Distribution Method

Page 34: Practical approach to machine learning techniques for classification and anomaly detection

34

Given a training set of examples, {x(1),…,x(m)} where each example is a vector, x∈Rn.

An "independent assumption" on the values of the features inside training example x.

More compactly, the above expression can be written as follows:

Anomaly if p(x)<ϵ

Gaussian Distribution Method

Page 35: Practical approach to machine learning techniques for classification and anomaly detection

35

Fit model on training set

On a cross validation/test, predict x as:

Possible evaluation metrics:

- True positive, false positive, false negative, true negative

- Precision/Recall

- F1-score

Tricks:

- Choose features that might take on unusually large or small values in the event of

an anomaly

- Use cross validation set to choose sigma parameter

- Train only on normal data

- Test and validation: add anomalies (50% each)

Gaussian Distribution Method

Page 36: Practical approach to machine learning techniques for classification and anomaly detection

36

An extension of anomaly detection and may (or may not) catch more anomalies.

Instead of modelling p(x1),p(x2),… separately, we will model p(x) all in one go.

Parameters are : μ∈ Rn and Σ ∈ Rn×n

We can vary Σ for changes in shape, width, and orientation of the contours.

Changing μ will move the centre of the distribution.

Anomaly if p(x)<ϵ

Multivariate Gaussian Distribution

Page 37: Practical approach to machine learning techniques for classification and anomaly detection

37

One-class SVM

The multivariate Gaussian model can automatically capture correlations between

different features of x.

However, the original model is computationally cheaper (no matrix to invert) and it

performs well even with small training set size.

One-class SVM can be used for anomaly detection.

Could work better than multivariate when data does not follow a Gaussian distribution

Page 38: Practical approach to machine learning techniques for classification and anomaly detection

38

Agenda

1. Introduction

2. Basic Techniques

3. Guides & Tips Building a Classifier

4. Practice:

- Environment

- Examples & exercises

6. References

Page 39: Practical approach to machine learning techniques for classification and anomaly detection

39

If classification performance is not what we expected, What to work on?

- Get more training examples?

- Try smaller sets of features?

- Try getting additional features?

- Try changing model?

- Try decreasing regularization?

- Try increasing regularization?

Guides & Tips Building Classifiers

Page 40: Practical approach to machine learning techniques for classification and anomaly detection

40

The attributes petal width and petal length provide a moderate separation of the Irish species

Data exploration

Manually examine the examples (in cross validation set) that your algorithm made errors on.

See if you spot any systematic trend in what type of examples it is making errors on.

Arrange good features for your classifier:

- Discrimination ability: Values significantly different for objects of different classes

- Reliability: Similar values for objects same class

- Independence: Attributes should be uncorrelated. Instead, combine them:

E.g. diameter and weight: diameter3 / weight (scale invariant)

Page 41: Practical approach to machine learning techniques for classification and anomaly detection

41

Bias-Variance Trade-Off

- Balance between capacity generalize classifier performance

- Plot learning curves to decide if more data, more features, etc. are likely to help.

Page 42: Practical approach to machine learning techniques for classification and anomaly detection

42

Start with a simple algorithm that you can implement quickly.

Implement and test it on your cross-validation data.

Split data in 3 different sets: Training + Validation + Test

Accuracy, percentage of correct predictions (SPAM or no) by all predictions

Precision, percentage of e-mails classified as SPAMs which truly are

Recall, percentage of e-mails classified as SPAMs over the total of

examples that are SPAM

How to compare precision/recall numbers?

FNTP

TPTPRrecall

+

==

FPTP

TPprecision

+

=

Model Evaluation

F1 Score:

Page 43: Practical approach to machine learning techniques for classification and anomaly detection

43

Agenda

1. Introduction

2. Basic Techniques

3. Guides & Tips Building a Classifier

4. Practice:

- Environment

- Examples & exercises

6. References

Page 44: Practical approach to machine learning techniques for classification and anomaly detection

44

Practice: Environment

0) Python:

Language interpreted dynamically-typed nature

Download:

- Python already installed:

pip install ipython or only dependencies "ipython[notebook]“

- Otherwise:

Anaconda (http://continuum.io/downloads) is a completely free Python distribution

(including for commercial use and redistribution). It includes over 195 of the most

popular python packages for science, math, engineering, data analysis.

$ Conda info

$ conda install <packageName>

$ conda update <packageName>

Page 45: Practical approach to machine learning techniques for classification and anomaly detection

45

Practice: Environment

1) Ipython:

Ipython provides a rich architecture for interactive computing with:

- Powerful interactive shells (terminal and Qt-based).

- A browser-based notebook with support for code, text, mathematical expressions,

inline plots and other rich media.

- Support for interactive data visualization and use of GUI toolkits.

- Flexible, embeddable interpreters to load into your own projects.

- Easy to use, high performance tools for parallel computing.

Start console � Ipython –pylab

Page 46: Practical approach to machine learning techniques for classification and anomaly detection

46

Practice: Environment

2) Notebook

Web-based interactive computational environment where to combine code execution,

text, mathematics, plots and rich media into a single document

Start notebook server ���� ipython notebook

(http://127.0.0.1:8888)

Open an existing notebook � ipython notebook <name.ipynb>

The notebook consists of a sequence of cells.

A cell is a multi-line text input field, and its contents can be executed by commands or

clicking either “Play” button, or Cell | Run in the menu bar.

Commands:

Shift-Enter � Runs cell and goes to next

Ctrl-Enter � Runs cell & stays in same cell

Esc and Enter � Command mode and edit mode

Tab � auto-complete

Page 47: Practical approach to machine learning techniques for classification and anomaly detection

47

Practice: Environment

3) Numpy + scipy

Numpy offers a specific data structure for high-performance numerical computing:

the multidimensional array

- Data is stored in contiguous block of memoryin Ram. This makes more efficient

Use of cpu cycles and cache

- Array operations implemented internally with C loops rather than python.

Numpy has all standard array functions, linear algebra, and fancy indexing.

Numpy+scipy docs: http://docs.scipy.org

4) Matplotlib

Graphical library to plot and visualize your data

5) Scikit-Learn

Librería para machine learning

Page 48: Practical approach to machine learning techniques for classification and anomaly detection

48

Agenda

1. Introduction

2. Basic Techniques

3. Guides & Tips Building a Classifier

4. Practice:

- Environment

- Examples & exercises

6. References

Page 49: Practical approach to machine learning techniques for classification and anomaly detection

49

Practice: Exercises

- An introduction to machine learning with Python and scikit-learn (repo and overview)

by Hannes Schulz and Andreas Mueller.

- PyCon 2014 Scikit-learn Tutorial (Ipython and machine learning) by Jake VanderPlas

Page 50: Practical approach to machine learning techniques for classification and anomaly detection

50

Agenda

1. Introduction

2. Basic Techniques

3. Guides & Tips Building a Classifier

4. Practice:

- Environment

- Examples & exercises

6. References

Page 51: Practical approach to machine learning techniques for classification and anomaly detection

51

References

- Data mining. Practical Machine Learning Tools and Techniques. I. Frank, et al

- Introduction to Machine learning with Ipython. LxMLS 2014. A. Mueller

- Ipython and machine learning. PyCon ’14

- Introduction to Machine learning. Coursera 2014. A. Ng

- scikit-learn. http://scikit-learn.org (see especially the narrative documentation)

- Matplotlib. http://matplotlib.org (see especially the gallery section)

- Ipython. http://ipython.org (also check out http://nbviewer.ipython.org)

- Anaconda. https://store.continuum.io/cshop/anaconda/

- Notebook. http://ipython.org/ipython-doc/stable/notebook/index.html

Page 52: Practical approach to machine learning techniques for classification and anomaly detection

52

Thanks a lot!!

[email protected]