anomaly detection with bigml

9
1 David Gerster VP Data Science [email protected]

Upload: david-gerster

Post on 14-Aug-2015

283 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Anomaly Detection with BigML

David GersterVP Data Science

[email protected]

Page 2: Anomaly Detection with BigML

2

The Easy Part: Predictive Modeling

• Train a predictive model using 699 biopsies• The “label” of benign or malignant is known for each one• Since we have labels, this is supervised learning

Page 3: Anomaly Detection with BigML

3

What if we don’t have labels?

• Can we get insight into our data if we don’t know the labels?• Enter anomaly detection• Since we don’t have labels, this is unsupervised learning

Page 4: Anomaly Detection with BigML

10 lines are neededto isolate this data point(not anomalous)

Page 5: Anomaly Detection with BigML

Only 4 lines are neededto isolate this data point(highly anomalous)

Page 6: Anomaly Detection with BigML

6

The Other Easy Part: Anomaly Detection• Remove the labels of benign or malignant• Train an anomaly detector on this unlabeled data• Create a new dataset with the anomaly scores as “labels”• Use these “labels” to train a predictive model!

Page 7: Anomaly Detection with BigML

Who Needs Labels?

Page 8: Anomaly Detection with BigML

9

Minority Report

• Anomaly detection also works great on large unlabeled datasets, especially if you expect to find an (adversarial) minority class• Millions of credit card transactions, billions of network events …

• Doesn’t require you to know what you’re looking for!

Page 9: Anomaly Detection with BigML

10

Thanks!

David GersterVP Data Science, BigML

[email protected]