anomaly detection with bigml
TRANSCRIPT
David GersterVP Data Science
2
The Easy Part: Predictive Modeling
• Train a predictive model using 699 biopsies• The “label” of benign or malignant is known for each one• Since we have labels, this is supervised learning
3
What if we don’t have labels?
• Can we get insight into our data if we don’t know the labels?• Enter anomaly detection• Since we don’t have labels, this is unsupervised learning
10 lines are neededto isolate this data point(not anomalous)
Only 4 lines are neededto isolate this data point(highly anomalous)
6
The Other Easy Part: Anomaly Detection• Remove the labels of benign or malignant• Train an anomaly detector on this unlabeled data• Create a new dataset with the anomaly scores as “labels”• Use these “labels” to train a predictive model!
Who Needs Labels?
9
Minority Report
• Anomaly detection also works great on large unlabeled datasets, especially if you expect to find an (adversarial) minority class• Millions of credit card transactions, billions of network events …
• Doesn’t require you to know what you’re looking for!