web ui, algorithms, and feature engineering
TRANSCRIPT
![Page 1: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/1.jpg)
Algorithms
Poul Petersen @pejpgrep CIO, BigML, Inc @bigmlcom
UI Algorithms & Feature Engineering with Flatline
![Page 2: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/2.jpg)
BigML, Inc 2ML Crash Course - UI/Algorithms/Feature Engineering
BigML Algorithm History
2011
Prototyping and Beta
API-first Approach
2013
Evaluations, Batch Predictions,
Ensembles, Sunburst
2015
Association Discovery,
Correlations, Samples, Statistical
Tests
2014
Anomaly Detection, Clusters, Flatline
2016
Scripts, Libraries, Executions,
WhizzML, Logistic Regression
2012
Core ML workflow: source, dataset,
model, prediction
![Page 3: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/3.jpg)
BigML, Inc 3ML Crash Course - UI/Algorithms/Feature Engineering
The need for Machine Learning• Can you find any pattern in this tiny data set?
Talk Text Purchases Data Age Churn?
148 72 0 33.6 50 TRUE
85 66 0 26.6 31 FALSE
183 64 0 23.3 32 TRUE
89 66 94 28.1 21 FALSE
115 0 0 35.3 29 FALSE
166 72 175 25.8 51 TRUE
100 0 0 30 32 TRUE
118 84 230 45.8 31 TRUE
171 110 240 45.4 54 TRUE
159 64 0 27.4 40 FALSE
…. but this is a simple example
![Page 4: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/4.jpg)
BigML, Inc 4ML Crash Course - UI/Algorithms/Feature Engineering
Data Types
numeric
1 2 3
1, 2.0, 3, -5.4 categoricaltrue, yes, red, mammal categoricalcategorical
A B C
DATE-TIME2013-09-25 10:02
DATE-TIME
YEAR
MONTH
DAY-OF-MONTH
YYYY-MM-DD
DAY-OF-WEEK
HOUR
MINUTE
YYYY-MM-DD
YYYY-MM-DD
M-T-W-T-F-S-D
HH:MM:SS
HH:MM:SS
2013
September
25
Wednesday
10
02
text / itemsBe not afraid of greatness: some are born great, some achieve greatness, and some have greatness thrust upon 'em.
text
“great”“afraid”“born”“some”
appears 2 timesappears 1 timeappears 1 timeappears 2 times
![Page 5: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/5.jpg)
BigML, Inc 5ML Crash Course - UI/Algorithms/Feature Engineering
Text Analysis
Be not afraid of greatness: some are born great, some achieve greatness, and some have greatnessthrust upon 'em.
great: appears 4 times
Bag of Words
![Page 6: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/6.jpg)
BigML, Inc 6ML Crash Course - UI/Algorithms/Feature Engineering
Text Analysis
… great afraid born achieve … …
… 4 1 1 1 … …
… … … … … … …
Be not afraid of greatness: some are born great, some achieve greatness, and some have greatnessthrust upon ‘em.
Model
The token “great” occurs more than 3 times
The token “afraid” occurs no more than once
![Page 7: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/7.jpg)
BigML, Inc 7ML Crash Course - UI/Algorithms/Feature Engineering
DATASET
Evaluation
TRAIN SET
TEST SET
PREDICTIONS
METRICS
![Page 8: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/8.jpg)
BigML, Inc 8ML Crash Course - UI/Algorithms/Feature Engineering
EnsemblesDiameter Color Shape Fruit
4 red round plum
5 red round apple
5 red round apple
6 red round plum
7 red round appleBagging!
Random Decision Forest!
All Data: “plum”
Sample 2: “apple”
Sample 3: “apple”
Sample 1: “plum”}“apple”
What is a round, red 6cm fruit?
![Page 9: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/9.jpg)
BigML, Inc 9ML Crash Course - UI/Algorithms/Feature Engineering
Logistic Regression
![Page 10: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/10.jpg)
BigML, Inc 10ML Crash Course - UI/Algorithms/Feature Engineering
Logistic Regression
????
![Page 11: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/11.jpg)
BigML, Inc 11ML Crash Course - UI/Algorithms/Feature Engineering
Logistic Regression
P≈0 P≈10<P<1• x→-∞ : P(x)→0
• x→∞ : P(x)→1
![Page 12: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/12.jpg)
BigML, Inc 12ML Crash Course - UI/Algorithms/Feature Engineering
Supervised Learning
animal state … proximity actiontiger hungry … close run
elephant happy … far take picture
Classification
animal state … proximity min_kmhtiger hungry … close 70
hippo angry … far 10
Regression
label
animal state … proximity action1 action2tiger hungry … close run look untasty
elephant happy … far take picture call friends
Multi-Label Classification
![Page 13: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/13.jpg)
BigML, Inc 13ML Crash Course - UI/Algorithms/Feature Engineering
Unsupervised Learning
date customer account auth class zip amountMon Bob 3421 pin clothes 46140 135Tue Bob 3421 sign food 46140 401Tue Alice 2456 pin food 12222 234Wed Sally 6788 pin gas 26339 94Wed Bob 3421 pin tech 21350 2459Wed Bob 3421 pin gas 46140 83The Sally 6788 sign food 26339 51
date customer account auth class zip amountMon Bob 3421 pin clothes 46140 135Tue Bob 3421 sign food 46140 401Tue Alice 2456 pin food 12222 234Wed Sally 6788 pin gas 26339 94Wed Bob 3421 pin tech 21350 2459Wed Bob 3421 pin gas 46140 83The Sally 6788 sign food 26339 51
Clustering
Anomaly Detection
similar
unusual
![Page 14: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/14.jpg)
BigML, Inc 14ML Crash Course - UI/Algorithms/Feature Engineering
K-Means
K=3
![Page 15: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/15.jpg)
BigML, Inc 15ML Crash Course - UI/Algorithms/Feature Engineering
K-Means
K=3
![Page 16: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/16.jpg)
BigML, Inc 16ML Crash Course - UI/Algorithms/Feature Engineering
G-Means
![Page 17: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/17.jpg)
BigML, Inc 17ML Crash Course - UI/Algorithms/Feature Engineering
G-Means
![Page 18: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/18.jpg)
BigML, Inc 18ML Crash Course - UI/Algorithms/Feature Engineering
G-MeansLet K=2Keep 1, Split 1 New K=3
![Page 19: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/19.jpg)
BigML, Inc 19ML Crash Course - UI/Algorithms/Feature Engineering
G-MeansLet K=3Keep 1, Split 2New K=5
![Page 20: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/20.jpg)
BigML, Inc 20ML Crash Course - UI/Algorithms/Feature Engineering
G-MeansLet K=5K=5
![Page 21: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/21.jpg)
BigML, Inc 21ML Crash Course - UI/Algorithms/Feature Engineering
Isolation Forest
Grow a random decision tree until each instance is in its own leaf
“easy” to isolate
“hard” to isolate
Depth
Now repeat the process several times and use average Depth to compute anomaly score: 0 (similar) -> 1 (dissimilar)
![Page 22: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/22.jpg)
BigML, Inc 22ML Crash Course - UI/Algorithms/Feature Engineering
Model Competence
MODEL
ANOMALY DETECTOR
Prediction T T
Confidence 86% 84%
AnomalyScore 0.5367 0.7124
Competent? Y N
At Training Time At Prediction Time
DATASET
![Page 23: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/23.jpg)
BigML, Inc 23ML Crash Course - UI/Algorithms/Feature Engineering
Association Rules
date customer account auth class zip amountMon Bob 3421 pin clothes 46140 135Tue Bob 3421 sign food 46140 401Tue Alice 2456 pin food 12222 234Wed Sally 6788 pin gas 26339 94Wed Bob 3421 pin tech 21350 2459Wed Bob 3421 pin gas 46140 83The Sally 6788 sign food 26339 51
{class = gas} amount < 100{customer = Bob, account = 3421} zip = 46140
Rules:
Antecedent Consequent
![Page 24: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/24.jpg)
BigML, Inc 24ML Crash Course - UI/Algorithms/Feature Engineering
Association Metrics
Instances
AC
Coverage
Percentage of instances which match antecedent “A”
![Page 25: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/25.jpg)
BigML, Inc 25ML Crash Course - UI/Algorithms/Feature Engineering
Association Metrics
Instances
AC
Support
Percentage of instances which match antecedent “A” and Consequent “C”
![Page 26: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/26.jpg)
BigML, Inc 26ML Crash Course - UI/Algorithms/Feature Engineering
Association Metrics
Coverage
Support
Instances
AC
Confidence
Percentage of instances in the antecedent which also contain the consequent.
![Page 27: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/27.jpg)
BigML, Inc 27ML Crash Course - UI/Algorithms/Feature Engineering
Association Metrics
CInstances
A C
A
Instances
C
Instances
A
Instances
AC
0% 100%
Instances
AC
Confidence
A never implies C
A sometimes implies C
A always implies C
![Page 28: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/28.jpg)
BigML, Inc 28ML Crash Course - UI/Algorithms/Feature Engineering
Association Metrics
Independent
AC
C
Observed
A
Lift
Ratio of observed support to support if A and C were statistically independent.
Support == Confidence p(A) * p(C) p(C)
![Page 29: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/29.jpg)
BigML, Inc 29ML Crash Course - UI/Algorithms/Feature Engineering
Association Metrics
C
Observed
A
Observed
AC
< 1 > 1
Independent
A C
Lift = 1
Negative Correlation No Association Positive
Correlation
Independent
A C
Independent
A C
Observed
A C
![Page 30: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/30.jpg)
BigML, Inc 30ML Crash Course - UI/Algorithms/Feature Engineering
Association Metrics
Independent
AC
C
Observed
A
Leverage
Difference of observed support and support if A and C were statistically independent.
Support - [ p(A) * p(C) ]
![Page 31: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/31.jpg)
BigML, Inc 31ML Crash Course - UI/Algorithms/Feature Engineering
Association Metrics
C
Observed
A
Observed
AC
< 0 > 0
Independent
A C
Leverage = 0
NegativeCorrelation No Association Positive
Correlation
Independent
A C
Independent
A C
Observed
A C
-1…
![Page 32: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/32.jpg)
BigML, Inc 32ML Crash Course - UI/Algorithms/Feature Engineering
Machine Learning Secret
“…the largest improvements in accuracy often came from quick experiments, feature engineering, and model tuning rather than applying fundamentally different algorithms.”
Facebook FBLearner 2016
Feature Engineering: applying domain knowledge of the data to create features that make machine
learning algorithms work better or at all.
![Page 33: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/33.jpg)
BigML, Inc 33ML Crash Course - UI/Algorithms/Feature Engineering
Feature Engineering
2013-09-25 10:02
DATE-TIME
Automatic Date Transformation
… year month day hour minute …
… 2013 Sep 25 10 2 …
… … … … … … …
NUM NUMCAT NUM NUM
![Page 34: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/34.jpg)
BigML, Inc 34ML Crash Course - UI/Algorithms/Feature Engineering
Feature EngineeringAutomatic Categorical Transformation
… alchemy_category …… business …… recreation …… health …… … …
CAT
business health recreation …… 1 0 0 …… 0 0 1 …… 0 1 0 …… … … … …
NUM NUM NUM
![Page 35: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/35.jpg)
BigML, Inc 35ML Crash Course - UI/Algorithms/Feature Engineering
Feature Engineering
Be not afraid of greatness: some are born great, some achieve greatness, and some have greatnessthrust upon ‘em.
TEXT
Automatic Text Transformation
… great afraid born achieve …
… 4 1 1 1 …
… … … … … …
NUM NUM NUM NUM
![Page 36: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/36.jpg)
BigML, Inc 36ML Crash Course - UI/Algorithms/Feature Engineering
Feature Engineering
{ “url":"cbsnews", "title":"Breaking News Headlines Business Entertainment World News “, "body":" news covering all the latest breaking national and world news headlines, including politics, sports, entertainment, business and more.”}
TEXT
Better representation
title body
Breaking News… news covering…
… …
TEXT TEXT
![Page 37: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/37.jpg)
BigML, Inc 37ML Crash Course - UI/Algorithms/Feature Engineering
Feature EngineeringDiscretization
Total Spend
7,342.99
304.12
4.56
345.87
8,546.32
NUM
“Predict will spend $3,521 with error
$1,232”
Spend Category
Top 33%
Middle 33%
Bottom 33%
Middle 33%
Top 33%
CAT
“Predict customer will be Top 33% in
spending”
![Page 38: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/38.jpg)
BigML, Inc 38ML Crash Course - UI/Algorithms/Feature Engineering
Feature EngineeringCombinations of Multiple Features
Kg M2
101.4 3.24
85.2 2.8
56.2 2.9
136.1 3.6
95.9 4.1
NUM NUM
BMI
31.17
30.4
19.38
37.8
23.39
NUM
Kg M2
![Page 39: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/39.jpg)
BigML, Inc 39ML Crash Course - UI/Algorithms/Feature Engineering
Feature EngineeringFlatline
• BigML’s Domain-Specific Language (DSL) for Transforming Datasets
• Limited programming language structures
• let, cond, if, maps, list operators, */+-
• Dataset Fields are first-class citizens
• (field “diabetes pedigree”)
• Built-in transformations
• statistics, strings, timestamps, windows
![Page 40: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/40.jpg)
BigML, Inc 40ML Crash Course - UI/Algorithms/Feature Engineering
Feature Engineering
(/ (- ( f "price") (avg-window "price" -4, -1)) (standard-deviation "price"))
date volume price1 34353 3142 44455 3153 22333 3154 52322 3215 28000 3206 31254 3197 56544 3238 44331 3249 81111 287
10 65422 29411 59999 30012 45556 30213 19899 30114 21453 302
day-4 day-3 day-2 day-1 4davg 0
314 314314 315 314.5
314 315 315 314.6314 315 315 321 316.25315 315 321 320 317.75315 321 320 319 318.75
Current - (4-day avg) std dev
Shock: Deviations from a Trend
![Page 41: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/41.jpg)
BigML, Inc 41ML Crash Course - UI/Algorithms/Feature Engineering
Feature Engineering
(/ (- (f "price") (avg-window "price" -4, -1)) (standard-deviation "price"))
Current - (4-day avg) std dev
Shock: Deviations from a Trend
Current : (field “price”) 4-day avg: (avg-window “price” -4 -1) std dev: (standard-deviation “price”)
![Page 42: Web UI, Algorithms, and Feature Engineering](https://reader033.vdocuments.net/reader033/viewer/2022042907/5899eb3f1a28ab96418b6523/html5/thumbnails/42.jpg)
BigML, Inc 42ML Crash Course - UI/Algorithms/Feature Engineering
Feature EngineeringFix Missing Values in a “Meaningful” Way
Filter Zeros
Model insulin
Predict insulin
Select insulin
FixedDataset
AmendedDataset
OriginalDataset
CleanDataset