automatic feature engineering the manual approach amsterdam du... · automatic feature engineering...

50
Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan

Upload: others

Post on 26-Sep-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Automatic Feature Engineering The manual approach

Pierre Gutierrez Leo Dreyfus-Schmidt

Du Phan

Page 2: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

CONTENTS

A general-purpose human-powered feature generation pipeline.

INITIAL SOLUTION

Dealing with vertical business problems.

Accelerating the feature engineering process.

MOTIVATION

Page 3: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Leveraging relational nature between tables to aggregate features.

DEEP FEATURE SYNTHESIS

What about Deep Learning ?

Where does the solution fit in a general data science workflow ?

CONCLUSION

Page 4: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

CONTENTS

A general-purpose human-powered feature generation pipeline.

INITIAL SOLUTION

Dealing with vertical business problems.

Accelerating the feature engineering process.

MOTIVATION

Page 5: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose
Page 6: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose
Page 7: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

How can we provide a general solution for these problems ?

Page 8: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

There is always a structure in our data waiting to be exploited

Page 9: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Feature Engineering process

(sample size: 1)

37 % Meh.Fun !

Page 10: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

How can we accelerate the boring parts ?

Page 11: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

CONTENTS

A general-purpose human-powered feature generation pipeline.

INITIAL SOLUTION

Dealing with vertical business problems.

Accelerating the feature engineering process.

MOTIVATION

Page 12: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

CONTENTS

A general-purpose human-powered feature generation pipeline.

INITIAL SOLUTION

Dealing with vertical business problems.

Accelerating the feature engineering process.

MOTIVATION

Page 13: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

OBJECTIVELeveraging human knowledge for automatic

feature engineering

Build a general-purpose feature generation pipeline

Create expressive features based on user's data model

Versatility, Modularity and Interpretability

Page 14: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Most problems can be aggregated with some primary keys

Page 15: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

user_id user_id + event_timestamp

user_id + product_id

Page 16: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Most features belong to a “general” feature family

Page 17: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Frequency: how often does the client do a specific action ?

Page 18: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Recency: when was the last time that he did this action ?

Page 19: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Monetary: what is his spending habit ?

Page 20: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Distribution: what type of clients is he ?

Page 21: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose
Page 22: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

fittransform

Page 23: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Feature: frequency of the buying event in the last 6 months Time window: last 6 months

Primary key: user_id Filter: event_type is buy_order

Page 24: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

DROP TABLE IF EXISTS frequency_feature_last_6_month_group_1; CREATE TABLE frequency_feature_last_6_month_group_1 AS( SELECT *, group_1_frequency_last_6_month/6 as mean_frequency_group_1_per_month_last_6_month FROM ( SELECT user_id, COUNT(event_timestamp) as group_1_frequency_last_6_month FROM ( SELECT * FROM “events_complete" WHERE event_timestamp::timestamp >= (ref_date - INTERVAL '6 month') AND event_timestamp::timestamp <= ref_date AND event_type IN (‘buy_order’) ) as table_layer_2 GROUP BY user_id ) as table_layer_3

Page 25: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Leveraging relational nature between tables to aggregate features.

DEEP FEATURE SYNTHESIS

What about Deep Learning ?

Where does the solution fit in a general data science workflow ?

CONCLUSION

Page 26: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Max Kanter Kalyan Veeramachaneni

Page 27: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose
Page 28: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Features are often derived using relationships in the dataset

Page 29: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose
Page 30: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose
Page 31: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose
Page 32: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Across datasets, many features are derived using similar mathematical operations

Page 33: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose
Page 34: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose
Page 35: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

New features are composed using previously derived features

Page 36: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Customers CustomerID

Age Churned

Orders CustomerID

OrderID Date

OrderProduct OrderID

ProductID Product.Price

Page 37: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Step 1: SUM(Product.Price) GROUP BY OrderID

Page 38: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Step 2: AVG(Orders.SUM(Product.Price)) GROUP BY CustomerID

-> average expense per order per customer

Page 39: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose
Page 40: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose
Page 41: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Limit ?

Page 42: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Brute-force nature -> Feature selection needs to be considered

Page 43: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Leveraging relational nature between tables to aggregate features.

DEEP FEATURE SYNTHESIS

What about Deep Learning ?

Where does the solution fit in a general data science workflow ?

CONCLUSION

Page 44: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Alex Net (2012)

Page 45: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Feature Engineering vs Representation Learning: the chess game metaphor

Page 46: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Feature Engineering: same game, different forms

Page 47: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Representation Learning: different game

Page 48: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Interpretability: do you need it ?

Page 49: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Where does this method fit in the data science workflow ?

Page 50: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose

Thank you for your attention! Question time