demystifying machine learning learning.pdfii. need to recalibrated and retrained on a regular basis...

28
Demystifying Machine Learning

Upload: others

Post on 18-Sep-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

1

Demystifying Machine Learning

Page 2: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

us.sogeti.com2

Topics• Definition• A Couple of Motivating Examples• Why the Hype?

What is Machine Learning?

• Techniques and Use-Cases• Common ML and AI Algorithms• AI vs. ML vs. Deep Learning

Types of Machine Learning

• Lifecycle• Analysis and Model Building

Machine Learning Process

• Challenges with Data• Poor Model Performance• Production Deployment and Maintenance

Common Challenges in Machine Learning

• A Few ApplicationsMachine Learning in the

Public Sector

Page 3: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

3

What is Machine Learning?

“Science of getting computers to act without explicit programming”

Page 4: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

4

Motivating Example – Predicting Home Prices

Living Area (sq. ft.) # of bedrooms Parking Space Finished Basement?

Other Parameters (zipcode, school

district, tax rate..)

House Price

2400 4 3 1 … $350,000

1400 2 1 0 … $190,000

1900 3 2 0 … $250,000

Living Area (sq. ft.) # of bedrooms Parking Space Finished Basement?

Other Parameters (zipcode, school

district, tax rate..)

House Price

2000 2 2 1 … ?

Page 5: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

5

Another Motivating Example – Self Driving Cars

Basic High Level Technique Behind Self Driving Cars

1) A human drives a car in varying traffic conditions2) While the car is being driven, a set of cameras:

a. Record the traffic conditionsb. And the corresponding action(s) taken by the driver

3) The data points (billions/trillions of telemetry, videos, and images!) collected is fed to computers

4) Machine Learning algorithms train on the data and the computers learn what action(s) to take under different traffic conditions

5) The computers are now given charge of driving the car!

Note that no explicit rules of what to do when are explicitly fed to the computers

Page 6: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

6

Machine Learning - Why the Hype Now?!

Perf

orm

ance

Amount of Data

Traditional MachineLearning Algorithms

Modern Machine Learning Algorithms

1. The term ‘Machine Learning’ was coined in 1959

2. However, it is only in the 7 to 8 years that is has caught on and been adopted widely in business

3. This is for two reasons:a. Explosion of data

generation all around us

b. Availability of compute power in terms of GPUs, and horizontally scalable platforms like Hadoop

Courtesy - Andrew Ng: Artificial Intelligence is the New Electricity

Page 7: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

us.sogeti.com7

Topics• Definition• A Couple of Motivating Examples• Why the Hype?

What is Machine Learning?

• Techniques and Use-Cases• Common ML and AI Algorithms• AI vs. ML vs. Deep Learning

Types of Machine Learning

• Lifecycle• Analysis and Model Building

Machine Learning Process

• Challenges with Data• Poor Model Performance• Production Deployment and Maintenance

Common Challenges in Machine Learning

• A Few ApplicationsMachine Learning in the

Public Sector

Page 8: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

8

Machine Learning – TypesMachine Learning

Predictive Prescriptive(Optimizations)

Supervised Un-Supervised

Regression Classification

Examples:1. House price

prediction2. Stock price

prediction3. Demand prediction

Examples:1. Image classification2. email spam

detection3. Tumor classification4. Fraud detection

Examples:1. Customer segmentation2. Document classification3. Fraud detection

Examples:1. Inventory optimization2. Truck route optimization3. Retail store assortment

optimization

Page 9: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

9

Machine Learning – Common AlgorithmsMachine Learning

Predictive Prescriptive(Optimizations)

Supervised Un-Supervised

Regression Classification

1. Linear Regression

2. SVM3. K-Nearest

Neighbors4. Decision Trees

1. Logistic Regression

2. Neural Networks3. K-Nearest

Neighbors4. Decision Trees

1. K-means Clustering2. Principal

Component Analysis (PCA)

1. Linear Programming2. Non-Linear Programming:3. Metaheuristics:

a. Genetic Algorithmsb. Simulated Annealing

Page 10: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

10

The Jargon: Artificial Intelligence, Machine Learning, Deep Learning??!!

Page 11: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

11

The Jargon: Venn Diagram Representation

AIMachine LearningDeep Learning

Page 12: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

12

Definitions: Artificial Intelligence vs. Machine Learning vs. Deep Learning

Artificial Intelligence: Techniques that enable computers to mimic human intelligence.

This can include things like making predictions, planning, understanding language, recognizing objects etc.

AI can be achieved using a variety of techniques such as if-then-rules, decision trees, Robotic Process Automation (RPA), machine learning etc.

Machine Learning: A subset of AI techniques based around the idea that we should really just be able to give machines

access to data and mimic human intelligence by letting them learn for themselves

Some techniques include Linear/Logistic Regression, SVM, Random Forests, Neural Networks etc.

Page 13: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

13

Artificial Intelligence vs. Machine Learning vs. Deep Learning

Deep Learning: A subset of machine learning algorithms that allow the algorithms to perform higher level human

tasks such as image recognition, speech to text translation, sentiment classification in a text, language translation etc.

These algorithms are inspired by the neural networks in the human brain

Page 14: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

us.sogeti.com14

Topics• Definition• A Couple of Motivating Examples• Why the Hype?

What is Machine Learning?

• Techniques and Use-Cases• Common ML and AI Algorithms• AI vs. ML vs. Deep Learning

Types of Machine Learning

• Lifecycle• Analysis and Model Building

Machine Learning Process

• Challenges with Data• Poor Model Performance• Production Deployment and Maintenance

Common Challenges in Machine Learning

• A Few ApplicationsMachine Learning in the

Public Sector

Page 15: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

15

Machine Learning Lifecycle

Deployment

Page 16: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

16

Machine Learning – Analysis and Model Building

Explore Data

Prepare Data

Perform Feature Engineering

Divide Data into Train/Validation/

Test Datasets

Train Different Models on

Training Dataset

Evaluate Trained Models on ‘Validation’

Dataset

Pick the best performing

model

Test the chosen model on ‘Test’

Dataset

70 – 80% of time spent here Only 20-30% of time spent here

Page 17: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

us.sogeti.com17

Topics• Definition• A Couple of Motivating Examples• Why the Hype?

What is Machine Learning?

• Techniques and Use-Cases• Common ML and AI Algorithms• AI vs. ML vs. Deep Learning

Types of Machine Learning

• Lifecycle• Analysis and Model Building

Machine Learning Process

• Challenges with Data• Poor Model Performance• Production Deployment and Maintenance

Common Challenges in Machine Learning

• A Few ApplicationsMachine Learning in the

Public Sector

Page 18: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

18

Machine Learning – Challenges1. Lack of data or lack of labeled/tagged data for supervised learning

a. Many modern Machine Learning algorithms are very data hungryb. Some organizations do have the required data but it is not labeledc. Ways to address the challenge:

Gather more data! Data synthesis

2. Poor Data Quality:a. A Machine Learning Model is as good as the data! b. Typical reasons for an organization to have poor data quality are:

Manual data entry Lack of consistent data dictionaries Inconsistent entries by different users System Integration Issues

c. Ways to improve data quality: Automation Good data architecture Implementation of good data governance principles

Page 19: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

19

Machine Learning – Challenges (Contd.)

3. Poor Performance of Machine Learning Algorithms:a. Bias vs. Variance tradeoff a.k.a Underfitting vs. Overfitting tradeoffb. Stale Models

i. Models when left alone get stale very quicklyii. Need to recalibrated and retrained on a regular basisiii. Having a feedback loop from production data is a best practice

c. Unbalanced datasetsi. For example in Fraud Detectionii. Accuracy could be very high of a useless model!

Page 20: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

20

Challenge: Overfitting and Underfitting

Underfitting – Model does not capture the structure of data. It is too simple.

Overfitting – Model tries too hard to fit all outliers and errors in the data and does not do well with new data. Generally is high order polynomial model.

Page 21: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

21

Challenge: How to Overcome Underfitting and Overfitting

Underfitting- Increase model complexity by adding more features, adding higher order terms or interaction features

- Try different Machine Learning algorithms

Overfitting- Make the model simpler by:

- Reducing the number of features- Penalizing a complex model by using a mathematical technique called Regularization

- Train on a larger dataset

Page 22: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

22

Machine Learning – Challenges (Contd.)

4. Production Deployment and Maintenance: a. Industry still immature in this area. Many

organizations have ML models running on individual laptops!

b. Without an effective deployment solution, it is hard to:i. Make the model available to the larger

organization and embed it in business applications

ii. Determine why a model works well on training data but not on production data

iii. Maintain different versions of the model and to do A/B testing

iv. Gain confidence of the business when the results cannot be interpreted by them

Page 23: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

us.sogeti.com23

Topics• Definition• A Couple of Motivating Examples• Why the Hype?

What is Machine Learning?

• Techniques and Use-Cases• Common ML and AI Algorithms• AI vs. ML vs. Deep Learning

Types of Machine Learning

• Lifecycle• Analysis and Model Building

Machine Learning Process

• Challenges with Data• Poor Model Performance• Production Deployment and Maintenance

Common Challenges in Machine Learning

• A Few ApplicationsMachine Learning in the

Public Sector

Page 24: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

24

Machine Learning Applications in the Public Sector

1. Fraud Detection: Payments Insider threat detection DMV applications

2. Deployment of Resources: Optimal deployment of police and traffic cops Readiness for adverse events like earthquakes and forest fires

3. Education: Automatic computer grading of student papers and answer sheets Detect cheating in tests and plagiarism

Page 25: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

25

Machine Learning Applications in the Public Sector (Contd.)

4. Process Automation: Chat bots to answer citizen queries Deciphering the sentiment and mood of citizens calling-in into government agencies call-centers Matching job descriptions with resumes Intelligent automation of things like Event Registration, sending communication email etc.

5. Predictive Analytics: Predict and reduce reincarceration rates Predict and reduce hospital readmission rates Detect system intrusion into IT systems and hacking activity

6. Predictive Maintenance Predictive maintenance of heavy equipment

Page 26: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

26

Questions?

Page 27: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

27

Thank You!

Page 28: Demystifying Machine Learning Learning.pdfii. Need to recalibrated and retrained on a regular basis iii. Having a feedback loop from production data is a best practice c. Unbalanced

28

References

i. https://medium.com/iotforall/the-difference-between-artificial-intelligence-machine-learning-and-deep-learning-3aa67bff5991

ii. https://medium.com/@diamond_io/artificial-intelligence-101-everything-you-need-to-know-to-understand-ai-8e20fe4bd750

iii. https://www.forbes.com/sites/bernardmarr/2016/12/06/what-is-the-difference-between-artificial-intelligence-and-machine-learning/#25895eea2742

iv. https://dzone.com/articles/10-interesting-use-cases-for-the-k-means-algorithm

v. https://www.youtube.com/watch?v=21EiKfQYZXc&list=PL4KifhYqFlly5ynC4WqwNSwzGcPmT_t6g&index=2&t=0s