solving the skills gap - home - scc · solving the skills gap a quick overview of automl mark...
TRANSCRIPT
Solving the Skills GapA Quick Overview of AutoML
Mark Woolnough, Power and AI Architect, IBM Systems6th March 2019
The Challenges of Enterprise AI Adoption
Time to Insights Slow
Weeks to Months
Lack of AI Talent
~100Data Science
“Grandmasters” in the WorldTime for a data scientist
to build a model
Lack of Trust in AI
Black box models
”US alone faces a shortage of 190,000 people with analytical expertise.”
Data Scientists combine skills across various areas of expertise
• Scripting, SQL,Python, R Scala
• Data Pipelines• Big Data,
Apache Spark
• Mathematical Background• Computational Science
• Business/IndustryExpertise
• Domain Knowledge• Supply Chain• CRM• Financials• Networking
Statistician
ComputerScience
DomainExpert
Unicorn
What is AutoML?
4
Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine
learning to real-world problems.
Confidential5 Confidential5
AutoML
Features Target
Data Quality andTransformation
ModelingTable
ModelBuilding
ModelData Integration
+
AutoML: Automated Data Science and ML Workflows
Highly Iterative Process
What are the benefits to AutoML?
6
ü Cost Reductions
ü Faster Deployment: Increased revenues and customer satisfaction
ü More models in production, more automation, better employee engagement
Ø Increased productivity for data scientistsØ Democratization of machine learning reduces demand for data scientists
ü Can help with machine learning and AI interpretability (i.e, for regulation bodies)
ü Can help audit the quality and accuracy of existing models
AutoML Vendors – an example
7
H2O.ai is a Recognized Leader in AI and ML2018 Gartner Magic Quadrant
for Data Science andMachine Learning Platforms
Forrester Wave: Notebook-Based Predictive Analytics And Machine
Learning Solutions, Q3 2018
Top 3 Artificial Intelligence (AI) and Machine Learning (ML)
Software Solution
“Technology leadership … with a distinguished vision”
“the quasi-industry standard”
“its vision of creating an AI and ML tool that ultimately aims to allow almost everyone within the business to create their own predictive models”
“H2O.ai’s future is automated machine learning”
“its bright future is in Driverless AI”
21 day free trial for Driverless AI
H2O Driverless AI Delivers “Expert Data Scientist in a Box”
• Award-winning software• Created and supported by world
renowned AI experts from H2O.ai• Empowers companies to accomplish
AI and ML with a single platform
• Performs the function of an expert data scientist and adds more power to both novice and expert teams
• Details and highlights insights and interpretability with easy to understand results and visualizations
Supervised Learning
Age Income Last Month Payment
Payment Default
47 $183,342 Yes False
29 $ 84,823 No True
58 $ 95,853 Yes False
63 $ 43,824 Yes True
Training Data
Age Income Last Month Payment
Payment Default
61 $ 73,679 Yes
73 $ 54,428 No
59 $ 90,453 Yes
43 $ 83,041 Yes
Test Data
What’s the pattern?
Can we create model to guess
‘Default’?
Supervised Learning Techniques
Regression: How much will a customers spend?
Classification: Will a customer make a purchase? Yes or No
X
y
xi
xj
yesno
H2O Driverless AI: How it Works
SQL
Local
Amazon S3
HDFS
X Y
AutomaticScoring Pipeline
Machine learningInterpretability
Deploy Low-latencyScoring to Production
Modelling
Dataset
Model Recipes:• i.i.d. data• Time-series• More on the way
Advanced Feature Engineering Algorithm Model
Tuning+ +
Survival of the Fittest
Automatic Machine Learning
Understand the data shape, outliers, missing values, etc.
Powered by GPU Acceleration
1Drag and drop data 2
Automatic Visualization Use best practice model recipes
and the power of high performance computing to Iterate across thousands of possible models including advanced feature engineering and parameter tuning
3Automatic Machine Learning
Deploy ultra-low latency Python or Java Automatic Scoring Pipelines that include feature transformations and models.
4Automatic Scoring Pipelines
Ingest data from cloud, big data and desktop systems
Google BigQuery
Azure Blog Storage
Snowflake
Model Documentation
12
CONFIDENTIAL
Industry Leading Interpretability for Trust and Compliance
• Interpretability for debugging, not just for regulators
• Get reason codes and model interpretability in plain English
• K-Lime, LOCO, partial dependence and more
CONFIDENTIAL
H2O Walk through…
14
Scenario: Use AutoML to build and deploy an AI model that will predict when a customer will default payment.
Materials: IBM AC922, H2O Driverless AI, a labelled training data set with payment history of 25,000 clients and their payment status (true/false) and a test data set to validate.
Output: A trained model pipeline for deployment and human readable report showing how it was built.
CONFIDENTIAL
H2O Walk through…
15
The data…
Predict this!
CONFIDENTIAL
H2O Walk through…
16
| 177 March 2019