openpower summit driverless ai compressed · h2o.ai is a leaderin the 2018 gartner data science and...
TRANSCRIPT
Jo-fai(Joe)Chow,DataScienceEvangelistH2O.ai
[email protected]|@h2oai
JointheConversation#OpenPOWERSummit
AcceleratingAIDeploymentwithH2ODriverlessAIonIBMPower9
CONFIDENTIAL
H2O.aiOverview
This
Company Founded in Silicon Valley in 2012Funded: $75M Investors: Wells Fargo, NVIDIA, Nexus Ventures, Paxion Ventures
Products • H2O Open Source Machine Learning (14,000 organizations)• H2O Driverless AI – Automatic Machine Learning
Leadership Leader in Gartner MQ Machine Learning and Data Science Platform
Team 120 AI experts (Kaggle Grandmasters, Distributed Computing, Visualization)
Global Mountain View, London, Prague, India
CONFIDENTIALCONFIDENTIAL
AGrowingCustomerBase
This “H2O.ai'sreferencecustomersgaveitthehighestoverallscoreforsales
relationshipandoverallserviceandsupport”- GartnerMQ2018
Financial InsuranceMedia & MarketingTelcosIndustrial Retail HealthcareAdvisory,
Accounting & Government
CONFIDENTIAL
GrowingWorldwideOpenSourceCommunity
14,000CompaniesusingH2O
155,000datascientists 116KMeetupMembers
H2OWorldNYC,London,SF
Thousandsattendingliveandonline
CONFIDENTIAL
H2O.aiisaLeader inthe2018GartnerDataScienceandMachineLearningPlatformsMagicQuadrant• Technologyleaderwithmostcompletenessofvision
• Recognizedforthemindshare,partnernetworkandstatusasaquasi-industrystandardformachinelearningandAI
• H2O.aicustomersgavethehighestoverallscore amongallthevendorsforsalesrelationshipandaccountmanagement,customersupport(onboarding,troubleshooting,etc.)andoverallserviceandsupport
This
GettheGartnerMagicQuadranthere
“ConfidentialandpropertyofH2O.ai.Allrightsreserved”
PartnerEcosystem
StrategicPartners
Cloud ProvidersHW Vendors System Integrators
Value Added Resellers
Data Stores
This
CONFIDENTIALCONFIDENTIAL
H2O.ai Product Suite
In-Memory, Distributed Machine Learning Algorithms
with H2O Flow GUI
H2O AI Open Source Engine Integration with Spark
Lightning Fast machine learning on GPUs
Automatic feature engineering, machine
learning and interpretability
• 100%opensource– ApacheV2licensed• Builtfordatascientists– interfaceusingR,Python
onH2OFlow(interactivenotebookinterface)• EnterpriseSupportsubscriptions
• Enterprisesoftware• Builtfordomainusers,analysts&
datascientists– GUIbasedinterfaceforend-to-enddatascience
• Fullyautomatedmachinelearningfromingesttodeployment
• User licensesonaperseatbasis(annualsubscription)
Open Source
CONFIDENTIALCONFIDENTIAL
Why Driverless AI?
CONFIDENTIAL
Driverless AI: Automates Data Science and ML Workflows
Driverless AI
10H2OTeam
OriginofRPackage`ggplot2`
“ConfidentialandpropertyofH2O.ai.Allrightsreserved”
Automatic VisualizationAutomaticScagnosticsandothervisualizationstogeneratethemostrelevantvisualizationsforeachdataset
12H2OTeam
1st
4th
25th
48th33rd
KaggleGrandmasters(andtheirHighestRank)
13th
About80,000Kagglers
13H2OTeam
1st
4th
25th
48th33rd
13th
181stHopingtogetclosertothematsomepoint…
CONFIDENTIAL
Secret Sauce: 1) Grandmaster Feature Engineering
Numerical/Categorical Interactions, Target Encoding, Clustering, Dimensionality Reduction, Weight of Evidence, etc.
Time-Series: Lags and historical aggregates with causality constraints
CONFIDENTIAL
Secret Sauce: 2) Grandmaster Pipeline Tuning + Validation
19,000 features tested
1,000 models trained
reliable generalization estimates (overfitting avoidance)
Example: Driverless AI BNP Paribas on 3-GPU workstation
evolutionary strategies
DOI:10.1126/science.aaa9375
MTV
1 final optimalscoring pipeline
massively parallel processing(multi-CPU, multi-GPU)
CONFIDENTIAL
https://web.stanford.edu/~hastie/Papers/ESLII.pdf
http://www.deeplearningbook.org
Statistical Learning vs Deep Learning - We Do Both!
Typically better for structured data(CSV, SQL, Transactional)
Typically better for unstructured data(Images, Video, Audio, Text)
GLM/CART/RF/GBM/XGBoostK-Means/PCA/SVD
TensorFlow Deep Learning
“ConfidentialandpropertyofH2O.ai.Allrightsreserved”
• Automatic feature engineering to increase accuracy - AlphaGo for AI
• Automatic Kaggle Grandmaster recipes in a box for solving wide variety of use-cases
• Automatic machine learning to find and tune the right ensemble of models
Accuracy
“ConfidentialandpropertyofH2O.ai.Allrightsreserved”
Interpretability
• Interpretability for debugging, not just for regulators
• Get reason codes and model interpretability in plain english
• K-Lime, LOCO, partial dependence and more
CONFIDENTIAL
Deployment: Auto Generated Pipelines
Driverless AI = AI to do AI
CONFIDENTIALCONFIDENTIAL
BinaryClassification
LiveDemo
https://www.kaggle.com/c/bnp-paribas-cardif-claims-management
CONFIDENTIALCONFIDENTIAL
BinaryClassification
LiveDemo
CONFIDENTIAL
DriverlessAIExperiment– LiveDemo
This
CONFIDENTIAL
Deployment:ScoringPipelineExample
This
valid license
Pipelines generated from Driverless AI experiment
New data (raw features only, no target)
Fast, practical scoring speed in ms(including all feature engineering and scoring steps)
CONFIDENTIAL
PythonAPI:RunningDriverlessAIwithaScript
This
CONFIDENTIALCONFIDENTIAL
DriverlessAIonIBMPower
docs.h2o.ai
CONFIDENTIALCONFIDENTIAL
DriverlessAIonIBMPower
CONFIDENTIALCONFIDENTIAL
DriverlessAIDelivers“ExpertDataScientistinaBox”
• CreatedandsupportedbyworldrenownedAIexperts
• EmpowerscompaniestoaccomplishAIandMLwithasingleplatform
• Performsthefunctionofanexpertdatascientistandaddsmorepowertobothnoviceandexpertteams
• Detailsandhighlightsinsightsandinterpretabilitywitheasytounderstandresultsandvisualizations
21dayfreetrialforDriverlessAI
OurFlagshipCommunityEvent– H2OAIWorldisfinallycomingtoLondon!
28
29th &30th Oct,London
Morereal-worldusecases
+AllH2OKaggleGrandmasters
+Hands-onTraining
• MoreInfo,Code,andSlides• bit.ly/h2o_meetups
• Contact• [email protected]• @matlabulous• github.com/woobe
29
Thanks!