h2o world - python pipelines - spencer aiello
TRANSCRIPT
![Page 1: H2O World - Python Pipelines - Spencer Aiello](https://reader036.vdocuments.net/reader036/viewer/2022081517/587155771a28ab8e5b8b50d1/html5/thumbnails/1.jpg)
To Production and Beyond
Spencer Aiello
![Page 2: H2O World - Python Pipelines - Spencer Aiello](https://reader036.vdocuments.net/reader036/viewer/2022081517/587155771a28ab8e5b8b50d1/html5/thumbnails/2.jpg)
The Problem
• Goal: o Move from prototype to production
• Road block:o Prototyping Environment Cages Your:
• Feature preprocessing• Models• Ideas
![Page 3: H2O World - Python Pipelines - Spencer Aiello](https://reader036.vdocuments.net/reader036/viewer/2022081517/587155771a28ab8e5b8b50d1/html5/thumbnails/3.jpg)
The Problem
• Even if your code is beautiful:
![Page 4: H2O World - Python Pipelines - Spencer Aiello](https://reader036.vdocuments.net/reader036/viewer/2022081517/587155771a28ab8e5b8b50d1/html5/thumbnails/4.jpg)
The Problem
• You cannot drag-n-drop into a new environment.
• Translation may be difficult;humans make mistakes
![Page 5: H2O World - Python Pipelines - Spencer Aiello](https://reader036.vdocuments.net/reader036/viewer/2022081517/587155771a28ab8e5b8b50d1/html5/thumbnails/5.jpg)
A Solution
H2O gives you wings:
• Export Preprocessing
• Export Models
![Page 6: H2O World - Python Pipelines - Spencer Aiello](https://reader036.vdocuments.net/reader036/viewer/2022081517/587155771a28ab8e5b8b50d1/html5/thumbnails/6.jpg)
H2OAssembly
o Build Rich Feature Preprocessing Assembly Lines• Clean, reduce, and expand datasets by composing any
of the 100s of primitives available in H2O• Build hygenic processing assembly lines that can be
applied to new batches of data• Export your feature preprocessing steps as a plain old
java object and apply to streaming tuples
![Page 7: H2O World - Python Pipelines - Spencer Aiello](https://reader036.vdocuments.net/reader036/viewer/2022081517/587155771a28ab8e5b8b50d1/html5/thumbnails/7.jpg)
H2OAssembly
![Page 8: H2O World - Python Pipelines - Spencer Aiello](https://reader036.vdocuments.net/reader036/viewer/2022081517/587155771a28ab8e5b8b50d1/html5/thumbnails/8.jpg)
H2OAssembly
Python
Java
![Page 9: H2O World - Python Pipelines - Spencer Aiello](https://reader036.vdocuments.net/reader036/viewer/2022081517/587155771a28ab8e5b8b50d1/html5/thumbnails/9.jpg)
Live Demo
• Lending Club Data: Predict Interest Rateo Four-part dataset of loan datao 500K rows, 52 columnso Preprocess 5 columns within a 16 step assemblyo Build a simple GBM to predict interest rateo Export everything into a Storm topology
![Page 10: H2O World - Python Pipelines - Spencer Aiello](https://reader036.vdocuments.net/reader036/viewer/2022081517/587155771a28ab8e5b8b50d1/html5/thumbnails/10.jpg)
Live Demo
Storm Topology