h2o world - munging, modeling, and pipelines using python - hank roark
TRANSCRIPT
![Page 1: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/1.jpg)
MUNGING, MODELING,AND PIPEL INES USING PYTHON
Hank Roark
![Page 2: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/2.jpg)
COMMUNITY FEEDBACK
Pythonic Interface to H2O, R interface parity
Rapid learning and iteration
Leverage existing knowledge and skills
Interface cleanly with PyData ecosystem
More Environments, esp. PySpark
Python Pipelines to Production
![Page 3: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/3.jpg)
EXAMPLE FROM THE IOTDomain: Prognostics and Health ManagementMachine: Turbofan Jet EnginesData Set: A. Saxena and K. Goebel (2008). "Turbofan Engine Degradation Simulation Data Set", NASA Ames Prognostics Data Repository
Predict Remaining Useful Life from Partial Life Runs
Six operating modes, two failure modes, manufacturing variability
Training: 249 jet engines run to failureTest: 248 jet engines
![Page 4: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/4.jpg)
WHY THIS EXAMPLE?
GETTING READY FOR BRONTOBYTES
![Page 5: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/5.jpg)
LOADING DATA
![Page 6: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/6.jpg)
SUMMARY STATISTICS
![Page 7: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/7.jpg)
FEATURE ENGINEERING
Calculate Total CyclesFor Each Unit
![Page 8: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/8.jpg)
FEATURE ENGINEERING
Append To OriginalFrame
![Page 9: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/9.jpg)
FEATURE ENGINEERING
Create New Feature of Cycles
Remaining
![Page 10: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/10.jpg)
EXPLORATORY DATA ANALYSISBoolean Indexing
![Page 11: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/11.jpg)
EXPLORATORY DATA ANALYSISSample thedata to local
memory
![Page 12: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/12.jpg)
EXPLORATORY DATA ANALYSIS
Use yourfavorite
visualizationtools
(Seaborn!)
Ugh, where are
trendsover time
Time
ZeroRemainingUsefulLife
![Page 13: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/13.jpg)
MODEL BASED DATA ENRICHMENTSensor
measurementsappear inclusters
Correspondingto operating
mode!
![Page 14: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/14.jpg)
MODEL BASED DATA ENRICHMENT
Use H2O k-means to find cluster
centers
![Page 15: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/15.jpg)
MODEL BASED DATA ENRICHMENT
Enrich existing datawith operating mode
membership
![Page 16: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/16.jpg)
MORE FEATURE ENGINEERINGFor non-constant
sensor measurements
within an operating mode,
Standardize each sensor measurement
by operating mode
Based on thetraining data
![Page 17: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/17.jpg)
TRENDS OVER TIME!
Before H2O Munging
Ready for H2O Learning
Time Time
![Page 18: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/18.jpg)
MODELING
Configure anEstimator
![Page 19: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/19.jpg)
MODELING
Train an Estimator
![Page 20: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/20.jpg)
MODEL EVALUATIONEvaluate Performance
at a glancein Python
![Page 21: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/21.jpg)
MODEL EVALUATIONEvaluate Performance
at a glancein H2O Flow
![Page 22: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/22.jpg)
MODEL EVALUATIONEvaluate Performance
at a glancegraphically in Python
![Page 23: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/23.jpg)
CROSS VALIDATION
SetupHyperparameterSearch Options
![Page 24: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/24.jpg)
CROSS VALIDATION
Configurefull full
grid search
![Page 25: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/25.jpg)
CROSS VALIDATION
Executegrid search
![Page 26: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/26.jpg)
CROSS VALIDATION
Evaluate results &model selection
![Page 27: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/27.jpg)
MORE CONTROL – SCIKIT PIPELINES
Create Pipelines
Hyperparameter Options
Cross validation strategy
HyperparameterSearch Strategy
Fit
![Page 28: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/28.jpg)
DATA PIPELINES USING H2OASSEMBLY
TypicalData Preparation
Add some structure
![Page 29: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/29.jpg)
H2OASSEMBLY TO PRODUCTION
Javafor
ProductionScoring
Python
![Page 30: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/30.jpg)
MORE ENVIRONMENTS
PySparkling Water = Python + Spark + H2O
Python + Sparkling Water
![Page 31: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/31.jpg)
COMMUNITY FEEDBACK
Pythonic Interface to H2O, R interface parity
Rapid learning and iteration
Leverage existing knowledge and skills
Interface cleanly with PyData ecosystem
More Environments, esp. PySpark
Python Pipelines to Production
![Page 32: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/32.jpg)
RESULTSH2O Python Framework:
H2OFrame & H2OEstimators
H2OAssembly for Data Prep Pipelines
Python, Jupyter Notebooks,Pandas, Scikit-Learn Integration
PySparkling Water
![Page 33: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/33.jpg)
RESOURCES
• Python booklet• Tibshirani release• Python documentation• Github examples• Jupyter Notebook of Example
![Page 34: H2O World - Munging, modeling, and pipelines using Python - Hank Roark](https://reader035.vdocuments.net/reader035/viewer/2022062503/586f79421a28ab10258b6f0b/html5/thumbnails/34.jpg)
THANK YOU