dataframevalidation in python · 2018-06-08 · tdda applying test driven development (tdd)...
TRANSCRIPT
![Page 1: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/1.jpg)
DataFrame Validation In Python
![Page 2: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/2.jpg)
Sounds Familiar? Credit: Anaconda, Inc.
AnacondaCON 2018
https://www.youtube.com/watch?v=UXd0EDy7aTY
![Page 3: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/3.jpg)
About Me
![Page 4: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/4.jpg)
MotivationWhy do we need data validation?
![Page 5: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/5.jpg)
Data Quality Dimensions
Valid Accurate Complete
Consistent Uniform Unique
![Page 6: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/6.jpg)
It can happen to all of us
![Page 7: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/7.jpg)
1 Perfect World
![Page 8: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/8.jpg)
1 Perfect World
![Page 9: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/9.jpg)
2 Model Deterioration
![Page 10: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/10.jpg)
3 Accidental Discovery
![Page 11: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/11.jpg)
4 Ignorance Is Bliss
![Page 12: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/12.jpg)
Let ’s See Some Tools?
![Page 13: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/13.jpg)
VoluptuousVoluptuous is a Python data validation library.
• Simplicity.• Support for complex data structures.• Useful error messages.
![Page 14: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/14.jpg)
Engarde
• Great for flat files like csv
•• As decorators, which are most useful in .py scripts• Interactively at the interpreter
https://github.com/TomAugspurger/engarde
![Page 15: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/15.jpg)
TDDAApplying Test Driven Development (TDD) principals to data analysis.
• Correctness
• Regression detection
• Specification, Design and Documentation
• Refactoring
• Portability
Test Driven Data Analysis
![Page 16: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/16.jpg)
![Page 17: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/17.jpg)
![Page 18: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/18.jpg)
Credit - Practical Data Cleaning with Python
http://kjamistan.com/
![Page 19: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/19.jpg)
“Quality is never an accident; it is always the result of intelligent effort.”
─ John Rusk in
![Page 20: DataFrameValidation In Python · 2018-06-08 · TDDA Applying Test Driven Development (TDD) principals to data analysis. • Correctness • Regression detection • Specification,](https://reader033.vdocuments.net/reader033/viewer/2022050107/5f45c880ee5f463be5495f86/html5/thumbnails/20.jpg)
https://github.com/pyotam/Dataframe-Validation