clean code in jupyter notebooks
TRANSCRIPT
![Page 1: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/1.jpg)
@KNerush @Volodymyrk
Clean CodeIn Jupyter notebooks, using Python
1
5th of July, 2016
![Page 2: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/2.jpg)
@KNerush @Volodymyrk
Volodymyr (Vlad) Kazantsev
Head of Data @ product madness
Product Manager
MBA @LBS
Graphics programming
Writes code for money since 2002
Math degree2
Kateryna (Katya) Nerush
Mobile Dev @ Octopus Labs
Dev Lead in Finance
Data Engineer
Web Developer
Writes code for money since 2003
CS degree
![Page 3: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/3.jpg)
@KNerush @Volodymyrk
Why we end-up with messy ipy notebooks?
3
Coding
Stats Business
![Page 4: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/4.jpg)
@KNerush @Volodymyrk
Who are Data Scientists, really?
4
Coding
Stats Business “In a nutshell, coding is telling a computer to do something using a language it understands.”
Data Science with Python
![Page 5: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/5.jpg)
@KNerush @Volodymyrk
It is not going to production anyway!
5
![Page 6: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/6.jpg)
@KNerush @Volodymyrk
“Any fool can write code that a computer can understand. Good programmers write code that humans can understand” - Kent Beck, 1999
6
WTF! How am I suppose to validate this??
Sorry, but how do can I calculate 7 day retention ?
![Page 7: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/7.jpg)
@KNerush @Volodymyrk
From Prototype to ... The Data Science Spiral
7
Ideas & Questions
Data Analysis
Insights
Impact
![Page 8: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/8.jpg)
@KNerush @Volodymyrk
You do it for your own good..
8
Re-run all AB tests analysis for the last months, by tomorrow
Ideas & Questions
Data Analysis
Insights
Impact
![Page 9: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/9.jpg)
@KNerush @Volodymyrk
Part 2What can Data Scientists learn from
Software Engineers?
9
![Page 10: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/10.jpg)
@KNerush @Volodymyrk
Robert C. Martin, a.k.a. “Uncle Bob”
10
https://cleancoders.com/
![Page 11: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/11.jpg)
@KNerush @Volodymyrk
“Clean Code” ?
11
Pleasingly graceful and stylish in appearance or manner
Bjarne StroustrupInventor of C++
Clean code reads like well written proseGrady Boochcreator of UML
.. each routine turns out to be pretty much what you expected
Ward Cunninghaminventor of Wiki and XP
![Page 12: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/12.jpg)
@KNerush @Volodymyrk
One does not simply start writing clean code..
12
First make it work,Then make it Right,Then make it fast and small
Kent Beckco-inventor of XP and TDD
Leave the campground cleaner than you found it
- Run all the tests
- Contains no duplicate code
- Expresses all ideas...
- Minimize classes and methods
Ron Jeffriesauthor of Extreme
Programming Installed
The Boy Scouts of America
Applied to programming by Uncle Bob
![Page 13: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/13.jpg)
@KNerush @Volodymyrk
I'm not a great programmer; I'm just a good programmer with great habits.
13
Kent Beck
![Page 14: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/14.jpg)
@KNerush @Volodymyrk
“There are only two hard problems in Computer Science: cache invalidation and naming things" - Phil Karlton
long_descriptive_names
Avoid: x, i, stuff, do_blah()
Pronounceable and Searchable
revenue_per_payer vs. arpdpu
Avoid encodings, abbreviations, prefixes, suffixes.. if possible bonus_points_on_iphone vs. cns_crm_dip
Add meaningful contextdaily_revenue_per_payer
Don’t be lazy. Spend time naming and renaming things.14
![Page 15: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/15.jpg)
@KNerush @Volodymyrk
“each routine turns out to be pretty much what you expected” - Ward Cunningham
Small
Do one thing
One Level of Abstraction
Have only few arguments (one is the best)
Less important in Python, with named arguments.
15
![Page 16: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/16.jpg)
@KNerush @Volodymyrk
Use good names
Avoid obvious comments.
Dead Commented-out Code
ToDo, licenses, history, markup for documentation and other nonsense
But there are exceptions..
“When you feel the need to write a comment, first try to refactor the code so that any comment becomes superfluous” Kent Beck
16
![Page 17: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/17.jpg)
@KNerush @Volodymyrk
// When I wrote this, only God and I understood what I was doing// Now, God only knows
17
![Page 18: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/18.jpg)
@KNerush @Volodymyrk
// sometimes I believe compiler ignores all my comments
18
![Page 19: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/19.jpg)
@KNerush @Volodymyrk
/*** Always returns true.*/public boolean isAvailable() { return false;}
19
![Page 20: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/20.jpg)
@KNerush @Volodymyrk
“Long functions is where classes are trying to hide” - Robert C. Martin
20
Small
Do one thing
SOLID, Design Patterns, etc.
![Page 21: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/21.jpg)
@KNerush @Volodymyrk
Code conventions
Team should produce same style code as if that was one person
Team conventions over language one, over personal ones
Automate style formatting
21
![Page 22: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/22.jpg)
@KNerush @Volodymyrk
Part 3How to write Clean Code in Python?
(e.g. this is not Java)
22
![Page 23: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/23.jpg)
@KNerush @Volodymyrk
● Indentation● Tabs or Spaces?● Maximum Line Length● Should a line break before or after a binary operator?● Blank Lines● Imports● Comments● Naming Conventions
Example:
PEP 8 -- Style Guide for Python Code
23
foo = long_function_name(var_one, var_two, var_three, var_four)
foo = long_function_name(var_one, var_two, var_three, var_four)
Good Bad
https://www.python.org/dev/peps/pep-0008/
![Page 24: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/24.jpg)
@KNerush @Volodymyrk
Google Python Style Guide
24
https://google.github.io/styleguide/pyguide.html
![Page 25: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/25.jpg)
@KNerush @Volodymyrk25
My favourite !
This is not Java or C++
Functions are first-class objects
Duck-typing as an interface
No setters/getters
Itertools, zip, enumerate
etc.
![Page 26: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/26.jpg)
@KNerush @Volodymyrk
Part 4How to write Clean Python Code in
Jupyter Notebook?
26
![Page 27: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/27.jpg)
@KNerush @Volodymyrk
1. Imports
27
2. Get Data
5.Visualisation
6. Making sense of the data
4. Modelling
3. Transform Data
Typical structure of the ipynb
![Page 28: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/28.jpg)
@KNerush @Volodymyrk
How big should a notebook file be?
28
![Page 29: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/29.jpg)
@KNerush @Volodymyrk
How big should a notebook file be?
Hypothesis - Data - Interpretation
29
![Page 30: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/30.jpg)
@KNerush @Volodymyrk
Keep your notebooks small!
(4-10 cells each)
30
![Page 31: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/31.jpg)
@KNerush @Volodymyrk
Example:
Tip 1: break fat notebook into many small ones
31
1_data_preparation.ipynb
df.to_pickle(‘clean_data_1.pkl)
2_linear_model.py
df = pd.read_pickle(‘clean_data_1.pkl)
3_ensamble.py
df = pd.read_pickle(‘clean_data_1.pkl)
![Page 32: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/32.jpg)
@KNerush @Volodymyrk
Tip 2: shared library
Data access
Common plotting functionality
Report generation
Misc. utils
32
acme_data_utils Data_access.py plotting.py setup.py tests/
![Page 33: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/33.jpg)
@KNerush @Volodymyrk
Tip 3: Don’t just be pythonic. Be IPythonicDon’t hide “secret sauce” inside imported module
BAD:
Good:
33
![Page 34: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/34.jpg)
@KNerush @Volodymyrk
Clean code reads like well written prose
34
Grady Booch
![Page 35: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/35.jpg)
@KNerush @Volodymyrk
Good jupyter notebook reads like well written prose
35
![Page 36: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/36.jpg)
@KNerush @Volodymyrk
How big should one Cell be?
36
![Page 37: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/37.jpg)
@KNerush @Volodymyrk
One “idea - execution - output” triplet per cell
Import Cell: expected output is no import errors
CMD+SHIFT+P
37
Tip 4: each cell should have one logical output
![Page 38: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/38.jpg)
@KNerush @Volodymyrk
Tip 5: write tests .. in jupyter notebooks
38
https://pypi.python.org/pypi/pytest-ipynb
![Page 39: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/39.jpg)
@KNerush @Volodymyrk
Tip 6: ..to the cloud
39
![Page 40: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/40.jpg)
@KNerush @Volodymyrk
Code Smells .. in ipynb
- Cells can’t be executed in order (with runAll and Restart&RunAll)
- Prototype (check ideas) code is mixed with “analysis” code
- Debugging cells
- Copy-paste cells
- Duplicate code (in general)
- Multiple notebooks that re-implement the same function
40
![Page 41: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/41.jpg)
@KNerush @Volodymyrk
Tip 7: Run notebook from another notebook!
41
analysis.ipynb
![Page 42: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/42.jpg)
@KNerush @Volodymyrk
Make Data Product from notebooks!
42
![Page 43: Clean code in Jupyter notebooks](https://reader037.vdocuments.net/reader037/viewer/2022102413/5884b3fe1a28ab76798b7535/html5/thumbnails/43.jpg)
@KNerush @Volodymyrk
Summary: How to organise a Jupyter project
1. Notebook should have one Hypothesis-Data-Interpretation loop
2. Make a multi-project utils library
3. Good jupyter notebook reads like a well written prose
4. Each cell should have one and only one output
5. Write tests in notebooks
6. Deploy a shared Jupyter server
7. Try to keep code inside notebooks. Avoid refactoring to modules, if possible.
43