leveraging open technologies pragmatically within a traditionally closed ecosystem | anacondacon...
TRANSCRIPT
Leveraging open technologies
pragmatically within a
traditionally closed ecosystem
Dharhas Pothina US Army Engineer Research and Development Center
Some History• 5 years university research• 10 years state government• 3 years federal government
Started out with mostly in-house codebases plus proprietary tools and some scripting for automation / data transformation
my workflow
circa 2008
• bash• perl• awk/
sed• fortran• c
artisanal data scienceworkflows are fragile and ineffective
Image credit: Quilted Northern April Fools
Why Python?
Transitioning was easy• I could understand the programs I read• Had the scientific libraries I needed• Could interoperate with everything in my processing
pipeline• Had powerful data structures and language features• Great community support
I tried learning Java 3 time in my career Python was nicer
Python Scales
Easy things are easyComplex things are sensibleHard things are possible
Non Technical User/AnalystData Scientist/EngineerSoftware Developer
PYTHON IS OPTIMIZED FOR HUMAN PRODUCTIVITY RATHER THAN MACHINE PRODUCTIVITY
Image credit: Bea de los Arcos (CC BY-ND 2.0)
Why Open? It gets the job done.
open + people productive environment + low friction
Image credit: Sonny Abesamis (CC BY 2.0)
Closed Ecosystems are Resource Limited• Limited staff• Limited time• Limited
expertise• Limited
fundingso stop building your own machine learning library
and use your limited resources on mission
critical activities instead
Reduce License Friction*• impacts development speed• impacts agility/trying new things• impacts deployment • Impacts scaling
whenever possible avoid proprietary
tools*
* If you work for state/federal agencies, or anywhere with a long procurement process
internal teams cannot match the resources of the open data science community (neither can commercial vendors)
Build a layer not a internal platform
Internal Software
or you will own that puppy…
Image credit: Marcos Leal (CC BY 2.0)
Risks
Be very selective• Bus Factor + Code Complexity• Software Ecosystem• Code Quality• Python 3 compatibility• Continuous Integration• Cross Platform Compatibility• License – BSD, MIT, Apache
understand your dependencies
Packaging is hard (we use )
Packaged by Continuum
Packaged by Community
Internal, Secret & Export Restricted
Should you make internal code open?• Can (but may not) gain you external contributors• Takes effort • Refactoring/Clean Up• Documentation• Legal Review• Tests/Continuous Integration
• Social contract• Gains you the open infrastructure ecosystem – ci,
github, conda-forge, etcmost of the steps you need convert a tool to be open are the same to make it useful across your
own organization