lessons learned from testing machine learning software
TRANSCRIPT
Lessons Learned from Testing Machine Learning SoftwareChristian Ramírez
@chrix2 @formiik
About me• Computer Engineer• M.Sc. (Astronomy)• Ex-googler• Python lover• Actually formiiker (principal researcher at
formiik and formiiklabs)
About formiik and how we use MLFormiik is a platform designed to improve productivity of on-site staff.– Route optimization– Workload balancing– Ranking– Time series anomaly detection– Image recognition (deep learning)
Lesson 0
• Corollary number 2 by @dx• “In theory, there is no difference between
theory and practice. But, in practice, there is”
Yogi Bera
Introduction• In machine learning, computers apply
statistical learning techniques to automatically identify patterns in data.
• A core objective of a learner is to generalize from its experience.
•WHAT?
Lesson 1• Forget all you know about testing
Traditional software programming• In a general way, we do this:• With a specification of a function
» f(x)– We implement the function to meet the
specification» f(x)=y
Traditional software programming• How we do testing, basically:– Inputs a=[X1,X2,…,Xn]– Expect results b=[Y1,Y2,…,Yn]– We use assertions to validate the specification
– f(Xi)==Yi
Machine learning software programming• In the most of cases• We give examples– Pairs (Xi,Yi)– Induce f() such that• y≈f(x)for given pairs, and generalizes well for unseen
x
Machine learning software programming• In other cases• We give examples– Input (Xi)– No Yi are given to the learning algorithm, leaving
it on its own to find structure in its input
Testing (ML) software• Traditional software is modular– We can decompose it and understand it– Each module has inputs and outputs that can be
defined and isolated
Testing (ML) software• Machine learning systems appear to be
monolithic– Everything depends on everything else– Changing any one thing changes everything else
Lesson 2• Learn about machine learning– Kinds of learning– Coding
Lesson 3• The models will learn what you teach them to
learn.– Test models no requirements– Adversarial examples– Algorithms don't align with reality– Spurious correlations
• Corollary number 4 “human beings are so fools to programming” by @dx
Lesson 4• No matter what you have read on blogs, you
need a good level of mathematical knowledge
• Remember again corollary number 2
Lesson 5• Build your own ecosystem– Frameworks– GPU or CPU?
Lesson 6• Identifies how much data you really need– More data or better models?
Lesson 7• You need a new toolset– Experimental data scientist
Lesson 8 • Be a story teller– Have a goal– Manage the expectative, this is not magic
Summary