how i became a data scientist
TRANSCRIPT
Let’s tell a story
● Just to prove that I can talk other things than kaggle● Today’s goal, as always, is to entertain, not enlighten.● Apologize for presuming myself to be “experter data
scientist”.
What It Takes to be a Good Data Scientist
● Domain knowledge● Coding skills● Math/Stats
But maybe equally (or even more) important:● Ask the right question● Tell a good story
How Much Math(/Stats) is Required?
● Math is an extremely broad field● Personally I am good at numerical problems but bad
at algebra● Guestimate has always been my strength vs “precise
answer”● Having good intuition is more helpful than being
able to prove theorems
Majored in Engineering but...
Always wanted to be a “Data Scientist”● Unfortunately that didn’t exist at that
time
Three useful things learned in college● Linear algebra● Programming● Teamwork (a.k.a. party with your
friends)
Even after Y2K, there were plenty of IT jobs
● By chance I got a job as software developer● By chance it was in insurance
○ Arguably insurance has the best data to practice data science on
○ Very noisy○ High variety○ Not too small and not too big
The Most Useful Things Learned Doing IT
● It is NOT how to program!○ My coding skill probably degenerated
● Be interested in learning the domain○ I learned my “domain expertise” here
● Speak the “business language”○ Terminology is very important
● How to talk to IT folks
What to do when bored with your job?
● Career switch!● The following approach isn’t recommended:
Wanna be a chef? I’ve never
cooked before but you can trust me
Lesson learned in switching careers
● It is counter productive to talk about how you would be good at something that you haven’t done before
● Use cases / stories● Find the right mentor/sponsor
Don’t Laugh, but Almost Became an Actuary
● Why?○ Actuaries were doing “data science” way before
“data scientist” became a job title○ My wife is an actuary○ I am good at taking exams
● Why not?○ Data Science came along before I finished all the
exams
Became “Expert Data Scientist”
● It is both easy and hard to transform from “some IT guy who wants to be a (predictive) modeler” to “expert data scientist”○ The trick is to get new colleagues
● At that time it was called “predictive modeler”
● “Legitimized” by Kaggle Kaggle
What I Learned being a “Practitioner”
● The most important insight:○ Asking the right question is more important than
getting the perfect answer
● The right “form” of question:○ What will/can you do differently if you have a
prediction of [????]
If We Finish here...
● Then we would have made a very common mistake in data analysis○ All we have is an anecdote
● Enemies and friends of Data Science○ “Anecdotal” vs “general”○ “Co-occurrence” vs “correlation”○ “Correlation” vs “causality”
An ExampleOwen was good at math and became a data scientist(1000 people) were good at math and became data scientists
An Example
Good@Math Became Data ScientistYes No %Became DS
Yes 1,000 99,000 1%
No 10,000 90,000 10%
%Good@Math 9% 52%
Owen was good at math and became a data scientist(1000 people) were good at math and became data scientists
An Example
Good@Math Became Data ScientistYes No %Became DS
Yes 1,000 9,000 10%
No 1,000 99,000 1%
%Good@Math 50% 8.3%
Owen was good at math and became a data scientist(1000 people) were good at math and became data scientists
An Example● We found something!
○ People who are good at math has 10 times better chance to become Data Scientist!
● Is this good enough? Depending on your use case:○ Probably good enough to make up some math
interview questions for DS○ But not necessarily good enough to say “let’s
teach kids more math so that more of them become data scientists”