girls who code in data science
TRANSCRIPT
Girls Who Code and Do Data Science
@EstherVasiete Data Scientist
July 12th, 2016 Girls Who Code Summer Immersion Program
About me • Born and raised in Barcelona • Bachelor’s Degree in Electrical Engineering
About me • Studied abroad in UK - Best time of my life - Developed an interest in image processing and computer vision - Also developed an interest in machine learning, I just didn’t know then
About me • Did my Masters at CU Boulder • Officially, I received my diploma in EE - Unofficially, I like to think about it as a CS degree - I managed to cross-list most courses and thesis advisor so that I could feed my growing interest for machine learning
About me • Once I graduated, I moved to San Francisco - My first data science gig
So what is machine learning?
How does this… …become this?
By recognizing this
Sensors + Other Structured and Unstructured Data
How can a machine learn that a cat is a cat?
How can a machine learn that a cat is a cat? What about these?
The cat from Shrek Hairless cat Baby panther and baby tiger
Can your model generalize to new, unseen data?
The importance of data
Messy data – the norm and not the exception
Training examples Machine Learning Algorithm Cat Model
Basic Machine Learning Framework
Gene Sequencing
Smart Grids
COST TO SEQUENCE ONE GENOME HAS FALLEN FROM $100M IN 2001 TO $10K IN 2011 TO $1K IN 2014
READING SMART METERS EVERY 15 MINUTES IS
3000X MORE DATA INTENSIVE
Stock Market
Social Media
FACEBOOK UPLOADS 250 MILLION
PHOTOS EACH DAY
In all industries billions of data points represent opportunities for data science
Oil Exploration
Video Surveillance
OIL RIGS GENERATE
25000 DATA POINTS PER SECOND
Medical Imaging
Mobile Sensors
https://www.washingtonpost.com/posteverything/wp/2015/06/05/the-auto-industry-discriminates-against-women-so-i-quit-my-engineering-job-to-become-a-mechanic/
You can also transform a male-dominant
industry with data science.
On-Board Diagnostics
Diagnostic Trouble Codes (DTC)
Unscheduled repairs
AB1029 – Power steering pump replacementCT3408 – Wheel alignment
Data Sources for Predictive Maintenance
VIN Timestamp DTC Code Odometer
Speed Acceleration
Engine Temperature Engine Torque GPS
Coordinates etc.
VIN Date vehicle in
Date vehicle out Repair code
Parts replaced Warranty claims
Repair Comments
Vehicle Data Car Repairs Data
Predicting Job Type from Diagnostic Trouble Codes (DTCs)
Time
Job Type: Transmission
Job Type: Transmission
Engine Job Type:
Regular check
DTC: B DTC: B,
P, C
DTC: U DTC: B
DTC: B
DTC: B, P, C, U
DTC: P, B, U
DTC: P
DTC: B
DTC: B,P
DTC: B,P
Can the DTCs observed here predict
this Job Type?
Can the DTCs observed here predict this Job
Type?
Can the DTCs observed here predict this Job
Type?
Predicting Job Type: a multi-class classification problem
DF 12 10
DF 12 15
DF 29 80
AB 10 29
AB 16 22
AB 16 25
AB 86 22
CT34 02
CT3408
CT 35 60
CT 24 09
Vehicle Features
Hierarchical Classification Framework
Vehicle Features
DF 12 10
DF 12 15
DF 29 80
AB 10 29
AB 16 22
AB 16 25
AB 86 22
CT34 02
CT3408
CT 35 60
CT 24 09
• Diagnostic Trouble Codes (DTCs) are not always symptomatic of an ensuing repair.
• Hence, creating a rule-based approach for repairs based on DTCs has been challenging to construct.
• A machine learning approach could be a better solution to infer the relationship between groups of DTCs and repairs.
• Become a mechanic and solve a few car repairs, or become a data scientist and solve millions!
Takeaways
Other Data Science Use-Cases for Connected Cars
Other Data Science Use-Cases
Automated essay scoring
Drug/chemical discovery & analysisRecommendation systems
Fraud detection
blog.pivotal.io/data-science-pivotal/case-studies/pivotal-for-good-with-crisis-text-line-using-text-analytics-to-better-serve-at-risk-teens
blog.pivotal.io/data-science-pivotal/features/pivotal-for-good-with-crisis-text-line-a-first-look
Data Scientist Profile Ask me anything @EstherVasiete