Irene Gonzálvez, Product Manager at Spotify
Big Data, Big Quality?
Irene GonzálvezProduct ManagerData Infrastructure
Music Streaming ServiceLaunched in 2008
Premium and Free TiersAvailable in 61 Countries
Over 140M Monthly Active Users
More than 30M Songs
Over 1 billion plays per day
Data enables recommendations, advertising, label and artist payments and more
$ $ $$ $ $
Data of Good Quality First
Data quality problems cost US business $600B a year!
Data Warehouse Institute
Data Quality Dimensions
Timely Correctness
Completeness Consistency
Data Quality Dimensions
Timely Correctness
Completeness Consistency
Datamon Data CountersMetriLab
TC4D: Test Certified for DataLevel 1: Set-up, monitoring, alerting and documentation
Level 2: Data management and Unit tests
Level 3: Build your defenses
What’s next?Build an algorithm library for anomaly detection (ML4ALL)
Provide the infrastructure to ‘plug&play’ more algorithms
Provide parameter recommendations to tweak the algorithms
What’s next?Spotify-wide strategy
● Have metrics to understand when a dataset qualifies as ‘good’ quality.
● Identify which datasets are critical/ central to Spotify and make them of ‘good’ quality
Lesson #1: Think Big Understand your org’s pain points
Lesson #2: Start smallAnd start NOW!
Lesson #3: Data Quality is not an add-on
Insights can ONLY be as good as the data
Data will increase 10x by 2025International Data Corp
1 ZB = 1 trillion GB
20% 10%Critical Data Hypercritical Data
Q&AIrene Gonzálvez Product Manager,Spotify
irene@spotify.com