schibsted media from data collection to data action · 2019. 4. 10. · from data collection to...
TRANSCRIPT
-
From Data Collection to Data Action
Ludwig KrokstedtAnalyst, Media Insights2018.03.xx Data Innovation Summit, Stockholm Sweden
SCHIBSTED MEDIA
-
About Me
Ludwig KrokstedtAnalyst, Media Insights
Schibsted MediaStockholm, Sweden
-
Empowering people in their daily lives
About Schibsted
Publishing Marketplaces Growth
… and too many to name
-
To contribute to a functioning democracy by closing the gap between what cit izens know and what they
need to know about the world around them
Our Purpose
-
1,4+ billion online events every day
More than 2 PB of data stored in literally millions of files
-
So to access the data, I just need to understand S3, Spark , Parquet, Scala,
and Luigi ? No problemo!- Some data scientist , probably
-
But of course not everyone is data scient ist (#sad), so sites integrate the
tools that allow people and systems to make data-driven decisions
-
increasedcomplexityincreased fragmentation
increased costdifficult to correlate and compare
difficult to ensure compliance
-
Democratize access to relevant data while maintaining security and
privacy
Our solution
-
Standardizeand ensure
quality
Provide data to people with the
tools that match the needs of their
role
Secure and ensure privacy
compliance
How we get there
-
Do you really trust that the data you’re receiving today leads you into making the right decisions?
Quality
-
Do you really trust that the data you’re receiving today leads you into making the right decisions?
Quality
We didn’t .
-
Four strategic initiatives to ensure high quality data across media
One tracker online
One schema in media (as much as is possible)
Standardization and data modeling forum with dedicated focus groups
Real-time data quality testing
-
We all have different needs in terms of data and insights, and we should adhere to them
Get the right tools to the right person
-
Provide data to people with the tools that match the needs of their role
Rebuilding the data pipelines with tools that can support the different user needs
Bringing back control to the data consumer throug introduction of Snowflake data warehouse
New real-time visualization tooling for Product managers
-
Source systems
CREATE
Privacy Broker
PULSE
Real-time data pipeline
Batch data pipeline
Amplitude
Snowflake
Offline Analysis & Machine Learning
Reporting and aggregations
Marketing cohorts
Real-time applications
Algorithmic front page,Recommendations,
Propensity models etc
Routing layer Warehouse layer Business layer
-
● The ETL infrastructure around data can’t be cumbersome.● The data is constantly evolving. We need something that can flexible.● Data in all aggregation levels should be accessible by most (if not all) technical users.● We want to pay for what we use, not more - Sometimes we query 1 TB of data,
sometimes we query 1kB.● The local businesses know their business best. Central or specialized resources can
never be a bottleneck for local initiatives. ● Transferability of data and insights should be easy
Bringing back control to the data consumers What we needed
-
● Redshift● Spark jobs● Oracle Business Object● … and many other singular tools
and technologies
Bringing back control to the data consumers
What we had
-
● Redshift● Spark jobs● Oracle Business Object● … and many other singular tools
and technologies
● Not enough unicorn engineers!
Bringing back control to the data consumers
What we had
-
• Complex infrastructure• Unreliable• Unforgiving UI• High skill cap• Hard to reuse results• Time consuming• Change sensitive• Not scalable at runtime
• ELT Architecture• 100% up time so far...• Easy overview• SQL or BI tool knowledge• Caching and unlimited storage• Near instant access• Schema on Read• Runtime scalable
This... Became this.
-
Providing data with the tools that match the needs of their role
Need for detail / data complexity:Low High
Latency:
Real-time
Not real-time
Amplitude Kafka applications
Tableau +Snowflake
Snowflake SQL
The tools we offer:
-
Data being accessible at multiple points meansSecurity takes on new importance and complexity
Roles+
Access Control
-
Privacy
full GDPR compliancefor all users identified and anonymous
manage consents and opt -outsevent broker for take -out and deletion
-
Schibstedaccount
Privacy broker
System 1
System 2
Delegate system 1
3rd party system 1
AlertsSignals
Callbacks
Privacy broker
-
Collection ➔ Action
standardize collectiondemocratize access
ensure security and privacyenable action and innovation
-
Thanks!
Slide Number 1Slide Number 2Slide Number 3Slide Number 4Slide Number 5Slide Number 6Slide Number 7Slide Number 8Slide Number 9Slide Number 10Slide Number 11Slide Number 12Slide Number 13Slide Number 14Slide Number 15Slide Number 16Slide Number 17Slide Number 18Slide Number 19Slide Number 20Slide Number 21Slide Number 22Slide Number 23Slide Number 24Slide Number 25Slide Number 26