schibsted media from data collection to data action · 2019. 4. 10. · from data collection to...

1

Upload: others

Post on 02-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • From Data Collection to Data Action

    Ludwig KrokstedtAnalyst, Media Insights2018.03.xx Data Innovation Summit, Stockholm Sweden

    SCHIBSTED MEDIA

  • About Me

    Ludwig KrokstedtAnalyst, Media Insights

    Schibsted MediaStockholm, Sweden

  • Empowering people in their daily lives

    About Schibsted

    Publishing Marketplaces Growth

    … and too many to name

  • To contribute to a functioning democracy by closing the gap between what cit izens know and what they

    need to know about the world around them

    Our Purpose

  • 1,4+ billion online events every day

    More than 2 PB of data stored in literally millions of files

  • So to access the data, I just need to understand S3, Spark , Parquet, Scala,

    and Luigi ? No problemo!- Some data scientist , probably

  • But of course not everyone is data scient ist (#sad), so sites integrate the

    tools that allow people and systems to make data-driven decisions

  • increasedcomplexityincreased fragmentation

    increased costdifficult to correlate and compare

    difficult to ensure compliance

  • Democratize access to relevant data while maintaining security and

    privacy

    Our solution

  • Standardizeand ensure

    quality

    Provide data to people with the

    tools that match the needs of their

    role

    Secure and ensure privacy

    compliance

    How we get there

  • Do you really trust that the data you’re receiving today leads you into making the right decisions?

    Quality

  • Do you really trust that the data you’re receiving today leads you into making the right decisions?

    Quality

    We didn’t .

  • Four strategic initiatives to ensure high quality data across media

    One tracker online

    One schema in media (as much as is possible)

    Standardization and data modeling forum with dedicated focus groups

    Real-time data quality testing

  • We all have different needs in terms of data and insights, and we should adhere to them

    Get the right tools to the right person

  • Provide data to people with the tools that match the needs of their role

    Rebuilding the data pipelines with tools that can support the different user needs

    Bringing back control to the data consumer throug introduction of Snowflake data warehouse

    New real-time visualization tooling for Product managers

  • Source systems

    CREATE

    Privacy Broker

    PULSE

    Real-time data pipeline

    Batch data pipeline

    Amplitude

    Snowflake

    Offline Analysis & Machine Learning

    Reporting and aggregations

    Marketing cohorts

    Real-time applications

    Algorithmic front page,Recommendations,

    Propensity models etc

    Routing layer Warehouse layer Business layer

  • ● The ETL infrastructure around data can’t be cumbersome.● The data is constantly evolving. We need something that can flexible.● Data in all aggregation levels should be accessible by most (if not all) technical users.● We want to pay for what we use, not more - Sometimes we query 1 TB of data,

    sometimes we query 1kB.● The local businesses know their business best. Central or specialized resources can

    never be a bottleneck for local initiatives. ● Transferability of data and insights should be easy

    Bringing back control to the data consumers What we needed

  • ● Redshift● Spark jobs● Oracle Business Object● … and many other singular tools

    and technologies

    Bringing back control to the data consumers

    What we had

  • ● Redshift● Spark jobs● Oracle Business Object● … and many other singular tools

    and technologies

    ● Not enough unicorn engineers!

    Bringing back control to the data consumers

    What we had

  • • Complex infrastructure• Unreliable• Unforgiving UI• High skill cap• Hard to reuse results• Time consuming• Change sensitive• Not scalable at runtime

    • ELT Architecture• 100% up time so far...• Easy overview• SQL or BI tool knowledge• Caching and unlimited storage• Near instant access• Schema on Read• Runtime scalable

    This... Became this.

  • Providing data with the tools that match the needs of their role

    Need for detail / data complexity:Low High

    Latency:

    Real-time

    Not real-time

    Amplitude Kafka applications

    Tableau +Snowflake

    Snowflake SQL

    The tools we offer:

  • Data being accessible at multiple points meansSecurity takes on new importance and complexity

    Roles+

    Access Control

  • Privacy

    full GDPR compliancefor all users identified and anonymous

    manage consents and opt -outsevent broker for take -out and deletion

  • Schibstedaccount

    Privacy broker

    System 1

    System 2

    Delegate system 1

    3rd party system 1

    AlertsSignals

    Callbacks

    Privacy broker

  • Collection ➔ Action

    standardize collectiondemocratize access

    ensure security and privacyenable action and innovation

  • Thanks!

    [email protected]

    Slide Number 1Slide Number 2Slide Number 3Slide Number 4Slide Number 5Slide Number 6Slide Number 7Slide Number 8Slide Number 9Slide Number 10Slide Number 11Slide Number 12Slide Number 13Slide Number 14Slide Number 15Slide Number 16Slide Number 17Slide Number 18Slide Number 19Slide Number 20Slide Number 21Slide Number 22Slide Number 23Slide Number 24Slide Number 25Slide Number 26