meetup duchess 20160119 - leboncoin de la data

LEBONCOIN DE LA DATA Stéphanie Baltus – Responsable Data Engineering- @steph_baltusMeetup Duchess France @ TheFamily – 01/19/2016

2

■ About leboncoin

■ Data, data everywhere !

■ To infinity and beyond …

PLAN

Select Image Placeholder and choose Insert, Image.

ABOUT LEBONCOIN

4

LEBONCOIN... AND FRIENDS

6

■ A Schibsted Media Group company

■ Since 2006

■ 320+ people

■ Located in Paris, Montceau-Les-Mines, Reims

■ 2014 Revenue: 150+M€

IN A FEW WORDS

7

NOT JUST A WEBSITE

8

■ Classified ads :

■ Professional

■ Personal

■ Premium offer :

■ Highlight products

■ Ad import tools

■ Ad display

NOT JUST A CLASSIFIED ADS COMPANY


DATA, DATA EVERYWHERE

10

■ Building a team

■ Provide daily batch DWH■ Website traffic (sort of)■ Ad activity & validation■ Sales & Coin usage■ User information■ Support

■ Try near-real time processing

A BIT OF STORY

11

SO, WE DID SOME BI STUFF (2012-2015)

12

IT LOOKS LIKE THIS

13

■ A lot of uncovered scope

■ Incremental load only

■ Unablity to load historical data, stuck from 2013 to today

■ A business team unable to query the database

■ A lot of « no! » when asking for evolution

■ Vertical scalability only

■ No potential sharing policy with the product (website, app, CRM, …)

IT WORKS ! BUT …


TO INFINITY AND BEYOND!

15

■ Share data services with the website, apps

■ Build a unique source of truth

■ Provide raw data to our analysts

■ Provide real time data

■ Cover all the data scope of leboncoin

THE FUTURE

16

FUNCTIONAL ARCHITECTURE

17

DATA ARCHITECTURE : DUMBO STYLE

18

ONE STACK TO RULE THEM ALL

19

■ Centralized data cleaning / streamlining

■ Extended analytics apps

■ Ads and customers indexes

■ Import ad web service

■ Datalake indexing through bloomfilter

■ Anomaly detection

SOME IMPLEMENTATIONS

20

■ Goal : help the SysAdmin Team to catch bots crawling our website and apps to steal our ads or people’s phone numbers => Anomaly detection

■ How :

■ Use http logs (150Go per day)

■ Build KPIs and vectors

■ Apply a logistic regression to identify suspicious session

■ Next steps :

■ Test K-Means algorithm

CATCH’EM ALL !

21

■ Data unified view

■ Home built data extractor + Spark MDM jobs

■ Build a next generation BI app

■ Spark ETL+ Redshift

■ Share built information with other apps

■ Spark ETL+ ES + Kafka

DIVE INTO DATA SHARING

22

NOW IT LOOKS LIKE THIS

23

■ Being production ready

■ New app, new services

■ More machine learning oriented apps

■ Feeding the website

■ Recruiting

WHAT’S NEXT ?

QUESTIONS ?

meetup duchess 20160119 - leboncoin de la data

Data & Analytics