nrt event processing with snowplow
TRANSCRIPT
NRT Event Processing
Outline• Introduction
• Our Snowplow Setup
• Example NRT Use Cases
• Radio Campaign
• Telephony System
Simply Business
• Largest UK business insurance provider
• More than 400.000 policy holders
• Using BML, tech and data to disrupt the
business insurance market
Data ’n’ Analytics
• 5 Data Engineers
• 3 Business Intelligence Developers
• 3 Data Analysts
• 1 Data Scientist
• 1 Director of Data Science
• And hiring! :-)
Our Snowplow Setup
Snowplow Setup
Trackers Collector Enrichment Modeling Storage
• Trackers, collectors and storage are 100% upstream Snowplow
• Enrichment:
• Spark apps that use scala-common-enrich as a library
• We add our own enrichments after the default ones
• We perform NRT identity stitching and sessionization
• Modeling: mix of Spark and SQL jobs
• Storage: Spark apps that use scala-hadoop-shred as a library
Why ?
• We wanted a near real-time pipeline, but KCL was too rigid:
• Provision, set up and monitor the machines
• Configuration is difficult for complex DAGs
• In contrast, Spark:
• Once set up, the cluster is a PaaS
• Allows streaming, batch, ML and graph workloads
• Allows analysts and data scientists to use Python
Radio Campaign
The Radio Campaign
• We’re running a radio campaign in Birmingham, Manchester and London
• People that get a quote starting from our radio landing pages get £25 discount
The Banner
• The questionnaire to get quotes can be quite long to complete
• We wanted to reassure our customers that they would get the
discount
• We wanted to display a banner at the top through all the pages of
the questionnaire
The Banner
Our InfrastructureSpark Stream
NRT EnrichmentScala Stream
Collector Kinesis
MongoDB
Visitor APIQuoting AppHTTP
On average, it takes 2.5s for an event to be available in the Visitor API
Benefits of NRT Snowplow
• Our quoting app does not need to know about marketing, user
landing pages, etc.
• Our Mongo table with active sessions’ events becomes a view of our
event log
• Can be reused for many other use cases: analytics on read!
Telephony System
Telephony System
• We have a call center in Northampton with around 200 consultants
• We used an off-the-shelf telephony system
• It worked well for a long time, but:
• Was not very well integrated with our systems
• Quite rigid, we couldn’t adapt it to all our needs
• We had daily reports and they contained aggregated data
Telephony System
• We decided to replace it with a home grown, Twilio-based solution
• Components:
• Contact Strategy Manager
• Voice Channel Manager
• Communication is event-based
• We transform those events into Snowplow’s unstructured
• Spark Streaming app to insert the events into Redshift every 2min
The InfrastructureSpark Stream
NRT EnrichmentScala Stream
Collector Kinesis Kinesis
Redshift
Spark StreamShredder
LookerContact Strategy Manager
Voice Channel Manager
EventTranslator
Events
Example call when viewed as sequence of events:
Benefits of NRT Snowplow
• Event Sourcing is great for reporting and analytics: ensures that
data quality remains high
• Team managers now have a NRT view of what teams are doing
• You can aggregate and drill down on the data as appropriate
• Leveraging our data platform: Snowplow pipeline, Redshift & Looker
• Leveraging our existing skills: everyone knows how to use Looker
Sum Up
The InfrastructureSpark Stream
NRT EnrichmentScala Stream
Collector Kinesis
MongoDB
Kinesis
Redshift
Spark StreamShredder
Visitor API LookerApplications
NRT Benefits
• We can dynamically alter the website while the user is still using it
• We can provide insights on live processes
• Multiple uses to improve conversion:
• Instant inclusion/exclusion from remarketing lists
• Abandoned cart emails/calls
• Social proofing (3 more people are also watching…)
• …
Questions?