rebuilding web tracking infrastructure for scale

24
Rebuilding Web Tracking Infrastructure for Scale Stephen Oakley Principal Engineer Marketo

Upload: hadoop-summit

Post on 07-Jan-2017

211 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Rebuilding Web Tracking Infrastructure for Scale

Rebuilding Web Tracking Infrastructure for ScaleStephen OakleyPrincipal EngineerMarketo

Page 2: Rebuilding Web Tracking Infrastructure for Scale

What is Marketo?

Page 3: Rebuilding Web Tracking Infrastructure for Scale

Page 3Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023

What is Web Tracking at Marketo?• Ingest web page visits and clicks on customer’s website• Trigger campaigns in response to web activity• Trigger real-time personalization of web experience• Provide lead level analytics for known leads• Provide aggregate analytics for all lead activity• Typically known leads < 10 % of all traffic

Page 4: Rebuilding Web Tracking Infrastructure for Scale

Page 4Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023

Legacy Web Tracking Infrastructure

Page 5: Rebuilding Web Tracking Infrastructure for Scale

Page 5Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023

Legacy Web Tracking Infrastructure

Page 6: Rebuilding Web Tracking Infrastructure for Scale

Page 6Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023

Legacy Problems• Throughput limitations – 2 million activities per day• Processing delays can be on the order of hours

• Large customers cause web server brownouts• Web reporting does not scale• Fixed-sized clusters prohibit horizontal scaling• Brittle infrastructure prevents feature development

Page 7: Rebuilding Web Tracking Infrastructure for Scale

The Vision

Page 8: Rebuilding Web Tracking Infrastructure for Scale

Page 8Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023

Orion Initiative• Increase scale to support IoT for Marketers• Support billions of marketing activities each day• Trigger on activities in near real time (< 2 minute @ 99th %)

• Reduce operational costs• Improve multitenancy and QoS

Page 9: Rebuilding Web Tracking Infrastructure for Scale

Requirements

Page 10: Rebuilding Web Tracking Infrastructure for Scale

Page 10Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023

Business Requirements• 200 MM activities per customer per day• Near real-time web activity processing (SLA of < 1

minute lag)• Improve cost efficiency• Improve flexibility for feature enhancements

Page 11: Rebuilding Web Tracking Infrastructure for Scale

Page 11Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023

Technical Requirements• Multitenancy support with brownout protections• Infrastructure must scale horizontally• Decouple web processing from downstream processing• Anonymous leads should cost next to nothing to track

Page 12: Rebuilding Web Tracking Infrastructure for Scale

Architecture & Design

Page 13: Rebuilding Web Tracking Infrastructure for Scale

Page 13Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023

Page 14: Rebuilding Web Tracking Infrastructure for Scale

Page 14Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023

Page 15: Rebuilding Web Tracking Infrastructure for Scale

Page 15Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023

Why Hbase + Phoenix?• Horizontally scalable• Leverages the Hadoop cluster for storage and scaling• Provides secondary indices for query patterns through

Phoenix• Natural integration with JDBC and Spark JDBC RDDs

Page 16: Rebuilding Web Tracking Infrastructure for Scale

Page 16Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023

Page 17: Rebuilding Web Tracking Infrastructure for Scale

Page 17Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023

Page 18: Rebuilding Web Tracking Infrastructure for Scale

Page 18Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023

Why Spark Streaming?• Micro-batching provides sink-side efficiencies• This is especially important with MySQL touchpoints

• Great integration with Kafka • No strict real-time processing requirements• Great community and industry adoption

Page 19: Rebuilding Web Tracking Infrastructure for Scale

Page 19Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023

Multitenancy• One topic per customer (sized by volume)• Traffic storms are isolated to a single customer

• Fairness/throttling is easy to control

• Spark Streaming job consumes from many topics• Allows us to turn a customer off under error conditions

• See “Elastic Streaming” by Neelesh Shastry – Spark Summit

Page 20: Rebuilding Web Tracking Infrastructure for Scale

Page 20Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023

Making Spark Streaming Performant• Coalesce small partitions for the same customer• Aggressive caching of metadata (mostly from MySQL)• Heavily leverage Scala future composition for parallelism• Persist RDDs that are used for multiple outputs• e.g. write to Kafka and Activity Service

Page 21: Rebuilding Web Tracking Infrastructure for Scale

Page 21Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023

Making Anonymous Traffic Cheap• High costs of web traffic in legacy system• MySQL storage for all traffic• Down streaming processing of all events (even anonymous)

• V2 only processes and stores known traffic in MySQL• Defer triggering for anonymous data until promotion

Page 22: Rebuilding Web Tracking Infrastructure for Scale

• Rolled out to our highest volume customers• Processing latencies < 30s (at 99.9th %)• Allowed key customers to scale from ~2MM/day to > 20

MM/day

Impact and Results

Page 23: Rebuilding Web Tracking Infrastructure for Scale

• Mitigations of straggler effects on processing delays• Adding sessionization for web reporting• Scaling Kafka topics as customers increase volume• Globally distributed ingestion for a single customer

Future Work

Page 24: Rebuilding Web Tracking Infrastructure for Scale

We’re Hiring! Http://Marketo.Jobs

Q & A