architecture for real-time and batch big data analytics
TRANSCRIPT
The World's leading Mobile Measurement Platform
Founded in 2011 by Reshef Mann and Oren Kaniel
Just completed Round B funding - total of $28M
Processing 3.5B daily events (it was 1.9B just 2 months ago and 250M at the start of 2014!)
13 people in the development team (we were just 6 people 12 months ago!)
AppsFlyer Who?!
The Way W e W ere
• We had no real concept of “Big Data”- we were just occupied with making the system work
• Even though we were ignorant of the future, we tried to adhere to a few abstractions:
- Small isolated services
- Central concept of message delivery via
- Different tech for different tasks
A message-bus
• CouchDB that served raw reports via views couldn't keep up with view generation
• Python processes that read from the message bus (via pub sub) couldn't keep up with the amount of data
• The split between aggregated and raw data was good, but caused discrepancies because each service failed at a different time
• If the message bus failed (Redis), all other services were also in a fail state – single point of failure
First Creaks In The System
Part 1 of The Solution
• Migrate raw reports from CouchDB to Google's Bigquery (easiest solution at the time)
• Rewrite some of the Python services in a new language that:
– Deals better with strings and allocation of memory
– Can help us scale out
– Has a great ecosystem
– Functional
Why Clojure?
Sequence based processing capabilities really fit in the visualized data flow of AppsFlyer (processing the Event Stream)
Enforces use of FP paradigm more strictly than Scala
Repl based development
Easy and common Java interop
JVM!
• Python's proprietary serialization
• Python custom data structure as the base message in the system
The Hurdles(or how 2 stupid mistakes can bite
you in the ass 2 years down the road)
Small isolated services
Each service has a single business responsibility
Each service encapsulates its own data (if it has any) and he exposes it over a well known interface
Data objects are always POCO/POJO (simple data structures represented in JSON or EDN)
Preference for queues and buffers that pass isolated data for total async processing
How We Model
How W e Test
No QA team
Each new service can read from the event stream to its heart content - regular Kafka consumers behavior
Each new service handles real life traffic and real life load because it's connected to the event stream
Test DB if needed is easy to spin up on the cloud
Once deemed ready, just throw the switch to on
Service discovery via Consul
How W e Orchestrate
MesosEach service is a single uberjar in a single docker container
DBs get their own dedicated machines
Preference for ring based architecture for DBs
Marathon
Jenkins build server
Deployment via the in-house Santa tool
Consul health checks
Statsd for the JVM and application metrics
Sensu
End-to-end flows
Deployment and
Monitoring