flurry analytic backend - processing terabytes of data in real-time

www.flurry.com

November 14, 2013

Anthony Watkins, Senior Director of Developer Relations

Processing Terabytes of Data in Real-

@flurrymobile

@antwatkins

www.flurry.com

Flurry is a leading mobile advertising and analytics provider

Audience

AppCircle Applications: 10,000+

Devices/month: 300M

Conversions/month: 120M

AppSpot Applications: 2,500+

Devices/month: 250M

Impressions/month: 7.5B

Analytics Applications: 400,000

Devices/month: 1.2B

Data points/month: 1.9T

• Why Flurry Switched from a MapReduce Framework to

pipeline processing

• How Flurry uses Kafka in data processing

• Tuning of Kafka to work in Flurry’s environment

• Flurry Monitoring and error handling of streams

Topics

The Path to Real-Time Processing

www.flurry.com 4

The Why

www.flurry.com 5

Past Processing Model

www.flurry.com 6

Device Reports

NoSQL DataStore

Collectors

MapReduce

(jobs)

External

Action

Flurry Analytics MapReduce Architecture

www.flurry.com 7

Agent Portal Data Log Processor

Developer

Portal Metrics Computer

Hadoop/Hbase

Binary Encoded

Raw Data

Log Archive

Metrics Table

(Cube)

Normalized

Data Storage

User Profile

Hadoop Map/Reduce

Web Layer Metrics Processing

Data Collection and Processing in MR

www.flurry.com 8

MapReduce

(jobs)

Data Collection and Processing in MR

www.flurry.com 9

Device Reports

MapReduce

(jobs)

Job Time

Startup Time

Flurry Kafka

The Move to Kafka

www.flurry.com 10

About Kafka

Origin

www.flurry.com 11

November 2010 June 2011 November 2012

About Kafka

www.flurry.com 12

Producer Producer Producer

Kakfa Broker

Consumer Consumer Consumer

About Kafka

www.flurry.com 13

Kafka Broker

* Partition image courtesy of http://kafka.apache.org/images/log_anatomy.png

About Kafka

www.flurry.com 14

Producer 1 Producer N Producer 2

Kafka Cluster

Broker 1

Broker 2

Consumer Group

C1 C2 C3

Why Kafka for Flurry

www.flurry.com 15

Device Reports

MapReduce

(jobs) Kafka

Startup

Introducing the Data Log Consumer (DLC)

www.flurry.com 16

Agent Portal Data Log Consumer

Developer

Portal Metrics Computer

Hadoop/Hbase

Binary Encoded

Metrics Table

(Cube)

Normalized

Data Storage

User Profile

Hadoop Map/Reduce

Web Layer Metrics Processing

• Zookeeper timeouts

• Completely async service

• Default fsync interval

• Commit threshold from local environments

Tuning Kafka for Flurry

Challenges

www.flurry.com 17

How Flurry Uses Kafka

Infrastructure and Setup

www.flurry.com 18

Consumer Group

C1 C2 C… C325

Kafka Cluster

B1 B2 B3

Broker

P1 P2 P… P400

Flurry Monitoring / Error Handling

Monitoring

www.flurry.com 19

• Alerts

• Consumer Failure

• Broker Failure

Error Handling

Next Steps: 0.8

www.flurry.com 20

Data Log Consumer

Kafka Cluster

Broker 1

Broker 2

P1’ P3’ P0’ P2’

Next Steps: Extended Pipeline

www.flurry.com 21

Input Data

NoSQL DataStore

Real-Time Batch

Collectors

Consumer/

Producer

Systems

MapReduce

(jobs)

External

Action External

Action

Next Steps: Topics and Consumer Groups

Infrastructure and Setup

www.flurry.com 22

Consumer Group 2

C1’ C2’ C… CN’

Topic 1

Consumer Group 1

C1 C2 C… CN

Consumer Group N

C1’’ C2’’ C… CN’’

Topic 2

www.flurry.com

November 14, 2013

anthony@flurry.com

blog.flurry.com

@flurrymobile

@antwatkins

Thank you

flurry analytic backend - processing terabytes of data in real-time

Data & Analytics

flurry analytics

flurry analytic backend - processing terabytes of data in...

flurry in coconut oil market

flurry variety appconference_29nov12

flurry 經驗分享

flurry analytics - обзор возможностей

flurry 를 사용한...

hbase at flurry

managing terabytes on aws

rabies flurry pa tho genesis

jewel embellished milan flurry jacket

flurry - inside the mobile revolution 2013

ignition mobile- flurry

managing terabytes: when postgres gets big

flurry analytics - mobile monetization - asw berlin

calculating ltv using flurry

Презентация flurry на live mobile

terabytes de mapas

flurry simon khalaf_presentation_appnation2013ny

all-new flurry mobile analytics intro