big data in the advertising industry (by michael dewhirst) - big data tech hangout - 2013.10.26

32
Big Data in the advertising industry Michael Dewhirst Captify CTO; StrikeAd, DevZeroG co-founder freediver, rock climber,

Upload: innovecs

Post on 15-Jan-2015

480 views

Category:

Education


1 download

DESCRIPTION

On Saturday, 26 of October, the second external meeting of Tech Hangout Community took place in Creative Space 12, the cultural and educational center based in Kiev! The event was held under the motto «Discover the value of Big Data!» * Tech Hangout -- an event, organized by the developers for the developers for knowledge and experience sharing. The concept of the event proposes a 30-minute report on the topic previously defined, and the discussion of the same duration in a roundtable session format. This initiative has proved to be so popular and high-demand that Tech Hangout own logo, blog and group on Facebook with the opportunity to discuss information heard have been created in a short period of time. Join to discuss - https://www.facebook.com/groups/techhangout/ Read us - http://hangout.innovecs.com/

TRANSCRIPT

Page 1: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Big Data in the

advertising industryMichael Dewhirst

Captify CTO; StrikeAd, DevZeroG co-founder

freediver, rock climber, photographer

Page 2: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Who am I?Born Moscow, Russia

UK from 1991

Working in Kiev (from London) since 1999

In IT/Software (professionaly) since 1994

Ex Java, HTML/JS, ABAP/SAP, .NET (shhh..), Notes, etc developer

Working with Big Data since 2010

Freediving and rockclimbing when not working

Page 3: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

CompaniesStrikeAd (2010-2013: CTO, Co-founder)

Mobile advertising media DSP / trading platform

Processing 10’s of BN requests/month

Several “Big Data” solutions in place

Launched in 2010 (co founded)

Captify (2013-now: CTO)

Search re-targeting company

Processing 10’s of BN requests/month

Complex “dual” traffic and data workflow

Launched R&D dpt 2 months ago

Page 4: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Why is Big Data so key?

Pretty much everything in a business revolves around data and understanding it and there is exponentially more data every day to understand

Page 5: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

What is Big Data

What is big data and what solutions can be classed as such?

Page 6: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

What is Big Data

“Internet scale” / Billions of transactions a month

2000-5000+ QPS (queries per second)

Page 7: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

What is Big Data

Processing time of under a second per transaction

Usually sub- 100ms

Page 8: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

What is Big Data

Ability to aggregate, report and analyse processed data

in near real time or real-time

Page 9: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

What data?Ad slots

Impressions

Clicks

Actions/conversions

Tracking pixels

Data feeds / databases

User ID

IP address

GPS lat long

Site category

Site URL

Age

Gender

Income

Connection type (mobile / wifi)

etc

Page 10: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

The Challenge

Page 11: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

The Challenge

A lot of volume

which needs retrospective accessquickly

(s)

Page 12: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Architecture, Design,

Solutions

Page 13: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Typical architecture

Modules/components:

1. Load Balancing

2. Actual processing

distributed identical workers

3. Logging

4. ETL (Extract Transform Load)

Processing logs, summarising/aggregating by keys

5. Aggregated data

6. “Big DataBase” (sometimes x2)

7. Machine learning

Page 14: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Big Data specific featuresLoad balancing

By geo - routing requests to nearest data centre

By load - usually round robin evenly distributing traffic between available nodes

DNS or software based (or both)

Page 15: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Big Data specific features

Storage RW/RO

In-mem only for real time data (sub 100ms access)

On disk for near-line, non-”realtime” access

Page 16: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Big Data specific features

Storage - in-mem (fast) - Sharding

Splitting data across several nodes (e.g. “A-C” - node1; “D-F” - node2, etc) - whole DB does not fit in one server memory

Hashing request data to determine storage node

2 tier architecture:

1) Load balancing tier evenly distributing traffic between available nodes - each LB is identical

2) Data storage tier, only processing relevant requests, each node only stores it’s chunk/shard of entire “spread out” DB

Page 17: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Sharding architecture

Page 18: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Dynamic scalingCloud based hosting charges are usually time based

Local continental data centres are needed

Traffic usually fluctuates significantly during the day, week, month and year

Cloud based hosting allows quick server/instance commissioning / decommissioning

Instances can be added as traffic trends grow and removed as they drop to save cost

Page 19: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Other areasAutomatic node updating (there can be 100’s to manage)

Monitoring and alerting (load, space, errors, etc)

Burn in - testing new code on a small cluster before upgrading whole network

Good security - firewalls, local user/file access, etc

Avoid having single points of failure

Old log near-line storage (e.g. Amazon Glacier)

Page 20: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Architecture, design, solutions

Any other “modules”?

Page 21: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Machine learning

Page 22: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

What is machine learning?

Automated, algorithmic statistical data analysis and pattern detection

Page 23: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

What?!

Page 24: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Used in advertising?

To help find repeatable actions with lowered risk and high expected outcome certainty

Page 25: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Meaning...Finding links between ad properties to buy more clicks or actions, e.g.

ad shown on site a, during lunch time, ad size 320x600, user from London, etc - CPC likelihood of 10%

user with iPhone, in Central Kiev, having been to dance club sites - 30% likelyhood of conversion to taxi advertising

Page 26: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Vendors and solutions

Page 27: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Vendors and solutionsApache Hadoop

Nginx

Erlang, OTP, etc

Aerospike

MongoDB

Amazon Redshift

Google Big Query

Dynamo

PostgreSQL

Memcache

Xtremedata

Page 28: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Vendors and solutionsDynDns

Nustar DNS

Nustar Quova Geo DB

Amazon Route53

Amazon Load Balancing

Page 29: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Real world examples

• Companies who have big data at their core

Google AdX / Double click

Online and mobile Advertising Exchange

Ad serving

Criteo

Page 30: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Conclusions

A complex, specialised industry and software development sub-category

Technically challenging by an order of magnitude

NOT only for “special” people - anybody can get in - I did

Genuinely interesting to work in

Page 31: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

Questions?

Page 32: Big Data in the advertising industry (by Michael Dewhirst) - Big Data Tech Hangout - 2013.10.26

The end

Thank you!