dale wong - spark graphx demo

22
A Different Look at Ad Targeting A Swarm of Ads

Upload: dalewong108

Post on 18-Aug-2015

71 views

Category:

Data & Analytics


2 download

TRANSCRIPT

A Different Look at Ad Targeting

A Swarm of Ads

Nature-Inspired Algorithm for AdTech Model Exploration

• Pages are linked by similarity,forming a network of branches

• Ads are like butterflies,drifting towards attractors

• The flight of the butterflies is a function of local attraction to pages, plus some randomness to escape local minima

• System converges to ads hovering around relevant pages

Nature-Inspired Algorithm for AdTech Model Exploration

• Pages are linked by similarity,forming a network of branches

• Ads are like butterflies,drifting towards attractors

• The flight of the butterflies is a function of local attraction to pages, plus some randomness to escape local minima

• System converges to ads hovering around relevant pages

Nature-Inspired Algorithm for AdTech Model Exploration

• Pages are linked by similarity,forming a network of branches

• Ads are like butterflies,drifting towards attractors

• The flight of the butterflies is a function of local attraction to pages, plus some randomness to escape local minima

• System converges to ads hovering around relevant pages

Nature-Inspired Algorithm for AdTech Model Exploration

• Pages are linked by similarity,forming a network of branches

• Ads are like butterflies,drifting towards attractors

• The flight of the butterflies is a function of local attraction to pages, plus some randomness to escape local minima

• System converges to ads hovering around relevant pages

Nature-Inspired Algorithm for AdTech Model Exploration

• Pages are linked by similarity,forming a network of branches

• Ads are like butterflies,drifting towards attractors

• The flight of the butterflies is a function of local attraction to pages, plus some randomness to escape local minima

• System converges to ads hovering around relevant pages

Similarity Graph

vertex = pageedge = similarity

val allPairs = pages.cartesian(pages).filter{ case (a, b) => a._1 < b._1 } val similarPairs = allPairs.filter{ case (page1, page2) => page1._2.intersect(page2._2).length >= 1 }

Data Set•Kaggle 2012 Challenge: Click-Thru Rate Prediction •Actual data provided by a Chinese search company

•CSV files •26M search queries

•each query has its list of words •e.g. “data scientist”

• 4M ads •each ad has its list of words •e.g. “Insight Data Engineering Program”

Data Set•Kaggle 2012 Challenge: Click-Thru Rate Prediction •Actual data provided by a Chinese search company

•CSV files •26M search queries

•each query has its list of words •e.g. “data scientist”

• 4M ads •each ad has its list of words •e.g. “Insight Data Engineering Program”

Data Set•Kaggle 2012 Challenge: Click-Thru Rate Prediction •Actual data provided by a Chinese search company

•CSV files •26M search queries

•each query has its list of words •e.g. “data scientist”

• 4M ads •each ad has its list of words •e.g. “Insight Data Engineering Program”

Ads Converge on Similar Pages Over Time

Ads Converge on Similar Pages Over Time

Ads Converge on Similar Pages Over Time

Butterfly Simulation is a Good Fit for Spark GraphX

• Many parallel computations of localized operations

• ad migration

• attraction propagation

• select ad for page request

Spark GraphX Google Pregel API

MapReduce for each vertex:

1.Send Messages

• send msgs to neighbors

2. Merge Messages

• merge msgs to same vertex

3. Vertex Program

• process incoming msgs

Vertex Program

Send Message

MergeMessages

Spark GraphX Google Pregel API

MapReduce for each vertex:

1.Send Messages

• send msgs to neighbors

2. Merge Messages

• merge msgs to same vertex

3. Vertex Program

• process incoming msgs

Vertex Program

Send Message

MergeMessages

Spark GraphX Google Pregel API

MapReduce for each vertex:

1.Send Messages

• send msgs to neighbors

2. Merge Messages

• merge msgs to same vertex

3. Vertex Program

• process incoming msgs

Vertex Program

Send Message

MergeMessages

Spark GraphX Google Pregel API

MapReduce for each vertex:

1.Send Messages (for each edge)

• send msgs to neighbors

2. Merge Messages

• merge msgs to same vertex

3. Vertex Program

• process incoming msgs

Vertex Program

Send Message

MergeMessages

Need to Adapt Programming Model

• Adapt Vertex-centric algorithmto Edge-centric API

• Replicate each vertex’s data onto its neighbors

• Replication implemented as an initialization phase Pregel cycle

• Localizes vertex calculations

• With GraphX cluster, network bandwidth is more of a concern than storage

Need to Adapt Programming Model

• Adapt Vertex-centric algorithmto Edge-centric API

• Replicate each vertex’s data onto its neighbors

• Replication implemented as an initialization phase Pregel cycle

• Localizes vertex calculations

• With GraphX cluster, network bandwidth is more of a concern than storage

Need to Adapt Programming Model

• Adapt Vertex-centric algorithmto Edge-centric API

• Replicate each vertex’s data onto its neighbors

• Replication implemented as an initialization phase Pregel cycle

• Localizes vertex calculations

• With Spark GraphX cluster, network bandwidth is more of a concern than storage

About Dale• Bachelors in Computer Science, UC Berkeley

• Algorithm development for semiconductor design

• Algorithm development for genomic analysis

• Co-founder of three startups

• 18 US patents granted

• I hate vacationing in nature