sas global forum 2014...–40+ million records by 600+ traits; anonymized data (non pii) –with hpa...

39
HOW ROGERS MEDIA TURNS BIG DATA INTO REAL- TIME CUSTOMER INSIGHTS SAS GLOBAL FORUM 2014 03.24.2014

Upload: others

Post on 25-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

HOW ROGERS MEDIA TURNS BIG DATA

INTO REAL- TIME CUSTOMER INSIGHTS

SAS GLOBAL FORUM 2014

03.24.2014

Page 2: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

AGENDA…

–Who Is Rogers Media?

–What Were the Business Initiatives and Challenges

–What Were the Technical Challenges?

–Big Data Journey

–Results

–Lessons Learned

–Q&A

Page 3: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

ABOUT OUR TEAM… AUDIENCE SOLUTIONS

Chris Dingle (@cdingle)

–Sr. Dir. Audience Solutions

–Teams

–We’re in Canada, I play ice hockey Sunday evenings

Page 4: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

ABOUT ROGERS MEDIA

–Great Brands

–Media advertising revenue a priority

–Audience Strategy the future

2013 CONSOLIDATED REVENUE BY SEGMENT (%)

Page 5: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

AUDIENCE RETROSPECTIVE

"Half the money I spend on advertising is

wasted; the trouble is I don't know which half.“

– John Wanamaker

A popular saying illustrating how difficult it is to

reach potential customers using traditional

advertising, attributed to legendary Philadelphia

retailer John Wanamaker in the 1870s.

Page 6: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

AUDIENCE PROGRAMATIC ADVERTISING

Link ->

File ->

Page 7: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

ROGERS MEDIA AUDIENCE PLATFORM

UNIQUE &

VALIDATED

15MM

CANADIANS

5,000+

ATTRIBUTES

A blend of online

& offline data

Subscription,

ecommerce, loyalty

programs, etc.

BIG FLEXIBLE

CUSTOM

SOLUTIONS

CAMPAIGN

OPTIMIZATION

Page 8: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

TYING IT TOGETHER: AUDIENCES

INPUTS

Predicted segments (prospects, customer profiles)

CLIENT SEGMENTS

PROSPECTS

BEHAVIOURAL

Will work with Client, Marketing to obtain the necessary inputs to start

Filtering criteria as defined by Client will be applied to create modelled segments

Utilize internal & client data sources to best determine lifestyle values and personality characteristics

Utilize internal data sources such as online click stream to provide additional behavioural insight

OPTIMIZATIONApply sophisticated predictive analytical techniques to further ‘optimize’ Client customer base, thus increasing response while reducing the COA

AUDIENCE TYING IT TOGETHER

Page 9: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

PRIMARY AUDIENCE

Client Life Prospects

Research Study

Segment

Prospects

Email Retargeting

Audiences

Premium Inventory

1. Begin by segmenting client’s primary audience that is recommended based on the target group provided.

3. This includes validated demographic data, and in house market research

2. Through the Rogers Trade Desk, execute campaign programmatically; focussing on the intersection between the primary audience and overlaid segments of interest to profile

4. Finally, this online and offline data will be hybridized with modelled segments of conversion

propensity programs, derived from actual conversion data on top Rogers loyalty products.

DYNAMIC Standard Display, Facebook Exchange (FBX), Pre-roll.

AUDIENCE DELIVERY

Page 10: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

AUDIENCE BUSINESS CHALLENGES

1. UNDERSTAND AUDIENCEHaving the largest volume of data sets, audience

segments/profiles in Canada while leading the Canadian marketplace in privacy and governance

3. ENGAGE AUDIENCEDriving engagement across platforms and formats

2. FIND AUDIENCEBeing leaders in identifying and targeting audiences

across channels, platforms and devices

4. MEASURE AUDIENCEExceeding client expectations with transparent reporting, the most accurate attribution models

Page 11: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

THE JOURNEY TO BIG DATA

- From the beginning:

- Selection process and criteria on choosing the right partner(s)

- The decision

- SAS and Hortonworks’ role in this journey

- Current set-up

Page 12: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

THE JOURNEY TO BIG DATA - ORGANIZATION

- Approach: Split Big Data opportunities into two initiative teams

- Telecommunication data storage rationalization -> Longer time frame

- Audience Platform: Lean / Fast time frame -> This Presentation

- Share learnings across teams

Page 13: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

AUDIENCE PLATFORM - OPEN SOURCE DATA

― From initial meeting to installed cluster: 5 days; Wow! [Q2 2013]

― Company wide standard determined [Q1 2014]

― Hortonworks as a leader:

― Partnerships: Teradata, Microsoft, Talend, etc.

― Management team: Open Source track record

― Highest number of Apache code committers

― What about SAS? (accelerating the roadmap)

― The Forrester Wave: Q1 2014

Page 14: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

AUDIENCE PLATFORM – THE DATA LAKE

- Land massive click stream log files:

- 100+ M records / day;

- 30 million unique IDs / month

- Cost effective / competitive

- Lean methodology

- Landed data always available if requirements should change

- Data definition on read

- Interestingly: have yet to install purchased Columnar database

- Adoption of the Data Lake framework

Page 15: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

AUDIENCE PLATFORM – THE DATA LAKE

Page 16: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

AUDIENCE REFERENCE ARCHITECTURE

Modeled after reference architectures

including: Cisco Flexpod, etc.

Consists of the following:

• Fabric & Switch Interconnects

• Compute & Memory

• NetApp E Series Storage

Page 17: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

AUDIENCE PLATFORM – ANALYTICS VENDOR

Lean methodology: Analysts UI

- SAS R&D Lead: anything is possible

- Looked into R derivatives; (MPP closed source vendors); and

Apache Mahout

- Benchmark with machine learning: Vowpal Wabbit: SGD, LDA

- For enterprise wide organization adoption, settled on SAS EM

- Wanted lead clients to be able to interact with the analytics

- Democratization of analytics innovation important

Page 18: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

TECHNICAL CHALLENGES: ANALYTICS

SAS + HadoopSAS HPA:

- Data stays in place

- Modeling and Scoring takes place in Hadoop

Jobs

Setup Modeling, Validation and Scoring Jobs

Model Development Validation, Scoring

SAS Desktop / Server

Page 19: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

AUDIENCE PLATFORM – ANALYTICS VENDOR

Key Benefits:

– Eliminate Data movement (ETL) between Data Store (Hadoop) and Analyst SAS

Server/Desktop for dynamic discovery

– Explore and Visualize the impact on many variables and traits on Key

Performance metrics

– Use 100% of the data for Analysis and Visualization instead of smaller random

samples (over sampling)

– Streamlined report distribution

Page 20: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

TECHNICAL CHALLENGES:

SAS + HORTONWORKS HADOOP

HORTONWORKS DATA PLATFORM

SAS® LASR™ Analytic Server &

SAS® High-PerformanceAnalytics

MPI Based

Base SAS & SAS/ACCESS® Interface to Hadoop™ In-Memory Data Access

SAS Metadata

SAS® Display Manager SAS® Visual AnalyticsSAS® Enterprise

Miner™SAS® Data

Integration

SAS®

EnterpriseGuide®

HIVE &HCATALOG

PIGHBASE

HDFS

SQOOP

FLUME

NFS

LOAD & EXTRACT

WebHDFS

MAPREDUCE

REDUCE

AMBARI

OPERATE

OOZIE

Next-GenerationSAS

®User

SAS®

User

Page 21: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

RESULTS #1: LOOKALIKE MODELING

- Use case 1: ‘Lookalike Model’:

–40+ million records by 600+ traits; anonymized data (non PII)

–With HPA no need for oversampled data!

- Data preparation for the data for modeling example:

- Data is pulled from the Primary Hadoop Cluster and into memory on the

SAS HPA Cluster. Data is pulled through the embedded process (EP)

- Several steps of summarization are applied leveraging SAS high-

performance procedures in the SAS HPA cluster. The SAS Compute Server

in the SAS Client Application cluster does additional processing to the

data. After summary and transposition steps -> Analytical Base Table

SAS HDAT from which Analytics are performed.

Page 22: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

RESULTS #1: COMPUTE/MEMORY RESOURCEAudio

Page 23: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

RESULTS #1: TRANSPOSED SAMPLE VIEW

Page 24: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

RESULTS #1: TRAITS CORRELATION VIEW

Page 25: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

RESULTS #1: TRAITS DISTRIB. BY TARGET

Page 26: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

RESULTS #1: EM DIAGRAM WORKFLOW

Page 27: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

RESULTS #1: MODEL RESULTS COMPARISON

Page 28: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

RESULTS #1: TRAITS SELECTED IN MODEL

Trait Name

442515 O&O - Search - Referrer Domain - live.com

455544 Email Opt-in

455545 Email Opt-out

462895 rogers.com account info - Signed In

463110 rogers.com - page - myrogers

463286 O&O - home:wireless:phones

463289 O&O - home:wireless:payasyougo

463290 O&O - home:wireless:travel

463294 O&O - home:tv:packagespricing

465357 LR-Gender-Female

465358 LR-Gender-Male

465361 LR-Email Opt-in

465362 LR-Email Opt-out

472910 L'actualite subscription (offline and email)

472926 Male

472930 Non business address

472931 Business

496718 LR - MAGAZINE_PAPER - 1

520074 Rogers Rewards Site Visitor

520123 Fido.ca EN Visitors

550541 TSC Special Offers NL Subscriber

613366 Fido.ca FR Visitors

Page 29: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

RESULTS #2: CAMPAIGN OPTIMIZATION

- Use case 2: ‘Campaign Optimization’:

- Multivariate – interactions

- Goal: without over-sampling

- Observations: 40 Million+

- Traits, parameters: 100s

- Tradeoff: oversample

User Habits Segmentation

(visitor, conversion, regency, preference etc.)

Behavior Typology

Segmentation

(Sport Fun, Video Shopper etc.)

Value Segmentation

(premium click, low value click etc.)

Intersection between segments

Category/Sub-category

Segmentation

Life stage segmentation

(age, gender, income, adults married 35-55 years with kids etc.)

Page 30: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

RESULTS #2: CAMPAIGN OPTIMIZATION

Page 31: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

RESULTS #2: RESPONSE OPTIMIZATION

Page 32: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

RESULTS #2: CAMPAIGN OPTIMIZATION

Page 33: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

RESULTS #2: CAMPAIGN OPTIMIZATION ROC

Page 34: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

LESSONS LEARNED ALONG THE WAY

• Before You Start….

– Collaboration: Tom Sawyer…

– Organizational buy in is important

– Establish data governance and consumer privacy compliance

• When You Start….

– Create the team

– Need people to learn: Hive and Pig

– Passionate and Innovative

Page 35: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

LESSONS LEARNED ALONG THE WAY

― Lessons learned: SAS HPA:

– HPA environment needs to have sufficient disk allocated (not 1:1 disk/memory as originally expected) to allow flexibility for analysts to persist datasets and optimize model building process

– Mid Tier (App/meta/mid-tier) still plays an important role as not all procsare available on HPA. Short wish list: proc Transpose, interactive decision trees

– For current architectures, separation of the SAS Hadoop cluster and the primary Hadoop Data Lake cluster is a tactical approach to optimized infrastructure (and license). Looking for future versions of YARN to see the two clusters merge into a heterogeneous set of data applications and hardware platforms to form mature data lake architectures.

Page 36: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

LESSONS LEARNED ALONG THE WAY

― Watch the timing and memory parameters:

― set mapreduce.task.timeout to 900000 or higher for SAS 9.4 M‐1 (short term solution)

― identified a 1536mb limit; keep configurations consistent across clusters

― There are a multitude of data management tasks that need to be considered when embarking on these kind of analytical projects. HDP + SAS makes it faster by shortening the time from ingest to analytical model considerably. Over time, data management practices need to be put into place.

Page 37: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

LESSONS LEARNED ALONG THE WAY - TEAM

Page 37

― Big Data Analytic talent can be acquired or developed

― When looking for talent keep in mind ‘Boomerang’ Employees

― Recruit by offering interesting and challenging work –Big Data SAS

― Lean: already know the organization.

― Audience team has hired three!

― hbr.org, boomerang-employees, Feb 2014

― Ref: boomerangs.org

Page 38: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

THANK YOU

Page 39: SAS GLOBAL FORUM 2014...–40+ million records by 600+ traits; anonymized data (non PII) –With HPA no need for oversampled data! - Data preparation for the data for modeling example:

AUDIENCE PRIVACY COMPLIANCE

• Rogers Senior Regulatory, Privacy Counsel and Rogers legal team, partnered to

create a Privacy Compliance framework with respect to online behavioral

advertising

– The framework is driven by Canadian federal regulations and industry best

practices

• Ad Choices Canada member - the industry framework

– Similar to the best practices in the U.S., framework gives the customer complete

flexibility to opt-out

• All data exposed to advertising platforms is anonymous and aggregated into

segments to ensure that no personally identifiable information (PII) is exposed