sas global forum 2014...–40+ million records by 600+ traits; anonymized data (non pii) –with hpa...
TRANSCRIPT
HOW ROGERS MEDIA TURNS BIG DATA
INTO REAL- TIME CUSTOMER INSIGHTS
SAS GLOBAL FORUM 2014
03.24.2014
AGENDA…
–Who Is Rogers Media?
–What Were the Business Initiatives and Challenges
–What Were the Technical Challenges?
–Big Data Journey
–Results
–Lessons Learned
–Q&A
ABOUT OUR TEAM… AUDIENCE SOLUTIONS
Chris Dingle (@cdingle)
–Sr. Dir. Audience Solutions
–Teams
–We’re in Canada, I play ice hockey Sunday evenings
ABOUT ROGERS MEDIA
–Great Brands
–Media advertising revenue a priority
–Audience Strategy the future
2013 CONSOLIDATED REVENUE BY SEGMENT (%)
AUDIENCE RETROSPECTIVE
"Half the money I spend on advertising is
wasted; the trouble is I don't know which half.“
– John Wanamaker
A popular saying illustrating how difficult it is to
reach potential customers using traditional
advertising, attributed to legendary Philadelphia
retailer John Wanamaker in the 1870s.
ROGERS MEDIA AUDIENCE PLATFORM
UNIQUE &
VALIDATED
15MM
CANADIANS
5,000+
ATTRIBUTES
A blend of online
& offline data
Subscription,
ecommerce, loyalty
programs, etc.
BIG FLEXIBLE
CUSTOM
SOLUTIONS
CAMPAIGN
OPTIMIZATION
TYING IT TOGETHER: AUDIENCES
INPUTS
Predicted segments (prospects, customer profiles)
CLIENT SEGMENTS
PROSPECTS
BEHAVIOURAL
Will work with Client, Marketing to obtain the necessary inputs to start
Filtering criteria as defined by Client will be applied to create modelled segments
Utilize internal & client data sources to best determine lifestyle values and personality characteristics
Utilize internal data sources such as online click stream to provide additional behavioural insight
OPTIMIZATIONApply sophisticated predictive analytical techniques to further ‘optimize’ Client customer base, thus increasing response while reducing the COA
AUDIENCE TYING IT TOGETHER
PRIMARY AUDIENCE
Client Life Prospects
Research Study
Segment
Prospects
Email Retargeting
Audiences
Premium Inventory
1. Begin by segmenting client’s primary audience that is recommended based on the target group provided.
3. This includes validated demographic data, and in house market research
2. Through the Rogers Trade Desk, execute campaign programmatically; focussing on the intersection between the primary audience and overlaid segments of interest to profile
4. Finally, this online and offline data will be hybridized with modelled segments of conversion
propensity programs, derived from actual conversion data on top Rogers loyalty products.
DYNAMIC Standard Display, Facebook Exchange (FBX), Pre-roll.
AUDIENCE DELIVERY
AUDIENCE BUSINESS CHALLENGES
1. UNDERSTAND AUDIENCEHaving the largest volume of data sets, audience
segments/profiles in Canada while leading the Canadian marketplace in privacy and governance
3. ENGAGE AUDIENCEDriving engagement across platforms and formats
2. FIND AUDIENCEBeing leaders in identifying and targeting audiences
across channels, platforms and devices
4. MEASURE AUDIENCEExceeding client expectations with transparent reporting, the most accurate attribution models
THE JOURNEY TO BIG DATA
- From the beginning:
- Selection process and criteria on choosing the right partner(s)
- The decision
- SAS and Hortonworks’ role in this journey
- Current set-up
THE JOURNEY TO BIG DATA - ORGANIZATION
- Approach: Split Big Data opportunities into two initiative teams
- Telecommunication data storage rationalization -> Longer time frame
- Audience Platform: Lean / Fast time frame -> This Presentation
- Share learnings across teams
AUDIENCE PLATFORM - OPEN SOURCE DATA
― From initial meeting to installed cluster: 5 days; Wow! [Q2 2013]
― Company wide standard determined [Q1 2014]
― Hortonworks as a leader:
― Partnerships: Teradata, Microsoft, Talend, etc.
― Management team: Open Source track record
― Highest number of Apache code committers
― What about SAS? (accelerating the roadmap)
― The Forrester Wave: Q1 2014
AUDIENCE PLATFORM – THE DATA LAKE
- Land massive click stream log files:
- 100+ M records / day;
- 30 million unique IDs / month
- Cost effective / competitive
- Lean methodology
- Landed data always available if requirements should change
- Data definition on read
- Interestingly: have yet to install purchased Columnar database
- Adoption of the Data Lake framework
AUDIENCE PLATFORM – THE DATA LAKE
AUDIENCE REFERENCE ARCHITECTURE
Modeled after reference architectures
including: Cisco Flexpod, etc.
Consists of the following:
• Fabric & Switch Interconnects
• Compute & Memory
• NetApp E Series Storage
AUDIENCE PLATFORM – ANALYTICS VENDOR
Lean methodology: Analysts UI
- SAS R&D Lead: anything is possible
- Looked into R derivatives; (MPP closed source vendors); and
Apache Mahout
- Benchmark with machine learning: Vowpal Wabbit: SGD, LDA
- For enterprise wide organization adoption, settled on SAS EM
- Wanted lead clients to be able to interact with the analytics
- Democratization of analytics innovation important
TECHNICAL CHALLENGES: ANALYTICS
SAS + HadoopSAS HPA:
- Data stays in place
- Modeling and Scoring takes place in Hadoop
Jobs
Setup Modeling, Validation and Scoring Jobs
Model Development Validation, Scoring
SAS Desktop / Server
AUDIENCE PLATFORM – ANALYTICS VENDOR
Key Benefits:
– Eliminate Data movement (ETL) between Data Store (Hadoop) and Analyst SAS
Server/Desktop for dynamic discovery
– Explore and Visualize the impact on many variables and traits on Key
Performance metrics
– Use 100% of the data for Analysis and Visualization instead of smaller random
samples (over sampling)
– Streamlined report distribution
TECHNICAL CHALLENGES:
SAS + HORTONWORKS HADOOP
HORTONWORKS DATA PLATFORM
SAS® LASR™ Analytic Server &
SAS® High-PerformanceAnalytics
MPI Based
Base SAS & SAS/ACCESS® Interface to Hadoop™ In-Memory Data Access
SAS Metadata
SAS® Display Manager SAS® Visual AnalyticsSAS® Enterprise
Miner™SAS® Data
Integration
SAS®
EnterpriseGuide®
HIVE &HCATALOG
PIGHBASE
HDFS
SQOOP
FLUME
NFS
LOAD & EXTRACT
WebHDFS
MAPREDUCE
REDUCE
AMBARI
OPERATE
OOZIE
Next-GenerationSAS
®User
SAS®
User
RESULTS #1: LOOKALIKE MODELING
- Use case 1: ‘Lookalike Model’:
–40+ million records by 600+ traits; anonymized data (non PII)
–With HPA no need for oversampled data!
- Data preparation for the data for modeling example:
- Data is pulled from the Primary Hadoop Cluster and into memory on the
SAS HPA Cluster. Data is pulled through the embedded process (EP)
- Several steps of summarization are applied leveraging SAS high-
performance procedures in the SAS HPA cluster. The SAS Compute Server
in the SAS Client Application cluster does additional processing to the
data. After summary and transposition steps -> Analytical Base Table
SAS HDAT from which Analytics are performed.
RESULTS #1: COMPUTE/MEMORY RESOURCEAudio
RESULTS #1: TRANSPOSED SAMPLE VIEW
RESULTS #1: TRAITS CORRELATION VIEW
RESULTS #1: TRAITS DISTRIB. BY TARGET
RESULTS #1: EM DIAGRAM WORKFLOW
RESULTS #1: MODEL RESULTS COMPARISON
RESULTS #1: TRAITS SELECTED IN MODEL
Trait Name
442515 O&O - Search - Referrer Domain - live.com
455544 Email Opt-in
455545 Email Opt-out
462895 rogers.com account info - Signed In
463110 rogers.com - page - myrogers
463286 O&O - home:wireless:phones
463289 O&O - home:wireless:payasyougo
463290 O&O - home:wireless:travel
463294 O&O - home:tv:packagespricing
465357 LR-Gender-Female
465358 LR-Gender-Male
465361 LR-Email Opt-in
465362 LR-Email Opt-out
472910 L'actualite subscription (offline and email)
472926 Male
472930 Non business address
472931 Business
496718 LR - MAGAZINE_PAPER - 1
520074 Rogers Rewards Site Visitor
520123 Fido.ca EN Visitors
550541 TSC Special Offers NL Subscriber
613366 Fido.ca FR Visitors
RESULTS #2: CAMPAIGN OPTIMIZATION
- Use case 2: ‘Campaign Optimization’:
- Multivariate – interactions
- Goal: without over-sampling
- Observations: 40 Million+
- Traits, parameters: 100s
- Tradeoff: oversample
User Habits Segmentation
(visitor, conversion, regency, preference etc.)
Behavior Typology
Segmentation
(Sport Fun, Video Shopper etc.)
Value Segmentation
(premium click, low value click etc.)
Intersection between segments
Category/Sub-category
Segmentation
Life stage segmentation
(age, gender, income, adults married 35-55 years with kids etc.)
RESULTS #2: CAMPAIGN OPTIMIZATION
RESULTS #2: RESPONSE OPTIMIZATION
RESULTS #2: CAMPAIGN OPTIMIZATION
RESULTS #2: CAMPAIGN OPTIMIZATION ROC
LESSONS LEARNED ALONG THE WAY
• Before You Start….
– Collaboration: Tom Sawyer…
– Organizational buy in is important
– Establish data governance and consumer privacy compliance
• When You Start….
– Create the team
– Need people to learn: Hive and Pig
– Passionate and Innovative
LESSONS LEARNED ALONG THE WAY
― Lessons learned: SAS HPA:
– HPA environment needs to have sufficient disk allocated (not 1:1 disk/memory as originally expected) to allow flexibility for analysts to persist datasets and optimize model building process
– Mid Tier (App/meta/mid-tier) still plays an important role as not all procsare available on HPA. Short wish list: proc Transpose, interactive decision trees
– For current architectures, separation of the SAS Hadoop cluster and the primary Hadoop Data Lake cluster is a tactical approach to optimized infrastructure (and license). Looking for future versions of YARN to see the two clusters merge into a heterogeneous set of data applications and hardware platforms to form mature data lake architectures.
LESSONS LEARNED ALONG THE WAY
― Watch the timing and memory parameters:
― set mapreduce.task.timeout to 900000 or higher for SAS 9.4 M‐1 (short term solution)
― identified a 1536mb limit; keep configurations consistent across clusters
― There are a multitude of data management tasks that need to be considered when embarking on these kind of analytical projects. HDP + SAS makes it faster by shortening the time from ingest to analytical model considerably. Over time, data management practices need to be put into place.
LESSONS LEARNED ALONG THE WAY - TEAM
Page 37
― Big Data Analytic talent can be acquired or developed
― When looking for talent keep in mind ‘Boomerang’ Employees
― Recruit by offering interesting and challenging work –Big Data SAS
― Lean: already know the organization.
― Audience team has hired three!
― hbr.org, boomerang-employees, Feb 2014
― Ref: boomerangs.org
THANK YOU
AUDIENCE PRIVACY COMPLIANCE
• Rogers Senior Regulatory, Privacy Counsel and Rogers legal team, partnered to
create a Privacy Compliance framework with respect to online behavioral
advertising
– The framework is driven by Canadian federal regulations and industry best
practices
• Ad Choices Canada member - the industry framework
– Similar to the best practices in the U.S., framework gives the customer complete
flexibility to opt-out
• All data exposed to advertising platforms is anonymous and aggregated into
segments to ensure that no personally identifiable information (PII) is exposed