early lessons learned in applying big data to tv advertising presentation presented by dave morgan

Post on 19-Jan-2015

2.624 Views

Category:

Business

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Early Lessons Learned in Applying Big Data To TV Advertising

IAB ITV for Agencies DayDave Morgan, CEO, Simulmedia

2

About Us

We are a New York based start-up. We are venture backed by Avalon Ventures, Union Square Ventures and Time-Warner.

Our 35 person team has veterans of:

Television is still the most powerful advertising medium in the world. While addressability will come, we’re not waiting for it. We’ve taken a few strategies we learned from the Internet and are applying it to linear TV advertising, today.

Through partnerships with major data providers, we have assembled the world’s largest set of actionable television data.

We sell television advertising. With inventory in over 106 million US households, we can cost-effectively extend reach into high-value target audiences across virtually any advertiser category. We use big data and science to do this.

Who We Are

Where We Have Been

What We Believe

How We Do It

How We Make Money

3

Why Did We Leave The Web?

Television remains the dominant consumer medium

(a) Nielsen US TV Viewing Audicence Traditional Live-Only TV based on average monthly viewing during 1Q2011. Internet and Online Video based on average monthly consumption during July 2011. Video on Demand based on consumption during May 2011.

4

TV Spend Is Increasing

Source: MAGNAGLOBAL

5

Audience Is Fragmenting

Source: Nielsen via TVbythenumbers.com

6

Campaign Reach Is Declining

Source: Simulmedia analysis of data from SQAD, Nielsen and TVB

Impossible for measurement and planning tools to keep pace

Highly Confidential

Big Data

8

Big Data Is Driving Growth

“We are on the cusp of a tremendous wave of innovation, productivity and growth, as well as new modes of competition and value-capture –

all driven by Big Data.”- McKinsey Global Institute, May 2011

“For CMOs, Big Data is a very big deal.”- Alfredo Gangotena, CMO, Mastercard, July 2011

9

Size Is Relative

1 byte x 1000 = 1 kilobyte…x 1000 = 1 megabyte…x 1000 = 1 gigabyte…x 1000 = 1 terabyte…x 1000 = 1 petabyte…x 1000 = 1 exabyte

10

Size Is Relative

Telegram = 100 bytes

Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm

11

Size Is Relative

Page of an Encyclopedia = 100 kilobytes

Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm

12

Size Is Relative

Pickup truck bed full of paper = 1 gigabyte

Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm

13

Size Is Relative

Entire print collection of the Library of Congress = 10 terabytes

Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm

14

Size Is Relative

All hard drives produced in 1995 = 20 petabytes

Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm

15

Size Is Relative

All printed material = 200 petabytes

Data © 1997-2011, James S. Huggins http://www.jamesshuggins.com/h/tek1/how_big.htm

16

But Big Data Is More Than Size

Time:

Focus:

Supports:

What happened?

Why did it happen?

BIG DATA

What’s going to happen next?

Past Future

Reporting Prediction

Human decisions

Machine decisions

StructuredAggregated

UnstructuredUnaggregated

Data:

DashboardsExcel

DiscoveryVisualization

Statistics & Physics

Human Skills:

17

Accelerating The Push To Big Data

Hadoop, cloud computing, Facebook, Yahoo, quants, Bittorrent, machine learning, Stanford,

large hadron collider, Wal-Mart, text processing, Amazon S3 & EC2, open source intelligence, NoSQL, social media, Google,

commodity hardware, Hive, fraud detection, trading desks, MapReduce, natural language

processing

18

What Can It Mean For TV Advertising?

Big data drove the rise of web & search advertising

• Accumulation of high volume of direct measurement of media consumption

• Better predictions about consumer interests• Real time return path• Automation• Interim step for addressability• More diligence around consumer privacy• Media buyers and sellers rethinking their approach to

audience packaging, campaign planning, technology, data assembly and people

19

Post Modern Architecture

Have we reached the limits of classic data storage architecture?

Data Warehouses• Yahoo!: 700 tb1 • Australian Bureau of Statistics: 250 tb1

• AT&T: 250 tb1

• Nielsen: 45 tb1

• Adidas: 13 tb1

• Wal-Mart: 1 pb2

1 Oracle F1Q10 Earnings Call September 16, 2009 Transcript2 Stair, Principles of Information Systems, 2009, p 1813 Dhruba Borthakur, Facebook, December 2010, http://www.facebook.com/note.php?note_id=4682111939194 Simulmedia estimate

Data Lakes• Facebook: 30 pb3 (7x

compression)• Yahoo: 22 pb4

• Google: ???

20

Our Idea of Big Data

Set Top Boxes

• 17+ million boxes

• Completely anonymous viewing• Live• DVR• VOD• Pay channels

Program

• 3 different sets of schedule data

• Proprietary metadata

Public

• US census• Military• Business

Ad Occurrence

• What ads ran?

• Where did they run?

Client Proprietary

• Business Development Indices (BDI)

• Commercial Development Indices (CDI)

• Regional sales data

Nielsen Ratings

• All Minute Respondent Level Data (AMRLD)

Bringing the data set together in a single platform

Our (comparatively modest) data set:• 200 tb (approx. 7x compression)• 113,858,592 daily events• Approximately 402,301 weekly ads• Double capacity every 6 months…And we don’t load every data point across all data sets, yet

21

Rethinking Media Data Architecture

• No clouds allowed (ISO compliance)• Expect hardware failure

• Learn from those who have done it• Participate in the Open Source community

• ELT (Extract, Load, Transform)• Meddle• Machine learning

Commodity Hardware

Open Source Software

Write Your Own Software

Applying big data to television required us to rethink what our technical architecture should be

• Advanced statistical techniques• Experimentation

Science

22

Some Wrinkles In The Matrix

No standards for set top boxesChannel mapping

Time synchronizationOn/off rules

….

Consult the sagesBuild the team

23

The People We Needed

• New core skills for everyone in the company• Pattern recognition• Visualization• Technology• Experimentation

• Where do you find hard to find tech skills?• You don’t find them. You make them.

• A dedicated Science team• Non traditional researchers (Brain imaging, bioinformatics,

economic modeling, genetics) • People who watch a lot of television

A different approach required different skill sets

Highly Confidential

10 Lessons We’ve Learned

25

Some Things To Know, First

• Live viewing unless otherwise noted• Time shifting lessons is a whole other presentation• Time shifting + live viewing lessons is a whole other other presentation• Video on demand is a whole other other other presentation

• We name names and provide numbers where clients and data partners permit• Client confidentiality is important to us

• None of this work would’ve been possible without the help of our clients and partners

Read me…This box will contain important information about the graphs on

each page.

Highly Confidential

60% of TV Viewers Watch 90% of TV

27

Networks with relatively fewer lighter viewer impressions

Networks with relatively more lighter viewer impressions

OXYGEN 7.4

WE 7.6

PLANET GREEN

7.7

OVATION 7.8

STYLE 7.8

MTV2 7.8

SUNDANCE 7.9

IFC 7.9

TCM 13.6

HALLMARK 13.7

ADSWIM 14.0

NICKNITE 14.3

CNBC 15.7

FOX NEWS 18.0

Higher rated networks

Lowerrated

networks

Where The Other 40% Are

Vertical: Ratio of Heavy Viewers to light viewer impressions. Horizontal: Low rated to Highly rated networks Call outs: Ratio is the number of Heavier Viewer impressions you would deliver to reach a Lighter Viewer on a given network Sources: Nielsen & Simulmedia’s a7

28

Where The Other 40% Are

To capture light viewers, media planning and measurement tools must quickly apply new methods to emerging data sets

Highly Confidential

Quality Control Is A Full Time Job

30

When Data Goes Missing

Automation of error checking/quality control is essential

Reuse the data to solve other problems

Occasionally observe missing data

Three choices:• Pick up the phone• Estimate missing fields • Work around the missing

data

Source: Simulmedia’s a7

Time series of SYFY network. 10645 observations from 2010.02.28 at 7:00pm Eastern to 2010.10.14 at 12:30pm Eastern

Highly Confidential

More Data Really Is Better

32

Disambiguation: The Madonna Problem

OR

Pop Icon? Religious icon?

33

The Revolution of Simple Methods

More data beats better algorithms.

The best performing algorithm underperforms the worst algorithm when given an order of magnitude more data.

Simple algorithms at very large scale can help better predict audience movement.

Peter Norvig | Internet Scale Data Analysis | June 21, 2010

Original graph sourced from: Banko & Brill, 2001. Mitigating the paucity-of-data problem: exploring the effect of training corpus size on classifier performance for natural language processing

34

Packaging Reach

Peter Norvig | Internet Scale Data Analysis | June 21, 2010

Very large data sets better predict TV audience movements

35

The Cost Of More Data

• All data online. All the time.

• Less expensive hardware• Extremely flexible

• All data online. All the time.

• More expensive talent• Physicists & statisticians ain’t

cheap• Hard to find programmers

• Not everything meets your needs

• Evolving technologies in mission critical functions

More data drives better results but there are costs

Highly Confidential

The Data Isn’t Biased Just Because It Comes From A

Set Top Box

37

Applying Simple Methods At Scale

Sources: Nielsen & Simulmedia’s a7

Regression analysis of Nielsen Household Cume Rating against Simulmedia’s a7 cume rating. 20 Primetime Network shows with HAWAII FIVE-0. Fall 2010.

High correlation of a7

measures and Nielsen estimates.

Either bias is insignificant or Nielsen data and our data share the same bias.

Multiple methods yield similar results

38

And Then We Kept Going

Two samples1. Sample 1: Fall 2010: 20 Primetime

broadcast series launches + promos

2. Sample 2: Jan 2011: 15 Primetime cable series premieres + promos (Plus one multi-season/year primetime broadcast premiere + promos)

• Hand selected programs • Mix of genres • Mix of new vs. returning shows

How we sliced it• Entire a7 data set • Cross correlated individual data

sets contained in a7 aggregate data set

• Aggregate cross geographies (DMA to DMA)

Observations• Sample 1 average r2>0.85• Sample 2 average r2>0.93

We measured program Tune-In, Spot Tune-In, Campaign Reach, Campaign Rating using multiple slices of our data set using two

different sample sets and time frames

Highly Confidential

Addressability Is Here

40

Closing The Loop On Program Promotion

Sources: Simulmedia’s a7

Spring 2010 broadcast premiere promotion. Horizontal: Left to right moves back in time. 0 is the premiere time. Vertical: Conversion rate is measured in percent. Size of the bubble represents total conversions for a given spot.

41

Closing The Loop On Program Promotion

Sources: Simulmedia’s a7

Spring 2010 broadcast premiere promotion. Horizontal: Left to right moves back in time. 0 is the premiere time. Vertical: Conversion rate is measured in percent. Size of the bubble represents total conversions for a given spot.

42

Long held beliefs and rules of thumb in planning may or may not be supported by data

TV marketers now have more options for show promotion

Closing The Loop

Highly Confidential

Nielsen’s Ratings Are Good (Surprisingly Good)

44

Time Series: Broadcast: CBS

Sources: Nielsen & Simulmedia’s a7

Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987))

60 networks. High correlation between Nielsen large sample measurement and a7 measures

45

Time Series: Broadcast: Fox

Sources: Nielsen & Simulmedia’s a7

Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987))

46

Time Series: Broadcast: ABC

Sources: Nielsen & Simulmedia’s a7

Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987))

47

Time Series: Cable: Investigation Discovery

Sources: Nielsen & Simulmedia’s a7

Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987))

48

Time Series: Cable: Golf

Sources: Nielsen & Simulmedia’s a7

Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987))

49

Time Series: Cable: Bravo

Sources: Nielsen & Simulmedia’s a7

Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987))

50

Time Series: Cable: ESPN2

Sources: Nielsen & Simulmedia’s a7

Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987))

51

Time Series: Cable: Speed

Sources: Nielsen & Simulmedia’s a7

Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987))

Highly Confidential

…but…

53

When You Look Closer

Sources: Nielsen & Simulmedia’s a7

Hour by hour time series Mar 20 to April 8, 2011. Z score plots with Nielsen estimates in red. Simulmedia measurements in blue. Where Nielsen provided no estimate, estimates were imputed using Multiple Imputation (Rubin (1987))

54

High Frequency Time Series: ABC Family

Sources: Nielsen & Simulmedia’s a7

Nielsen

Sample graph from High Frequency (Second and Minute level) Time Series Analysis of 45 networks on January 19th 2011. Simulmedia a7 Sample (Second by Second to Minute) Nielsen Sample (Minute by Minute)

a7

Volatility in dayparts, low rated networks, demographics…. Unrated networks “don’t exist.” Did NOT look at local.

Highly Confidential

Women Are More Different Than Men

56

Gender Driven Geographic Variation

Viewing by zip code among women across markets is more varied than men in the same zip codes

Women 18-54 Men 18-54

Fraction of view time for ages 18-54 as fraction of view time for all TV viewers. Week 2 vs. the same fraction for week 1 (last two weeks in January). Three markets: Philadelphia (blue) Atlanta (red) and Chicago (green) Each point represents a zip code in one of these markets. Source: Simulmedia’s a7

57

Gender Driven Geographic Variation

Planning tactics for female targeted campaigns should be different than male target campaigns

PS…Also a good case for geo based creative versioning

Highly Confidential

Privacy Matters

59

Privacy By Design

• All marketing data companies need to care

• Make consumer privacy protection part of the business from the beginning • Anonymous, aggregated data only• No personal data or data that can

be related to particular individuals or devices

• Broad marketing segmentations, not profiling

• No sensitive dataDon’t be creepy

Highly Confidential

Mass Reach Is Indiscriminant

61

Fragmentation Effects On Frequency

Source: Nielsen & Simulmedia’s a7

Each segment was above 70% reach but the frequency distribution was nearly identical

Percent of audience reached for major animated motion picture campaign 2011. Two weeks prior to release. Each stacked bar is a different audience segment. Each color with the stacked bar represents the frequency of ad view for each segment.

62

Fragmentation Effects On Frequency

Source: Nielsen & Simulmedia’s a7

Fragmentation is affecting all high reach campaigns.

Percent of audience reached for insurance advertisers September to October 2010. Approximately 8000 ads. Each stacked bar is a different audience segment. Each color with the stacked bar represents the frequency of ad view for each segment.

63

The TV advertising market can’t continue to support this

Fragmentation Effects On Frequency

Highly Confidential

40% Of The Audience Is Getting 85% Of The

Impressions

65

Fragmentation Rears It’s Head Again

Source: Nielsen & Simulmedia’s a7

0.0

1.4

4.3

9.1

24.8

0.0%

3.6%

10.8%

23.0%

62.6%

Average Frequency Per Quintile

% of Total Impressions Per Quintile

Campaign impressions increasingly concentrated against

heavy viewers.

Percent of audience reached for a different major animated motion picture campaign 2011. Two weeks prior to release. The stacked bar represents quintiles. Blue labels are average frequency per respective quintile. Red labels are % of total campaign impressions by respective quintile.

Total US Television

Audience

66

Fragmentation Effects on Frequency

Advertisers won’t continue to support this

Highly Confidential

What Happens Next?

68

Choices

• If fragmentation is causing declining campaign reach and frequency imbalances, marketers must make choices.• Reduce reach

• Do nothing• Use other channels

• Stabilize or improve reach• Re-aggregate audiences using big data

What do you think?

70

About Our Science Team

• Krishna Balasubramanian, Chief Scientist• Previously: Chief Scientist, Tacoda. Chief Scientist, Real Media.• Doctoral Candidate, Physics. (Condensed Matter Physics) The Ohio State University• MS, Computer & Information Systems. The Ohio State University• MSc, Physics. Indian Institute of Technology, Kanpur

• Yuliya Torosjan, Scientist• Previously: Clinical Research (Brain Imaging), Mount Sinai College of Medicine• MA, Statistics. Columbia University• BSE, Computer Science & Engineering. University of Pennsylvania• BA, Psychology. University of Pennsylvania

• Mario Morales, Scientist• Previously: Lecturer, Bioinformatics, New York University. Senior Consultant, Weiser LLP.• MS, Statistics. Hunter College• MS, Bioinformatics. New York University

• Dr. Sidd Mukherjee, Scientist• Previously, Visiting Scholar (Atomic Scattering experiments), The Ohio State University• Post doctoral research, Heat capacity of Helium-4. Pennsylvania State University• PhD, Physics. (Thesis: Measurements of Diffuse and Specular Scattering of 4He Atoms from

4He Films), Ohio State University• MS, Computer &Information Systems. The Ohio State University• BSc, Physics & Mathematics. University of Bombay

top related