hp discover: real time insights from big data

Billions of Rows, Millions of Insights

Right Now

Developing a Landscape for Real Time Information

Spil Games: A leader in online gaming• 180 million monthly and 12 million daily players

• >50 websites, local in 15 languages

• A rich source of data about traffic, content, and consumers

• Battling changing consumer expectations on content delivery (the Netflix effect)

Big data created big paradigm shifts

• Highly consistent• Highly connectable• Inflexible• Slow

• Open• Adaptive/Evolving• Inconsistent

You always need both

Traditionally, we define data based on what we expect

With big data, we capture first and define later

Capture

Explore Define

Apply + Track

VELOCITY

VARIETY

VERACITY

What is big data?

VALUEThe Only V that Matters

Big data also brings new challenges: the four Vs

Velocity: What is real time?

Traditional ETL“Real Time”

• Once a day• Once a week• Delayed

• Faster than human perception

• <200 milliseconds “In Time”

In Time: Information is available fast enough to influence decisions• While in the shop/on the site (minutes)• While the query runs (seconds)• While the page loads (milliseconds)

The Velocity Continuum

How big data drives value at Spil

Informing Decisions Making Decisions

• Day to day business reporting

• Analytical reporting for self-service analysis

• Business analytics for advising decisions

• Descriptive models to explain our business

• Customer Lifetime Value• Marketing ROI

• Customer content recommendations

• Email campaign targeting

• Site learning and optimization

• System monitoring and alerting

Unstructured data intake

Unstructured data storage

Structured data storage

Human interface layer

Predictive analytics tools

Select A,B,sum(C)From XGroup by 1,2

• High Query Performance• Denormalized• Scalable; high concurrency

• Cheap• Flexible Schema• Easy Management

• Scalable• Schemaless or adaptive schema• Resilient

• Highly Flexible• Simple to use• In-tool metadata

• Not memory constrained• Flexible inputs/outputs• Easy iteration

The pieces needed for a big data stack

The nuts and bolts of our big data tech

Why we chose our tech

• Affordable• Highly available and resilient

• Extremely fast development due to SQL• Excellent query performance = lazy

optimization

• Right price• Easy (and fun!) development• Excellent library availability

• Industry standard for Map/Reduce• Cheap storage for “data lake”

• Easy integration with existing tech

How much data do we handle?

Through Map/Reduce: 1.4 Billion Events/Day (200 Million Rows/Day

into DWH)

Through ETL: 100-200 Million

Rows/Day into DWH

Map/Reduce: 20 Billion Rows

Vertica: 50 Billion Rows

Long Term Storage:All of 2013 Events

Predictive models: >500 million scores per day

ETLs to Production DBs: >10 Models

Reporting: 150 Dashboards, 80 data

sources

Queries: >2000 per day

Ingestion Persistence Usage

What it drives for us every day

Demographic Prediction

Multivariate Testing/Site Optimization

Q&A + Demo

hp discover: real time insights from big data

Technology