big data analytics from a practitioners view

Post on 12-Jul-2015

900 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Big Data Analytics from

a Practitioners viewSep 2013

Raghu Kashyap

About Raghu Kashyap

page 1

Areas of Responsibility

Data Insights Group (Site analytics,

Competitive Intelligence, Big Data)

Orbitz India, supporting Analytics

and BI teams

US, Europe, Australia(APAC)

Personal

Director – Data Insights Group

Strong background with technology(13

years) passion and experience with

analytics(4 years) and big data (3.5

year)

Masters in Computer Science

Golf, traveling, helping non-profit

organizations, spending time with my

wife and 2 boys

Twitter: @ragskashyap

Blog: http://kashyaps.com

Email: raghu.kashyap@orbitz.com

Orbitz Worldwide

page 2

Challenges

Lack of multi-dimensional capabilities

Heavy investment on the tools

Precision vs Accuracy

Data Governance

continued….

No data unification or uniform platform

across organizations and business

units

No easy data extraction capabilities

Hadoop history at OWW

page 5

Web Analytics & Big Data

OWW generates couple million air and hotel

searches every day.

Massive amounts of data. Over hundred GB

of log data per day.

Expensive and difficult to store and process

this data using existing data infrastructure.

Love Thy Hadoop

page 7

Long term storage for

very large data sets.

Open access to

developers and analysts.

Allows for ad-hoc

querying of data and

rapid deployment of

reporting applications.

Hadoop Growth

page 8

Hadoop Cluster

page 9

Treemap of HDFS storage

page 10

Approach with Hadoop and ETL

Raw logs

Flat files

Event Model

Map Reduce

ETL

External Tables

Data Warehouse (Greenplum)

GP Connector

Opportunities

page 12

Machine Learning

Site Analytics Data

PPC bidding efficiencies

Internal log analysis. Hgrep

MVT testing

Advanced Analytics

Show me the money

EFX – Every Friggin X

PPC bidding efficiencies

MAC vs. PC

Marketing Channel optimization

page 14

Orbitz.comDirect

Paid -Brand

Paid –Non

Brand

SEO –Brand

SEO -Non

BrandEmail

Meta

Travel Research

Affiliates

Display Ads

Hotel Rate Cache optimization

page 15

Data is collected as part of RCDC.

Includes every live rate search (aka

burst) performed by our hotel stack.

Raw data: ~200 GB, compressed, 108

records.

Extraction: <40 GB compressed, 109

records.

MVT

Analyze behavioral and Test data from our

MVT testing

page 16

DWH Log analysis

page 17

• Analysis of Greenplum DB logs within Hadoop

to analyze the data usage patterns.

• Impact analysis

• Hadoop usage for the last 30 days of DB log

analysis.

HIPPO is your best friend

• Expect organizational resistance from

unanticipated directions

• You can do wonders in the analytics area if

you get buy in.

Lessons Learnt

Analytics using Big Data comes with a price.

Data Governance

Senior Leadership buy in

I can't tell you the key to success, but the key

to failure is trying to please everyone." -Ed

Sheeranpage 19

How to capitalize on Big Data?

page 20

Learn from people who have already

done this.

DO NOT reinvent the wheel

Buy v/s Build balance

Build once and leverage mulitple

places.

Go where clients don’t want to go or

cant go in terms of execution.

What matters to Practitioners?

Things change dramatically in the

world of analytics

Being Agile is very important

Dashboards and Reports can take

you only to a certain level

Buy in from key groups is important

Grow business and impress Boss

page 21

2222222

Thank you

top related