Transcript
Page 1: Web Analytics: Challenges in Data Modeling

C H A L L E N G E S I N DATA M O D E L I N G

WEB ANALYTICS

Page 2: Web Analytics: Challenges in Data Modeling

AGENDA

• Introduction to Web Analytics• Data Sources, Data Capture• Vocabulary

• Data Modeling Basics• Relational vs. Dimensional• Normalization, De-normalization, Aggregation

• Web Analytics + Data Modeling• Four-tiered Data Model for Web data• Challenges

• Q & A

Page 3: Web Analytics: Challenges in Data Modeling

INTRODUCTION

• Anne Marie Macek• Senior Manager, Data Strategy• Consumer Insight and Revenue Strategy• Marriott International

• 30+ years Data Modeling and Reporting• 14+ years Data Warehousing and Business

Intelligence• 4+ years Web Analytics Data and Reporting• MBA, Management Information Systems• BS, Mathematics and Computer Science

Page 4: Web Analytics: Challenges in Data Modeling

EXPERIENCE

• Data Modeling:• Flat Files, IMS/DB, DB2, Oracle, Netezza• MS Access, Borland Paradox• Cognos Powerplay, MS Analysis Services, Cognos 10.2

Dynamic Cubes

• Reporting:• COBOL, Focus, SAS, Actuate• Cognos BI Suite

• Business Functions:• eCommerce, Revenue Management, Sales & Marketing• Human Resources, Finance

Page 5: Web Analytics: Challenges in Data Modeling

DEFINITION

• Web analytics is the measurement, collection, analysis and reporting of internet data for purposes of understanding and optimizing web usage.

Source: Wikipedia

Page 6: Web Analytics: Challenges in Data Modeling

OBJECTIVES

• Website Performance• Conversion Rate ($ sales / # visits)• Trends over time• In Response to Campaigns

• Website Optimization• Customer Behavior • Technological Trends

• Integration• Customer Lifetime Value / Segmentation

• Personalization• Proactive display of pertinent information

Page 7: Web Analytics: Challenges in Data Modeling

DATA SOURCES

• Click-stream Data• Search Engine Optimization (SEO)• Campaign Classification• Email Campaigns• Advertising Impressions• 3rd Party Marketing Data• IP Geolocation• Competitive Analysis• Customer Information• Multi-channel Analysis• Outcome Data

Page 8: Web Analytics: Challenges in Data Modeling

CLICKSTREAM COLLECTION

• Web Log Files• Rudimentary data collected on company’s web server• Page name, IP address, browser, date/time

• Does not screen out search engine robots

• JavaScript Tagging (Google Analytics, Omniture, WebTrends)• As page loads, data is sent to 3rd party for collection• Assigns a cookie to the user• Can implement custom tags on specific pages• Does not count pages served from cache

• Packet Sniffers (Cloudmeter Pion, Tealeaf CX Connect)• Software or hardware layer installed on web servers• Parsing raw data, and ensuring PII can be complex

Page 9: Web Analytics: Challenges in Data Modeling

CLICKSTREAM ANALYSIS

• Number of Visitors • Total vs. Unique• New vs. Repeat

• Source of Visit (Session)• External Link (Campaign Analysis / Attribution)• Direct

• Searches Performed On Site• Keywords• Sort Order of Results

• Page Analysis• Specific Actions Performed• Order (Booking)• Signup for Membership, Credit Card, Event

• Abandonment (Bounce Rate)

Page 10: Web Analytics: Challenges in Data Modeling

BRINGING CLICKSTREAM IN-HOUSE

• Control/Consolidate Business Rules• Integration with Corporate Systems of Record • Single Version of the Truth

• Integration with Other Web Data Sources• Enable more “intelligent” metrics• Not all visits are a conversion opportunity

• Shift from “visit analysis” to “customer analysis”• Enable advanced statistical and predictive

modeling• Multi-touch Attribution• Pay Per Click (PPC) Keyword Bid Optimization

Page 11: Web Analytics: Challenges in Data Modeling

CLICKSTREAM CHALLENGES

• “Clickstream data … is delightfully complex, ever changing, and full of mysterious occurrences.” Avinash Kaushik, Web Analytics: An Hour a Day

• Volume• Cons- It’s big• Pros- It’s incremental

• Fairly Unstructured• Exceptions to every rule• Mobile App vs. Mobile Web vs. Desktop• Rapidly Changing• Most queries require trending YTD + 2 years’ history• Few “natural” metrics; most require count (distinct)• How do I model this data??

Page 12: Web Analytics: Challenges in Data Modeling

DATA WAREHOUSE APPROACHES

Bill Inmon

• DW is Central Repository of all Enterprise Data• “Top Down”• Relational Model (3NF)• Feeds Functional Data

Marts• Huge Undertaking

Ralph Kimball

• DW is the “Virtual” Integration of Various Functional Data Marts• “Bottom Up”• Dimensional Model• Quicker to Develop• Silo-ed and Redundant

Page 13: Web Analytics: Challenges in Data Modeling

RELATIONAL MODEL

Source: sqlservercentral.com

Page 14: Web Analytics: Challenges in Data Modeling

DIMENSIONAL MODELS

Star Schema Snowflake Schema

Source: Wikipedia

Page 15: Web Analytics: Challenges in Data Modeling

NORMALIZATION

• Removes redundancy and dependency from data structures.

• 1NF: Remove Repeating Groups• 2NF: Remove Partial Key Dependencies• 3NF: Remove Dependencies Among Attributes

• Tutorial: http://phlonx.com/resources/nf3/

• Data Warehouses require some De-Normalization to improve query performance

Page 16: Web Analytics: Challenges in Data Modeling

ECOMMERCE DATA WAREHOUSE

Native Source Model

Fact Model BI ModelAggregate

Model

Page 17: Web Analytics: Challenges in Data Modeling

NATIVE SOURCE MODEL

Plus

• In-database copy of the source data• Stores data elements

we are not yet ready to model further• Maintains details for

research purposes• Prevents repeating

historical conversion

Minus

• Huge• Unstructured• Not normalized (at all)• Not useful for analysis

or reporting

Page 18: Web Analytics: Challenges in Data Modeling

NATIVE SOURCE MODEL

Page 19: Web Analytics: Challenges in Data Modeling

FACT MODEL

Plus

• “Snow-relational”• Nearly Normalized

(optimized for load)• Multiple Fact &

Extension Tables (manage I/O)

• Granular (click row)• Contains keys to

integrate with enterprise data

Minus

• Complex load including propagation and look-back• Use requires non-

filtered joins of massive tables• Difficult to use for

analysis, cannot be used for reporting

Page 20: Web Analytics: Challenges in Data Modeling

FACT MODEL

Page 21: Web Analytics: Challenges in Data Modeling

BI MODEL

Plus

• “Star-flake” Model• De-normalized

(optimized for query)• Pre-joined• Granular (click row)• Integrated with

enterprise data at load time

• Useful for detailed analysis

Minus

• Complex load process• It’s still big!• Corrections to Fact

Model data issues require re-build or complex conversion processes• Difficult to use for

reporting

Page 22: Web Analytics: Challenges in Data Modeling

BI MODEL

Page 23: Web Analytics: Challenges in Data Modeling

AGGREGATE MODEL

Plus

• Star Schema (simple)• De-normalized

(optimized for query)• Aggregated• Fast query

performance• Great for pre-

determined reports

Minus

• Corrections to Fact Model data issues and embedded dimensions require re-build• Count distincts only

available for pre-determined dimensions• Limited use for

analysis

Page 24: Web Analytics: Challenges in Data Modeling

AGGREGATE MODEL

Page 25: Web Analytics: Challenges in Data Modeling

QUESTIONS?

• Thank You!


Top Related