web mining: an overview of web analytics with examples donghui wu, ph.d. oracle corporation april 16...

67
Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Upload: neal-banks

Post on 22-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Web Mining: An Overview Of Web Analytics with Examples

Donghui Wu, Ph.D.

Oracle Corporation

April 16th 2003

Page 2: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Agenda

• Web Mining Overview

• Basic Web Analysis Problems

• Data Warehouse Solutions

• Oracle 9iAS Clickstream Intelligence Demo– Site Configure Excerpts– Site Basic Statistics Examples– Business Scenario Examples

Page 3: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Web Mining

Web Mining, generally speaking, is the activity of applying data mining principles and process to Web domain. It may tackle the World Wide Web as a whole, or focus on a particular (group) of Web sites (servers)

In this talk, we will limited the scope to Web usage and pattern analysis, or, more specifically Web Log Mining, at the enterprise (Web sites) level. In industry, it is also referred as Web Analytics.

Page 4: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Web Analytics

• Web Analytics is the monitoring and reporting of Web site usage so that enterprises can better understand the complex interactions between Web visitor actions and Web site offers, and leverage that insight to optimize the site for increased customer loyalty and sales.– From Web Analytics :Making Business Sense of Online

Behavior, Aberdeen Group, June 2002

Page 5: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003
Page 6: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Web Mining and Privacy

• Privacy issue is always a concern for data mining projects.

• When analyzing/mining visitor online behaviors, in particular visitor / user profiling, privacy issue is a major concern

• Usually only the aggregated info are analyzed, not the individual visitor’s/user’s

Page 7: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003
Page 8: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Web Log Data Sources (1)• Web Server Log

– This is the server log at the Web server, easy to get, and most widely analyzed.

– It is logged at the destination. The analysis is about a particular Web server or servers.

– One Web server can host many Web sites, and one Web site may served by multiple Web servers.

• Proxy Server Log– If the Web connection is through a proxy, every

requests are logged at the proxy server as well. – It’s logged the origin. The analysis is about a group

users, e.g. all users within a company.

Page 9: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Web Log Data Sources (2)

• Client Side Browser Log– Embeded client-side collection. It requires

sending simple javascripts with the the response to the Browser, and will collect browser info, and visitor client side activity, e.g. mouse movement, to a collector server for analysis

• Application Log– Web application usually has its own logs at

various details and for various purposes

Page 10: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Server Log, Proxy Log, and Browser Log

Page 11: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Web Server Log Analysis and Mining

• From now on, we limited our subject to Web Server Log Analysis and Mining only.

• The emphasis is on Enterprise Web Analytics.

• We will use a fiction site drugdepo.com as sample analysis, and Oracle 9iAS Clickstream Intelligence to produce the sample analysis.

Page 12: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Web Analytics Tasks Category

• Site Activity and OperationSite traffic, performance and status

Usage MiningVisitor Behavior Analysis, Referrer

analysis,Path Analysis

User Profiling/ClusteringVisitor Profiling, visitor segmentationUser profiling, user segmentation

Page 13: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Web Analytics Tasks for Business Users

• Content effectiveness evaluation

• Online marketing campaign analysis

• Target marketing analysis

• Personalization and recommendation

• Cross-sell and up-sell opportunities

• Many more…

Page 14: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Data Mining Techniques in Web Analytics

The following data mining techniques may be applied to solve those problems:

• Association Rule Mining

• Clustering / Segmentation– Visitor / User– Pages

• Visitor/User Profiling

Page 15: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Web Mining Difficulties

• Data size is huge– For site with 1 million hits per day, the raw log file size

can be 500M to 1 G per day depending Web server configure

• Bad records– There are many bad records due to Server errors.

• Lack exact information– In many cases, heuristics have to be applied

Page 16: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Web Server Log Format

• NCSA Common Log Format

• NCSA Extended Common Log Format

• W3C Extended Common Log Format

For more information, see W3C website

Page 17: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

NCSA Common Log Format

The following is a line in an Apache server log. It is in NCSA Common Log Format, and has the following fields separated by a space.

Host Ident Authuser Time Request Status BytesSent Refer Browser

24.69.48.18 - 709697D0CE694757E034080020CB1B7C [01/Nov/2000:23:59:05 -0800] "GET /products/forms/pdf/256629.pdf HTTP/1.0" 206 308928 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)"

Page 18: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Dynamic Page and Parameters

• In the previous example, the requested page is a static page.

• For dynamic pages: e.g. ASP, JSP, etc.The request has two parts:

The static URL stem and query separated by “?”

• The query string is consisted of “paremeter=value” pairs.

• Parameters provide detailed info of the request.

Page 19: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Web Log Mining Task Types

• Web Log Analyzer– Provide simple statistics, e.g. # of visitor, # of

page view, # of sessions, etc. at given time

• Web Log Mining– Web Usage Mining and Pattern Analysis

• E-commerce, Personalization and CRM– Integrate and mining data across enterprise

Page 20: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Related Terms

• Hits– A hit is a URL request in server log

• Page Views (Page Impressions)– A page view may require multiple requests. E.g. several

.gif or .jpeg requests plus a .html requests

• Data Sent• Visitors ( identified and unidentified visitors) • Users (Authenticated Visitors)• Sessions

Page 21: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Data Filtering

Data analysis purpose, the following data preparationa are often applied:

• Remove .gif or .jpeg and other non-essential requests in raw data

• Some other filtering may also be applied based on tasks under attack.

• Page construction rules, to consolidate records

Page 22: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Basic Processing

• Parsing Log, resolve the following:– Client IP address– Visitor ID– User ID– Browser and OS– Request– Session

Page 23: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Basic Tasks

For any Web Analytics, you need to resolve the following before any possible analysis:

• Visitor identification

• User identification / matching

• Session Construction

• Path Completion

Page 24: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Visitor Identification Methods

• Client Hostname or IP Address only

• IP Address + Browser String

• Query String Parameter

• Cookie Value

• Visitor Field

Page 25: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

IP Method Limitations

• Single IP / Multiple Users– A single proxy server can sever many users.

• Multiple IP / Single User– A single user may use multiple machines over

time, or even in one session. For example, AOL dynamically assign IP address to every request

• Always configure your web server to use cookie or query string if possible

Page 26: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Session Identification

• Visitor ID and Timeout Period– Once Visitor ID is constructed, the requests with the

same Visitor ID are sequenced according to the timestamp, the time the requests were made. If between two requests the time difference is more than, say 30 minutes, then the sequence is break into two sessions.

• Query String Parameter– In the request query string

• Cookie Value • Session Field

Page 27: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

User Identification

• Web Server Authentication

• Query String Parameter

• Cookie Value – A cookie is a small text file that stores

information about a visitor on the user’s PC

Page 28: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Web Analytics Solution Types• Simple Web Log Analyzer

– Many free ones, simple parsing and counting– WebTrend Web Log Analyzer

• Data Warehouse Solutions– WebTrend E-commerce Server– Oracle 9iAS Clikcstream Intelligence

• Hosting Solutions– Digimine

• Consulting Solutions– Many companies specialized in customized Web Log

and Application Log analysis

Page 29: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Web Log Analyzer

• Web Log Analyzer- Report simple site usage measures, e.g.# of hits, # of visitors, page sequence, etc.

• Methodology: simple parsing and counting

• Small and quick, but only produce simple static reports, usually with big error margin

Page 30: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003
Page 31: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Data Warehouse Solutions

• Load Server Log into Data Warehouse

• Integrate with other data, e.g. sales

• Support interactive query and OLAP

• More accurate analysis and data mining results

• Expensive

Page 32: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003
Page 33: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Simplified DW Scheme:Dimensions

• Date

• Time

• Visitor

• User

• Browser

• Client Host

Page 34: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Simplified DW Scheme:Dimensions

• Date• Time of Day• Browser• Client Host• User• Visitor• Page

• Server• Site• Event• Referrer• Search

Page 35: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Simplified DW Scheme:Facts

• Impression (page view)– Browser– Client Host– Visitor– User– Page– Time to Serve– Referrer– Status– Event– Server– Session ID

• Session Fact– Session Date

– Session Time

– Session Visitor ID

– Session User ID

– Session Duration

– # of Impressions

– Data Sent

– First Impression Id

– Last Impression ID

– First referrer

Page 36: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Impression Fact

Page 37: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Session Fact

Page 38: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

ETL Process and external data

The ETL process can be customized to support business analysis according to:

• Web server log format

• External customer data

• External sales data and marketing data

• Other external data sources

Page 39: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Demo and ScenariosDemo and Scenarios

Page 40: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Oracle 9iAS Clickstream Intelligence

Collector Server

Loader

Oracle Warehouse

Builder

Star Schema

Partitioning

Staging

Oracle 9i

Page 41: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Agenda

• Configuration

• Basic Site Statistics

• Business Scenarios

Page 42: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

DrugDepo Site Configuration

Page 43: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003
Page 44: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003
Page 45: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003
Page 46: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003
Page 47: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003
Page 48: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003
Page 49: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Site Basic Statistics

Site: DrugDepo.com

Start Date: October 1

End Date: October 10

Page 50: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003
Page 51: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003
Page 52: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003
Page 53: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003
Page 54: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003
Page 55: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Business Scenarios Examples

Page 56: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Scenario 1: Determining Scenario 1: Determining Content EffectivenessContent Effectiveness

Page 57: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Scenario 1: Determining Scenario 1: Determining Content EffectivenessContent Effectiveness

• Questions The marketing director of DrugDepo, Shelley Green would like to know the following:

1. How do visitors find DrugDepo's Web site?

2. Did visitors find what they were looking for?

Page 58: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Discovery

Shelley uses the following Clickstream Intelligence reports:

• Search Analysis: Top Referring Searches

• Search Analysis: Top Local Searches

Page 59: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Top 5 Referring Searches

The top 5 referring searches (searches through search engines such as Google,Yahoo, Lycos, etc.) that bring visitors to DrugDepo are:

• health care products• ask expert• pharmacy• baby care• promotion

Page 60: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Top 5 Local Searches

The top 5 local searches are:

• ask expert

• specials

• allergy

• baby food

• heart attack

Page 61: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Possible Actions

Shelley is considering the following:• Expanding the content of the “Ask Expert”

column.• Positioning it prominently on the DrugDepo

home page.• Offering baby-related articles and items on

the site - There is also quite a high interest in baby care, food and related areas.

Page 62: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Scenario 2: Maximizing Online Scenario 2: Maximizing Online Marketing EffectivenessMarketing Effectiveness

Page 63: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Online Marketing Effectiveness

The marketing director of DrugDepo, Shelley Green would like to know the following:

• Who are DrugDepo’s top external referrers?

• What are the top searches by search engines?

Page 64: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Discovery

• Shelley uses the following Clickstream Intelligence reports: •Referring URLs:

• Top External Referrers

•Search Analysis: • Top Searches by Search Engine

Page 65: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Top referrers

The following are the top referrers of DrugDepo:• www.allergylearninglab.com• www.healthwatchlab.com• www.altmedicine.com• www.lycos.com• www.webclinic.com• search.yahoo.com• hotbot.lycos.com

Page 66: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Popular Search Phrases

The popular search phrases by search engines are:

• www.lycos.com – ask expert, health care products, promotion,

arnica, pharmacy …

• search.yahoo.com – health care products, ask expert, pharmacy …

• hotbot.lycos.com – health care products, pharmacy

Page 67: Web Mining: An Overview Of Web Analytics with Examples Donghui Wu, Ph.D. Oracle Corporation April 16 th 2003

Possible Actions

• Consider making Allergy Learning Lab, Health Watch Lab and Alt Medicine preferred partner Web sites because they are driving a lot visitors to DrugDepo’s Web site.

• Consider purchasing popular keywords or search phrases from Lycos and Yahoo because they are effective in driving visitors to the site.