wum tutorial

24
http://Ka.rsten-Winkler.de  Home » hypKNOWsys » Project WUM » WUM Tutorial E-Mail The Unfortunately Incomplete WUM Tutorial This small tutorial should enable you to start the Web Utilization Miner WUM, to create a new demo mining base, to import the fi rst demo log file that comes with this distribution, to create the visitors' sessions contained in this log file, to build the aggregated log and to execute your first MINT query with the MINT query processor. It covers the basic techniques that you should know about before mining your own l og files with WUM. Advanced techniques in using WUM are covered by the second part of the tutorial . It is strongly recommended to work your way through both parts of the tutorial before starting your own mining session. It is assumed that you successfully installed the Web Utilization Miner on your system and modified all necessary configuration files. If you did not install WUM yet, please refer to the Installation Guide that is part of this User Documentation and continue with the installation of this mining software. The demo version of WUM is supposed to be pure Java. Therefore is should run without difficulties on all existing Java Virtual Machines supporting Java 1.2.2 or higher. Please note that this Web Utilizatio n Miner is a beta version intended for use in research and education. The WUM team would really appreciate to get all kinds of bug reports and feature suggestions for the future development of this software. S imply drop us an e-mail. Good luck in exploring WUM: The Web Utilization Miner. Alternatively, you may be interested in reading what others wri te about WUM: Felix Schendel. Web-Usage-Mining: Analyse v orhandener Technologien und kombinierter Einsatz für kennzahl- und effizienzorientierte Analyse von S erver- Logfiles. Projektdokumentation, Fachbereich Wirtschaft, Hochschule Wismar. Wismar, Germany, January 2004. In German. [PDF File, Mail, Web] How to Start WUM UNIX and Linux: Open a new X-Terminal and make sure that your current working directory is the bin/ subdirectory of the WUM_HOME directory. In the given example, the environment variable JAVA_HOME is set to /usr/local/jdk1.2.2 and WUM_HOME is set to /users /kwinkler/WUM.v60 . The miner can be started as a background process by executing the shell script wumgui. Karsten Winkler http://ka.rsten-winkler .de/hypknowsys/wum/wumTutorial.html 1 de 24 30/07/2014 10:28

Upload: luis-alberto-olguin-villafane

Post on 13-Apr-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 1/24

http://Ka.rsten-Winkler.de

 

Home » hypKNOWsys » Project WUM » WUM Tutorial  E-Mail 

The Unfortunately Incomplete WUM Tutorial

This small tutorial should enable you to start the Web Utiliation Miner WUM! to create ane" demo minin# base! to import the $irst demo lo# $ile that comes "ith this distribution!

to create the %isitors& sessions contained in this lo# $ile! to build the a##re#ated lo# and to

e'ecute your $irst M(NT )uery "ith the M(NT )uery processor* (t co%ers the basic

techni)ues that you should +no" about be$ore minin# your o"n lo# $iles "ith WUM*

,d%anced techni)ues in usin# WUM are co%ered by the second part o$ the tutorial* (t is

stron#ly recommended to "or+ your "ay throu#h both parts o$ the tutorial be$ore startin#

your o"n minin# session*

(t is assumed that you success$ully installed the Web Utiliation Miner on your system and

modi$ied all necessary con$i#uration $iles* ($ you did not install WUM yet! please re$er to the

(nstallation uide that is part o$ this User .ocumentation and continue "ith the installation

o$ this minin# so$t"are* The demo %ersion o$ WUM is supposed to be pure /a%a* There$oreis should run "ithout di$$iculties on all e'istin# /a%a 0irtual Machines supportin# /a%a 1*2*2

or hi#her* Please note that this Web Utiliation Miner is a beta %ersion intended $or use in

research and education* The WUM team "ould really appreciate to #et all +inds o$ bu#

reports and $eature su##estions $or the $uture de%elopment o$ this so$t"are* 3imply drop

us an e-mail* ood luc+ in e'plorin# WUM4 The Web Utiliation Miner*

,lternati%ely! you may be interested in readin# "hat others "rite about WUM4

5eli' 3chendel* Web-Usa#e-Minin#4 ,nalyse %orhandener Technolo#ien und

+ombinierter Einsat $6r +ennahl- und e$$iienorientierte ,nalyse %on 3er%er-

7o#$iles* Proje+tdo+umentation! 5achbereich Wirtscha$t! Hochschule Wismar*

Wismar! ermany! /anuary 2889* (n erman* :P.5 5ile! Mail! Web;

How to tart WUM

UN(< and 7inu'4 Open a ne" <-Terminal and ma+e sure that your current "or+in#

directory is the bin= subdirectory o$ the WUM_HOME directory* (n the #i%en e'ample! the

en%ironment %ariable JAVA_HOME is set to /usr/local/jdk1.2.2 and WUM_HOME is set to /users

/kwinkler/WUM.v60* The miner can be started as a bac+#round process by e'ecutin# the shell

script wumgui*

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

24 30/07/2014 10:28

Page 2: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 2/24

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

24 30/07/2014 10:28

Page 3: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 3/24

Windo"s >?=>@=NT4 Open the Windo"s E'plorer by ri#ht-clic+in# the Start  icon o$ your

tas+ bar and selectin# Explorer ! open the home directory o$ WUM by bro"sin# the tree

%ie" o$ your $ile system and $inally double-clic+ the icon correspondin# to the $ile

startwum.pif* Usin# 7inu' and the K-.es+top En%ironment! the main $rame o$ the Web

Utiliation Miner may loo+ li+e this* The main "indo" o$ WUM can be resied or mo%ed on

your des+top "ithout di$$iculties*

How to !reate a "ew Minin# $ase

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

24 30/07/2014 10:28

Page 4: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 4/24

Each minin# project re)uires a minin# base "ithin WUM* , minin# base contains

descripti%e in$ormation as "ell as an Object 3tore P3E Pro database and %arious other $iles

created by the miner durin# the minin# process* (n order to create a ne" minin# base $or

this tutorial! please open the File menu and select Create Mining Base.

There are $i%e te't $ields $or the parameters o$ the ne" minin# base* Each minin# base

must ha%e a uni)ue name that may include blan+ spaces and numbers* The correspondin#

"eb ser%er UA7 can optionally be stored $or $uture use*

Each minin# base must ha%e its o"n directory to store the database and other related $iles*

(t is recommended to create a subdirectory in the directory data $or each ne" minin# base

be$ore startin# the miner* The minin# base o$ this tutorial "ill be stored in the e'istin#

directory data/demoWebSite* Blic+ on the button (Directory) ... to open a $ile dialo# o$ your

operatin# system* (n order to select the necessary directory websites/demoWebSite! please

select the directory and clic+ OK * ,lternati%ely! the name o$ an e'istin# directory can beentered in the correspondin# te't $ield*

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

24 30/07/2014 10:28

Page 5: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 5/24

,$ter selectin# or enterin# the home directory o$ the ne" minin# base! the current dialo#

should - more or less - loo+ li+e this4

,dditionally! the local directory containin# the lo# $iles o$ your Web ser%er must be

speci$ied* The demo lo# $ile AccessLog.txt is stored in the same directory as the database*

There$ore! clic+ on the button (Log Files:) ... to open the $ile dialo# o$ your operatin#

system* Open the directory data/demoWebSite and $inally clic+ OK * to select the lo# $ile

directory*

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

24 30/07/2014 10:28

Page 6: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 6/24

,$ter chec+in# the entered parameters! please clic+ the button OK  in order to create the

ne" minin# base $or this tutorial* Blic+in# Bancel "ould abort the creation o$ a ne" minin#

base* (n this case! the $ocus "ould be returned to the main "indo" o$ WUM*

,$ter success$ully creatin# a ne" minin# base! the title o$ the main "indo" contains thename o$ the ne" minin# base in brac+ets* The ne" minin# base is no" open and can be

used $or $urther operations* There can be only one open minin# base at a time* The Object

3tore P3E Pro database consists o$ three $iles WUM.MiningBase.* that are stored in the same

directory* Please do not edit! modi$y or delete these $iles*

Please note that the underlyin# Object 3tore P3E Pro is a sin#le user database only* The

.CM3 o$ Object 3tore P3E Pro uses a loc+in# mechanism to secure that each minin# base

is accessed by e'actly one user at a time* The database o$ an open minin# base is loc+ed

by creatin# a subdirectory WUM.MiningBase.odx in its home directory*

($ the pre%ious minin# session ended abnormally! the loc+ directory can be deleted byWUM in order to start the miner* Ce$ore unloc+in# a database by $orce! ma+e sure that

there is no other user "or+in# "ith the correspondin# minin# base*

How to Import a %o# &ile

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

24 30/07/2014 10:28

Page 7: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 7/24

,$ter creatin# a ne" or openin# an allready e'istin# minin# base! HTTP ser%er lo# $iles "ith

increasin# time stamps can subse)uently be imported into the minin# base* The import

module per$orms basic data cleanin# operations on each lo# $ile line and updates the

database "ith data o$ ne" %isitors and Web pa#es* (n order to import the small demo lo#

$ile into the tutorial minin# base! please open the File menu and select Import Log File.

The user inter$ace o$ the import module is depicted in the ne't picture* There are a $e"

parameters that must be speci$ied by the user be$ore a lo# $ile can be imported* ,part$rom simply enterin# the lo# $ile name and its $ormat! all parameters concernin# the data

cleanin# process should be considered %ery care$ully*

The te't $ield Filename contains the de$ault directory o$ HTTP ser%er lo# $iles* Cy clic+in#

the button (Filename) ...! you can speci$y the lo# $ile to be imported usin# the $ile dialo# o$ 

your operatin# system* ,$ter choosin# the correct $ile and clic+in# OK ! the complete lo# $ile

name "ill be sho"n in the te't $ield*

WUM currently supports $our "ide-spread lo# $ile $ormats* There is an e'ample lo# $ile line

o$ each $ile $ormat in the $ollo"in# table4 The e'ample lo# $ile AccessLog.txt corresponds to

the common lo# $ile $ormat* There$ore! please chec+ the Common Log File radio button*

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

24 30/07/2014 10:28

Page 8: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 8/24

The $ollo"in# table contains an e'ample lo# $ile line $or each lo# $ile $ormat supported by

WUM4

Bommon picasso.wiwi.hu-berlin.de - - [10/Dec/1999:23:06:31 +0200] "GET /index.html

HTTP/1.0" 200 3540

E'tended picasso.wiwi.hu-berlin.de - - [10/Dec/1999:23:06:31 +0200] "GET /index.html

HTTP/1.0" 200 3540 "http://www.berlin.de/" "Mozilla/3.01 (Win95; I)"

Boo+ie picasso.wiwi.hu-berlin.de - - [10/Dec/1999:23:06:31 +0200] "GET /index.html

HTTP/1.0" 200 3540 "http://www.berlin.de/" "Mozilla/3.01 (Win95; I)"

"VisitorID=10001; SessionID=20001"

M3-((3 picasso.wiwi.hu-berlin.de, -, 10.12.99, 23:06:31, W3SVC2, WWW,

100.100.100.100, 547, 444, 0, 200, 0, GET, /index.html, -,

(n order to reduce the number o$ "eb pa#es "ithin the WUM database! HTTP re)uests can

be truncated by cuttin# o$ all characters startin# at the $irst occurence o$ &D& HTM7

anchorsF or &G& B( parametersF* E'amples4 ($ the option r!ncate "e#!ests: $ML

 %nc&ors is enabled! the re)uests ET =contact*htmlDaddress and et

 =contact*htmlDemail "ill both be shortened to ET =contact*html and "ill there$ore be

treated as re)uests concernin# the same "eb pa#e* ($ the option r!ncate "e#!ests: C'I 

arameter  is enabled! the re)uests PO3T =c#i-bin=do"nload*c#iGuseridI12J%ersionIa

and PO3T =c#i-bin=do"nload*c#iGuseridI9?L%ersionIb "ill both be shortened to PO3T

 =c#i-bin=do"nload*c#i*

The WUM distribution contains a %ery small lo# $ile AccessLog.txt that is to be used in this

tutorial* :The tutorial is hope$ully to be continued at some point in time* .o you "ant to

helpG;

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

24 30/07/2014 10:28

Page 9: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 9/24

 

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

24 30/07/2014 10:28

Page 10: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 10/24

Please +eep in mind that the import module o$ WUM per$orms only basic substrin#operations on each lo# $ile line* ,ccordin# to the user&s indi%idual minin# #oals!

preprocessin# the ra" lo# $ile "ith the help o$ user speci$ic Perl scripts etc* can be

e'tremely use$ul*

How to 'naly(e a %o# &ile

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

de 24 30/07/2014 10:28

Page 11: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 11/24

How to )isuali(e the !ontents

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

e 24 30/07/2014 10:28

Page 12: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 12/24

The #enerated HTM7 report can be $ound here*

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

de 24 30/07/2014 10:28

Page 13: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 13/24

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

de 24 30/07/2014 10:28

Page 14: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 14/24

(ma#e o$ Bomplete ,##re#ated 7o#

How to *+ecute MI"T ,ueries

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

de 24 30/07/2014 10:28

Page 15: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 15/24

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

de 24 30/07/2014 10:28

Page 16: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 16/24

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

de 24 30/07/2014 10:28

Page 17: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 17/24

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

de 24 30/07/2014 10:28

Page 18: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 18/24

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

de 24 30/07/2014 10:28

Page 19: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 19/24

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

de 24 30/07/2014 10:28

Page 20: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 20/24

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

de 24 30/07/2014 10:28

Page 21: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 21/24

How to *+it from WUM

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

de 24 30/07/2014 10:28

Page 22: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 22/24

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

de 24 30/07/2014 10:28

Page 23: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 23/24

emarks and an 'dditional *+ample

WUM accepts as input a template! i*e* an ordered list o$ %ariables and "ildcards! and a

conjunction o$ constraints on the statistics o$ those %ariables* (t $inds all se)uences! "hich

ta+en to#ether build a pattern actually a directed acyclic #raphF that satis$ies the templateand the constraints*

E'ample4 We are interested in an e%ent ' that occurs a$ter y "ith probability at least >?*

This e%ent y should appear in at least 188 o$ our se)uences* ' needs not occur

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html

de 24 30/07/2014 10:28

Page 24: WUM Tutorial

7/27/2019 WUM Tutorial

http://slidepdf.com/reader/full/wum-tutorial 24/24

immediately a$ter y! but it should not be more than ? e%ents a"ay $rom y* This

speci$ication produces the template y [0;5] x "here x and y are %ariables* The "ildcard

[0;5] stands $or any number o$ e%ents! and the inter%al [0;5] constraints the "ildcard

bet"een ero and up to ? e%ents* The constraints on x and y result in t"o restrictions4

y.support >= 100

and ( x.support / y.support ) > 0.95

To $ind the se)uences satis$yin# this template and constraints! issue the $ollo"in# M(NT

)uery4

select t

from node as a b, template y [0;5] x as t

where y.support >= 100

and ( x.support / y.support ) > 0.95

ou can use this )uery in the demo* Cut you ha%e to reduce the support o$ y! because

there is no e%ent that appears in more than @ se)uences* This "as just an e'ample* 5or

the $ormal de$initions and the description o$ the miner at "or+! please re$er to the

publications about the Web Utiliation Miner WUM*

When issuin# a M(NT )uery! WUM $inds all acceptable bindin#s $or the template %ariables*

, bindin# is a list o$ e%ents! i*e* o$ %alues! bound to the %ariables* , bindin# is acceptable i$ 

the e%ents comprisin# it appear in se)uences "hich4

con$orm to the template&s structure

ta+en to#ether constitute a #roup! the statistics o$ "hich satis$y the )uery

constraints

(n the abo%e e'ample! a UA7 *html in the dataset could be bound to %ariable y* , UA7

<*html could then be bound to x! only i$ there e'ists a se)uence "here <*html appears

"ithin L positions a$ter *html* 5or the bindin# to be acceptable! there should be at least

188 se)uences containin# *html and >? o$ them should contain <*html in at most L

positions a$ter *html* Those se)uences contribute the bindin# *html! <*htmlF*

WUM disco%ers all acceptable bindin#s $or the )uery and builds a na%i#ation pattern $or

each bindin#* , na%i#ation pattern is a directed acyclic #raph comprised o$ the se)uencescontributin# the bindin#4 the se)uences ha%e been mer#ed at common pre$i' and at each

e%ent o$ the bindin#*

The %isualiation tool o$ WUM can display a na%i#ation pattern in t"o "ays4

The template tree consists only o$ the e%ents comprisin# the bindin#* The e%ents

are annotated "ith the number o$ contributin# se)uences*

This $ormat #i%es an o%er%ie" o$ the e%ents that satis$y our )uery! "ithout

in$ormation on the surroundin# e%ents*

,n a##re#ate tree is a set o$ subse)uences mer#ed on common pre$i'* 5or t"o

consecuti%e e%ents in the bindin#! the a##re#ate tree sho"s the $ra#ments o$ the

contributin# se)uences bet"een those t"o e%ents*WUM cannot yet display #raphs* 3o! a na%i#ation pattern is split into a##re#ate

trees! one per e%ent in the bindin#* This e%ent is then the root o$ the a##re#ate

tree*

 

Top o$ the Pa#e  7e#al Notice  .ecember J! 2889

sten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html