data mining in the weblog dr. teh ying wah faculty of computer science and information technology...

1
Data Mining in the Weblog Dr. Teh Ying Wah Faculty of Computer Science and Information Technology University of Malaya Introduction For a data warehouse environment, sales managers need to deal with very large data sets of sales items due to globalised marketing as current and future trends. To make globalisation possible, we must allow sales managers throughout the world to log on the system. On the average, users can tolerate at most 8 seconds, as this is the limit of peoples’ ability to keep their attention focused while waiting. Getting a reasonable response time is a very critical issue for a company that is going for globalization. Indexes have emerged as one of the techniques for dealing with very large data volumes and fast response time requirements in the data warehouse environment. iterature Review Current research in query processing chniques comprises either the tomatic or non-automatic selection query processing techniques able 1). Both approaches, however, re not suitable for a data warehouse. ere are too many parameters to lect in data warehouse performance tuning. crosoft’s AutoAdmin and Microsoft SQL 2000’s tuning zard use the optimiser estimated cost for all the SQL atement. Microsoft SQL 2000’s tuning wizard is not an en-source software, thus, it is impossible to change e existing codes. Therefore, data mining techniques e proposed as intelligent ways to handle the query ocessing techniques in this research. Data Mining Techniques in Indexes A high priority user’s (such as a manager) access Weblog file keeps track of the high priority user decision- support queries from time T1 to time T19, as shown in Fig. 1 Fig. 1: Weblog Table 2 shows a training data set with four data attributes and two classes. Table 2 : Training Data Set Fig. 2 shows how the data mining technique works with the training data set. Fig. 2: Decision Tree Model Evaluation The test data which is evaluated is based on Transaction Processing Performance benchmark Council’s web log file. Table 3 shows the performance TPC-H sample web log file. Table 3 : Performance TPC-H Web Log Conclusion There are great improvements in response times of queries after applying data mining models in indexes.

Upload: dale

Post on 25-Feb-2016

58 views

Category:

Documents


7 download

DESCRIPTION

Data Mining in the Weblog Dr. Teh Ying Wah Faculty of Computer Science and Information Technology University of Malaya. Introduction - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Mining in the Weblog Dr. Teh Ying Wah Faculty of Computer Science and Information Technology University of Malaya

Data Mining in the Weblog Dr. Teh Ying Wah

Faculty of Computer Science and Information TechnologyUniversity of Malaya

Introduction For a data warehouse environment, sales managers need to deal with very large data sets of sales items due to globalised marketing as current and future trends. To make globalisation possible, we must allow sales managers throughout the world to log on the system. On the average, users can tolerate at most 8 seconds, as this is the limit of peoples’ ability to keep their attention focused while waiting. Getting a reasonable response time is a very critical issue for a company that is going for globalization. Indexes have emerged as one of the techniques for dealing with very large data volumes and fast response time requirements in the data warehouse environment.

Literature Review Current research in query processing techniques comprises either the automatic or non-automatic selection of query processing techniques (Table 1). Both approaches, however, are not suitable for a data warehouse. There are too many parameters to select in data warehouse performance tuning. Microsoft’s AutoAdmin and Microsoft SQL 2000’s tuning wizard use the optimiser estimated cost for all the SQL statement. Microsoft SQL 2000’s tuning wizard is not an open-source software, thus, it is impossible to change the existing codes. Therefore, data mining techniques are proposed as intelligent ways to handle the query processing techniques in this research.

Data Mining Techniques in Indexes A high priority user’s (such as a manager) access Weblog file keeps track of the high priority user decision-support queries from time T1 to time T19, as shown in Fig. 1

Fig. 1: Weblog

Table 2 shows a training data set with four data attributes and two classes.

Table 2 : Training Data Set

Fig. 2 shows how the data mining technique works with the training data set.

Fig. 2: Decision Tree Model

Evaluation The test data which is evaluated is based on Transaction Processing Performance benchmark Council’s web log file. Table 3 shows the performance TPC-H sample web log file.

Table 3 : Performance TPC-H Web Log

Conclusion There are great improvements in response times of queries after applying data mining models in indexes.