performance analysis for data martvinayakamission.com/userfiles/phd/o846900004.pdf · data...

1

PERFORMANCE ANALYSIS FOR DATA MART

Thesis submitted in partial fulfillment

for the award of the Degree of

Doctor of Philosophy in Computer Science

by

M. PAULRAJ

Under the Guidance and Supervision of

Dr. P. SIVAPRAKASAM

VINAYAKA MISSIONS UNIVERSITY SALEM, TAMILNADU, INDIA

MARCH 2014

ii

DECLARATION

I, M. PAULRAJ hereby declare that the thesis entitled

“PERFORMANCE ANALYSIS FOR DATA MART” submitted by me for

the Degree of DOCTOR OF PHILOSOPHY IN COMPUTER SCIENCE is a

research work carried by me during 2008 - 2014 under the guidance and

supervision of Dr. P. SIVAPRAKASAM, Associate Professor, Sri Vasavi

College, Erode and that this work has not formed the basis for the award of

any degree, diploma, associate-ship, fellowship titles in this or any other

University or other similar Institutions of higher learning.

Station : Signature of the Candidate Date : (M. PAULRAJ)

iii

CERTIFICATE I, Dr. P. SIVAPRAKASAM, Associate Professor, Sri Vasavi College,

Erode certify that the thesis entitled “PERFORMANCE ANALYSIS FOR DATA MART” submitted for the Degree of DOCTOR OF PHILOSOPHY IN

COMPUTER SCIENCE by M. PAULRAJ is the record of research work

carried out by him during the period of 2008 - 2014 under my guidance and

supervision and that this work has not formed the basis for the award of

any degree, diploma, associate-ship, fellowship or other titles in this or any

other University or other similar Institutions of higher learning.

Station : Signature of the Supervisor Date :

(Dr. P. SIVAPRAKASAM)

iv

ACKNOWLEDGEMENTS

Firstly, the omnipresent God for answering my prayers and giving

me the strength to plod on despite my constitution. This dissertation would

not have been possible without the guidance, help and support of the kind

people around me who, in one way or another contributed and extended

their valuable assistance in the preparation and completion of this study.

I would like to express my deepest gratitude to my research supervisor

Dr. P. Sivaprakasam, Associate Professor, Sri Vasavi College, for his

excellent guidance, motivation and having allowed me complete freedom to

define and explore my own directions in research. His high standards for

research inspired me to do better work than I would have thought I was

capable of.

My heartfelt thanks to Dean of Academics and all faculty members of

Vinayaka Missions University, Salem.

I would like to thank the Management and Principal of Sri Vasavi

College, Erode, for providing me the Lab, Academic and Technical

support.

I wish to thank my family members for the enormous amount of

encouragement, wishes, prayers and kind words which paved the way for

my success. My special thanks to my wife Gloria Mary and my daughters

Nivetha P Raj and Niranjana P Raj for their support and encouraging me

with their best wishes and prayer.

v

ACRONYMS

FLIDMA Functional Layer Interfaced Data Mart Architecture

UCI Unique Client Identifier

TDW Trajectory Data Warehouse

FBP Functional Behavior Pattern

OLAP On-Line Analytical Processing

OLTP On-Line Transaction Processing

DTDI Data mart Transition Detection using Interface

IRM Inductive Rule Mining

FBPDM Functional Behavior Pattern in Data Mart

MFDWM Multi-Functional Data Warehousing Model

LDR Layered Data Repository

FBA Functional Behavior Analyses

FDRL Functional Data Repository Layers

CTP Clique Tree Propagation

ETL Extract, Transformation and Loading

DSS Decision Support System

IRIS Identity Risk and Investigation Solution

IRSA Indiscernibility-based Rough Set Approaches

DRSA Dominance-based Rough Set Approaches

GUI Graphics User Interface

PI Probabilistic Inverted

FRS Fuzzy-Rough Set

DHT Distributed Hash Tables

AIBFC Agglomerative Iterative Bayesian Fuzzy Clustering

FSQL Fuzzy-state Q-learning

vi

ABSTRACT

The rising challenge in storage of large volume of data for business

and corporate environment increase the need of data warehousing. Data

warehousing gathers the data at various levels like departmental,

operational, and functional. Data are stored as a group of data repository

with better storage efficiency.

Numerous data warehousing models focus on stacking the data

more efficiently and rapidly. Existing models in data warehouse proceeds

with construction of data mart but fails in the analysis of functional

behavior.

Additionally extraction of data from the warehouse requires better

understanding of the data repository structure. But function requirements of

users are not easily understood by the data warehouse model. Data

warehouse model requires a better technique to extract the user

demanded data from data warehouse.

The issue of functional decision support system to extract user

relevant data is handled by introducing data marts concept. Data mart is

planned in a way to built separate functional data repository layers based

on the departmental decision support requirements in the business and

corporate environments.

The merits of creating data mart is as follows:

Data mart access regularly desired data easily

Generates combined view by a group of users

Enhances end-user response time

Lesser cost than employing a complete data warehouse

To offer a better decision support system, a “Functional Layer Interfaced Data Mart Architecture” is proposed for larger corporate and

enterprise data applications.

vii

The research work involves the analyses of functional behavior of

the corporate system based on data mart operational goal. The objective of

functional behavior analyses is to construct layers of data storage

repositories with relevant data attributes using functional behavior pattern

in data mart (FBPDM). The work of functional behavior pattern is to identify

the functional activities of the data mart based on attribute relativity.

The required user demanded data is extracted from data mart with

the establishment of efficient decision support system. The new technique

of Inductive rule mining (IRM) is proposed to enhance the decision support

system for the extraction of user demanded data in data mart.

The purpose of decision support with efficient inductive rule mining

on functional data marts is to segregate the layered data repository and

extract the required information for the user. The new techniques of

induced rules facilitate the supportive knowledge for identifying the user

required information.

The performance of efficient functional behavior pattern and well

defined inductive rule mining is measured in terms of attribute relativity,

functional behavior analysis, decisive rule formation, data retrieval by about

10% to 15% in comparison to multi-functional data warehousing model.

The objective of the performance analysis of data mart in functional

layered repositories with efficient induction rule mining is to provide an

experiential and scientific foundation in extracting user demanded data

from data mart. The effectiveness of the IRM mechanism is finalized with

benchmark datasets from UCI repository on varying characteristics.

The performance metrics of IRM are functional behavior analysis,

attribute relativity, decisive rule formation and efficient data retrieval.

Evaluation of functional behavior in data mart and inductive rule mining in

extracting user demanded data is performed with Insurance Company

Benchmark (COIL 2000) data Set from UCI Repository data sets.

viii

The functional behavior of the corporate system is analyzed to build

layers of data storage repositories with relevant data attributes. Inductive

rule mining (IRM) improves the decision making in retrieving data from

data mart. Based on the formation of inductive rules, the user demanded

information is extracted from the insurance company repositories.

ix

TABLE OF CONTENTS

Page No.

CHAPTER 1 INTRODUCTION

1.1 Background 1

1.2 Statement of the Problem 2

1.3 Date Warehousing and Data Mart 3

1.3.1 Basic Concepts in Data Warehouse Technology 4

1.3.2 Data Mart Development and User Feedback 5

1.3.3 Dynamics of Data Mart Development 16

1.4 Decision Making in Data Mart 17

1.4.1 Multi Agents (MA) 18

1.4.2 Report Visualization Agent 19

1.4.3 ETL (Extraction, Transformation and Load) 19

1.4.4 Knowledge Base 19

1.4.5 Decision Making 20

1.5 Inductive Rule Mining 20

1.6 Motivation and Goal of the Study 22

1.7 Organization of the Remainder Study 23

CHAPTER 2 LITERATURE REVIEW

2.1 Introduction 24

2.2

Functional Behavior Pattern for Data Mart Based on attribute Relativity 2.2.1 Behavior patterns of Human

2.2.2 Temporal patterns Evaluation

2.2.3 Real Time Warehouse

2.2.4 Real Time environments

24

26 27

29

31

x

2.3 Mining User Demanded Data from Data Mart Using Inductive Rule Mining 2.3.1 Attribute Reduction 2.3.2 Consistent Metadata and Data Warehouse 2.3.3 Traditional Warehouse and Up-to-date Infor

41

44 50 54

2.4 Research Gap 55

2.5 Contribution of thesis 56

CHAPTER 3 PROPOSED METHODOLOGY - FUNCTIONAL

BEHAVIOUR PATTERN IN DATA MART

3.1 Introduction 3.1.1. Proposed Functional Behavior Pattern with

new Technique

58

60

3.2 Process of Data Mart in Data Warehousing 61

3.3 Impact of Data Warehouse Requirements in functional Behavior

63

3.4 An Efficient Functional Behavior Pattern of Data Mart based on attribute Relativity

72

3.4.1 Process of Data Mart 73

3.4.2 Efficient Analysis for Identifying the functional behavior pattern

77

3.5 Experimental Evaluation 78

3.6 Results and Discussion 80

3.6.1 Data Retrieval 80

3.6.2 Attribute Relativity 83

3.6.3 Functional Behavior Analysis 85

3.6.4 Data Storage Repositories 86

3.6.5 Response Time 88

3.6.6 Data Mart Management 89

3.7 Summary 91

xi

CHAPTER 4

AN EFFICIENT INDUCTIVE RULE MINING NEW TECHNIQUE IN EXTRACTING USER DEMANDED DATA FROM DATA MART

4.1 Introduction 93

4.2 Rule Induction for Supporting Decision Making 94

4.2.1 Decision Making on Inductive Rule Mining

95

4.2.2 Rule Induction for Supporting Decision Making

97

4.3 Efficient Analysis to Extract User Demanded Data in Data Mart Using Inductive Rule Mining

106

4.3.1 Decision Support System In data mart 108

4.3.2 Decision rule induction for extracting information

109

4.4 Experimental Evaluation on Induction Rule mining 112

4.5 Results and Discussion 114

4.5.1 Decision Rules 115

4.5.2 Data Relativity 117

4.5.3 Reliability 119

4.5.4 Running Time 121

4.5.5 Rule Coverage 123

4.6 Summary 125

CHAPTER 5

AN EFFICIENT FUNCTIONAL LAYERED DATA MART BASED ON NEW TECHNIQUE OF INDUCTIVE RULE MINING

5.1 Introduction 5.1.1 Attribute Relativity in Insurance Data Mart

127 130

xii

5.2 Functional Behavior Analysis on Attribute Relativity 132

5.3 Formation of Functional Layered Repositories based on insurance policies

134

5.4 Efficient Technique for Retrieving data from Functional Layered Insurance Repositories

137

5.5 Decision Making on Functional Layered Insurance Repositories

138

5.6 Efficient Decision Rule patterns 140

5.7 Decision Rule Based on Functional Attribute 142

5.8 Rule Induction with Decision Rules 144

5.9 Efficient Induction Rule Mining in Extracting User Demanded Data

146

5.10 Functional Behavior Analysis 149

5.11 Performance result of Memory Consumption 150

5.12 System Response Time 152

5.13 Execution Time Evaluation 154

5.14 Summary 156

CHAPTER 6

CONCLUSION AND FUTURE WORK

6.1 Conclusion 160

6.2 Future Work 162

BIBLIOGRAPHY 163

xiii

LIST OF TABLES

Table No. Title Page

No. 3.1 Relevant Attribute Vs. Data Retrieval 81 3.2 Data Mart Vs. Attribute Relativity (%) 83 3.3 No. of Data Marts Vs. Functional Behavior Analysis 85 3.4 No. of Data Vs. Data Storage Repositories 87 3.5 Data Accessibility Vs. Response Time 88 3.6 No. of Data Mart Vs. Data Mart Management 90 4.1 No. of User Queries Vs. Decisive Rules 116 4.2 Data Mart Vs. Extracted Data Relativity 118 4.3 Number of Users Vs. Reliability 120 4.4 Data Extraction Vs. Running Time 122 4.5 Data Mart Vs. Rule Coverage 124 5.1 Tabulation for Attribute Relativity 131 5.2 Tabulation of Functional Behavior Analysis 133 5.3 Tabulation for Functional Layered Repositories 135 5.4 Tabulation for Data Retrieval 137 5.5 Tabulation for Decision Support System 139 5.6 Tabulation for Decision Rule Pattern 141 5.7 Functional layered Insurance Repositories Vs. Decision

Rule 143

5.8 Tabulation for Rule Induction 145 5.9 Data Mart Vs. Extracting User Demanded Data 148 5.10 No. of Data Marts Vs. Analysis of functional behavior 149 5.11 Transaction Density Vs. Memory consumption 151 5.12 No. of users Vs. System Response Time 153 5.13 Tabulation of No. of requests sent Vs. Execution

Time 155

xiv

LIST OF FIGURES

Figures

No. Title Page No.

1.1 Types of Data Warehouse 1

1.2 Top down Flow from Data Warehouse to Data Mart 8

1.3 Bottom up Flow from Data Mart to Data Warehouse 9

1.4 Parallel Model in Data Mart Creation 11

1.5 Top down Model with End User Feedback 12

1.6 Bottom up Flow from Data Mart to Data Warehouse 14

1.7 Parallel Model in Creation of Data Mart with Feedback 15

1.8 Framework of Decision Support System 18

1.9 Inductions in Rule Mining 21

3.1 Data Mart in Data warehouse 62

3.2 Abstraction Levels of Data Warehouse Requirements 66

3.3 Architecture diagram of FBPDM 73

3.4 Process of Data Mart 74

3.5 Relevant Attributes Vs. Data Retrieval 82

3.6 Data Mart Vs. Attribute Relativity (%) 84

3.7 No. of Data Marts Vs. Functional Behavior Analysis 86

3.8 No. of Data Vs. Data Storage Repositories 87

3.9 Data Accessibility Vs. Response Time 89

3.10 No. of Data Mart Vs. Data Mart Management 90

4.1 Roles of DS in Decision Making Process 96

4.2 Decisions Making based on Inductive Rule Mining 105

4.3 Architecture diagram of the IRM 107

4.4 Process of Decision rule induction for extracting user information

110

xv

Figures No. Title Page

No.

4.5 No. of User Queries Vs. Decisive Rules 117

4.6 Data Mart Vs. Extracted Data Relativity 119

4.7 No. of users Vs. Reliability 121

4.8 Data Extraction Vs. Running Time 122

4.9 Data Mart Vs. Rule Coverage 124

5.1 No. of Attributes in Insurance Data Mart Vs. Attribute Relativity

132

5.2 No. of Attribute Relativity Vs. Functional Behavior Analysis

134

5.3 No. of Insurance Policies Vs. Functional Layered Repositories

136

5.4 Functional Layered Insurance Repositories Vs. Data Retrieval

138

5.5 Functional Layered Insurance Repositories Vs. Decision Support System

140

5.6 Functional Layered Insurance Repositories Vs. Decision Rule Pattern

142

5.7 Functional Layered Insurance Repositories Vs. Decision Rule

143

5.8 Decision Rules Vs. Induction 146

5.9 Data Mart Vs. Extracting User Demanded Data 148

5.10 No. of Data Marts Vs. Analysis of functional behavior 150

5.11 Transaction Density Vs. Memory consumption 152

5.12 No. of users Vs. System Response Time 153

5.13 No. of requests sent Vs. Execution Time 155

1

1. INTRODUCTION

1.1 Background

A data warehouse is a subject-oriented, incorporated, time-changing

and non-volatile set of shared data. Various architectures of data

warehouse are useful in numerous research applications and business

products. All architectures are put down in one of the three types, namely,

centralized DW [1], data mart and distributed DW. Figure 1.1 shows the

architectures of these three types.

Figure 1.1 Types of Data Warehouse

2

The Figure 1.1 describes the three types of data warehouse. In a

centralized DW, all enterprise data from various operational and functional

departments of a project is incorporated and loaded in a distinct database

with a single activity model. Centralized DW [2] is generated to support

knowledge workers with reliable and combined data from major project

area. They provide the potential to associate information across the

enterprise. Centralized data warehouses [4] are very complex and

time-consuming to execute.

Data Mart stands for well-concentrated version of a Data Warehouse.

Because the data is extremely focused, set up costs and time are

drastically reduced. The importance of a data mart is calculated by the

efficiency of broadcasting a relevant and absolute subset [3] of data

existing in data source to achieve decision support. Data in data mart is

extracted from legacy and operational systems and enterprise data

warehouse [5,7].

1.2 Statement of the Problem

Current Data warehouse models concentrate on the process of data

mart identification and processing. The model fails in analyzing the

functional behavior of the patterns [17]. No significant importance is given

to the selection of relative attribute in presented data warehouse models.

Users or other data interaction is more when identifying the behavior

3

patterns and they affect their choice of actions. Thus, it leads to

complicated action learning difficult problem.

Some existing methods fail to cover all the areas of ETL (Extract,

Transform and Load) and querying [15, 21]. Hence steps are to be taken to

achieve the mentioned benefits especially for the querying mechanism.

In streaming warehouse [26] table updating is performed. In that from

multiple streams evolves many new data arrives, but no steps are involved

for limiting the number of tables to be updated concurrently.

In the present methods the user demanded information is not

effectively retrieved. Some useful techniques are applied to fuzzy rough set

[33] and not to other form of rough set. Hence, a new technique is required

to apply all type of rough sets. Whenever the sources are to be updated,

some of the data become outdated.

1.3 Data Warehousing and Data Mart

Data warehouses are attractive significant benefit of any modern

business enterprise with standard applications in business planning and

strategic analysis. For instance, sales departments use data warehouses

to learn the buying reports of their customers to conclude their business

and sharing plans accordingly. The data warehouse module is a database

developed for logical processing [19] with main purpose to preserve and

analyze historical data.

4

In the model case, facts are the sales of an enterprise, and dimensions

enable the analysis by product, purchaser, point of sale, time of sale and so

on. In simple warehouses, data marts extract their information directly from

operational databases [7]. In complex situations like the data warehouse

architecture [25] is multilevel and the data mart information is extracted from

intermediate repositories called operational database.

1.3.1 Basic Concepts in Data Warehouse Technology

Operational databases provide a combined view of the application,

resulting in mapping each entity of the real world to exactly one concept of

the schema. Therefore, a difficult truth is related to complex plans.

The objective of complex plan is to capture the difficulty of the application

domain. Plans are frequently denoted at conceptual level through an entity-

relationship [27, 31] data model.

In contrast, data warehouse regularly leads the complexity by

providing a vision. The vision in each data is split into a number of simple

plans called data marts. Each one is focused for a particular analysis

activity. In turn each data mart denotes data by means of a star plan

consisting of a large fact table as center and a set of smaller dimension

tables placed in a radial pattern [12] around the fact. The fact table

includes numerical or preservative measurements for the purpose of

computing quantitative values about the enterprise. The dimension tables

provide the entire descriptions of the dimensions of the enterprise.

5

The construction of data warehouse details are exposed in terms of

entity-relationship diagram [13]. The description is the only requirement to

the construction of data warehouse. Formatting an integrated entity-

relationship plan of its information is more difficult on applying operational

database across different independent legacy systems.

1.3.2 Data Mart Development

A data mart is a decision support system integrating a subset of the

enterprise data concentrating on definite functions or activities of the

enterprise. Data marts includes distinct enterprise related features such as

forecasting the effect of marketing promotions, computing sales performance,

calculating the effect of new product introductions on enterprise profits and

forecasting the performance of a new enterprise division. Data Marts are

precise enterprise-related [42, 15] software applications.

Second, data marts are completely developed at a faster pace.

At the same time it is possible to create models of success and share

constituencies that are well on data mart applications in general. Third, data

mart achieves specific functions for a different unit like recognized

corporate or organizational task and political justification of a data mart.

Generally, it is obvious that a manager is able to achieve best decision

support within the specified enterprise budget [25] in addition to improved

technology. The conditions are appropriately solved with the decision

support system (DSS) applications [67, 9, 16].

6

The first pattern of data mart development is best characterized as

subsets of the data warehouse. The subset of data warehouse is placed

comparatively cheap computing platforms [72] that are closer to the user

and are periodically updated from the central data warehouse. In this

pattern the data mart is the child of the data warehouse, and viceversa

data warehouse is the parent of data mart.

The second pattern of development rejects the dominance of data

warehouse. Data mart is independently derived from the collection abstract

of information that predates both data warehouses and data marts.

The data mart uses data warehousing techniques of organization and

managing. The data mart is structurally a data warehouse. Data mart is

just a smaller data warehouse with an exact enterprise function.

The third pattern of development tries to combine and remove the

difference inherent in the first two developments. Here, data marts are

developed in a parallel way with the data warehouse. Both data warehouse

and data mart are developed from collection abstract of information.

But data marts are independent from data warehouse development.

The three patterns are developed without user feedback and with the

availability of user feedback. Each analysis believes that the relationship

between data warehouses and data marts is quite static. The data mart is a

subset of the data warehouse, or the data warehouse is an outgrowth of the

data marts. Data warehouse is parallel development with the data marts

7

directed by the data warehouse data model. Eventually, data mart is outmoded

by the data warehouse in offering a final solution to the abstract of information

problem. In all the three pattern analysis [52] the role of users in the dynamics

is not considered for data warehouse and data mart relationship.

Development Models Without User Feedback

The three pattern of data mart development without user feedback is

detailed. The data mart development is illustrated with a diagrammatical

representation of three models. The alternative models consider the

responsibility of users in collecting feedback for the purpose of development in

data warehouses and data marts. At last, an analysis of the effectiveness of

the six patterns of development is presented in illumination of a particular belief

on organizational reality.

i. Top down Model

The data warehouse is developed from the collection of information

through application of the Extraction, Transformation and Loading (ETL)

process [77]. The data warehouse combines all data in a general format

and a familiar software environment. In theory all organizations' data

resources are combined in the data warehouse build. All data are essential

for decision support that neighborhood in the data warehouse.

It only remains to share the data to information consumers and to present it

so that it does constitute information for them.

8

Abstract of Information

Figure 1.2 Top down Flow from Data Warehouse to Data Mart

The Figure 1.2 describes the top down pattern in building of data

mart. The task of the data mart is to present suitable subsets of the data

warehouse to customers with specific functional needs. In addition, the

structure of the data is concentrated to facilitate better information and to

offer an interface to front-end reporting [61]. The analysis tool [29] provides

the enterprise intelligence precursor information.

ii. Bottom Up Model

In the bottom up model, data marts are built from pre-existing

abstract of information and the data warehouse from the data marts. In this

9

model, the data marts are designed and implemented in parallel.

Development of this type is expected to contain both redundancy and

significant information gaps from an enterprise point of view. Each data

mart attains a combination of abstract of information in the service of the

data mart’s function.

The combination survives only from the thin view point of the

enterprise function supporting the data mart. From the enterprise point of

view, new legacy systems are generated by such a process, and these

form new abstract of information. The only development made is that the

new abstract utilize updated technology. But they are no more combined,

consistent old abstracts exist and no more capable of supporting enterprise

wide functions.

Figure 1.3 Bottom up Flow from Data Mart to Data Warehouse

10

The Figure 1.3 describes the bottom up flow from data mart to data

warehouse. The right-hand side of Figure 1.3 shows the data mart abstract of

information utilized as the basis of data warehouse incorporation.

The process in building data mart with similarity activities supports data

integration. The process is required to eliminate the redundancy in the data

marts, to recognize the issues in the process of isolative data mart creation.

The task of ELT [42] is to integrate the old abstract of information into the new

data warehouse in order to solve the issues. The opportunity of using older

abstract of information in this way is not predicted. The flow from data marts

to data warehouse is sufficient to generate a data warehouse with complete

coverage of enterprise data requirements.

iii. Parallel Model

The most interesting pattern of development compared to top-down

and bottom-up is the parallel development model. The parallel model

observes the freedom of the data marts as restricted in two ways. First, the

data mart is directed during parallel development by a data warehouse

data model exposing the business point of view. This same data model is

used as the basis for enduring development of the data warehouse,

promising that the data marts and the data warehouse are proportional.

Information gaps and redundancies are planned out and catalogued as

data mart development goes forward.

11

Figure 1.4 Parallel Model in Data Mart Creation

The Figure 1.4 describes the parallel model in data mart creation.

Second, the self-government of data marts is treated as a needed and

provisional measure on the road to structure of a data warehouse. Once

the ambition is achieved, the warehouse will supersede the data marts,

which will grow to be true subsets of the fully incorporated warehouse [52].

From that point on, the data warehouse will nourish established data

marts, create subsets for new data marts, and in all-purpose determine the

route of data mart creation and development.

The third pattern initiates to face some of the difficulties of the

relationship between the data warehouse and data marts. Unlike the first

pattern, it identifies that organizational departments and partitions requires

decision support in the short-term. The model is independent in the

development of data warehouse projects to allow outgrowth.

12

Development Models with User Feedback

All the three patterns of data mart development without user

feedback fail to clearly judge constant user feedback in response to data

mart and data warehouse activities. The data mart development mainly

relies on the character and quantity of feedback from users. The suggestions

of user feedback for the three patterns of development create three

alternative patterns of data warehouse and data mart development.

i. Top down Model

In the top down pattern, user feedback before execution of the data

warehouse is based on participation in the system planning, requirements

analysis, system design, prototyping and system acceptance activities of

the software development process. For reasons declared before, this

participation is expected to leave gaps in the reporting of domains and

attributes that are causal in character.

Figure 1.5 Top down Model with End User Feedback

13

The figure 1.5 describes top down model with end user feedback.

The top down model is focused to departmental user feedback. The top

down model provides new version to the top-down data warehouse by

departmental data marts. If the uninterrupted pattern of change to

departmental modification is implemented, a pattern of regular

development of the data warehouses and data marts are generated.

The pattern involves [35] constant feedback from the center.

ii. Bottom up Model

In the bottom-up pattern, in contrast to the outcome of top-down

development with data marts ensures much more absolute reporting of

fundamental and side effect dimensions. This also means that once the

data warehouse is executed, the bottom-up model [67, 37] with feedback

contains small primary gap between user data mart needs in the data

warehouse. Inconsistently, this small gap generated results in an

enterprise level decision to transfer the top down model to long-term

development once the data warehouse is in place.

14

Figure 1.6 Bottom up Flow from Data Mart to Data Warehouse

The Figure 1.6 illustrates the bottom up flow from data mart to data

warehouse. The transformation of bottom-up model to long term

development results in the difficulty of high gap. But if this danger is

negotiated in the development of data mart, then the initial small gap between

the data warehouse and data mart requirements are highly reduced.

15

iii. Parallel Model

The parallel model provides the most assured structure in the

development of data warehouse and data mart. Development starts with a

phase of shared adjustment between the enterprise data model [39] and

the data marts. As long as the center is unlock to data mart feedback the

development of data mart with departmental perspectives on causal and

side effect dimensions and attributes [78]. The period of data warehouse

development is comparatively even. The data mart is directed by the

enterprise data warehouse model, in a very real intellect. The enterprise

level model is directed by the individual and combined input from the data

marts. Nevertheless the enterprise data warehouse data model is more

aggregate of collected data mart models functions.

16

Figure 1.7 Parallel Model in Creation of Data Mart with Feedback

The Figure 1.7 describes the parallel model in building data mart

with feedback. The complexity in implementing the parallel model is at the

beginning of development. The model believes achievement of the data

warehouse data model before data mart development starts. Therefore it

needs fast development of the enterprise level model, and also needs the

data marts to stay until this development is complete.

A total enterprise level data warehouse model is unnecessary to

supervise and assess interdepartmental redundancies, and to follow

information gaps. Additionally, the enterprise level model is unnecessary to

organize data mart back-ends [20, 21, 25, 33] to guarantee eventual

compatibility. On the other hand, if data marts are synchronized by a

central modeling team and confident to progress with development of data

marts including all purposeful speed.

1.3.3 Dynamics of Data Mart Development

The three initial patterns of data mart development are idealistic as it

fails in collecting user feedback to build data marts and data warehouses.

By initiating precise thought of user feedback, the issue of centralized data

warehouse and decision support system is minimized. All three patterns of

development handle the key decision in the construction of data

warehouse. But the three patterns of development are still different choices

17

even if the similar long-term policy of shared change of data marts and

data warehouses is followed after data warehouse development.

The top down pattern need a time of considerable adjustment to data

mart requirements after the data warehouse is developed. The purpose of

time requirement in data mart is to restrain centripetal services and to

adjust the predictable development of independent data marts. The bottom-up

model needs an additional period in processing ETL to hold development

of the data warehouse from the data marts. The parallel development

model needs fast development of an enterprise level data warehouse data

model. The fast development is minimized at the time of immediate

development of data marts and the data warehouse, along with

organization of the enterprise team. Finally, the parallel model with

feedback is more popular, because it offers both co-ordination and self-

sufficiency. In addition, the parallel model is still more well-liked due to its

co-ordination in development of data models and in directing enterprise

level data model.

1.4 Decision Making in Data Mart

The main idea to develop a decision support system (DSS)[66] to face

the enormous amounts of data available across a distributed project lifecycle.

The distributed project lifecycle is accepted by a business and its associate

deal to develop the data towards precious and exact knowledge retrieval. The

data retrieval is faster and has better quality with certain decisions. The system

18

provides several types of data access capabilities to extract and analyze the

data contained in a data warehouse. Some of the data access capabilities are

on-line transaction processing operational (OLTP), and legacy systems for

providing the critical information needed by business decision makers.

Figure 1.8 Framework of Decision Support System

A snapshot of the framework is built up on the model of the decision

support as shown in Figure 1.8. The framework is composed of multi-

agents, Local Database model, modal base and knowledge base.

1.4.1 Multi Agents (MA)

Multi-agent based system [57] facilitates the search in the distributed

database. Simultaneously, in any case the decision gives on-line and dynamic

information. Different agent types are utilized to understand the system

19

objective. The agent’s roles, potential, intellect, independence, support,

communication language, protocol [47] and shared ontology are considered.

1.4.2 Report Visualization Agent

Report visualization agent integrates the end result of the decision

making process and produces the final results in a more understandable

and normal way, making it more attractive to users. Report visualization

agent holds pre-planned report templates and a group of visual illustrations

for exposing the outcome of the decision support operation. It executes a

range of visualization techniques [65] and produces the decision report.

1.4.3 ETL (Extraction, Transformation and Load)

ETL agents handle the process of Extraction, Transformation and

Load (ETL). ETL are shared on databases with Data warehouse, legacy

system and OLTP systems [23,25]. ETL agents recognize the significant

data sources, extract the data and then perform the essential data

cleansing and transforming. The extracted information is stored for later

use. An ETL agent job is the method where business data turn into

business information.

1.4.4 Knowledge Base

The knowledge base process is carried by the Knowledge

Acquisition Agent. The acquired knowledge is changed or modified related

to range of data offered by ETL agents. Different agents encompass

different knowledge acquisition algorithm and feedback method as they

20

deal with different user requests. Agent is capable of estimating knowledge

obtained by other agent before using it in decision making. Furthermore,

the agent is capable of authenticating or highlighting the knowledge.

1.4.5 Decision Making

Decision Making Agent is accountable for organizing the different tasks

that required to be performed. From the user’s specified high level goals,

decision making agent transfers them into exact lower level tasks and creates

a plan of action. The decision making agent employs knowledge about the

task domain, and the facilities of other agents. Decision making agent

requests the services of a group of agents that work co-operatively and

combine the final outcome. Thus, the decision agent is responsible for

performing the actual decision activity and providing the results.

1.5 Inductive Rule Mining

One of the main approaches of data mining is the routine induction

of classification rules. A range of methods are proposed and many relative

studies of methods are carried out. But the lack of broadly available

general platform made it complex to carry out adequately in-depth studies

to establish the advantage of one method over another. Inducer is a rule

induction workbench and intends to afford a general platform for examining

a range of rule induction algorithms [76]. Additionally, inducer indents to

examine the rule induction associated strategies [78] with a variety of

datasets, without the need for any programming.

21

Figure 1.9 Inductions in Rule Mining

The Figure 1.9 describes the induction rule mining. More accurate

results are generated on integration of rule induction along with association

rule mining producing more accurate result. Association Rules structures a

much useful data mining approach. Association Rules are derived from

recurrent item-sets. Data mining [13,16] is the challenging issue in every

domain of research. The challenge lies in mining the data with more

accuracy and processing time.

The use of rule induction along with association rule mining algorithm

in data mining provides high accuracy and processing time. The integration of

rule induction and association mining is able to minimize the number of rules

with more data coverage. In addition, the establishment of rule induction is

able to minimize the error rate with fast processing time from the large

dataset and also minimizes the time complexity [49] with combined use of

rule induction and association mining [63].

Induction

Decision List Induction

Rule Induction

22

1.6 Motivation and Goal of the Study

The rising necessity for enormous volume of data in enterprise and

corporate environment, increases the demand for data warehousing. Data

warehousing gathers the data at various levels like departmental,

operational, functional and stacked as a collective data repository with

better storage efficiency. Several data warehousing models [72] focus on

loading the data more efficiently and quickly.

Additionally extraction of data from the warehouse requires better

understanding of the structure in which the data layers are loaded in the

repository. But function requirements of users are not understood without

doubts by the data warehouse model. Functional requirements [60, 61, 62]

require efficient decision support system to extract the appropriate user

demanded data from data warehouse.

The difficulty in accessing data from data warehouse requires a

better solution. The task of extracting exact user required data is tedious

because the users require better understanding about the data structure

stored in the repositories. Therefore, to solve the issue, data mart is

necessary. An efficient technique for Data Mart is essential to handle the

analysis of the functional behavior pattern [63,74] and to provide a better

decision support system.

23

To handle the user demanded information, an inductive rule mining

is required. The efficient retrieval of user demanded is extracted from data

mart with the establishment of inductive rule mining. The purpose of

creating rule induction is to segregate the layered data repositories [73].

The decision support system is required to extract the required user

demanded data from data warehouse along with inductive rule mining.

1.7 Organization of the Remainder Study The organization of remaining chapters is as follows:

Chapter 2 describes the review of literature and discusses the

techniques of mining user demanded data from data mart and functional

behavior pattern for data mart based on attribute relativity. Chapter 3

initially presents process of data mart in data warehousing and impact of

data warehouse requirements in functional behavior. Chapter 4 discusses

the rule induction for supporting decision making and inductive rule mining.

Chapter 5 describes the performance analysis on extracting user

demanded data in functional layered data mart based on induction rule

mining. Chapter 6 presents the conclusion scope for further studies.

24

2. LITERATURE REVIEW

2.1 Introduction

Data warehouse (DW or DWH) is a database that is used for

reporting and data analysis. DW is the central repository of data, and that

is formed by integrating data from one or more dissimilar sources.

Data warehouses normally store current and historical data which are used

for designing trending reports for senior management reporting. The data

is transferred via an operational data store for extra operations before they

are involved in the DW for reporting.

However, users' function requirements are difficult to understood by

the data warehouse model. Decision support system is demanded to

separate the needed user demanded data from data warehouse.

Data marts are introduced to handle the issue of functional decision

support system to extract user relevant data. Isolated functional data

repository layers are formed by data mart, depending on the departmental

decision support requirements, that are in the enterprise and corporate

data applications.

2.2 Functional Behavior Pattern for Data Mart Based on Attribute

Relativity

In wireless sensors, every component stores some data about the

global state of the system. The system’s functionalities like routing the

25

message, retrieval of information and load sharing normally depend on

modeling the global state. The global data mining models, for e.g., decision

trees and k-means clustering are to be computed which are very costly

because of the system’s scale and communication cost (high). Ran Wolff et al

(2009) presented a two-step method to deal with these costs. First, many

data mining model is to be monitored with the help of a highly capable local

algorithm. Secondly, the local algorithm is employed as a feedback loop

which monitors the complex functions of the data (like k-means clustering).

The association rules are large in number but they miss some

interesting rules and the rules’ quality needs more analysis. Decision

making normally uses these rules and that leads to risky actions.

For discovering domain knowledge report as coherent rules, a framework

has to be developed and it is done by Alex Tze Hiang Sim et al (2010).

Depending on the properties of propositional logic, coherent rules are

invented and no background knowledge is needed anymore for coherent

rule generation. With the help of discovered coherent rule, association

rules [17] can be obtained without the knowledge of the required level of

minimum support threshold.

One of the applications of data mining techniques is web usage

mining and stream data is clicked for the extraction of usage patterns.

User’s behavior is determined by analyzing those patterns and determining

user behavior is a demanding research issue in web usage mining.

26

Data processing is an important task for Web log mining. Suneetha K.R

and R. Krishnamoorthi (2009) aimed to use snow flake schema [14] for

easy retrieval by which the preprocessed data’s usage pattern can be

accessed. On-line analytical processing (OLAP) tools were provided by

data warehouse for the multidimensional data of varied granularities

interactive examination to offer an effective data mining.

2.2.1 Behavior Patterns of Humans

Behavior patterns of humans involve ambiguity, uncertainty,

complexity and inconsistency which are formed because of physical,

logical and emotional factors. So, Sang Wan Lee, et al (2012) designed a

non-supervised learning framework for the human behavior patterns [35]

by considering their behavioral characteristics. Two steps are involved in

the proposed framework. The first step is, a cluster validity index is

proposed along with Agglomerative Iterative Bayesian Fuzzy Clustering

(AIBFC) [36] which is used to detect a meaningful structure of the data.

The second step is, with the help of the structure detected in the first step,

the actions sequence is learned by using the proposed Fuzzy-state

Q-learning (FSQL) [36] process. Some of the user’s interaction may affect

their choice of actions and thus lead to complicated action learning which

still a difficult problem.

Experimentation is conducted with the behavior patterns of

cabdrivers by some other author. Liang Liu (2010) designed a new method

27

to expose cabdrivers’ operation patterns by examining their non-stop digital

traces [25]. A set of important features that are useful to categorize

cabdrivers are identified. Cabdrivers’ operation patterns are defined and

comparison is made among diverse cabdrivers’ behavior.

Based on the daily income of the cabdrivers, they are classified as

top drivers and ordinary drivers. Daily operations of cabdrivers, are

considered in order to uncover spatial selection behavior, context-aware

spatio-temporal operation behavior, route choice behavior and operation

tactics. Focus is on the analysis of cabdriver operation patterns based on

their digital traces. The method is however empirical, and analytical

method is employed for GPS-like trace analysis.

2.2.2 Temporal Pattern Extraction

Temporal pattern extraction is a major and challenging one in

multivariate framework. So, Wenjing Zhang and Xin Feng (2013) proposed

a new method and it employs a Multivariate Reconstructed Phase Space

(MRPS) [15, 71] for determining multivariate temporal patterns. The univariate

reconstructed phase space framework is based on fuzzy unsupervised

clustering method and it is extended by a new technique by the usage of a

novel mechanism called data categorization and it depends on the

definition of events.

First, the technique of univariate RPS embedding is extended to the

Multivariate RPS (MRPS) embedding by the combination of each

28

variable’s embedding. Secondly, the statistical distribution of three latent

states, namely, event state, pattern state and normal state are estimated

by applying the Gaussian Mixture Model (GMM) to the dataset. A new

classifier and an associated objective function are also introduced that

integrate MRPS temporal pattern modeling and for temporal pattern

classification, Bayesian discriminative scoring.

Attacks has not left the data warehousing too and hence, Vijaya

Bhaskar Velpula, Dayanandam Gudipudi (2009) focused on the methods

for detecting insider attacks and provided a solution to avoid attacks.

A behavior-anomaly based system [13] was introduced for detecting insider

attacks. Peer-group profiling, composite feature modeling and real-time

statistical data mining are used by the system to avoid attacks. The real-time

monitoring process is updated with the help of refined analytical models.

The detection approach performance is described as the IBM Identity Risk

and Investigation Solution (IRIS).

Normally, users demand for fresh, attacks free and up-to-date

information in all fields, and so, Active warehousing has been emerged.

During the performance of warehouse transformations, processing and

disk overheads are formed because of the source updates online

refreshment. The root of many common transformations in active

warehouse (e.g., surrogate key assignment, duplicate detection, etc.,) was

the join operation and that is used frequently. The join operator performs

29

join among update stream and a persistent disk relation [11] with limited

memory which was considered by Abhirup Chakraborty, Ajit Singh (2009).

The author has also developed the partition-based join algorithm to reduce

the processing overhead, disk overhead and delay in resultant tuples.

2.2.3 Real Time Warehouse

Compared to active warehousing Real time warehouse also faces

many issues, and steps are to be taken to solve them. Write-read

contention is a great problem during the deployment of real-time data

warehouses. They need to control continuous flows of updates and queries

and needs to obey the conflicting requirements like short response times,

high data quality.

If two criteria are to be considered then a multi objective optimization

problem will be faced. The problem is transformed into a knapsack

problem with extra inequalities. Hence, Maik Thiele et al (2009) developed

a new model for grouping both objectives with provided user preferences [19].

First, the objectives are constructed to a more theoretical level and

separate maximization and minimization problems are also prepared.

Depending on that, a multi-objective scheduling algorithm is developed and

in point of view to user requirements the optimal schedule is provided by

them.

A data warehouse processes only the given set of queries with the

multiple materialized views. All view materialization is impractical because

30

of the space and maintenance cost constraints. The materialized views

selection is the basic problem in the process of developing a data

warehouse to obtain optimal efficiency.

B.Ashadevi and Dr.R.Balasubramanian (2009) designed a framework

for materializing the selection of views [21], for the provided storage space

constraints. The aim of the framework is to attain optimal combination of

better query response and cost of query processing and view maintenance

is also to be low. Using the proposed framework, the maintenance, storage

and query processing cost are optimized for materialization by the

selection of better cost sufficient views.

Real-time warehousing can be used to achieve fresher data that are

demanded by today’s business, but there are numerous challenges to

achieve it in true real-time environment. Hence, Janis Zuters (2011)

developed ‘Multi-stage Trickle & Flip’ method [20] for data warehouse

refreshment.

‘Multi-stage Trickle & Flip’ is a data warehouse refreshing method

and it is designed to reduce the competition among the loading and

querying processes when real-time operation is carried out. It depends on

the ‘Trickle & Flip’ principle and further extraction is needed for loading and

querying activities which makes them both to be more capable. The

method is not covering all the areas of ETL (Extract, Transform and Load)

31

and querying and needs efforts to attain the maximum benefits especially

for the querying mechanism.

Advanced steps are involved to solve problems faced by real time

data warehouse with the help of ETL. Data warehouses (DWs) are

normally dumped with data at regular time intervals with the help of quick

bulk loading methods. Now-a-days, all or some of the new source datas

are loaded very quickly into DWs because of the near-real-time DWs (right-

time DWs).

The above mentioned one is processed with the usage of regular

INSERT statements but it results in very low insert speeds. Hence,

Christian Thomsen et al (2008) proposed RiTE (“Right-Time ETL”), a

middleware system and it provides a solution by making the inserted data

to be available quickly even though with the presence of bulk-load insert

speeds [22]. A main memory based catalyst is provided by RiTE and it

offers fast storage and concurrency control. The system includes an open

source DBMS and the system supports both producer and consumers.

2.2.4 Real Time Environments ETL

The ETL is normally achieved in real time environments but the

following author has said that it is achieved in query networks too. Usually,

the data warehouses refreshment is achieved in an off-line fashion. Active

Data Warehousing is an emerging trend in which data warehouses are

updated often because of the user’s high interest for new data. Alexandros

32

Karakasidis et al (2005) developed a framework for active data warehousing

execution [26]. The framework has many goals, and they are, providing

slight modifications in the source’s software configuration, providing least

overhead because of the data propagation’s active nature, modifying easily

the overall configuration of the environment in a righteous way.

The ETL (Execute, Transform and Load) activities are achieved in

queue networks. The performance and tuning operation of overall

refreshment processes is predicted by applying the queue theory.

The downside of the work is that failures are to be avoided by employing

some safeguarding techniques and fast resumption algorithms.

Data feeds are to be burdened into real time data warehouses, and

hence Mohammad Hossein Bateni et al (2009) made a study on the

scheduling algorithms [28]. They are involved in applications like IP

network monitoring, online financial trading and detection of fraud credit

cards. A large number of data feeds are collected by the warehouse in

those applications and they are achieved by external sources.

In a constant rate, data are generated for all tables and at different

rates data are generated for different tables. During each data feed, an

update is generated by the arriving new data, and then it is added to the

respective table.

Specific percentage of data warehouses fail to face business objectives

as mentioned in some surveys. The main reason is that requirement analysis

33

is usually unnoticed in real projects. Paolo Giorgini et al (2008) designed

GRAnD, to meet requirement analysis for data warehouses depending on

the Tropos method [27]. Requirement analysis in GRAnD involves two

diverse aspects, namely, organizational modeling, based on stakeholders

and decisional modeling, depending on decision makers. Decisional

modeling directly deals to the information that are needed for decision

makers. Organizational modeling enables identification of facts and

supports the supply-driven part of the approach. The work can be

employed with both demand driven and a mixed supply/demand-driven

design framework.

Study is also made in stream warehouse to provide an efficient

outcome. Queries are facilitated by the stream warehouse which ranges

from realtime alerting and diagnostics to long-term data mining.

Continuous data loading from diverse and uncontrolled sources into a real-

time stream warehouse results in many problems; for example, user needs

result in a timely manner but a stable result often needs lengthy

synchronization delays.

So, P. Urmila et al (2012) proposed for stream warehouses a theory

of temporal consistency which permits multiple consistency levels [27].

The update problem of streaming warehouse is denoted as a scheduling

problem. In the scheduling problem jobs are denoted as processes and

that loads new data into tables and aims to reduce data staleness over

34

time. As soon as the new data are loaded, defined applications and

triggers on the warehouse perform immediate action.

As said earlier, streaming data warehouse update problem is

denoted as a scheduling problem. In the scheduling problem, jobs are

denoted as processes and that loads new data into tables and aims is to

reduce data staleness over time. Mohan Raj. A, M. N. Sushmitha developed a

scheduling framework to handle the difficulties faced by a stream

warehouse like view hierarchies and priorities, heterogeneity of update

jobs [24] because of diverse inter arrival times, data volumes in diverse

sources and transient overload, etc.

In streaming data warehouses, update scheduling is developed by

merging the characters of traditional data warehouses and data stream

systems. The framework’s functionality is that, for short jobs which

normally correspond to regularly refreshed tables, the resources are

reserved and the inefficiencies related with partitioned scheduling methods

are avoided. The only downside of the proposed approach is that from

multiple streams many new data arrive, but no steps are involved for

limiting the number of tables to be updated concurrently.

E. Malinowski, E. Zimanyi (2008) offer a temporal extension of the

MultiDim model [8] and the extension depends on research executed in

temporal databases. Diverse temporality types are permitted, and they are

valid time, transaction time and lifespan. Those are got from source

35

systems and loading time is achieved in the data warehouse. Temporal

support is provided for levels, attributes, hierarchies and measures by the

proposed model.

The author took steps to solve the problems faced by previous

method and used TDW to solve it. L. Leonardi et al (2009) developed a

method to store and aggregate spatio-temporal patterns with the help of

Trajectory Data Warehouse (TDW). Frequent patterns are quickly

estimated which are mined from trajectories of moving objects that appears

in a specific spatial zone and at a given temporal interval.

The TDW is spotted depending on a data cube model which has the spatial

and temporal dimensions. Such a TDW can be improved with a new

measure: on trajectories data mining process is carried out and frequent

patterns are also achieved from them. Those patterns are examined by the

user using OLAP queries at different levels of granularity.

In data warehouse design, most of the attention is on the literature

and no attention is paid on data warehouse testing. Matteo Golfarelli et al

(2009) proposed data mart specific testing activities and analysis [7] is

made regarding the one which is to be tested and the way it is tested.

A complete approach is developed to adapt and extend the testing

methodologies and it is designed particularly for data warehouse projects.

The proposed method is built on a set of tips and suggestions got from the

real projects, as well as the data warehouse practitioners. Consequently, a

36

set of appropriate testing activities are identified, classified and framed

within a reference design methodology.

V.Mallikarjuna Reddy et al (2010) proposed a method to adapt data

warehouse schemes in a correct way and new data are also loaded with

the guidance of GUI based Extraction, Transformation and Loading (ETL)

procedure [12] for Active Data Warehouse.

The above specified idea is achieved by using techniques like table

structure replication and query predicate restrictions in order to select data.

The GUI based ETL was mainly presented for continuous data loading in

the active data warehouse by allowing continuous data integration only

during reduced query execution time at the user end of data warehouse.

But the previous method has not mentioned anything about DSS.

Many Decision Support Systems (DSS) are developed in numerous areas

like medical, business agriculture, marketing, etc. Two methods are

proposed by Arvind Selwal in order to design DSS. A general method for

designing a DSS [9] is proposed firstly. A large data warehouse (DWH)

involves data mining methods to gather useful information in order to

improve the decision making process. A Decision support system for

Village Economy Development Planning (VEDP-DSS) is developed

secondly. It is used by District Development Planning Officer (DDPO),

Block Development Officer (BDO) & Village Surpanch for decision making

37

at applicable level. VEDP-DSS can be employed for the decision making

regarding village development.

An analysis framework is provided by the Data warehouses and On-Line

Analytical Processing (OLAP) to support the decision making process. In

numerous application domains, a complex analysis task involves

geographical information for their process. Many applications are present

for the integration of OLAP and Geographic Information Systems (GIS).

However, less effort is provided to support continuous fields, i.e.,

phenomena like temperature, altitude or land use, are recognized to hold a

value at every point in space and/or time. Alejandro Vaisman Esteban

Zimányi (2009) expanded a conceptual multidimensional model with

continuous fields [10]. It is attained by describing an appropriate data type

that holds diverse operations in order to manipulate such type of fields.

A query language depending on relational calculus is also determined and

it permits expressing spatial OLAP queries which includes continuous

fields. Query language is also used to properly define such class of

queries.

Depending on availability of integrated and organized high quality

information, supporting decision-making is really difficult. The above mentioned

problem is resolved by Data warehouses. Regarding internal and external

data intelligence which is critical to understand, data warehouse serves as

38

an integrated repository. Business intelligence supports efficient problem

and opportunity identification, critical decision-making, etc.

The analysis used four themes, namely, integration, implementation,

intelligence and innovation. Salvatore T. March, Alan R. Hevner (2007)

made a clear view of applying data warehousing technology [16] to support

management decision-making. Review is made on the information

requirements used for decision support through a general systems theory

of management.

Till now authors have considered only on single dimensional

databases and so DaWaII (Data Warehouse Integration) came into exist

which is a tool that supports the activities involved in the integration of

multidimensional databases. Combining separately developed data

warehouses is basically a great problem. The tool supports in testing the

validity of a matching heterogeneous dimensions depending on the

number of desirable properties.

Actual integration is performed when two diverse approaches are

offered by the tool and it is found by Luca Cabibbo et al (2006). A scenario

of loosely coupled integration [5] is the foremost approach. In the first

approach, general information between sources are to be spotted and then

it performs drill-across queries against the original sources. The sources

are merged by which the derivation of a materialized view is built.

39

A scenario of tightly coupled integration is referred in which the view is

spotted by the queries.

Web users, navigation and interaction patterns are quite complex,

particularly the interactive applications which supports user sessions and

profiles. Michel C. Desmarais (2006) presented such a case for an

interactive virtual garment dressing room. The application supports

personalization and user profiles and the view of a multi-site user session

and it is also spread over numerous web sites. It is also supported with a

data logging system that produces about 5GB of compound data per

month.

Juan Manuel Pe´ rez et al (2008) made a review on merging Data

Warehouse (DW) and Web data. XML technologies are studied which are

used to integrate, store, query and retrieve Web data [2] and their

application to DWs. Various DW distributed architectures are studied and

XML languages are used as an integration tool in the systems. Study is

also made on Web data repositories, for XML data sources multidimensional

databases design, and the XML extensions of OnLine Analytical

Processing mechanism.

The work has also discovered the main limitations and opportunities

that allow the combination of the DW and the Web fields. The main

problem is that domain ontologies cannot help DWs to interoperate in a

large-scale scenario and also with other information-provider applications.

40

The survey deals only with data-centric XML and not with document-centric

XML collections.

Active Data Warehousing is used as a substitute to traditional

warehousing to gather applications for up-to-date information. A higher

consistency among the information about the data that is store already and

the latest updates of the data are obtained by an active warehouse which

is refreshed on-line. The implementation of data warehouse transformations

faces numerous challenges due to the need of on-line warehouse

refreshment and it depends on their execution time and overhead to the

warehouse processes. N. Polyzotis, et al (2008) focus on a regularly

encountered operation, that is, linking the fast stream of source updates

along with the disk-based relation, [1] over a usage of limited memory.

The above said operation is carried out at several common

transformations like surrogate key assignment, duplicate detection or

identification of newly inserted tuples. A join algorithm named as meshjoin

(MESHJOIN) is designed to compensate the access cost of the two join

inputs. The MESHJOIN algorithm is detailed and a systematic cost model

is designed to enable MESHJOIN. MESHJOIN is enabled for two main

reasons: enlarging the throughput for a specific memory budget or

reducing memory consumption for a specific throughput.

41

2.3 Mining User Demanded Data from Data Mart Using Inductive Rule

Mining

OLAP (On-Line Analytical Processing) systems are extended with

spatial and temporal features and it has drawn the notice of GIS

(Geographic Information Systems) and database communities. However,

there is no specific definition for a spatio-temporal data warehouse and

also not specific to the functionality such a data warehouse must support.

The solutions provided earlier changes depending on the kind of data that

is to be represented as well as the kind of queries that are to be

expressed. Alejandro Vaisman and Esteban Zimanyi (2008) proposed a

conceptual framework to define spatio-temporal data warehouses [50] with

the help of an extensible data type system. A taxonomy of different classes

of queries with increasing expressive power are also defined.

Demonstration is also made to express such form of queries by employing

an extension of the tuple relational calculus of aggregated functions.

Chin-Ang Wu et al (2011) decided to advance the overall data

warehouse mining process and hence developed an intelligent data

warehouse mining method.

The method includes schema ontology [49], schema constraint

ontology, domain ontology and user preference ontology. The rule mining

is employed to exhibit the structures of those ontologies, and demonstrations

are also conducted by using the rule mining in order show the benefits of

42

the mining process. A prototype multidimensional association mining

system is also proposed with the support of ontologies to display how the

users are helped to build data mining models. It is employed to prevent

formation of ineffective patterns, identify concept extended rules, and

implement an active knowledge rediscovering method.

Jin Soung Yoo, and Shashi Shekhar (2009) provide a flexible model

and it denotes the interesting temporal patterns with the help of a user-

defined reference sequence. Similarity-profiled temporal association

patterns [52] are to be mined effectively and for that an algorithm has to be

proposed and it is processed by using interesting properties like support

time sequence and a lower bounding distance. But the association mining

is not very effective. So, Jerzy Błaszczynskia et al (2010) developed a

general rule induction algorithm [40] and it depends on sequential covering

which is applicable for variable consistency rough set methods. VC-DomLEM

is an algorithm and it is employed for both ordered and non-ordered data.

Rough set selection also plays a major role in data warehouse.

Salvatore Greco et al (2008) presented a generalization on the original

definition of rough sets and established variable precision rough sets [44].

The generalization is presented depending on the approach of absolute

and relative rough membership. In variable precision rough set model,

parameterized rough set model generalization is used. Its aim is to model

data relationships and it should be in the form of frequency distribution

43

rather than full inclusion relation and can be used in the classical definition

of rough sets. Thus, for a chance, in the variable precision rough set

model, one or more parameter models the degree and the condition

attribute value approves the decision attribute value.

The previous author mentioned about variable precision rough set

model, and Jerzy Błaszczyn ski et al (2009) focused on probabilistic rough

set method [40] which depends on diverse version of rough approximation

of a set. In those versions, consistency measures took control over the

assignment of objects with lower and upper approximations.

Some basic properties of rough sets are very attractive and it needs

the properties of monotonicity. Three types of monotonicity properties are

taken into account and they are monotonicity in consideration to the set of

attributes, set of objects and dominance relation. The consistency

measures lack some of the properties of the monotonicity. Hence, new

measures have been developed within two types of rough set methods:

Variable Consistency Indiscernibility-based Rough Set Approaches

(VC-IRSA) and Variable Consistency Dominance-based Rough Set

Approaches (VC-DRSA).

Many rough set models exist but Fuzzy Rough Set plays the major

role. The Fuzzy-Rough Set (FRS) method handles the discernibility and

fuzziness in a great way. Some researchers made a study on the rough

estimate of fuzzy sets, but some other researchers made a study only on

44

attribute reduction or feature selection, and it is an application of FRS.

But less study is made on the constructing classifier; it is another

application of FRS.

Suyun Zhao et al (2010) built a rule-based classifier with the help of

one generalized FRS model [37]. The existing FRS must be established as

a robust model in accordance with misclassification and perturbation. It is

performed by integrating the controlled threshold with FRS knowledge

representation. “Consistence Degree” method is denoted as a critical value

and it decreases the redundant attribute values that are involved in the

database. The consistence degree concept is used to form a discernibility

vector and it helps to develop the rule induction algorithms and the

induction rule set can function as a classifier.

2.3.1 Attribute Reduction

Attribute reduction is the important research topic in the existing

fuzzy rough sets. Reducts are computed by using the discernibility matrix

approach and it is found that only the minimal elements in a discernibility

matrix provide sufficient and necessary outputs. Hence, Degang

Chen et al (2011) considered the above fact and proposed a new algorithm

and it is used to find the reducts depending on the minimal elements in the

discernibility matrix.

In conditional attributes, Relative discernibility relations are present

and specified by the author. The relative discernibility relations are used by

45

the fuzzy discernibility matrix to characterize the minimal elements. Using

the fuzzy rough sets framework, two algorithms are developed in order to

compute minimal elements and reducts. The downside of the proposed

model is that it is acceptable only by fuzzy rough set and not by other

rough sets. Hence, steps to be taken to apply it for other rough sets too.

Answering the queries is a challenging issue and it should be tackled

in an efficient manner. In the uncertain database, the probabilistic

threshold query is the common query. In the database a result satisfying

the query must also meet the threshold requirement. Jianxin Li, et al (2013)

made an examination of the probabilistic threshold keyword queries

(PrTKQ) in probabilistic over XML data [45].

The idea of quasi-SLCA is presented first and it is used to produce

outcomes for a PrTKQ by considering the feasible world semantics.

A probabilistic inverted (PI) index is also designed and it rapidly gives back

the qualified answers. The unqualified answers are removed, depending

on the planned lower/upper bounds. Two efficient and equal algorithms are

also presented. They are, Baseline Algorithm and PI index-based Algorithm.

The performances of algorithms are to be speeded up, and so, the

probability density function is employed.

The user can also identify incomplete order constraints among

diverse categories. For example, after visiting the museum, user can go to

the restaurant. Previous works' focus was on the total order query.

46

Jing Li et al (2013) denoted that the ideas denoted by existing papers

involved repeated computations and it is not applicable to large data sets.

So, a new solution is provided for the general optimal route query [43].

The solution depends on two methods, namely, backward search and

forward search. Discussion is also made to answer the different optimal

route queries. The route in the optimal route query covers only a subset of

provided categories.

Theoni Pitoura et al. (2012) proposed Saturn for large-scale data

networks. Saturn is an overlay architecture and it employed for processing

range queries [44] and makes sure of accessing the load balance and fault

tolerance and it is maintained over Distributed Hash Tables (DHTs).

In DHT the consecutive data values are stored in neighboring peers and it

is highly advantageous.

It helps in accelerating range query processing but such a

placement leads to load imbalances. Saturn handles those issues (range

queries, load balancing, fault tolerance) by the development of a new

multiple ring, order-preserving architecture.

Man Lung Yiu et al (2012) took into account a cloud computing

setting and the process involved in it is parallel querying of metric data [65]

is outsourced to a service provider. The data can be exposed only to

trusted users and not to service provider or anyone else. The advantage in

outsourcing is that it allows the data owner scalability and a minimum initial

47

investment. Privacy is also to be provided because of the data’s sensitive,

valuable or otherwise for confidential. Methods are provided to transform

the data that is eager to supply it to the service provider for similarity

queries on the transformed data. Interesting trade-offs are also provided

among query cost and accuracy.

Shaoxu Song et al (2011) conducted a study based on the

computation caching and sharing within a sequence of inference queries in

the databases.

For probabilistic inference queries in databases, the clique tree

propagation (CTP) algorithm was developed. The materialized views are

employed to store the intermediate results that are obtained from the

previous inference queries. Those queries can be shared with the

subsequent queries and that minimizes the time cost. The frequently

queried variables are detected while the query workload is considered.

Different query plans are in exist, and so, heuristics are provided to

measure costs and select the optimal query plan.

Victoria Nebot et al (2009) designed a Semantic Data Warehouse

(SDW) which acts as a repository of ontologies and semantically annotated

data resources. An ontology-driven framework [18] is also developed in

order to plan multidimensional analysis models for Semantic Data

Warehouses.

48

The framework builds an integrated ontology named as the

Multidimensional Integrated Ontology (MIO) and it incorporates the

classes, relationships and instances, representing analysis dimensions and

measures. The main downsides of the implementation is that the data

become outdated because of the updates of the sources and also the

extraction, validation process takes more time to perform. The main

drawback in SDW is that the queries will be too slow.

Business programs mostly concentrate on the data warehouses and

the data mining techniques. Especially, the Radio Frequency Identification

(RFID) application concentrates on those techniques and it provides a

revolution in business programs. In RFID many applications like

manufacturing, the logistics distribution and stages of supply chains are

involved. The noises and duplicates produced in the RFID data confirm the

need of a data warehousing system.

Barjesh Kochar and Rajender Chhillar (2012) designed a new data

cleaning, transformation and loading method. The method allows the data

warehousing system and it is used for any RFID applications [51] to make it

more effective. The RFID application is recording the goods in warehouses

with the usage of RFID tags and readers which is a significant RFID

application.

Spatio temporal data warehousing is a different and interesting

method. Mobility data applications are flourishing in many application

49

domains. Many researchers have lead to the development of concepts,

models, theories and tools to capture mobility data and making it

convenient to those applications. One particular characteristic makes

mobility data management to worry and i.e., for subsequent analysis of

mobility data, it should be stored in data warehouse. Esteban Zimanyi

(2012) proposed an idea of spatio-temporal data warehouses and

demonstrated to define a data warehouse with the help of an extensible set

of data types.

Danubianu et al (2009) made a study on the needs and opportunities

that are involved in the implementation of a data warehouse in tourism

field. Focus is made on the three dimensions of tourism, namely,

economic, social and cultural. A short analysis is carried out in the tourism

area’s information system. But the technologies that are involved in diverse

components of the sector are not efficient.

In data warehouse a lot of information is stored, and to know about

them, metadata is necessary. It increases the level of adoption and data

warehouse’s data usage by knowledge workers and decision makers.

Data warehouse implementation is more effective when metadata model is

used. The quality of the data warehouse decreases if the metadata model

is not employed.

50

2.3.2 Consistent metadata and Data Warehouse

A warehouse implementation is also based on consistent metadata

and it results in a successful warehouse. Nayem Rahman et al (2012) used

ETL (extract transform load) based metadata model [42] for the data

warehouse. The ETL metadata model provides better refreshment in subject

area, especially in metadata-driven area, loads observation timestamps.

The ETL metadata model also decreases the utilization of database

systems resources. The ETL metadata model supplies a set of ETL

development tools to the developer and hand-over a user-friendly batch

cycle refresh monitoring tool to the production support team.

Dr. Anjana Gosain Suman Mann (2010) presented a paper based on

object oriented multidimensional data model. It is determined to describe

the data and it incorporates aggregation, generalization, multiple path

hierarchies, multiplicity, [41] etc. Seven operators are also presented for

the model and it is important to make query and format results. The seven

operators are intersection, difference, symmetric difference, restriction,

union, join and projection. If these operators are minimal, then no operator

can be denoted as the other operator and no one can also be discarded

without sacrificing functionality.

Studying the information provided by the medical images is really a

tedious problem, and hence the annotation of medical image process can

into existence. The diversifications obtained from diverse sources with the

51

annotations are to be modeled with the help of data warehouses. Classical

conceptual modeling does not consider the specificity of the annotations.

Mouhamed Gaith Ayadi et al (2013) proposed a conceptual

modeling of the data warehouse with the help of the StarUML extensibility

mechanism [39]. StarUML is an open source platform and it involves the

XML language to generate the profiles UML. Victoria Nebot, Rafael

Berlanga (2011) provided an efficient analysis and explored large amount

of semantic data. It is performed by the combination of inference power

present in annotation semantics with the analysis facility that are offered by

OLAP-style aggregations, navigation and reporting.

The author provided an explanation about the way the semantic data

must be organized in a well-defined conceptual Multi Dimensional (MD)

plan. So, the complicated queries can be declared and calculated in a

better way. The dimension hierarchies are to be developed and hence the

local method has to be extended, which is a drawback in the method.

Hence, the dimension values can be arranged into a defined number of

levels or categories. There is the presence of performance issue in the

proposed method.

DaWaII (Data Warehouse Integration) is a tool that supports the

activities involved in the integration of multi dimensional databases.

Combining separately developed data warehouses is basically a great

52

problem. The tool supports testing the validity of a matching heterogeneous

dimensions depending on the number of desirable properties.

Actual integration is performed when two diverse approaches are

offered by the tool and it is found by Luca Cabibbo et al (2006). A scenario of

loosely coupled integration [5] is the foremost approach. In the first approach,

general information between sources are to be spotted and then it performs

drill-across queries against the original sources. The sources are merged by

which the derivation of a materialized view is built. A scenario of tightly

coupled integration is referred in which the view is spotted by the queries.

Nittaya Kerdprasop et al (2008) upgraded the LCMS functionality to

represent the accessible content with the induction ability. The induction

method depends on rough set theory. The induced rules very proposed to

be the supportive knowledge to guide the content flow planning [29]. They

are also employed as decision rules to support the content developers to

manage the contents that are delivered to individual learner.

Y.Y. Yao gave a summarization of different formulations of the

standard rough set theory. Those formulations are also approved to

expand different generalized rough set theories [30]. The theory is formed

using constructive and algebraic methods, and connections among them

are also established. In the constructive method, examination is made on

the three definitions of approximation operators and the definitions are

element based, granule based and sub-system based. The algebraic is

53

more appropriate and it is employed to generalize the theory in a unified

manner. The drawback is that, more generalizations of the theory are

present, such as probabilistic and decision theoretic rough sets, but they

are not declared by the proposed method.

Michel C. Desmarais (2006) presented such a case for an interactive

virtual garment dressing room. The application supports personalization

and user profiles and the view of a multi-site user session and it is also

spread over numerous web sites.

It is also supported with a data logging system that produces about

5GB of compound data per month. The analysis on those logs can be

performed but it needs more sophisticated processing than normally

provided for a relational language. Usage of procedural languages and

DBMS also proves to be difficult to process. Analysis of complex log data

is performed with the use of a stream processing architecture and a

specialized language denoted as a grammatical parser and a logic

programming module.

Useful information from server logs (user’s history) can be derived

by using the web usage mining technique by which the user’s expectation

from the Internet can be identified. Web pages need to collect the web

access data and so it uses such type of web mining. The web usage data

provides the details of the paths that lead to the accessed web pages (with

54

the help of preferences and higher priorities) that are identified by using

web usage data.

Those information is obtained automatically in the access logs of the

web server. Poongothai, K. and S. Sathiyabama (2012) developed an

Induction based decision rule model. It is designed to generate inferences and

implied hidden behavioral aspects in the web usage mining and it is examined

at the web server and client logs. The fast decision rule induction algorithm is

merged with a technique which alters a decision tree into a simplified rule set

and the merging is performed by decision based rule induction mining.

2.3.3 Traditional Warehouse and Up-to-date Information

The induction rules can also be employed to active data warehouse.

Active Data Warehousing is used as a substitute to traditional warehousing to

gather applications for up-to-date information. A higher consistency among

the information about the data that is stored already and the latest updates of

the data are obtained by an active warehouse which is refreshed on-line.

The implementation of data warehouse transformations faces numerous

challenges due to the need of on-line warehouse refreshment and it depends

on their execution time and overhead to the warehouse processes.

N. Polyzotis, et al (2008) focus on a regularly encountered operation, that is,

linking the fast stream of source updates along with the disk-based relation,

over a usage of limited memory.

55

The above said operation is carried out at several common

transformations like surrogate key assignment, duplicate detection or

identification of newly inserted tuples. A join algorithm named as meshjoin

(MESHJOIN) is designed to compensate the access cost of the two join

inputs. The MESHJOIN algorithm is detailed and a systematic cost model

is designed to enable MESHJOIN. MESHJOIN is enabled for two main

reasons: enlarging the throughput for a specific memory budget or

reducing memory consumption for a specific throughput.

2.4 Research Gap

In existing DW models, the analysis on functional behavior is not at

all made. Only the process of data mart is identified and processed.

No importance is given to the attribute relativity in existing models. Users

or other data interaction will be more when identifying the behavior

patterns and they may affect their choice of actions. Thus, it leads to

complicated action learning which is still a difficult problem. Regarding

security, the insider attacks, financial and business cost is however large,

and it cannot be avoided even when computer security policies are

involved.

In the existing methods the user demanded information is not

effectively retrieved. So, a new method has to be proposed to effectively

retrieve user demanded information depending on attribute relativity.

Decision support has to be carried out on functional data marts.

56

Some useful techniques are applied to fuzzy rough set and not to other

form of rough set.

Hence, those techniques are to be applied to all type of rough sets.

Whenever the sources are to be updated, some of the data become

outdated, and moreover, in some data warehouse the queries become too

slow during their performance. In some OLAP based concepts,

performance issues also arise. In some cases, generalizations are present

but they are not involved while processing.

2.5 Contributions of thesis

Data must be accessed from the data warehouse but it is a

challenging issue. It is tedious because the user must have better

understanding about the data structure that is stored in the repositories.

Hence, to solve the issue, data mart is introduced. Functional Behavior

Pattern (FBP) for Data Mart is proposed which handles effectively the

analysis of the functional behavior based on attribute relativity and it

provides a better decision support system.

The functional behavior of the system is analyzed to build data

storage repositories in accordance with data attributes using functional

behavior pattern (FBP). The user demanded data path transition is

identified in data mart using interface layers for data movement between

different functional data marts. Thus, the proposed Functional Behavior

Pattern is capable regarding the analysis of functional behavior.

57

The inductive rule mining is used to handle the user demanded

information. The decision support is carried out effectively with the help of

inductive rule mining on functional data marts segregates of the layered

data repository. The decision support system is achieved in an effective

manner to extract the required user demanded data from data warehouse

when inductive rule mining is employed. Hence, a better result is provided

by using the FBP and inductive rule mining.

58

3. PROPOSED METHODOLOGY

FUNCTIONAL BEHAVIOR PATTERN IN DATA MART

3.1 Introduction

Building a data warehouse is an extremely demanding task since it

can frequently occupy many organizational units of a corporation. A data

warehouse is a widespread query intelligent source of data for examination

purpose, which is principally used as hold up for decision processes.

Furthermore, it is a multidimensional model and is used for the storage of

historicized, cleanse and validate, synthesized, functioning, interior and

external information. Stakeholders of a data warehouse system are involved

in analyzing their business processes in a complete and flexible way.

Mostly they have a complete understanding of their commerce

processes, which they would like to discover and examine. The actually

requirement is a sight of their business processes and its data, which

allows them a widespread analysis of their data. The data warehouses are

model multidimensional, which communicate to a characteristic view of its

users. The analysis view of the business processes is very dissimilar to the

universal view even though the fundamental process is the same.

Hence, it is compulsory to bring out requirements from the

stakeholder of a data warehouse, which belongs to their examination

views. The data warehouse system is extremely needy on these

requirements. Extremely the data warehouses are building without

59

understanding appropriately these needs and requirements, and as a

result fail for that reason. During the constraint description procedure

system, analysts of the IT department or consultants work together with

stakeholder and users to explain the necessities for the data warehouse

system.

The data warehousing team receives these descriptions, but they

have repeatedly dilemma in understanding the business terminology and

find the explanation too relaxed to use for the implementation. Therefore,

the data warehousing team writes its classification specification from a

technological point of view. When the system specification is obtainable to

the users, they do not quite appreciate because it is too technical.

They are, though, forced to believe it in order to move forward.

The approach effortlessly results in a data warehouse system that

does not meet the initially defined requirements, because, often the users,

the system analysts and developers do not converse the identical

language. Such communication problems build it tricky to turn an

explanation of an analysis system into a technical specification of a data

warehouse system that all parties can understand.

In addition, because of a mechanical system requirement that is not

fully implicit by the users, a data warehouse system becomes too hard or

not practical for the future purposes. Therefore, it will not transport the

expected result to the company. In these cases, often departments will

60

develop data marts for their own purposes, which can be measured as

stovepipes and make an enterprise wide examination system impossible.

The dispute is to model a data warehouse system in a method that is

both exact and user friendly. Each symbol unfolding the analysis process

should be instinctive for the user and have distinct semantics, so that the

developers use the account as a general but precise requirement of the

data warehouse system.

3.1.1 Proposed Functional Behavior Pattern with new Technique

The growing need for huge volume of data in enterprise and

corporate environment, fuel the demand for data warehousing. Data

warehousing collects the data at different levels like departmental,

operational and functional. Data are stored as a collective data repository

with better storage efficiency. Various data warehousing models

concentrate on storing the data more efficiently and quickly. In addition

accessibility of data from the warehouse needs better understanding of the

structure in which the data layers are stored in the repository. But function

requirements of users are not easily understood by the data warehouse

model. Data warehouse model needs efficient decision support system to

extract the required user demanded data from data warehouse.

The issue of functional decision support system to extract user

relevant data is handled by introducing data marts concept. Data marts

build separate functional data repository layers based on the departmental

61

decision support needs in the enterprise and corporate data applications.

To provide a better decision support system, a Functional Layer Interfaced

Data Mart Architecture is proposed for larger corporate and enterprise data

applications.

The research work involves the analyses of functional behavior of

the corporate system based on its operational goal. The aim of functional

behavior analyses is to build layers of data storage repositories with relevant

data attributes using functional behavior pattern in data mart (FBPDM).

3.2 Process of Data Mart in Data Warehousing

A data warehouse is a database utilized for exposing and analyzing

the data in the database. The data gathered in the warehouse

are uploaded from the equipped systems such as market place. The data

exceed throughout an equipped data hoard for further operations. The data

are used prior to the Data Warehouse for reporting.

A data warehouse created from incorporated data source

systems does not need ETL (Extract, Transform and Load) in producing

databases or operational data store databases as presented by Juan Manuel

Perez et al,. Data warehouses are subdivided into data marts.

62

Figure 3.1 Data Mart in Data warehouse

Data marts hoard subsets the data from a warehouse. An overview

of data warehouse with data mart is shown in Figure 3.1. A data mart is the

entry deposit of the data warehouse. The situation of data warehouse is

utilized to obtain data. The data mart is a division of the data warehouse

which is habitually oriented to a definite business line. In some instances,

each subdivision or industry unit is measured with all the hardware,

software and data of data mart.

Data mart facilitates each subdivision to utilize, operate and extend

the data without changing information. The progress of each subdivision is

handled within other data marts or the data warehouse. The functional

63

behavior of each data mart is analyzed and used in such a way. The merits

of creating data mart is as follows:-

Data mart access regularly desired data easily

Generates combined view by a group of users

Enhances end-user response time

Lesser cost than employing a complete data warehouse

Since the data mart provided an improved end user response time,

the processing of query in data mart consumes less time to access it.

The data mart is a subset of data warehouse and the data is stored based

on its behavior and grouped under a name.

The FBPDM analyzes the attributes in the data mart and classifies the

attributes based on its functional behavior of the database. In addition

FBPDM manages the data in data mart by analyzing the functional behavior

of the attributes. An experimental evaluation is conducted with benchmark

datasets from UCI repository data sets and compared with existing multi-

functional data warehousing model in terms of number of functional data

attributes, attribute relativity and analysis of functional behavior.

3.3 Impact of Data Warehouse Requirements in Functional Behavior

Data warehouses are high-maintenance systems, since reorganizations,

creation prologue, new pricing schemes, new customers, change in

production systems and so onward are departing to involve the data

warehouse. If the data warehouse is leaving to stay present and creature

64

current is completely crucial for user acceptance, changes to the data

warehouse have to be completed promptly. Therefore, the data warehouse

system has to develop with the industry trends.

The expansion team of a data warehouse system frequently

struggles with this evolutionary performance, because, requirements are

forever changing. For an improved organize of problem, an organization

has to classify the steps that the data warehouse team will complete to

supervise their necessities. Documenting these steps facilitate the

members of the organization to carry out necessary project activities every

time and successfully.

Data warehouse system requirements in wide area of business

decide the functional behavior and available data. Data warehouse

determines the accessibility of data, transformation, organization, data

aggregation and finally calculation. The data warehouse requirements are

capable of establishing the direction in communication with the

stakeholders, increasing the expectation level of the entrepreneur goal.

Data mart design process is completely based on a deep identification of

the data warehouse requirements. The data warehouse requirements are

the backbone for all forthcoming project activities. Business specification

has a major impact on the success of the data warehouse project.

The business of the stakeholders is enriched with the enhancement of

data warehouse system as per their expectation. The base of data

65

warehouse requirements is to fulfill the expectation of entrepreneur according

to enterprise goal. The development team of the data warehouse expects

system to build an accurate, absolute and clear specification way.

The data warehouse requirements are the establishment of all

potential project activities and have a main impact on the achievement of

the data warehouse project. Studies have shown that all data warehouse

projects not at all extend, and not succeed in meeting commerce

objectives. On an average, data warehouses habitually fail as a

consequence of unfortunate communication between IT and business

professionals, as well as developers who have poor project organization

skills and procedures. In order to reach a winning data warehousing

implementation, an immense arrangement of requirements engineering

effort and planning is required.

The enterprise requirements include the following levels as shown in

Figure 3.2.

The figure 3.2 given below illustrates the abstraction level of data

warehouse specification. Requirements from enterprise aspect represent

high-level goal of the administration for the data warehouse system.

Generally, business requirements are stored in a document illustrating

project scope and vision.

66

Figure 3.2 Abstraction Levels of Data Warehouse Requirements

Accurate system construction needs a further clarification of the

enterprise requirements from the stakeholders. Finally, the data warehouse

team requires a complete change of enterprise requirements into

definite, testable and accurate specification. The transformation of

67

business specification results in various levels of abstraction. The other

perspective of the data warehouse system is exposed in every abstraction

level with its own collection of stakeholders.

Requirements of an enterprise-wide data warehouse system decide

its practical behavior and its obtainable information. For example, what

data must be available, how it is distorted and prearranged, as well as how

it is aggregate or planned. The requirements allow the stakeholders to

commune the purpose, create the direction and set the potential of

information goals for the enterprise.

Stakeholders often articulate their needs in universal prospect of the

data warehouse system to improve their business. This business view

describes the goals and anticipation of stakeholders, which is the

establishment of the data warehouse requirements. On the other hand, the

expansion team of a data warehouse system expects an absolute, exact

and unequivocal arrangement of the system it has to build, which means

an additional modification of the business supplies from the stakeholders.

Therefore, it is essential to convert the business requirements to a

thorough, testable and whole requirement for the data warehouse team.

The general advantage of the data warehouse system is identified

with the requirements of the enterprise and entrepreneur. The business

requirements hold the higher level in the abstraction level of the

requirement chain. Business requirements convey business goal, business

68

probability and so on. The abstraction level also describes the user

requirements involving functional, information and other requirements in

the requirement chain.

i. Enterprise Requirements

The top level priority of the requirement chain is given to enterprise

requirements of the data warehouse system. Business requirements

illustrate the effective mission of the administration for the data warehouse

system. The document stores the requirements explaining the reach of the

project. Generally the enterprise requirements are depicted in a system

context diagram enhancing the service of data warehouse system.

The enterprise specifications recognize the general advantage of the data

warehouse system based on the administration and users expectation.

Enterprise requirements expose business mission, business probabilities

and so on extending to customer requirements.

ii. Customer Requirements

Customer requirements define the success of customer on utilizing

the data warehouse system. Customer requirements are generally

collected from the customers and workers operating with the data

warehouse system. The customer represents both the working of the data

warehouse system and typical features non-functional.

The features of the non-functional are essential in improving data

warehouse system working. The concepts and scope established in

69

enterprise requirements are organized by the customer requirements.

Customer requirements are represented in terms of use cases, test cases,

user profiles and so on describing the work of customer involved in data

warehouse system. The task of customer requirements is more effective than

the traditional requirements in extraction of users needs for system working.

iii. System Requirements

System requirements correspond to the data warehouse system

requirements on an extremely detailed level. The system requirements

provide detailed elevated level absolute, fine grained specification of the

requirement as an input for the development team. System requirements

support the customer requirements and enterprise requirements enclosing

a robust condition for requirement verification. System requirements

involve functional, information and other requirements.

Functional requirements identify the functionality developed in the data

warehouse system by the development team, facilitating the customer to

succeed the task. The functionality defined completely satisfies the

enterprise requirements. Functional requirements detain the future

behavior of the data warehouse system. The planned behavior is indicated

in terms of services, responsibilities or functions the system is required to

perform. Functional requirements describe the functionality of the system

analyzed based on the users' requirements.

70

Information requirements illustrate the information requirements of the

business. Information requirement represent the data access and

information delivered to the data warehouse. Information requirement

states the quality, arrival, and process of data. In addition, the information

requirements specify the data combination needed for analysis process

and the analysis method used in data processing.

Other requirements, in addition to functional and information

requirements, are defined to illustrate extra relevant effects of the data

warehouse system, like interface requirements or environmental

requirements such as cultural, political and legal.

iv. Requirement Attributes

The functional, information or other requirements enhance the

attributes by relating the uniqueness in different dimensions that are

significant to customers or to the data warehouse team. The properties and

behavior of the data warehouse system represents the requirement

attributes. Requirement attributes include principles, policy and

circumstances to which the data warehouse system have to obey the rules.

Data warehouse system concerns external interfaces, performance

requirements, design, implementation constraints and quality attributes

specifying the requirement attributes. Requirement attributes are generally

emotionally involved to the detailed system service requirements. With the

known functional specification, requirement attributes determine the

71

behavior and quality of data warehouse system. The attributes like data

excellence and granularity are determined with the known information

requirements.

v. Traditional Requirement

A traditional software requirement is differentiate between two types

of requirements:

1) Functional requirements, and

2) Non-functional requirements.

The traditional requirement differentiation in specification is useful in

data warehouse system. But the data and information delivered is the main

feature of data warehouse system. Information requirements are incorporated

in traditional software requirements, i.e., in the functional and non-functional

requirements. The data warehouse is built for the purpose of delivering

data, fulfilling customer requirements and data quality. Besides, the metrics

like data transmission and data granularity are highly important for the data

warehouse design and more complete than the requirements in traditional

software systems. Therefore, traditional software requirements types are

unsuitable for data warehouse projects.

The characteristic of data warehouse system is information centric.

To facilitate the needs and functional behavior pattern of data warehouse

system, the functional requirements and information requirements are

distinguished explicitly. Generally, just information requirements are

72

appropriate for the data mart of a data warehouse and are less efficient in

building data warehouse. The partition of functional and information

requirements is very beneficial for the data mart of a data warehouse.

A collection of all the necessary customer requirements are gathered for

designing data mart. From the collective requirements, the FBPDM selects

the relevant attribute in data mart termed as attribute relativity.

3.4 An Efficient Functional Behavior Pattern of Data Mart based on

Attribute Relativity

The functional behavior pattern of data mart is efficiently designed

for building the data mart components. The data mart is built with a

well-known attributes behavior of the data stored in the data warehouse.

A data warehouse system maintains the data in an arbitrary manner.

In addition, data warehouse handles the data mart based on the functional

behavior of the data in the data warehouse environment.

The functional behavior of the data is identified based on selecting

the relevant attributes of the system. The architecture diagram of the

functional behavior pattern in data mart based on attribute relativity is

shown in the Figure 3.3.

73

Figure 3.3 Architecture diagram of FBPDM

The Figure 3.3 illustrates the architecture diagram of the functional

behavior pattern in data mart based on attribute relativity.

3.4.1 Process of Data Mart

A data mart is a straightforward structure of a data warehouse

system attended on a distinct subject or functional area such as Finance,

Sales or Marketing. Data marts are frequently build by a particular division

of the organization. The enterprise specifies the single-subject hub and

data marts normally build data from only a small amount of sources.

Data Warehouse

Data Marts DM1 DM3

Identify the relevant attributes of each data mart

Build data storage repositories

Analyze and classify the functional behavior of

each data mart

DM2

74

The sources are inner outfitted systems, a central data warehouse or

external data. The major steps involved in implementing a data mart are :-

i. To design the schema,

ii. Construct the physical storage,

iii. Populate the data mart with data from source systems,

iv. Access it to make informed decisions, and

v. Manage it over time.

The overall implementation of data mart is explained with architecture

diagram of data mart in detailed process and is shown in below Figure 3.4.

Figure 3.4 Process of Data Mart

Data warehouse

Collect the necessary requirements of data

Construct physical and logical structure of

schema

Populate the logical structure of data

Access and manage the information

75

A collection of all the necessary customer requirements are gathered for

designing data mart. In addition, the data source of the collected data is

identified to enhance the data mart design. After identifying the sources of data,

an appropriate subset of data is selected. Simultaneously, the physical and

logical structure of data mart is analyzed and designed for better data

accessing. The physical and the logical construction in connection to the data

mart support rapid and proficient access to the data in data mart.

A construction of physical storage is done by identifying schema

objects, such as tables and indexes established in the data mart design.

Data marts are usually smaller and focus on a particular attribute. It is

taken from 6 attributes and these attributes are numerical based

attributes. The precise information is extracted from the customer

subtype attribute. It is basically more data mart. The physical storage is

also build by determining the design structures of the data stored in the

data mart. The populating data mart design handles the tasks of receiving

the data from the resource, transforming it to the exact lay out and loading

into the data mart. The populating step involved in the design of data mart

is defined as follows:

1. Planning data sources to objective data structures,

2. Retrieving data,

3. Transforming the data,

4. Loading data into the data mart, and

76

5. Producing and hoarding metadata.

The data in the data mart is accessed by querying. The retrieved

data is examined and presented in forms of reports, charts and graphs.

Normally, the end users utilize a graphical front-end device to propose

queries to the database and in displaying the outcomes of the queries.

The following step involves the task of accessing data:

1. Set up a transitional deposit for the front-end tool to utilize. The

transitional deposit act as interface describing database structures

and object names into business terms, and

2. Preserve and handle business interfaces.

The management process of the data mart is performed using the

following tasks:

1. Giving protected access to the data,

2. Organizing the development of the data and Optimizing the system

for enhanced performance, and

3. Ensuring the accessibility of data even after system failures.

Finally, the data mart is designed efficiently with data source of user

requirements, rapid data retrieval on specifying attributes, data

transformation and fast loading of data into data mart. The data mart is

efficiently analyzed and processed over different steps in order to obtain a

data with a specified list of attributes.

77

3.4.2 Efficient Analysis for Identifying the functional behavior pattern

The information in the data mart is managed with relevant selection of

attributes for each data mart. The operational objective of the data mart is

identified based on the selection of relevant attributes. Depending upon the

operational objective, the functional behavior of the data mart is analyzed and

stored as data repositories in the layer. The procedure below describes the

analysis of the functional behavior pattern of the data mart (FBPDM).

Given: Data warehouse (DW)

Step1: Sub-divide the DW into a set of Data marts (DM)

Step2: Let the data mart be sales

Step3: DW is formed using a set of schemas

Step4: Designing data mart [described briefly in section 3.4.1]

Step5: After the formation of data mart (sales details)

Step6: Identify the relevant attributes of data mart such as product details,

price, etc.

Step7: Store the relevant attributes of particular data mart in a data repository

Step8: Based on relevant attributes

Step9: Identify the functional behavior of the data

Step10: Based on functional behavior of data

Step11: Classify the data mart

Step12: End

78

Assume the data mart be sales. The main attributes of sales are

normally being sales details, i.e., product name, product ID, product

expiry date, manufactured date, product description, price and so on.

Identify the relevant attributes of sales, i.e., product description,

ID and price. Based on the relevant attributes selection, the functional

behavior is analyzed and stored in data repository for further analysis

of requirements. The functional behavior of the data mart is efficiently

analyzed and the relevant attributes are also taken from the data

warehouse. The analysis of functional behavior pattern is done through the

selection of attributes which are closely related to the data description.

3.5. Experimental Evaluation

The functional behavior pattern is efficiently analyzed for data mart

based on its attribute relativity. The experiments run on an Intel P-IV machine

with 2 GB memory and 3 GHz dual processor CPU. An experimental study to

estimate the effectiveness and performance of the functional behavior pattern

in data mart based on attribute relations. The effectiveness of the functional

behavior pattern in data mart is estimated on benchmark datasets from UCI

repository with varying characteristics.

The attributes are chosen depending upon the company relativity.

The operational objective of the administration is efficiently analyzed and

the functional behavior is identified based on relevant attributes.

79

The performance of the functional behavior pattern in data mart (FBPDM)

based on attribute relativity is measured in terms of

i) Data Retrieval,

ii) Attribute relativity,

iii) Functional Behavior Analysis,

iv) Data Storage Repositories,

v) Data Mart Management.

Data retrieval involves the task of fast data access from the data

mart. Attribute relativity indicates the relevant selection of attributes from

the designed data mart. The relevant attribute is primarily chosen

depending on needs of the organization. Attribute relativity specifies the

rate of attributes present in the data mart that are more closely related to

the operational goal of organization or data mart.

The functional behavior of data mart is easily analyzed based on

attribute relativity. Functional behavior analysis requires functional data

attributes for analyzing. Functional data attributes are attributes allotted

with a function as a substitute of a value. The function is identified at the

time of desired attribute value. Data storage repository is measured in

terms of memory occupied for storing information of the organization.

The management of data mart is organized based on the data protection,

organization and ensuring trusted access even after the failure of system.

80

3.6. Results and Discussion

The functional behavior pattern in data mart (FBPDM) based on

attribute relativity is effective compared to an existing multi-functional data

warehousing model (MFDWM). The performance of functional behavior

pattern is measured in terms of Data Retrieval, Attribute relativity,

Functional Behavior Analysis, Data Storage Repositories and Management

of the data mart by maintaining the attributes in the data mart in a

successful manner.

The functional behavioral pattern of the data mart or organization is

efficiently analyzed and viewed with relevant attribute of the data mart.

The experiments are conducted with benchmark data sets to estimate the

performance. The table and graph below describe the performance of the

function behavior pattern in data mart based on attribute relativity in

comparison to multi-functional data warehousing model proposed by

Neoklis Polyzotis et al.

3.6.1 Data Retrieval The extraction of the functional data attributes from the data mart

enhances the speed of data retrieval using functional behavior patterns.

The populating design of data mart facilitates the rapid retrieval of data

from the data mart. Fast retrieval of information supports the functional

behavior pattern in selecting the best relevant attributes and the functional

data attributes.

81

Six relevant attributes are tested from this data mart process. These

attributes provide genuine information whereas the other attributes are eliminated because they are available as discrete values. Decision Rule Induction algorithm are used.

The performance efficiency of high-speed data retrieval in FBPDM is

proved with the table and graph as follow:-

Table 3.1 Relevant Attribute Vs. Data Retrieval

Relevant Attributes Data Retrieval (secs)

MFDWM FBPDM 1 7 2 2 12 5 3 16 8 4 19 10 5 21 11 6 25 13 7 27 15

The table 3.1 describes the retrieval of data from data mart for

analyzing the functional behavior. The outcome of the FBPDM is compared

with a multi-functional model in data warehouse.

The data mart 1,2,3 refers to the number of data mart created

from single dataset. It refers to the size and attributes length created

from data mart. Five data mart are created from a single dataset. Data

retrieval based on relevant attribute is measured in terms of seconds.

82

Figure 3.5 Relevant Attributes Vs. Data Retrieval

The graph in Figure 3.5 describes the process of retrieving data from

data mart with relevant attributes. Based on the number of attribute

relativity, the rapid data retrieval from data mart is high by about

10-20% in the FBPDM in contrast to multi-functional model in data

warehouse. The physical and logical structure of data mart is designed for

fast data accessing and retrieving. The physical and the logical

construction in connection to the data mart support rapid and proficient

retrieval of data from data mart. Attribute relativity percentage

calculated as under:

83

3.6.2 Attribute Relativity

The Functional Behavior Pattern identifies the relevant attributes

based on the operational goal of the data mart. The functional data

attributes are analyzed efficiently based on relevant attributes of the data

repositories. The attribute identification of the data mart decides the

attribute relativity. The relevant attribute is selected from the collection of

functional attribute, information attribute and other attributes. The relevant

attribute is decided, based on the importance of the individual attribute

properties.

Multi-functional data model works in recognizing the process of data

mart. But the FBPDM efficiently identifies the operational goal of the data

mart by selecting the relevant purposeful data attributes. The attribute

relativity is based on the number of attributes which are closely related to

the operational objective of the data mart.

Table 3.2 Data Mart Vs. Attribute Relativity (%)

Data Mart Attribute Relativity (%)

MFDWM FBPDM 1 5 10 2 12 13 3 10 18 4 15 20 5 18 23 6 20 28 7 26 36

84

The table 3.2 illustrates the attribute relativity of functional data

attributes present in the data mart for analyzing the functional behavior.

The outcome of the FBPDM is compared with multi-functional model in

data warehouse. The performance of FBPDM is evaluated in terms of

attribute relativity. Attribute Relativity is measured in terms of percentage.

Based on the table 3.2, a graph is depicted as follows:-

Figure 3.6 Data Mart Vs. Attribute Relativity (%)

The graph in Figure 3.6 describes the attribute relativity of the

functional data attributes present in the data mart. Based on the number of

attributes available in the data marts, the attribute relativity is selected from

the data mart. A high attribute relativity of about 5-10% is seen in the

FBPDM compared to multi-functional model in data warehousing.

A physical storage is built by determining the design structures of the data

stored in the data mart, supporting the selection of relevant attributes.

85

3.6.3 Functional Behavior Analysis

The operational objective decides the functional behavior pattern of

the data mart. The operational objective of the data mart is identified,

based on the selection of relevant attributes. The information in the data

mart is managed with relevant selection of attributes for each data mart. An

analysis on functional behavior of the data mart is determined to prove the

better performance of FBPDM. The multi-functional data model fails in

analyzing the functional behavior of data mart. But the FBPDM works well

in identifying functional behavior pattern in data mart. As a proof to verify

the efficient working of FBPDM, a table and graph are illustrated below.

Table 3.3 No. of Data Marts Vs. Functional Behavior Analysis

No. of Data Marts Functional Behavior Analysis (%) MFBDW FBPDM

1 24 45 2 33 51 3 38 59 4 46 63 5 51 76 6 62 87 7 69 99

The table 3.3 shows the functional behavioral analysis of the data mart

and the outcome of the Functional behavior pattern in data mart are compared

with multi-functional model in data warehouse. Functional Behavior Analysis

based on the data mart is measured in terms of the percentage.

86

Figure 3.7 No. of Data Marts Vs. Functional Behavior Analysis

The graph in Figure 3.7 describes the efficient analysis of functional

behavior of the data mart. Based on the number of data marts, the

analysis of functional behavior in the data mart is high by about 20-30%

in the FBPDM compared to multi-functional model in DW. The attribute

relativity supports the determination of operational goal enhancing the

functional behavior analysis of the data mart.

3.6.4 Data Storage Repositories

The functional behavior aims in building layers of data storage


in data mart (FBPDM). The data storage repositories efficiency is decided

based on the memory occupied by the data. The allocation of data onto the

memory location is decided based on the data capacity of the organization.

87

The functional behavior is stored in data repositories layer. Data storage

repository is inversely proportional to data accessibility.

Table 3.4 No. of Data Vs. Data Storage Repositories

Number of Data Data Storage Repositories (MB) MFBDW FBPDM

100 16 9 200 21 12 300 29 23 400 34 28 500 45 36 600 56 41 700 72 53

The tabulation in table 3.4 shows the number of data versus data

storage repositories. Data Storage Repositories is measured in terms of

percentage. Based on the table 3.4 a graph is depicted as follows:-

Figure 3.8 No. of Data Vs. Data Storage Repositories

88

The graph in Figure 3.8 describes the data storage repositories in

efficient analysis of functional behavior of the data mart. The data storage

is reduced in FBPDM increasing data accessibility by about 10-20%

compared to MFDWM. The interface transitional deposit for the front-end

tool is designed to utilize database structures in data storage.

3.6.5 Response Time

Time required to access the information in the data is denoted as

response time. Response time is measure in seconds. Response time is

measured in terms of the milliseconds (ms) based on the data accessibility.

The performance of functional behavior pattern in data mart is depicted in

comparison to multi functional model. A table and graph are depicted

below to prove FBPDM efficiency.

Table 3.5 Data Accessibility Vs. Response Time

Data Accessibility Response Time (ms)

MFBDW FBPDM 10 0.76 0.42 20 0.97 0.71 30 1.61 1.01 40 2.42 1.32 50 3.04 2.12 60 3.25 2.41 70 3.76 2.83

The tabulation in table 3.5 illustrates the data accessibility versus

response time. Based on the table 3.5, a graph is depicted as follows:-

89

Figure 3.9 Data Accessibility Vs. Response Time

The graph in Figure 3.9 shows the time required to access the

information in the data, i.e., response time. The response time is less in

FBPDM increasing data accessibility by about 10-20% compared to

MFBDW. The business interface called transitional deposit for the

front-end tool is preserved and handled reducing data accessibility time.

3.6.6 Data Mart Management

The data mart process involves trusted data access, data

organization, optimizing system and promised data accessibility even after

system failure. The performance of data mart management is determined

in comparison to multi-functional model. A table and graph are depicted

below to prove FBPDM efficiency.

90

Table 3.6 No. of Data Mart Vs. Data Mart Management

No. of Data Mart Data Mart Management (%)

MFBDW FBPDM 10 54 76 20 57 79 30 60 81 40 63 83 50 67 86 60 69 89 70 72 93

The tabulation in table 3.6 illustrates the data mart management.

The Data Mart Management is measured in terms of percentage. Based

on the table 3.6, a graph is depicted as follows:-

Figure 3.10 No. of Data Mart Vs. Data Mart Management

91

The graph in Figure 3.10 describes the data mart management with

the given number of data mart. The greater data mart management is

provided in FBPDM by about 20-25% compared to MFBDW. The data

mart is designed efficiently with data source of user requirements, rapid

data retrieval on specifying attributes, data transformation and fast loading

of data into data mart.

Finally, it is being observed that the Functional Behavior pattern

technique efficiently analyzed the functional behavior of the data mart

based on the selection of relevant attributes of the data mart by building

data storage repository layers.

3.7. Summary

Data are stored as a collective data repository with better storage efficiency. Functional behavior pattern in data mart (FBPDM) achieved the analysis of functional behavioral of the data mart based on attribute relativity. The data mart is designed efficiently with data source of user requirements, rapid data retrieval on specifying attributes, data transformation and fast loading of data into data mart.

In existing multi-functional models in DW, the process of data mart is

identified and processed but fails in the analysis of functional behavior. The

issues raised over existing multi-functional models are efficiently handled

by the proposed Functional behavior pattern in data mart based on

attribute relativity (FBPDM).

92

The Functional behavior initially identifies the relevant attributes of

the particular data mart. The attribute relativity is selected specifying the

detailed description of the operational goal of the data mart. The Functional

behavior pattern in data mart achieved the analysis of functional behavioral

of the data mart based on attribute relativity. The data mart is designed

efficiently with data source of user requirements, rapid data retrieval on

specifying attributes, data transformation and fast loading of data into data

mart.

An efficiently developed Functional behavior pattern in data mart

using attribute relativity performed the functional analysis. The experimental

results showed that the functional behavioral pattern for data mart is

analyzed efficiently by using the metrics of data retrieval, attribute relativity,

functional behavior analysis, data storage repositories and management of

the data mart by maintaining the attributes in the data mart successfully.

Publication

[1] “Functional Behavior Pattern For Data Mart Based on Attribute

Relativity”, in the International Journal of Computer Science Issues on

Volume 9, Issue 4, No.1, July 2012, pp. 278-283

93

4. AN EFFICIENT INDUCTIVE RULE MINING NEW

TECHNIQUE IN EXTRACTING USER DEMANDED DATA

FROM DATA MART

4.1 Introduction

Data warehouse is a database for reporting and analyzing the data

stored in the repositories. A data mart acts as the accessing form of the

data warehouse. Data mart obtains the data out from the data warehouse

to the users. Accessing of data in the data warehouse is a challenging

approach as the user requires better understanding of the data structure

stored in the repositories. Data mart is introduced to enhance data

accessing in better perceptive. Data marts built separate functional data

repository layers based on the requirements and needs of the corporate

data applications. The data warehouse models fails in better understanding of

the functional requirements.

Data warehouse model needs efficient decision support system to

extract the required user demanded data from data warehouse. Existing

functional behavior pattern works in only identifying the functional activities

of the data mart based on attribute relativity. Functional behavior pattern

does not extract the user demanded information from the repositories.

Inductive rule mining (IRM) is proposed to improve the decision support

system to extract the required user demanded data from data mart.

94

The decision support with inductive rule mining on functional data

marts segregates the layered data repository and extracts the required

information for the user. The induced rules are proposed to be the

supportive knowledge for identifying the user needed information.

An experimental evaluation is conducted with benchmark datasets from

UCI repository data sets. The inductive rule mining is compared with

existing functional behavior pattern for data mart based on attribute

relativity in terms of number of decision rules, extracted data relativity,

analysis of functional behavior.

4.2 Rule Induction for Supporting Decision Making

Inductive rule mining (IRM) is proposed to improve the decision

support system by extracting the required user demanded data from data mart. The decision support with inductive rule mining on functional data marts segregates the layered data repository and extracts the required information for the user. The induced rules are proposed to be the supportive knowledge for identifying the user needed information.

As the request from the users in distributed system increases, IRM technique execution time decreases gradually.

Recently, considerable attention is paid to utilize the machine mining

techniques as tools for decision support. The inductive rule mining with

decision making methods is applied to a wide variety of problems in data

warehouse because of their ability to discover user demanded information

95

from data mart. The addition of inductive rule mining methods with

conventional decision support systems provides a means for extensively

improving the quality of decision making.

A decision support system employs induction rule mining techniques

to derive knowledge directly from data mart and to filter the knowledge

continually. Inductive mining is perhaps the most widely used machine

mining technique. Inductive mining algorithms are simple and fast. Another

advantage is that inductive mining generate models that are easy to

understand. Finally, inductive mining algorithms are more accurate

compared with other machine mining techniques.

4.2.1 Decision Making on Inductive Rule Mining

The term Decision Support (DS) is used often and in a

combination of frameworks associated with decision making. Recently

decision support system is frequently declared in relation with Data

Warehouses (DW) and On-Line Analytical Processing (OLAP). Another

present trend is to connect decision support with data mining. A data

warehouse sustains a copy of data from the resource operation systems.

The architectural complication presents the chance to continue data

account, incorporate data from various source systems, allowing an inner

analysis across the enterprise. The data quality in data warehouse is

normally improved, by presenting reliable codes and metaphors,

weakening bad data fitting. The data warehouse presents the association

96

information constantly and offers a distinct general data model for all data

of interest apart from the data source.

Decision support usually engages the combination of data and

knowledge organization to support human on creating efficient alternatives.

The online route framework satisfies scalable delivery to well distinctive

individual decision making. Decision making is on the source of stable

varying requirements to involve a rapid reaction. Conventional natural

methods of decision making are no longer sufficient to contract with such

obscured situation. The role of Decision Support in Decision Making

process is shown in Figure 4.1.

Figure 4.1 Roles of DS in Decision Making Process

Decision Making

Decision Systems Decision Sciences

Normative Descriptive Decision Support

97

A data mart is a decision support system in a separation of the

endeavors data paying attention to exact functions or behaviors of the

enterprise. Data marts is specific trade values such as determining the

collision of advertising promotions, or determining and forecasting sales

presentation, or determining the collision of new product prologues on

company profits, or determining and forecasting the presentation of a novel

business division. Data Marts are precise trade software applications.

A data warehouse is different from a data mart, bonds with numerous subject

areas. Data marts are naturally executed and controlled by an essential

organizational unit such as the corporate Information Technology (IT) group.

To improve the data extraction process in data mart, the work

includes inductive rule mining algorithm. The inductive rule mining is an

extensive approach used in the data mart for an extraction of information

from the data storage repositories by forming a set of decisive rules.

4.2.2 Rule Induction for Supporting Decision Making

Rule induction is a part of machine mining involving proper rules to

extract data from a set of observations or data warehouse. The extracted

rules represent a complete model of the data in the form of a rule set or

symbolize local patterns in the data in the form of individual rules. The

general form of each rule is an if-then rule:

IF Conditions THEN Conclusion

98

Conditions contain one or more conjunction of attribute tests. The

test attributes include features of the form as follows:-

i. Attribute is equal to possible value for categorical attributes. (Aj = vj)

ii. Attribute is less than threshold value for numeric attribute. (Aj < v)

The threshold value does not need to correspond to a value of the

attribute. The form of the Conclusion part of the rule depends on the type

of the rule. In the supervised rule mining setting, rules are induced from

dataset. Rules learned in an organized manner are typically used for rule

mining. The rule mining task is defined as follows: Given a set of training

samples, discover a set of rules that is used for decision making on new

instances. Classification rules inductions are of the form:

IF Conditions THEN Class = Target Variable.

In the above case, the Conclusion consists of the target variable

associated with one of the sample dataset. Induction rules are not

individual rules, but rather parts of models or rule sets that work together to

filter new instances. Rule sets are ordered in the if-then-else form or

unordered where each rule votes for a dataset.

In the independently rule mining setting, rules are induced from

unknown data. The goal of freely rule mining is to determine interesting

relations between variables or attributes. Association rule inductions are of

the form

IF Conditions THEN Conclusions,

99

where both the Conditions and the Conclusions are conjunctions of

attribute values or items, depending on the data format. Each rule is an

individual local pattern in the data, not related to other rules. Compared to

classification rule induction mining the approaches used in association rule

induction mining are usually complete and therefore promise optimality of

results in terms of support and confidence.

Traditional association rule induction mining constructs plenty of

redundant rules, which is due to their individuality. The redundancy is avoided

based on the closed frequent item set. The number of non- redundant rules is

substantially smaller than the rule set from the traditional approach, as closed

sets provide compacted data representations.

Assume a set of data, with well defined attributes and their values

and a selected nominal attribute called the target attribute. The goal of

inductive rule mining is to induce rules in the dataset and to explain the

relation between the target attribute and the other attributes in the data

deriving user understandable data. In other words, rule induction is a

process of inducing a set of understandable rules in the dataset form from

described data. The induction rule format restricts the conditions part of the

rules to be in the target attribute and also the values generated in the

conclusion part of the rules.

The purpose of rule induction is to extract the user demanded data.

The idea of rule induction is to look through the rules to increase

100

understanding in the data field. The final objective of rule induction is to

permit the user to recognize the development of essential data.

Differentiating classes of data characterizes the database. Efficient

decision rule induction algorithms emerges data in the data warehouse. The

purpose of emerging data is to capture the data in time. Subsequent emerging

data research has largely focused on the use of the discovered data from data

warehouse. In a correct point of view, emerging data are association rules

induction with an itemset in rule predecessor, and a fixed resultant: Itemset!

D1, for given data set D1 being compared to another data set D2.

The measure of quality of emerging data is the development rate.

Development rate provides a matching ordering to confidence in the

association rule induction. Rule induction mining represents emerging data.

The IRM approach is to extract data in data mart using rule induction mining.

The method is based on table structure from which the data are extracted.

The data extraction depends upon decision support system. Decision making

support system increases significantly from one data set to another.

The measure of quality of data extraction is the development rate.

The rule induction with an itemset in rule predecessor, and a fixed

resultant: ItemSet→D1, for given data set D1 being compared to another

data set D2. The rule induction with an itemset in rule predecessor, and a

fixed resultant: ItemSet→D1, for given data set D1 being compared to

another data set D2. Thus the development rate provides a matching

101

ordering to confidence. Jumping data are extracted with support zero in

one data set and greater than zero in the other data set. A sample training

data set is given below to elaborate the data extraction according to user

demands using induction rule mining.

In the above training sample dataset, the attributes are car, color

and possible value exists according to attribute. The decision making lies if

the attribute car is swift and color is silver. The induction rule mining

involves the decision making on the basis of attribute value selection.

For instance, the car model is Innova and color is changed to white the

approval to data extraction is negotiated.

Induction rule mining argues that extracting all the data above a

minimum development rate constraint generates analyzed data by domain

experts. IRM works on selecting the user required data from data

warehouse. Induction rule mining introduces a query based approach to

extract data in microarray data. The method is based on growing decision

making from which the data are extracted. IRM combines data search with

a statistical procedure based on exact test to assess the significance of

Car =Swift AND Color = Silver → Approved = yes

Car = Swift AND color =White → Approved = no

Car = Innova AND color = White → Approved = no

Car = Swift AND color = White → Approved = no

102

each emerging data. After, test mining based on the data inferred,

extraction is performed using maximum-likelihood linear distinct analysis.

The induction rule mining involves the identification of each sub-group

for user demanded data extraction. The job of sub-group detection is defined

as property of individual data identification in the sub-group. The sub-group of

the data is identified in the inductive rule mining which provides easy

extraction of data in data mart. Sub-group characterization is combinations of

features with a selected data of individuals. A sub-group characterization is

seen as the condition part of a rule Sub-group characterization class.

The sub-group characterization is noticed as a unique case of more

common rule induction mining task. Sub-group identification research has

evolved in several directions. On the one hand, in-depth approaches

guarantee the optimal solution in data extraction given the optimization

criterion or queries. The techniques for induction rule mining together with

constraints appropriate is utilized in data extraction.

Relational sub-group characterization in induction rule mining is

designed for spatial data mining in relational spatial databases. The approach

to relational sub-group identification is attained through properly adapting

induction rule mining and first-order feature construction. Other non-relational

sub-group identification algorithms are developed including an algorithm

for exploiting background knowledge in sub-group identification. In addition

103

a fuzzy system proposed in Suyun Zhao et al, is utilized for handling

sub-group characterization tasks.

The attraction of a subgroup depends on its unnaturalness and size.

The rule quality evaluation on data mart needs to combine both

unnaturalness and size factors. Weighted relative accuracy is used by

decision rule induction algorithms in a different formulation and in different

variants as per presented by Yuhua Qian et al.

In addition, the generalization quotient is also used by the decision

rule induction algorithm. Sub-group data miner uses the classical binominal

test to verify if the target data extracted is significantly different in a sub-

group as compared to the whole data warehouse. Different techniques are

used for eliminating redundant sub-groups. Algorithms decision rule mining

is utilized to achieve rule diversity. The data extraction technique with

induction rule mining coupled with constraints appropriate for descriptive

rules.

Car = Swift → Approved = yes

Color = Silver → Approved = yes

Car = Innova AND color = White → Approved = no

Company = Hyundai → Approved = yes

Car = Swift AND color = silver AND Company = Hyundai→

Approved = yes

104

In the above case, the attributes are car, color and possible value exists

according to attribute. The decision making lies if the attribute car is swift and

color is silver. The induction rule mining involves the decision making on the

basis of attribute value selection. For instance, the car model is innovative and

color is changed to white the approval to data extraction is negotiated.

The rule induction process is conceived as a search process. A metric

is needed to estimate the quality of rules found in the data warehouse and

to direct the search towards the best rule. Man Lung Yiu et al, proposed

metrics regarding rule with less efficient processing. The rule quality

evaluation is a key element in rule induction. In real-world applications, a

typical objective of a mining system is to handle induction rules. Induction

rules make perfect rule quality measure that accepts both training accuracy

and rule coverage into description so that the rules inductions are both

accurate and reliable. A quality measure is estimated from the available

data. All common measures are based on the number of positive and

negative instances covered by a rule. The decision making system based

on induction rule mining is elaborated in the following Figure 4.2

105

Figure 4.2 Decisions Making based on Inductive Rule Mining

The Figure 4.2 illustrates the decision making, based on the

inductive rule mining approach. Each data mart is generated from the data

warehouse segregating into individual sections. The data mart is constructed

based on the physical and logical structure. The populating data mart

design handles the tasks of receiving the data from the resource,

transforming it to the exact layout and loading into the data mart. The

attribute relativity is identified with relevant attribute selection of each data

106

mart. The functional behavior pattern of the data mart is identified based

on the attribute relativity.

4.3. Efficient Analysis to Extract User Demanded Data in Data Mart

Using Inductive Rule Mining

The work is efficiently designed for extracting the user demanded

information from the data storage repositories using inductive rule mining

(IRM). Extracting user needed information using IRM follows two set of

operations. The first operation describes the process of decision making

system and the second operation is to describe the decision inductive rule

mining for extracting the information based on inductive rules derived from

the large set of repositories. The process of the complete IRM is shown in

Figure 4.3.

107

Figure 4.3 Architecture diagram of the IRM

The figure 4.3 illustrates the architecture of the Induction Rule

Mining. The data mart is a subset of data warehouse. The data mart

consists of a set of information based on the operational objective of the

attributes present in the traditional organization. After organizing the data

mart, the inductive rule mining is applied to derive a set of rules based on

the functional attributes of the present data. The IRM presents a set of

rules and easily makes the decision process to extract user demanded set

of information. The formation of decision rules with inductive rule mining

process is computed, based on the processes involved in IRM.

108

4.3.1 Decision Support System in data mart

Decision Support System is certainly an ingredient of resolution

making processes. A decision is termed as the selection of one between a

number of substitutes. Decision making signifies the entire procedure of

building the choice containing:

i. Evaluating the problem,

ii. Gathering and validating information,

iii. Recognizing choices,

iv. Predicting consequences of decisions, and

v. Calculate decisions.

Initially, the problem occurrence is evaluated. The set of information

user demanded are collected. The information is validated to clarify the

correct data extraction. After information validation the choices are

identified in the process of decision making. Determine impact of decision,

an outcome in advance. Finally, compute the decisions. The decisions are

made, based on the data mart attribute selection using inductive rule

mining.

The decision making process is done with three main strategies:

1. Intelligence,

2. Design, and

3. Choice.

109

The work of intelligence is to find the problem and to analyse the

problem generated. The design involves the formulation of solutions,

representation and simulation, and atlast, deciding the choice, i.e., decision

making and implementation.

In data mart, the process of identifying the functional operational

goal of the system is analyzed and being processed. The IRM solves the

problem of determining the analysis of the given validated information in

the data warehouse. The representation of problem and the decision

making process is implemented with solution. IRM presents simulation of

the information available in the data warehouse.

4.3.2 Decision rule induction for extracting information

In data warehousing technology with the promising growth, a massive

expensive resource of information is persuading with functional decision rules.

But with the traditional method, the number of engendered decision rules is

incredible. IRM intend a diverse strategy of inducing definite and probable

decision rules. The induction process is generated by the query in the IRM.

The information concerning the user requirement is stored in a table structure

and decision rules are persuaded by posing query on any attribute.

Using IRM scheme, the relevant inductive rules are minimized. The

framework of the inductive rule mining for extracting the user needed

information is shown in Figure 4.4.

110

Figure 4.4 Process of Decision rule induction for extracting user

information

The framework of decision rule induction for extracting the user

required information is invoked by users query. After query processing, the

supporting data structure is restructured with the tables and attribute's

name mined from the query. The number of times attribute used is

calculated by the column hit in the table form.

111

The counter is varied in descending order to set the most commonly

used attribute in the first row. The most commonly used attribute is referred

by user queries. Efficiently processes range queries on table and ensure

data access as per proposal of Theoni Pitoura et al. Hence, it is important

in producing decision rules on the attribute value. The approach of

inducing decision rules supported on the most commonly inquired attribute

is described in the following algorithm.

An Efficient Algorithm Decision Rule Induction

Input: Users query and a data warehouse

Step 1: Extract table names Ti and attribute names Aj from the query in

DW.

Step 2: Process the attribute ranking (AR) table and revise the hit

counter recognized by each Ti and Aj.

Step 3: Arrange the AR-table based on the hit value in descending

order.

Step 4: Extract the top row of AR-table to achieve T1 and A1.

Step 5: Create a decision table A = <U, A, d> where d = A1, A = a set of

attributes in T1, U = a set of records in T1.

Step 6: Pre-process A by

i. eliminating attributes with number of discrete values = | T1|

ii. discrediting attributes with genuine values

112

Step 7: Partition U into similarity classes

Step 8: Generate certain, negative, positive rules

Step 9: Generalize all decision rules using dimension tables and

hierarchical information from the data warehouse.

Step 10: Include rules into the knowledge base

Output: Decision rules

Decision rules are normally used in classification and prediction of

data obtained in the data mart. Decision rule is nevertheless a dominant

way of data representation. The models formed by decision rules are

characterized and identified. The possible rules of the data are identified

and referred by the user queries. Induction based decision rule algorithm is

used to select the hit attribute at each attribute in the data storage

repositories. The attribute with the highest hit is selected as similarity

classes for the data warehouse. The highest hit attribute reduces the

information. In addition, the hit attribute provide the data demanded by the

user and categorize the attributes in the resulting partitions.

4.4. Experimental Evaluation on Induction Rule mining

The IRM approach in extracting user demanded information is efficiently

analyzed, based on inductive rule mining and is implemented using Java.

An experimental study is done to estimate the effectiveness and performance

of the extracting user required data from data mart using inductive rule mining.

113

The effectiveness of the IRM is estimated on Insurance company

benchmark datasets from UCI repository with varying characteristics.

Information about customers consists of 86 variables and includes

product usage data and socio-demographic data derived from zip area

codes. But the work involves 10 attributes taken out from dataset for

evaluating the performance of the IRM technique for extracting user

demanded information from repositories. The attributes used in customer

dataset are name, age, education, income, marital status, number of

houses, number of children, number of insurance policies, number of

motorcycle / car policies and number of life insurances.

Based on the formation of inductive rules, the user demanded

information is extracted from the data repositories stored in the data mart.

The inductive rule is formed, based on the decision rules, and the decision

is done efficiently by choosing the attributes related with operational goal.

The performance of the extracting user demanded information for data

mart using inductive rule mining is measured in terms of the following:-

i) Decision Rules,

ii) Extracted Data Relativity,

iii) Reliability,

iv) Running Time, and

v) Rule Coverage.

114

Decision rules are a set of functions. Decision rule draws an

examination to a proper action. The decision rules play an important role in

data warehousing concepts to determine the set of rules for extracting the

user demanded information from the data storage repositories. After the

formation of decisive rules, the user required data are retrieved from the

repositories. The retrieved data relativity is found out using the decisive

rule induction algorithm process in terms of extracted data relativity

process. Reliability is the term used to identify the reliable manner of data

extraction from the data storage repositories. The time taken to extract the

user required data from data mart, using inductive rule, is termed as

running time.

4.5. Results and Discussion

The inductive rule mining is compared to the functional behavior

pattern in data mart based on attribute relativity. Extracting user demanded

information in data mart using inductive rule mining is effective in terms of

number of decision rules, extracted data relativity, running time, rule

coverage and reliability of the data mart by maintaining the decision rules

in the data mart in a successful manner.

The functional behavioral pattern of the data mart for organization is

efficiently analyzed and viewed. In addition, the use of inductive rule

mining enhances the extraction of user demanded data. The experiments

are conducted with insurance company benchmark data sets to estimate

115

the performance of the IRM. The below table and graph describe the

performance of the extracting user demanded information in data mart,

using inductive rule mining with an existing scheme.

The output of IRM is extracting from the information provided

by the user. For example if any user select one attribute, it displays

the output based on the available and demand information.

4.5.1 Decision Rules

Decision rules are a set of functions. Decision rule draws an

examination to a proper action. The decision rules play a vital role in data

warehousing concepts, to decide the set of rules for extracting the user

required information, from the data storage repositories. Decision rules are

generally used in classification and prediction of data obtained in the data

mart. Decision rule is straightforward but is a dominant way of data

representation. The models formed by decision rules are characterized and

identified. The possible rules of the data are identified and referred by the

user queries.

In table 4.1, the number of user queries is in the range of 5, 10,

15 which is selected for experimental purpose. The queries are

issued by the user and accordingly based on the user demand the

reply is provided.

116

Table 4.1 No. of User Queries Vs. Decisive Rules

No. of user queries Decision rules (%)

FBPDM IRM 5 2 6 10 7 10 15 11 14 20 13 18 25 15 22 30 21 28 35 24 31

The table 4.1 describes process of decisive rules formed in the data

mart. The FBPDM is used for analyzing the functional behavior of user

required information. The decision rule based on the user queries in

FBPDM and IRM is measured in terms of percentage.

The outcome of the IRM for extracting user demanded data is

compared with an existing Functional behavioral pattern in data mart.

As the information concerning about the user requirement is stored in a

table structure and decision rules are persuaded by posing query on any

attribute. Based on the table 4.1, a graph is depicted as follows:-

117

Figure 4.5 No. of User Queries Vs. Decisive Rules

The graph (Figure 4.5) describes the decisive rules formed in the

data mart. The IRM for data mart efficiently derived the decision rules

and the variance is high by about 5-15% high compared to FBPDM.

The inductive rule mining is applied to derive a set of rules based on the

functional attributes of the data present.

4.5.2 Data Relativity

The retrieved data relativity is found out using the decisive rule

induction algorithm process in terms of extracted data relativity process.

The decision rules are formed on the basis of rule induction. After the

formation of decisive rules, the user required data are retrieved from the

repositories.

118

Data relativity facilitates corporations to reduce the difficulty and

organize the costs associated with enterprise. The data relativity allows

corporations to customize workflows more specific to the needs of the

users enhancing enterprises extra flexible platform. The workflow is

automated and linked to custom validation logic, sustaining guarantee work

product that is distinguished, justifiable and easily analyzed.

Table 4.2 Data Mart Vs. Extracted Data Relativity

Data Mart Extracted data relativity (%)

FBPDM IRM 5 35 48 10 38 51 15 42 54 20 47 59 25 50 62 30 53 64 35 59 67

The table 4.2 describes the extracted data relativity of functional data

attributes requested by the user. Data mart acts as the accessing form of

the data warehouse. User required efficient extract data from data

mart. It is not same for user queries and data mart. The outcome of the

IRM for extracting user demanded data is compared with functional

behavioral pattern in data mart. Based on the table 4.2, a graph is depicted as

follows:-

119

Figure 4.6 Data Mart Vs. Extracted Data Relativity

The graph in Figure 4.6 describes the extracted data relativity of

functional data attributes requested by the user. Extracted data relativity

using the data mart is measured in terms of percentage. FBP for data

mart only identified the process of data mart but the IRM for data mart

efficiently extracted user demanded information and the variance is

10-15% high. Based on the number of decisive rules formed by decision

inductive rule algorithm, the extracted data relativity present in the data

mart is high in the IRM in data mart.

4.5.3 Reliability Reliability is the term used to identify the reliable manner of data

extraction from the data storage repositories. Reliability is used to illustrate

the overall stability of a data extraction measures. The measure of data

extraction is said to contain a high reliability if it produces high

consequences under consistent conditions. Data reliability is the ability of

120

an Inductive rule mining system to perform data extraction as per user

requirement functions under stated query conditions for a specified period

of time.

When the users have increasing automatically HZ increased (Juan Manuel Pe´ rez et al). For example Single user gained with 2 HZ Gained HZ =2

Total HZ= Number of users * Gained HZ

Table 4.3 Number of Users Vs. Reliability

No. of users Reliability (Hz)

FBPDM IRM 10 28 44 20 31 50 30 35 54 40 38 58 50 43 64 60 49 68 70 52 73

The Table 4.3 describes the reliability of IRM in the data mart. The

Inductive rule mining (IRM) provides high data reliability as per the above

table values and is compared with Functional Behavior Pattern in Data

Mart (FBPDM). Based on the table 4.3, a graph is depicted as follows :-

121

Figure 4.7 No. of users Vs. Reliability

The graph in Figure 4.7 describes the reliability of inductive rule

mining in data extraction. The IRM provides high data reliability by

around 20-25% compared to FBPDM. Reliability is measured based on the

user count in terms of the Hertz (Hz). The user requested information in the

data mart is extracted based on the decisive rules formed accurately.

Even though the number of users increases, the reliability of the data mart is

high in IRM.

4.5.4 Running Time

The running time is measured based on the data set loading time

and rule induction time. The time taken to execute the inductive rule mining

algorithm in extracting the user demanded data is termed as running time.

The running time is measured in seconds. The speed acceleration varies

among the data sets. The improvement is more significant on large data

122

sets. To prove the better performance of IRM in terms of running time, it is

compared with FBPDM.

Table 4.4 Data Extraction Vs. Running Time

Data Extraction Running Time (seconds)

FBPDM IRM 10 27 14 20 31 27 30 43 34 40 57 43 50 64 56 60 78 68 70 83 74

The Table 4.4 describes the reliability of IRM in the data mart. The

outcome of the IRM for extracting user demanded data is compared with

Functional behavioral pattern in data mart. Based on the table 4.4, a graph

is depicted as follows:-

Figure 4.8 Data Extraction Vs. Running Time

123

The graph in Figure 4.8 describes the reliability of inductive rule

mining in data extraction. The IRM facilitates less running time around

10-20% compared to FBPDM. The running time of FBPDM and IRM

based on data extraction is measured in terms of seconds. The user

requested information in the data mart is extracted based on the decisive

rules formed accurately. Even though the number of users increases, the

reliability of the data mart is high in IRM.

4.5.5 Rule Coverage

The rule induction process is considered as a data search process in

data mart. The rule coverage metric is required to determine the quality of

rules established in the inductive rule mining to direct the data extraction

towards the best rule. The rule quality measure is a mean element in rule

induction.

In real-world applications, a typical purpose of a decision support

system is to find rules that optimize a rule quality criterion to extract data

with both training accuracy and rule coverage into account. So, the

inductive rule mining is both accurate and reliable. A quality measure of

rule induction is determined from the obtainable data. All common

measures are based on the number of positive and negative instances

covered by a rule.

The IRM used inductive rule mining for handling the user demanded information. The decision support is efficiently done with inductive

124

rule mining on functional data marts segregation of the layered data

repository. Table 4.5 Data Mart Vs. Rule Coverage

Data Mart Rule Coverage (%)

FBPDM IRM 10 38 54 20 44 57 30 52 64 40 58 73 50 62 76 60 74 88 70 87 94

The Table 4.5 illustrates data mart versus rule coverage. The

outcome of the IRM for extracting user demanded data is compared with

Functional behavioral pattern in data mart. The rule coverage based on

the data mart is measured in terms of percentage (%). Based on the

table 4.5, a graph is depicted as follows:-

Figure 4.9 Data Mart Vs. Rule Coverage

125

The graph in Figure 5.3 describes the rule coverage quality measure

on inductive rule mining. The IRM provides better rule induction about

10-15% compared to FBPDM. As the user requested information in the

data mart is extracted based on the decisive rules formed accurately. Even

though the number of users increases, the reliability of the data mart is

high in IRM.

Finally, the IRM facilitates extracting user demanded data in data

mart using decision rule induction algorithm. The experiments concludes

the efficient working of IRM in terms of metrics such as decision rules, data

relativity, reliability, running time and rule coverage. Evaluations are done

with the benchmark datasets using the decisive rules based on the

decision rule induction algorithm and the user demanded information is

taken out in a reliable manner.

4.6. Summary

The functional behavioral pattern in data mart is utilized and

processed in the analysis of data and extraction of user demanded data is

unsatisfied. The issue raised over functional behavioral pattern is it does

not efficiently retrieve the user demanded information based on attribute

relativity. The IRM used inductive rule mining for handling the user

demanded information. The decision support is efficiently done with

inductive rule mining on functional data marts segregation of the layered

data repository.

126

The framework of decision rule induction for extracting the user

required information is invoked by users' query. After query processing, the

supporting data structure is restructured with the tables and attribute's

name mined from the query. The decision support system is achieved in a

reliable manner to extract the user required demanded data from data

warehouse. The IRM is efficiently developed with inductive rule mining

approach to extract the user demanded information in data mart.

Experimental results showed that the Inductive rule mining in data mart

analyzed efficiently and performance is measured in terms of decisive rule

formation, data relativity, reliability, running time and rule coverage contrast

to FBP for data mart based on attribute relativity.

The insurance dataset user’s level selects the attribute data.

Using inductive rule mining based on the user demand data,

segregation is made in an efficient manner. As a resultant data is

easily transformed to the user.

Publication

1. “Mining User Demanded Data From Data Mart Using Inductive Rule

Mining”, in the International Journal of Computer Science Issues on

Volume 9, Issue 4, No.1, Huly 2012, pp.405-410

127

5. AN EFFICIENT FUNCTIONAL LAYERED DATA MART

BASED ON NEW TECHNIQUE OF

INDUCTIVE RULE MINING

5.1 Introduction

Data mart is a division of the data warehouse in a generally oriented

business line. Data warehousing collects the data at different levels. Data

are stored as a collective data repository with better storage efficiency.

Various data warehousing models concentrate on storing the data more

efficiently and retrieving data quickly. Data stored in the repository are

segregated into data layers. The functional behavior of the corporate

system is analyzed to build layers of data storage repositories with relevant

data attributes.

Then, the decision support system is applied with inductive rule

mining on functional data in the layered data repository. Inductive rule

mining (IRM) improves the decision making in retrieving data from data

mart. In addition, IRM facilitates better understanding of the structure in

extracting data from the warehouse. The functional behavior pattern in

data mart (FBPDM) with attribute relativity is compared and the result is

analyzed with existing multi-functional data warehousing model (MFDWM).

The inductive rule mining is compared with the functional behavior pattern

in data mart.

128

The objective of case study on performance analysis of data mart in

functional layered repositories with induction rule mining is to offer an

experiential and scientific foundation in extracting user required data from

data mart. The effectiveness of the IRM mechanism is estimated on

benchmark datasets from UCI repository with varying characteristics.

Performance of functional behavior in data mart and inductive rule mining

in extracting user demanded data is evaluated with Insurance Company

Benchmark (COIL 2000) data set from UCI Repository data sets.

Insurance Company benchmark data set, used in the CoIL 2000

Challenge, contains information on customers of an insurance company.

The characteristic of data set is multivariate and attribute characteristic is

categorical and integer. The data consists of 86 variables, 9,000 instances

and includes product usage data and socio-demographic data. The data

was supplied by the Dutch data mining company Sentient Machine

Research and is based on a real world business problem. The training set

contains over 5,000 descriptions of customers, including the information of

a caravan insurance policy. A test set contains 4,000 customers with or

without insurance policy in caravan organization.

Data warehousing collects the insurance company data. Insurance

company data are stored as a collective data repository with better storage

efficiency. Initially, the functional behavior of the corporate system is

analyzed as policies, based on annual income of the customer. Based on

129

the number of insurance policies, the insurance company repositories are

constructed in layers including car policies, motorcycle/scooter policies,

agricultural machines policies, life insurances, family accidents insurance

policies, fire policies and boat policies with relevant attributes in each

policy. Based on the customer annual income and career, decision making

system decides the insurance policies.

The closely related attributes such as Customer Subtype, Career,

Income and Third party insurance are chosen from insurance company

organization. The functional behavior of the insurance company is

identified as insurance policies with these attributes. Based on the

formation of inductive rules, the user demanded information is extracted

from the insurance company repositories.

The inductive rules are formed based on the decision rules and the

decision is done efficiently by choosing the attributes related to operational

objective of the insurance company. Usually in the data mart, the

extraction of information is done based on users query. Efficiently

processing of range queries on table ensures data access as proposed by

Theoni Pitoura et al. For instance, the customer subtype in the insurance

data mart consists of various policy forms. The policy form of the requested

customer subtype is retrieved, depending upon the users query.

The performance analysis of data mart in functional layered

repositories is evaluated with induction rule mining. Extracting user

130

demanded information in data mart using inductive rule mining is

implemented using Java. The experiments run on an Intel P-IV machine

with 2 GB memory and 3 GHz dual processor CPU. An experimental study

is presented to estimate the effectiveness and performance of the

functional behavior pattern and IRM in data mart. Several experiments are

conducted and evaluations are showed to prove the better extraction of

user required data in data mart.

5.1.1 Attribute Relativity in Insurance Data Mart

Data mart is built with a well-known attributes behavior of the

insurance company data stored in the data warehouse. A data warehouse

system maintains the insurance company data in an arbitrary manner.

In addition, data warehouse handles the insurance data mart, customer

profile data mart and policy data mart based on the functional behavior of

the data in the data warehouse environment. For instance, the insurance

data mart is considered for evaluation.

The relevant attributes such as Customer Subtype, Career, Income,

policies and Third party insurance are chosen from insurance data mart.

The functional behavior of the insurance data mart is identified based on

the relevant attributes of the insurance data mart. The below table and

graph describe the performance of the function behavior pattern in data

mart based on attribute relativity in comparison to multi-functional data

warehousing model as proposed by Neoklis Polyzotis et al.

131

Table 5.1 Tabulation for Attribute Relativity

No. of Attributes in Insurance Data Mart

Attribute Relativity (%) MFDWM FBPDM

10 10 5 20 19 12 30 28 16 40 37 23 50 48 26 60 56 32 70 64 45

The table 5.1 illustrates the relevant attribute selection from number of

attributes in insurance data mart. The attribute relativity based on the

insurance data mart is measured in terms of the percentage. For each and every process, different relative attributes with respect to time are

taken from our dataset. The values are calculated by using the given formula.

Here using different relative attributes for each time, the calculated values differ from each time.

Yes, it is true. When no. of attributes increases then relative

attribute relativity also increase by using above formula.

Based on the table 5.1, a graph is depicted as follows:-

132

Figure 5.1 No. of Attributes in Insurance Data Mart Vs. Attribute Relativity

The graph in Figure 5.1 illustrates the attribute relativity of insurance

company data attributes present in the insurance data mart for analyzing the

functional behavior. A high attribute relativity of about 25- 35% is seen in

the FBPDM compared to multi-functional model in data warehousing.

As an individual attribute character is studied in designing logical storage of

insurance data mart facilitating the selection of relevant attributes.

5.2 Functional Behavior Analysis on Attribute Relativity

The operational objective of insurance company decides the

functional behavior pattern of the insurance data mart. The operational

objective of the insurance data mart is identified, based on the selection of

relevant attributes like Customer Subtype, career, income, policies and

third party insurance. Generally, the information in the insurance data mart,

133

customer profile data mart and policy data mart is managed with relevant

selection of attributes for each data mart.

Table 5.2 Tabulation of Functional Behavior Analysis

No. of Attribute Relativity

Functional Behavior Analysis (%) MFDWM FBPDM

5 5 1 12 12 4 16 16 6 23 22 8 26 27 9 32 30 12 45 39 15

The table 5.2 describes the functional behavior analysis made on

different sets of attribute relativity. The operational goal of the insurance

company is to register insurance policies. The insurance policies are the

functional behavior of the insurance organization which is decided on the

basis of attribute relativity. The functional behavior of insurance data mart

is mainly to provide insurance policies to different fields. In this case, policy

of customer in the insurance data mart is recognized as functional behavior

of insurance company.

134

Figure 5.2 No. of Attribute Relativity Vs. Functional Behavior Analysis

The graph in Figure 5.2 describes the efficient analysis of functional

behavior with relevant attributes. The functional behavior analysis based

on the attribute relativity is measured in terms of the percentage. The

functional behavior of insurance data mart with various attribute

relativity is identified and variance is 25-45% compared to MFDWM.

The operational goal of the insurance company is recognized on the basis

of attribute relativity in analyzing functional behavior pattern.

5.3 Formation of Functional Layered Repositories based on

Insurance Policies

The functional behavior aims in building layers of data storage


in data mart. A collection of all the necessary customer requirements are

135

gathered for designing insurance data repositories. In addition, the

insurance data source of the collected data is identified to enhance the

data mart design. After identifying insurance data sources, the physical

and logical structure of data mart is analyzed and designed for better data

accessing.

The physical and the logical construction in connection to the data

mart support rapid and proficient access to the data in insurance data mart.

The functional behavior is recognized on the operational goal of the

organization i.e., based on the insurance company policies, the data are

stored in the repositories as layers. Based on the insurance policies, the

insurance company repositories are constructed in layers containing car

policies, motorcycle/scooter policies, agricultural machines policies, life

insurances, family accidents insurance policies, fire policies and boat

policies with relevant attributes in each policy.

Table 5.3 Tabulation for Functional Layered Repositories

No. of Insurance Policies

Functional Layered Repositories (kb) MFDWM FBPDM

1 5 2 2 8 4 3 12 6 4 14 9 5 17 11 6 20 13 7 24 16

136

The table 5.3 shows tabulation of functional layered insurance

repositories with the insurance policies. The insurance layered repositories

minimize the storage size. The insurance data storage repositories efficiency

is decided, based on the memory occupied by the data in the insurance data

mart. Based on the table 5.3, a graph is depicted as follows:-

Figure 5.3 No. of Insurance Policies Vs. Functional Layered Repositories

The graph in Figure 5.3 describes the functional layered insurance

repositories in efficient analysis of policies as the functional behavior in

data mart. The data storage is reduced in FBPDM increasing data

accessibility by about 10-20% compared to MFDWM. The layered

formation of the insurance data mart enhances efficient data retrieval.

The interface transitional deposit for the front-end tool is designed to

utilize insurance database structures in data storage. Functional Layered

137

Repositories based on the Insurance Policies number, the repositories are

measured in terms of the Kilo Bytes (KB).

5.4 Efficient Technique for Retrieving data from Functional Layered

Insurance Repositories

The layered formation of the insurance repositories on the basis of

functional data attributes from the insurance data mart improving the speed of

data retrieval. The populating design of data storage repositories facilitates the

rapid retrieval of data from the insurance data mart. The performance efficiency

of high-speed data retrieval in FBPDM is proved with the table and graph as

follows:-

Table 5.4 Tabulation for Data Retrieval

Functional layered Insurance Repositories

Data Retrieval (seconds) MFDWM FBPDM

1 0.95 0.46 2 1.48 0.89 3 2.18 1.56 4 2.54 1.89 5 2.97 2.11 6 3.20 2.63 7 3.64 2.97

The table 5.4 describes the retrieval of data from functional layered

insurance repositories. The outcome of the functional behavior pattern in

data mart is compared with a multi-functional model in data warehouse.

Data retrieval is measured in terms of seconds based on the functional

layer insurance repositories. Based on the table (table 5.4) a graph is

depicted as follows:-

138

Figure 5.4 Functional Layered Insurance Repositories Vs. Data Retrieval

The graph in Figure 5.4 describes the process of retrieving data from

functional layered insurance repositories with functional attributes. Based on

the number of attribute relativity, the rapid data retrieval from data mart is

high by about 10-20% in the FBPDM in contrast to multi-functional model

in data warehouse. The insurance data storage repositories are constructed

in layers based on the functional behavior policies pattern.

5.5 Decision Making on Functional Layered Insurance Repositories

Decision Support system is certainly a part of resolution making

processes. A decision is termed as the selection of one between a number of

substitutes. Decision making on functional layered insurance repositories forms

decision rule induction in extracting user demanded data in data mart. Decision

making signifies the entire procedure of building the choice.

139

The set of demanded customer's policies are collected. The

insurance policy information is validated to clarify the correct data

extraction. After insurance policy data validation, the choices are identified

in the process of decision making. Finally, the decision on utilizing the

policy insurance for the customer is reported. The policy insurance

decisions are made, based on the insurance data mart attribute selection.

Decision support system facilitates the inductive rule mining.

Table 5.5 Tabulation for Decision Support System


Decision Support System (%) FBPDM IRM

1 71 89 2 69 85 3 67 78 4 61 74 5 58 70 6 52 67 7 48 62

The table 5.5 describes the decision made on the insurance

repositories. The problem occurrence is solved with decision on insurance

data validation. After policy information validation, the choices are

identified in the process of decision making. The impact of decision making

in insurance repositories is determined in prior to confirm the data

extraction. Decision Support System based on the Functional Layered

Insurance Repositories is measured in terms of percentage.

140

Figure 5.5 Functional Layered Insurance Repositories vs. Decision

Support System

The graph in Figure 5.5 illustrates the decision made on the functional

layered insurance repositories to enhance the user demanded data extraction.

The capacity of decision making in IRM is high by about 15-20%

compared to FBPDM. The decision support system works in terms of

intelligence, design and choice. The intelligence finds the problem occurred in

insurance data mart and analyses the problem generated. The design involves

the formulation of solutions and deciding choice in decision making

implementation.

5.6 Efficient Decision Rule patterns

In data warehousing technology with the promising growth, a

massive expensive resource of information is persuading with functional

141

decision rules. Several decision rule formations are derived from the

insurance data mart. But with the traditional method, the number of

produced decision rules is incredible. The decision rule formation on the

basis of insurance repositories is depicted in a table below:-

Table 5.6 Tabulation for Decision Rule Pattern


Decision Rule Patterns (%) FBPDM IRM

1 61 69 2 52 59 3 42 47 4 36 41 5 33 39 6 25 30 7 19 28

The table 5.6 shows the decision rule pattern formation based on the

insurance repositories. Decision rule pattern based on the functional

layered insurance repositories is measured in terms of the percentage (%).

Based on the table 5.6 a graph is depicted as follows:-

142

Figure 5.6 Functional Layered Insurance Repositories vs. Decision

Rule Pattern

The graph in Figure 5.6 illustrates the decision rule pattern in the

functional layered insurance repositories. The decision rule pattern

formation is little high in IRM by about 5-10% compared to FBPDM as

decision rule are formed on the basis of decision making in the decision

support system.

5.7 Decision Rule Based on Functional Attribute

The decision rules convincing enormous costly resource of

information in data warehousing technology with the hopeful development.

The number of engendered decision rules from the insurance repositories

is incredible with decision making. In addition, IRM intend a diverse

strategy of inducing definite and probable decision rules. IRM derives a set

of decision rule based on the functional policy attribute in the insurance

143

data mart. The decision rule patterned with decision support system is

enhanced with decision rule based on the functional attribute.

Table 5.7 Functional layered Insurance Repositories Vs. Decision Rule


Decision Rule (%) FBPDM IRM

1 61 75 2 52 67 3 42 58 4 36 53 5 33 50 6 25 41 7 19 37

The table 5.7 shows the decision rule pattern formation based on the

insurance repositories. Decision rule is measured in terms of percentage

based on the Functional Layered Insurance Repositories. Based on the

table 5.7, a graph is depicted as follows:-

Figure 5.7 Functional Layered Insurance Repositories Vs. Decision Rule

144

The graph in Figure 5.7 elaborates the decision rule based on the

functional policy attribute in the functional layered insurance repositories.

The decision rule based on functional policy attribute is high in IRM

about 15-20% compared to FBPDM as IRM by solves the problem of

decisive analysis of the specified authorized insurance data in insurance

company data warehouse.

5.8 Rule Induction with Decision Rules

The rule induction process is considered as a policy data search

process in insurance data mart. The rule coverage metric is required to

determine the quality of decision rules established in the inductive rule

mining to direct the data extraction towards the best decision rule. The rule

quality measure is a mean element in rule induction.

In real-world applications like insurance company, a typical purpose of a

decision support system is to find decision rules. The decision rule determined

works in optimizing a rule quality standard to extract insurance data with both

training accuracy and rule coverage into account. So, the inductive rule mining

is both accurate and reliable. A quality measure of rule induction is determined

from the obtainable insurance data. All common measures are based on the

number of positive and negative instances covered by a rule induction.

145

Table 5.8 Tabulation for Rule Induction

Decision Rules Rule Induction (%)

FBPDM IRM 1 61 88 2 57 82 3 52 78 4 46 73 5 43 70 6 38 62 7 34 57

The table 5.8 shows the rule induction in directing the insurance data

extraction towards the best decision rule. The rule induction is measured in

terms of the percentage (%). Decision rule values vary from range 1 to 7.

Decision rules are generally used in classification and prediction of data obtained in the data mart. Decision rules are persuaded by posing query on any attribute.

Decision rule induction for extracting the user required information is invoked from the users query.

The inference out the result is that Query processing is performed. So, the supporting data structure is restructured with the tables and attribute's name mined from the query.

146

Based on the table (table 5.8) a graph is depicted as follows:-

Figure 5.8 Decision Rules Vs. Induction

The graph in Figure 5.8 elaborates the determination of best

decision rule based on the functional policy attribute in the functional

layered insurance repositories using rule induction. Inducing best

decision rule based on functional policy attribute using IRM is higher

by about 20-30% compared to FBPDM. The insurance company information

concerning customer requirement is stored in a table structure and the decision

rules are persuaded by posing query on policy attribute value.

5.9 Efficient Induction Rule Mining in Extracting User Demanded Data

The framework of decision rule in induction rule mining for extracting

the user required information is invoked by users query. After query

processing, the supporting data structure is restructured with the customer

147

profile tables and policy attribute's name mined from the query. The attribute

count is calculated by the column hit in the table form. The counter is varied in

descending order to set the most commonly used attribute in the first row.

The most commonly used attribute is referred by user queries.

Induction based decision rule algorithm selects the hit attribute at each

policy attribute in the functional layered insurance storage repositories. The

attribute with the highest hit is selected as similarity classes for the data

warehouse. For example, in accident insurance policy table, the hit attribute is

vehicle price and is referred by query ‘In table (Accident Insurance policy)

vehicle price ≥ 3lakhs’. Finally, the query results with the extraction of user

demanded data in the insurance data mart.

148

Table 5.9 Data Mart Vs. Extracting User Demanded Data

Data Mart Extracting User Demanded Data (seconds)

FBPDM IRM 1 64 53 2 67 56 3 70 59 4 75 62 5 82 66 6 87 68 7 92 72

The table 5.9 shows the induction rule mining in extracting user

demanded data in data mart. The formation of decision rules is recognized

by computing inductive rule mining process enhancing decision rule

induction algorithm. Using IRM scheme, the relevant inductive rules are

minimized. Extracting User Demanded Data is measured in terms of

seconds (secs) based on the data mart group.

Figure 5.9 Data Mart Vs. Extracting User Demanded Data

149

The graph in Figure 5.9 describes the user demanded data

extraction in data mart. IRM provides better extraction of user

demanded data by about 15-25% compared to FBPDM. As the highest

hit attribute reduces the information and the hit attribute provide the user

demanded data by categorize the attributes in the resulting partitions.

5.10 Functional Behavior Analysis

Functional Behavior depends on attribute relativity which specifies

the rate of the attributes present in the data mart, closely related with the

operational goal of organization.

Table 5.10 No. of Data Marts Vs. Analysis of functional behavior

Data Mart Analysis of functional behavior (%)

FBPDM IRM 1 5 10 2 8.4 15.3 3 12.5 19.2 4 15.7 21.5 5 18 24 6 22 28 7 25.6 30.5

Table 5.10 describes the functional behavioral analysis of the data

mart and the outcome of the proposed FBP for data mart is compared with

an existing multi-functional model in DW using attribute relativity.

The analysis of functional behavior based on the data mart is measured in

terms of percentage.

150

Fig. 5.10 No. of Data Marts Vs. Analysis of functional behavior

The Figure 5.10 describes the efficiency of analysis of functional

behavior of the data mart. The proposed Functional Behavior Pattern

chosen the relevant attributes based on the operational goal of the data

mart is high. Since the relativity of attribute is high, the efficiency of the

proposed FBP for data mart is also high.

Existing multi-functional data model does not analyze the functional

behavior of data mart but the proposed FBP does. Based on the number of

data marts, the analysis of functional behavior in the data mart is

45 – 50 % high in the proposed FBP for data mart contrast to an existing

multi-functional model in DW.

5.11 Performance result of Memory Consumption

Memory Consumption is the amount of space taken to store the data

in the Data Warehouse. Percentage of total memory contribution to a

mechanism is called memory consumption rate. Memory Consumption

151

objective is to decrease the amount of energy required to offer quality data

retrieval from the data warehouse.

Table 5.11 Transaction Density Vs. Memory consumption

Transaction Density Memory consumption (MB)

FBPDM IRM 10 10 8 20 14 13 30 16 15 40 20 18 50 24 20 60 26 22 70 29 26 80 34 30

Table 5.11 described memory consumption of IRM and FBPDM

model. The outcome of the proposed IRM for data warehouse is compared

with an existing system based on the transaction density. Transaction density also called as transaction refers to an item

set in each data. For a pair of an item set and a transaction set, the density is defined by the average number of items included in a transaction.

In particular data transacted with limitation of time.

Transaction density = Number of transaction data * Time (ms)

152

Memory Consumption is measured in terms of Mega Bytes (MB).

Fig 5.11 Transaction Density Vs. Memory consumption

The Figure 5.11 describes the memory consumption of data from data

warehouse for analyzing the functional behavior of user needed information.

As the transaction density increases, the memory usage is decreased gradually.

Based on transaction density, memory consumption of IRM is 25 - 30 %

less when compared to the FBPDM model. Higher the transaction, lower the

memory consumption of data mart. The IRM consumes lesser memory for

storage of information in data mart. It is measured in terms of Mega Bytes (MB).

5.12 System Response Time

The interval between the instant at which an operator at a terminal

enters a request for a response from a data warehouse and first data of

153

response received to the terminal is called the system response time. It is

measured in terms of milliseconds (ms).

Table 5.12 No. of users Vs. System Response Time

No. of users System Response Time (ms)

FBPDM IRM 5 1450 789

10 1750 1010 15 2100 1250 20 2560 1350 25 2890 1650 30 3250 1850 35 3680 2010

The system response time in terms of milliseconds are tabulated in

table 5.12. The system response time is measured based on number of

users. The results are compared with FBPDM and IRM.

Fig 5.12 No. of users Vs. System Response Time

154

The performance graph of system response time based on user is

measured in IRM model. The user requested information in the data mart

is extracted based on the decisive rules formed accurately. Even though

the number of users increases, the system response time of the data mart

is less in IRM. The response time taken is lesser when compared with the

FBPDM. The variance in the system response time of IRM would be

50 – 60 % lesser response time taken when compared with the

FBPDM.

5.13 Execution Time Evaluation

Execution time is when a program is running (or being executable).

That is, when you start a program running, it is runtime for that program.

The execution time is defined as the time taken to transfer the data from

the source to destination in data mart using IRM.

155

Amount of time taken to request send Execution Time =

Total no. of users in data mart

Table 5.13 Tabulation of No. of requests sent Vs. Execution Time

No. of requests sent Execution Time (seconds)

FBPDM IRM 1 45 18 2 80 30 3 125 48 4 175 60 5 210 82 6 255 101 7 310 122

The table 5.13 describes the execution time in FBPDM and IRM model.

The execution time is measured in terms of seconds.

Fig 5.13 No. of requests sent Vs. Execution Time

156

Fig 5.13 demonstrates the performance of the execution in IRM.

As the request from the users in distributed system increases, IRM

technique execution time decreases gradually. The result differs based on

the number of request send to the users in data mart. The variance is

approximately 30 - 40% lesser in terms of time when compared to the

FBPDM model.

The IRM concludes efficient work on extracting user demanded data in

data mart using decision rule induction algorithm.

Evaluations are done with Insurance Company Benchmark (COIL

2000) data Set from UCI repository data sets using the decisive rules based

on the decision rule induction algorithm on the functional layered insurance

repositories. The user demanded information is extracted in a reliable

manner. 5.14 Summary

The objective of Inductive rule mining (IRM) is that it improves

the decision making while retrieving data from data mart. IRM

facilitates better understanding of the structure for efficient

extraction of data from the warehouse. The significant part of IRM is

that it ensures efficient extraction of information based on users

query. As a result, efficient processing of range queries on table

ensures data access. Finally, IRM extracts the user demanded

information in a reliable manner.

157

IRM provides better efficient extraction of user demanded data

by about 15-25% compared to FBPDM.

The decision rule based on functional policy attribute is high in IRM about 15-20% compared to FBPDM as IRM by solves the problem of decisive analysis of the specified authorized data in data warehouse.

The following paragraphs reveal performance analysis on extracting

user demanded data in functional layered data mart based on induction

rule mining.

1) A high attribute relativity of about 25-35% is seen in the FBPDM

compared to multi-functional model in data warehousing.

2) The functional behavior of insurance data mart with various attribute

relativity is identified and variance is 25-45% compared to MFDWM.

3) The data storage is reduced in FBPDM increasing data accessibility

by about 10-20% compared to MFDWM.

4) Based on the number of attribute relativity, the rapid data retrieval from

data mart is high by about 10-20% in the FBPDM in contrast to multi-

functional model in data warehouse.

5) The capacity of decision making in IRM is high by about 15-20%

compared to FBPDM.

6) The decision rule pattern formation is little high in IRM by about 5-10%

compared to FBPDM as decision rule are formed on the basis of

decision making in the decision support system.

158

7) The decision rule based on functional policy attribute is high in IRM

about 15-20% compared to FBPDM as IRM by solves the problem of

decisive analysis of the specified authorized insurance data in

insurance company data warehouse.

8) Inducing best decision rule based on functional policy attribute using

IRM is higher by about 20-30% compared to FBPDM.

9) IRM provides better efficient extraction of user demanded data by

about 15-25% compared to FBPDM.

10) Based on the number of data marts, the analysis of functional behavior

in the data mart is 45 – 50 % high in the proposed FBP for data mart

contrast to an existing multi-functional model in DW.

11) Based on transaction density, memory consumption of IRM is 25 - 30 %

less when compared to the FBPDM model.

12) The variance in the system response time of IRM would be

50 – 60 % lesser response time taken when compared with the

FBPDM.

13) Based on the number of request send to the users in data mart, the

variance is approximately 30 - 40% lesser in terms of time when

compared to the FBPDM model.

14) The functional behavior of the corporate system is analyzed to

build layers of data storage repositories with relevant data

attributes. Inductive rule mining (IRM) improves the decision

159

making in retrieving data from data mart. Based on the formation

of inductive rules, the user demanded information is extracted

from the insurance company repositories.

160

6. CONCLUSION AND FUTURE WORK

6.1 Conclusion

A data warehouse is a large database used for revealing and

examining the data in the database. The data collected in the warehouse

are updated from the operational systems such as market place. The data

surpass throughout an operational data stored for further operations.

The data in the data warehouse are utilized to report the information.

A data warehouse generated from integrated data source systems does

not need (ETL) Extract, Transform and Load process in producing

databases or operational data store databases. Data warehouses are

subdivided into data marts. Data marts store the data in subsets from

warehouse.

Functional behavior pattern in data mart (FBPDM) based on attribute

relativity analyze the functional behavior. The Functional behavior

identified the relevant attributes of the exacting data mart. The attribute

relativity is selected based on the complete report of data mart obtaining

operational goal. The Functional behavior pattern in data mart achieved

the best analysis of functional behavioral of the data mart depending on

attribute relativity. The data mart is finally designed efficiently with data

source of user needs, fast data retrieval on specifying attributes, data

broadcasting and fast storing of data into data mart.

161

The effective functional behavioral pattern in data mart is utilized and

processed in the analysis of data. Inductive rule mining (IRM) efficiently

retrieves the user demanded data in data mart based on attribute relativity.

The IRM properly used inductive rule mining for handling the user demanded

information. The decision support is efficiently carried out with inductive rule

mining on functional data marts segregation of the layered data repository.

The framework of decision rule induction used users query to extract user

demanded information. Based on the query processing, the supporting data

structure is restructured with the tables and attribute's name. The decision

support system achieved reliable data extraction from data warehouse.

Experimental results showed that the functional behavioral pattern

for data mart and Inductive rule mining in data mart are analyzed efficiently.

Performance of functional behavior pattern and inductive rule mining is

measured in terms of attribute relativity, functional behavior analysis,

decisive rule formation, data retrieval by about 10% to 15% in

comparison to multi-functional data warehousing model.

Finally, the IRM concludes efficient work on extracting user

demanded data in data mart using decision rule induction algorithm.

Evaluations are done with Insurance Company Benchmark (COIL 2000)

data Set from UCI repository data sets using the decision rule induction

algorithm on the functional layered insurance repositories and the user

demanded information is extracted in a reliable manner.

162

6.2 Future Work

Functional behavior of the data mart is analyzed but more time

consuming. In addition, functional behavior faces the possible dangerous

in problem behavior. The issue in functional behavior problem is considered

as future direction in addition to time minimization factor with the

establishment of functional training communication. The purpose of

Functional Communication Training is to understand each enterprise

communication behaviors as a replacement for functional behavior.

Additionally, functional training communication is able to consume time.

Data extractions are very complex in some subject areas and drop

out of the better data retrieval because the implementation is through

function modules. To still work in compliance with the data extraction, the

enterprise content data sources are loaded on a one-to-one basis into the

data mart and are further processed. But, no field extension is done in the

source system of data warehouse any longer. The data repository layer is

planned with enterprise content data sources to extend the better

performance in user required data extraction. The extraction process is

simple with the establishment of inductive rule mining. The difficulty arises

if the preferred data item is visible but out of reach. In future, a novel

technique is planned to extract the unreachable items in data mart.

163

BIBLIOGRAPHY

1. Polyzotis, Skiadopoulos, Panos Vassiliadis, Alkis Simitsis, Nils-Erik

Frantzell, “Meshing Streaming Updates with Persistent Data in an

Active Data Warehouse”, IEEE Transactions on Knowledge and Data

Engineering, Vol. 5, Issue No. 7, pp. 123-135, May 2008.

2. Juan Manuel Pe´ rez, Rafael Berlanga, Marı´a Jose´ Aramburu, and

Torben Bach Pedersen, Member, IEEE, “Integrating Data Warehouses

with Web Data: A Survey”, IEEE Transactions on Knowledge and

Data Engineering, Vehic. Technology, Soc, News pp.133-145,

September 2008.

3. Y. Tao, M. Yiu, D. Papadias, M. Hadjieleftheriou, and N. Mamoulis,

“RPJ: Producing fast join results on streams through rate-based

optimization”, Pentice Hall, 2005.

4. Desmarais. M.C, “Web log session analyzer: integrating parsing and logic

programming into a data mart architecture”, Web Intelligence, 2005 at

http://www.webintelligence.

5. Luca Cabibbo, Ivan Panella, and Riccardo Torlone, “DaWaII: a Tool

for the Integration of Autonomous Data Marts”, Proceedings of the

22nd International Conference on Data Engineering, in proceedings of

IEEE ICC, 2006.

http://ieeexplore.ieee.org/search/searchresult.jsp?searchWithin=p_Authors:.QT.Desmarais,%20M.C..QT.&newsearch=partialPref

164

6. L. Leonardi, S. Orlando, A. Raffaetà, A. Roncato, C. Silvestri, “Frequent

Spatio-Temporal Patterns in Trajectory Data Warehouses”, Pentice Hall

press, 2009.

7. Matteo Golfarelli, Stefano Rizzi, “A Comprehensive Approach to Data

Warehouse Testing”, Kluwer Academic Publishers, 2009.

8. E. Malinowski., E. Zimanyi, “A conceptual model for temporal data

warehouses and its transformation to the ER and the object-relational

models”, Elsevier-2008.

9. Arvind Selwa., “A Decision Support System for Village Economy

Development Planning” in proceedings of 8th Annual conference,

Delhi, 2007.

10. Alejandro Vaisman, Esteban Zimányi, “A Multidimensional Model

Representing Continuous Fields in Spatial Data Warehouses”, Nova

Science Publishers, 2009.

11. Abhirup Chakraborty, Ajit Singh, “A Partition-based Approach to

Support Streaming Updates over Persistent Data in an Active

DataWarehouse”, 2009.

12. V.Mallikarjuna Reddy, S.K.Jena, M.Nageswara Rao, “Active

Datawarehouse Loading by GUI Based ETL Procedure”, International

Conference on Computational Intelligence Applications, 2010.

165

13. Vijaya Bhaskar Velpula, Dayanandam Gudipudi, “Behavior-Anomaly-

Based System for Detecting Insider Attacks and Data Mining”,

International Journal of Recent Trends in Engineering, Vol. 1, No. 2,

May 2009.

14. Suneetha K.R, Dr. R. Krishnamoorthi, “Data Preprocessing and Easy

Access Retrieval of Data through Data Ware House”, Proceedings of

the World Congress on Engineering and Computer Science 2009,

October 2009.

15. Wenjing Zhang, Member, IEEE and Xin Feng, Senior Member, IEEE,

“Event Characterization and Prediction Based on Temporal Patterns

in Dynamic Data System”, Vol. 51, No. 7, pp. 115-125, September

2013.

16. Salvatore T. March, Alan R. Hevner, “Integrated decision support

systems: A data warehousing perspective”, 2007.

17. Alex Tze Hiang Sim, Maria Indrawan, Samar Zutshi, Member, IEEE,

and Bala Srinivasan, “Logic-Based Pattern Discovery”, IEEE

Transactions on Knowledge And Data Engineering, Vol. 22, No. 6,

June 2010.

18. Victoria Nebot, Rafael Berlanga, Juan Manuel Pérez, María José

Aramburu Torben Bach Pedersen, “Multidimensional Integrated

Ontologies: A Framework For Designing Semantic Data Warehouses”,

2009.

166

19. Maik Thiele Andreas Bader Wolfgang Lehner, “Multi-Objective

Scheduling for Real-Time Data Warehouses”, Kluwer Academic

Publishers, pp 62-633, 2009.

20. Janis Zuters, “Near Real-time Data Warehousing with Multi-stage

Trickle & Flip”, 2011.

21. B.Ashadevi and Dr.R.Balasubramanian, “Optimized Cost Effective

Approach for Selection of Materialized Views in Data Warehousing”,

Vol. 9 No. 1 April 2009.

22. Christian Thomsen, Torben Bach Pedersen, Wolfgang Lehner, “RiTE:

Providing On-Demand Data for Right-Time Data Warehousing”, 2008.

23. P. Urmila, K.Siva Rama Krishna, P. Raja Prakash Rao, “Scheduling of

Updates In Data Warehouses”, International Journal of Advanced

Computer and Mathematical Sciences, Vol 3, Issue 3, 2012.

24. Mohan Raj. A, M. N. Sushmitha, “Updates in Streaming Data

Warehouses by Scalable Scheduling”, International Journal of

Science and Research (IJSR), Volume 2 Issue 3, March 2013.

25. Liang Liu, Clio Andris, Carlo Ratti, “Uncovering cabdrivers’ behavior

patterns from their digital traces”, Computers, Environment and Urban

Systems 2010.

26. Alexandros Karakasidis, Panos Vassiliadis, Evaggelia Pitoura, “ETL

Queues for Active Data Warehousing”, 2005.

167

27. Paolo Giorgini, Stefano Rizzi, Maddalena Garzetti, “GRAnD: A Goal-

Oriented Approach to Requirement Analysis in Data Warehouses”,

2008.

28. Mohammad Hossein Bateni, Lukasz Golab, MohammadTaghi,

Hajiaghayi Howard Karloff, “Scheduling to Minimize Staleness and

Stretch in Real-Time Data Warehouses”, 2009.

29. Nittaya Kerdprasop, Narin Muenrat, and Kittisak Kerdprasop,

“Decision Rule Induction in a Learning Content Management System”,

World Academy of Science, Engineering and Technology, 2:2 2008.

30. Y.Y. Yao., “On Generalizing Rough Set Theory”, Kluwer Academic

Publishers, 2003.

31. Poongothai. K and S. Sathiyabama, “Efficient Web Usage Miner Using

Decisive Induction Rules”, Journal of Computer Science 8 (6):,835-

840, 2012, ISSN 1549-3636.

32. Shaoxu Song, Student Member, IEEE, Lei Chen, Member, IEEE, and

Jeffrey Xu Yu, Senior Member, IEEE, “Answering Frequent

Probabilistic Inference Queries in Databases”, IEEE Transactions On

Knowledge And Data Engineering, Vol. 23, No. 4, April 2011.

33. Man Lung Yiu, Ira Assent, Christian S. Jensen, Fellow, IEEE, and

Panos Kalnis, “Outsourced Similarity Search on Metric Data Assets”,

IEEE Transactions On Knowledge And Data Engineering, Vol. 24,

No. 2, February 2012.

168

34. Theoni Pitoura, Member, IEEE, Nikos Ntarmos, and Peter

Triantafillou, “Saturn: Range Queries, Load Balancing and Fault

Tolerance in DHT Data Systems”, IEEE Transactions on Knowledge

And Data Engineering, Vol. 24, No. 7, July 2012.

35. Sang Wan Lee, Yong Soo Kim, and Zeungnam Bien, Fellow, IEEE,

“A Nonsupervised Learning Framework of Human Behavior Patterns

Based on Sequential Actions”, IEEE Transactions on Knowledge And

Data Engineering, Vol. 22, No. 4, April 2010.

36. Degang Chen, Lei Zhang, Suyun Zhao, Qinghua Hu, and Pengfei

Zhu, “A Novel Algorithm for Finding Reducts with Fuzzy Rough Sets”,

2011.

37. Suyun Zhao, Student Member, IEEE, Eric C.C. Tsang, Degang Chen,

and XiZhao Wang, Senior Member, IEEE, “Building a Rule-Based

Classifier—A Fuzzy-Rough Set Approach”, IEEE Transactions On

Knowledge And Data Engineering, Vol. 22, NO. 5, May 2010.

38. Victoria Nebot, Rafael Berlanga, “Building Data Warehouses with

Semantic Web Data”, 2011.

39. Mouhamed gaith ayadi, Riadh bouslimi, jalel akaichi, “Modeling a Data

Warehouse of Annotation of Medical Images using UML profile”,

International Journal of Advanced Research in Computer Science and

Software Engineering, Volume 3, Issue 5, May 2013.

169

40. Jerzy Błaszczyn ski, Salvatore Greco, Roman Słowin ski, Marcin Szel,

“Monotonic Variable Consistency Rough Set Approaches”,

International Journal of Approximate Reasoning, 2009.

41. Anjana Gosain, Suman Mann, “Object Oriented Multidimensional

Model for a Data Warehouse with Operators”, International Journal of

Database Theory and Application, Vol. 3, No. 4, December, 2010.

42. Nayem Rahma, Jessica Marz and Shameem Akhter, “An ETL

Metadata Model for Data Warehousing”, Journal of Computing and

Information Technology - 2012.

43. Jing Li, Yin David Yang, and Nikos Mamoulis, “Optimal Route Queries

with Arbitrary Order Constraints”, IEEE Transactions on Knowledge

And Data Engineering, Vol. 25, No. 5, May 2013.

44. Salvatore Greco, Benedetto Matarazzo, Roman Słowin ski,

“Parameterized rough set model using rough membership and

Bayesian confirmation measures”, International Journal of

Approximate Reasoning, 2008.

45. Jianxin Li, Chengfei Liu, Rui Zhou and Jeffrey Xu Yu, Member, IEEE,

“Quasi-SLCA based Keyword Query Processing over Probabilistic

XML Data”, 2013.

46. Jerzy Błaszczynskia, Roman Słowinski, Marcin Szelaga, “Sequential

Covering Rule Induction Algorithm for Variable Consistency Rough

Set Approaches”, 2010.

170

47. Mirela danubianu, Candidate tiberiu socaciu, Candidate adina barila,

“Some Aspects Of Data Warehousing In Tourism Industry”, Volume 9,

No.1, 2009.

48. Esteban Zim´anyi, “Spatio-Temporal Data Warehouses and Mobility

Data: Current Status and Research Issues”, 2012.

49. Chin-Ang Wu, Wen-Yang Lin, Chang-Long Jiang, Chuan-Chun Wu,

“Toward Intelligent Data Warehouse Mining: An Ontology-Integrated

Approach for Multi-Dimensional Association Mining”, 2011.

50. Alejandro Vaisman and Esteban Zimanyi, “What is Spatio-Temporal

Data Warehousing? ”, 2008.

51. Barjesh Kochar and Rajender Chhillar, “An Effective Data

Warehousing System for RFID Using Novel Data Cleaning, Data

Transformation and Loading Techniques”, The International Arab

Journal of Information Technology, Vol. 9, No. 3, May 2012.

52. Jin Soung Yoo, Member, IEEE, and Shashi Shekhar, Fellow, IEEE,

“Similarity-Profiled Temporal Association Mining”, IEEE Transactions

on Knowledge and Data Engineering, Vol. 21, No. 8, August 2009.

53. Yuhua Qian, Chuangyin Dang, Jiye Liang, Dawei Tang, “Set-valued

ordered information systems”, Elsevier Science Direct on Information

Science, 2009.

171

54. Jung-Yi Jiang, Ren-Jia Liou, and Shie-Jue Lee, “A Fuzzy Self-

Constructing Feature Clustering Algorithm for Text Classification”,

IEEE Transactions on Knowledge and Data Engineering, Vol. 23,

No. 3, March 2011

55. Yun Yang, and Ke Chen, “Temporal Data Clustering via Weighted

Clustering Ensemble with Different Representations”, IEEE Transactions

on Knowledge and Data Engineering, Vol. 23, No. 2, February 2011

56. Rahmat Widia Sembiring, Jasni Mohamad Zain, Abdullah Embong,

“Clustering High Dimensional Data Using Subspace and Projected

Clustering Algorithms”, International journal of computer science &

information Technology (IJCSIT) Vol.2, No.4, DOI: 10.5121/ijcsit.2010.

2414, August 2010

57. Hans-Peter Kriegel, Peer Kroger, and Arthur Zimek, “Clustering

High-Dimensional Data: A Survey on Subspace Clustering, Pattern-

Based Clustering, and Correlation Clustering,” ACM Transactions on

Knowledge Discovery from Data, Vol. 3, No. 1, Article, Publication

date: March 2009.

58. Satish Gajawada, and Durga Toshniwal, “Vinayaka: A Semi-

Supervised Projected Clustering Method Using Differential Evolution”,

International Journal of Software Engineering & Applications (IJSEA),

Vol.3, No.4, DOI : 10.5121/ijsea.2012.3406 , July 2012.

172

59. Yanming Nie, Richard Cocci, Zhao Cao, Yanlei Diao, and Prashant

Shenoy, “SPIRE: Efficient Data Inference and Compression over

RFID Streams”, IEEE Transactions on Knowledge and Data

Engineering, Vol. 24, NO. 1, January 2012

60. Deng Cai, Chiyuan Zhang, Xiaofei He, “Unsupervised Feature

Selection for Multi-Cluster Data”, ACM journal, 2010.

61. Mohamed Bouguessa, and Shengrui Wang, “Mining Projected

Clusters in High-Dimensional Spaces”, IEEE Transactions on

Knowledge and Data Engineering, Vol. 21, No. 4, APRIL 2009

62. Emmanuel Muller, Stephan Gunnemann, Ira Assent, Thomas Seidl,

“Evaluating Clustering in Subspace Projections of High Dimensional

Data”, ACM Journal, 2009

63. Yogendra Kumar Jain. et al., “An Efficient Association Rule Hiding

Algorithm for Privacy Preserving Data Mining”, International Journal

on Computer Science and Engineering (IJCSE), 2011

64. K.Venkateswara Rao, A.Govardhan, and K.V.Chalapati Rao,

“Spatiotemporal Data Mining: Issues, Tasks and Applications”,

International Journal of Computer Science & Engineering Survey

(IJCSES) Vol.3, No.1, DOI : 10.5121/ijcses.2012. 3104 39, February

2012.

173

65. Man Lung Yiu, Ira Assent, Christian S. Jensen, and Panos Kalnis,

“Outsourced Similarity Search on Metric Data Assets”, IEEE

Transactions on Knowledge and Data Engineering, Vol. 24, No. 2,

February 2012.

66. L.V.S.S.Swarupa Penmetsa, Ch. Raja Ramesh, “Anonymization of

the Sequential Patterns in Location Based Service Environments”,

International Journal of Computer Technology & Research, IJCTR,

ISSN 2319-8184, Vol 1 Issue 1, October 2012.

67. Marco Muselli., and Enrico Ferrari., “Coupling Logical Analysis of Data

and Shadow Clustering for Partially Defined Positive Boolean

Function Reconstruction”, IEEE Transactions on Knowledge and Data

Engineering, Vol. 23, No. 1, January 2011.

68. Liang Wang, Christopher Leckie, Kotagiri Ramamohanarao, and

James Bezdek., “Automatically Determining the Number of Clusters in

Unlabeled Data Sets”, IEEE Transactions on Knowledge and Data

Engineering, Vol. 21, No. 3, March 2009.

69. Ning Zhong, Yuefeng Li, and Sheng-Tang Wu, “Effective Pattern

Discovery for Text Mining”, IEEE Transactions on Knowledge and

Data Engineering, Vol. 24, No. 1, January 2012.

70. Sharadh Ramaswamy, and Kenneth Rose, “Adaptive Cluster Distance

Bounding for High-Dimensional Indexing”, IEEE Transactions on

Knowledge and Data Engineering, Vol. 23, No. 6, June 2011.

174

71. Nan Zhang, Wei Zhao, “Privacy-Preserving OLAP: An Information-

Theoretic Approach”, IEEE Transactions on Knowledge and Data

Engineering, 2010.

72. Archana Tomar, Vineet Richhariya, Mahendra Ku. Mishra,

“A Improved Privacy Preserving Algorithm Using Association Rule

Mining In Centralized Database”, International Journal of Advanced

Technology & Engineering Research (IJATER), 2012.

73. N V Muthu Lakshmi, and Dr. K Sandhya Ran, “Privacy Preserving

Association Rule Mining Without Trusted Party For Horizontally

Partitioned Databases”, International Journal of Data Mining &

Knowledge Management Process (IJDKP) Vol.2, No.2, March 2012.

74. Tiancheng Li, Ninghui Li, Jian Zhang, Ian Molloy, “Slicing: A New

Approach to Privacy Preserving Data Publishing”, IEEE Transactions

on Knowledge and Data Engineering, volume: 24, Issue:3, 2012.

75. Eric Hsueh-Chan Lu, Vincent S. Tseng, and Philip S. Yu, “Mining

Cluster-Based Temporal Mobile Sequential Patterns in Location-

Based Service Environments”, IEEE Transactions on Knowledge and

Data Engineering, Vol. 23, No. 6, June 2011.

76. Kyriacos E. Pavlou, and Richard T. Snodgrass, “The Tiled Bitmap

Forensic Analysis Algorithm”, IEEE Transactions on Knowledge and

Data Engineering, Vol. 22, No. 4, April 2010.

175

77. Jyoti Jadhav, Lata Ragha, Vijay Katkar, “Incremental Frequent Pattern

Mining”, International Journal of Engineering and Advanced

Technology (IJEAT) ISSN: 2249 – 8958, Volume-1, Issue-6, August

2012.

78. Donghun Lee, Sang K. Cha, and Arthur H. Lee, “A Performance

Anomaly Detection and Analysis Framework for DBMS Development”,

IEEE Transactions on Knowledge and Data Engineering, Vol. 24,

No. 8, August 2012.

79. Foster. I; and Keselman. C., The Grid: Blue print for a New Computing

Infrastructure, San Francisco: Morgan Kaufimann Publishers, 1999.

176

PUBLICATIONS

Publications

International Journals

1. “Functional Behavior Pattern For Data Mart Based on Attribute

Relativity”, in the International Journal of Computer Science Issues

on Volume 9, Issue 4, No.1, July 2012, pp. 278-283

2. “Mining User Demanded Data From Data Mart Using Inductive

Rule Mining”, in the International Journal of Computer Science Issues

on Volume 9, Issue 4, No.1, July 2012, pp.405-410

National Journals

3. “Performance Evaluation of Functional Behavior Pattern” in the

National Journal of Engineering Today on Volume VII, May 2010,

pp.11-16.

performance analysis for data martvinayakamission.com/userfiles/phd/o846900004.pdf · data...

Documents