performance analysis for data martvinayakamission.com/userfiles/phd/o846900004.pdf · data...
TRANSCRIPT
1
PERFORMANCE ANALYSIS FOR DATA MART
Thesis submitted in partial fulfillment
for the award of the Degree of
Doctor of Philosophy in Computer Science
by
M. PAULRAJ
Under the Guidance and Supervision of
Dr. P. SIVAPRAKASAM
VINAYAKA MISSIONS UNIVERSITY SALEM, TAMILNADU, INDIA
MARCH 2014
ii
DECLARATION
I, M. PAULRAJ hereby declare that the thesis entitled
“PERFORMANCE ANALYSIS FOR DATA MART” submitted by me for
the Degree of DOCTOR OF PHILOSOPHY IN COMPUTER SCIENCE is a
research work carried by me during 2008 - 2014 under the guidance and
supervision of Dr. P. SIVAPRAKASAM, Associate Professor, Sri Vasavi
College, Erode and that this work has not formed the basis for the award of
any degree, diploma, associate-ship, fellowship titles in this or any other
University or other similar Institutions of higher learning.
Station : Signature of the Candidate Date : (M. PAULRAJ)
iii
CERTIFICATE I, Dr. P. SIVAPRAKASAM, Associate Professor, Sri Vasavi College,
Erode certify that the thesis entitled “PERFORMANCE ANALYSIS FOR DATA MART” submitted for the Degree of DOCTOR OF PHILOSOPHY IN
COMPUTER SCIENCE by M. PAULRAJ is the record of research work
carried out by him during the period of 2008 - 2014 under my guidance and
supervision and that this work has not formed the basis for the award of
any degree, diploma, associate-ship, fellowship or other titles in this or any
other University or other similar Institutions of higher learning.
Station : Signature of the Supervisor Date :
(Dr. P. SIVAPRAKASAM)
iv
ACKNOWLEDGEMENTS
Firstly, the omnipresent God for answering my prayers and giving
me the strength to plod on despite my constitution. This dissertation would
not have been possible without the guidance, help and support of the kind
people around me who, in one way or another contributed and extended
their valuable assistance in the preparation and completion of this study.
I would like to express my deepest gratitude to my research supervisor
Dr. P. Sivaprakasam, Associate Professor, Sri Vasavi College, for his
excellent guidance, motivation and having allowed me complete freedom to
define and explore my own directions in research. His high standards for
research inspired me to do better work than I would have thought I was
capable of.
My heartfelt thanks to Dean of Academics and all faculty members of
Vinayaka Missions University, Salem.
I would like to thank the Management and Principal of Sri Vasavi
College, Erode, for providing me the Lab, Academic and Technical
support.
I wish to thank my family members for the enormous amount of
encouragement, wishes, prayers and kind words which paved the way for
my success. My special thanks to my wife Gloria Mary and my daughters
Nivetha P Raj and Niranjana P Raj for their support and encouraging me
with their best wishes and prayer.
v
ACRONYMS
FLIDMA Functional Layer Interfaced Data Mart Architecture
UCI Unique Client Identifier
TDW Trajectory Data Warehouse
FBP Functional Behavior Pattern
OLAP On-Line Analytical Processing
OLTP On-Line Transaction Processing
DTDI Data mart Transition Detection using Interface
IRM Inductive Rule Mining
FBPDM Functional Behavior Pattern in Data Mart
MFDWM Multi-Functional Data Warehousing Model
LDR Layered Data Repository
FBA Functional Behavior Analyses
FDRL Functional Data Repository Layers
CTP Clique Tree Propagation
ETL Extract, Transformation and Loading
DSS Decision Support System
IRIS Identity Risk and Investigation Solution
IRSA Indiscernibility-based Rough Set Approaches
DRSA Dominance-based Rough Set Approaches
GUI Graphics User Interface
PI Probabilistic Inverted
FRS Fuzzy-Rough Set
DHT Distributed Hash Tables
AIBFC Agglomerative Iterative Bayesian Fuzzy Clustering
FSQL Fuzzy-state Q-learning
vi
ABSTRACT
The rising challenge in storage of large volume of data for business
and corporate environment increase the need of data warehousing. Data
warehousing gathers the data at various levels like departmental,
operational, and functional. Data are stored as a group of data repository
with better storage efficiency.
Numerous data warehousing models focus on stacking the data
more efficiently and rapidly. Existing models in data warehouse proceeds
with construction of data mart but fails in the analysis of functional
behavior.
Additionally extraction of data from the warehouse requires better
understanding of the data repository structure. But function requirements of
users are not easily understood by the data warehouse model. Data
warehouse model requires a better technique to extract the user
demanded data from data warehouse.
The issue of functional decision support system to extract user
relevant data is handled by introducing data marts concept. Data mart is
planned in a way to built separate functional data repository layers based
on the departmental decision support requirements in the business and
corporate environments.
The merits of creating data mart is as follows:
Data mart access regularly desired data easily
Generates combined view by a group of users
Enhances end-user response time
Lesser cost than employing a complete data warehouse
To offer a better decision support system, a “Functional Layer Interfaced Data Mart Architecture” is proposed for larger corporate and
enterprise data applications.
vii
The research work involves the analyses of functional behavior of
the corporate system based on data mart operational goal. The objective of
functional behavior analyses is to construct layers of data storage
repositories with relevant data attributes using functional behavior pattern
in data mart (FBPDM). The work of functional behavior pattern is to identify
the functional activities of the data mart based on attribute relativity.
The required user demanded data is extracted from data mart with
the establishment of efficient decision support system. The new technique
of Inductive rule mining (IRM) is proposed to enhance the decision support
system for the extraction of user demanded data in data mart.
The purpose of decision support with efficient inductive rule mining
on functional data marts is to segregate the layered data repository and
extract the required information for the user. The new techniques of
induced rules facilitate the supportive knowledge for identifying the user
required information.
The performance of efficient functional behavior pattern and well
defined inductive rule mining is measured in terms of attribute relativity,
functional behavior analysis, decisive rule formation, data retrieval by about
10% to 15% in comparison to multi-functional data warehousing model.
The objective of the performance analysis of data mart in functional
layered repositories with efficient induction rule mining is to provide an
experiential and scientific foundation in extracting user demanded data
from data mart. The effectiveness of the IRM mechanism is finalized with
benchmark datasets from UCI repository on varying characteristics.
The performance metrics of IRM are functional behavior analysis,
attribute relativity, decisive rule formation and efficient data retrieval.
Evaluation of functional behavior in data mart and inductive rule mining in
extracting user demanded data is performed with Insurance Company
Benchmark (COIL 2000) data Set from UCI Repository data sets.
viii
The functional behavior of the corporate system is analyzed to build
layers of data storage repositories with relevant data attributes. Inductive
rule mining (IRM) improves the decision making in retrieving data from
data mart. Based on the formation of inductive rules, the user demanded
information is extracted from the insurance company repositories.
ix
TABLE OF CONTENTS
Page No.
CHAPTER 1 INTRODUCTION
1.1 Background 1
1.2 Statement of the Problem 2
1.3 Date Warehousing and Data Mart 3
1.3.1 Basic Concepts in Data Warehouse Technology 4
1.3.2 Data Mart Development and User Feedback 5
1.3.3 Dynamics of Data Mart Development 16
1.4 Decision Making in Data Mart 17
1.4.1 Multi Agents (MA) 18
1.4.2 Report Visualization Agent 19
1.4.3 ETL (Extraction, Transformation and Load) 19
1.4.4 Knowledge Base 19
1.4.5 Decision Making 20
1.5 Inductive Rule Mining 20
1.6 Motivation and Goal of the Study 22
1.7 Organization of the Remainder Study 23
CHAPTER 2 LITERATURE REVIEW
2.1 Introduction 24
2.2
Functional Behavior Pattern for Data Mart Based on attribute Relativity 2.2.1 Behavior patterns of Human
2.2.2 Temporal patterns Evaluation
2.2.3 Real Time Warehouse
2.2.4 Real Time environments
24
26 27
29
31
x
2.3 Mining User Demanded Data from Data Mart Using Inductive Rule Mining 2.3.1 Attribute Reduction 2.3.2 Consistent Metadata and Data Warehouse 2.3.3 Traditional Warehouse and Up-to-date Infor
41
44 50 54
2.4 Research Gap 55
2.5 Contribution of thesis 56
CHAPTER 3 PROPOSED METHODOLOGY - FUNCTIONAL
BEHAVIOUR PATTERN IN DATA MART
3.1 Introduction 3.1.1. Proposed Functional Behavior Pattern with
new Technique
58
60
3.2 Process of Data Mart in Data Warehousing 61
3.3 Impact of Data Warehouse Requirements in functional Behavior
63
3.4 An Efficient Functional Behavior Pattern of Data Mart based on attribute Relativity
72
3.4.1 Process of Data Mart 73
3.4.2 Efficient Analysis for Identifying the functional behavior pattern
77
3.5 Experimental Evaluation 78
3.6 Results and Discussion 80
3.6.1 Data Retrieval 80
3.6.2 Attribute Relativity 83
3.6.3 Functional Behavior Analysis 85
3.6.4 Data Storage Repositories 86
3.6.5 Response Time 88
3.6.6 Data Mart Management 89
3.7 Summary 91
xi
CHAPTER 4
AN EFFICIENT INDUCTIVE RULE MINING NEW TECHNIQUE IN EXTRACTING USER DEMANDED DATA FROM DATA MART
4.1 Introduction 93
4.2 Rule Induction for Supporting Decision Making 94
4.2.1 Decision Making on Inductive Rule Mining
95
4.2.2 Rule Induction for Supporting Decision Making
97
4.3 Efficient Analysis to Extract User Demanded Data in Data Mart Using Inductive Rule Mining
106
4.3.1 Decision Support System In data mart 108
4.3.2 Decision rule induction for extracting information
109
4.4 Experimental Evaluation on Induction Rule mining 112
4.5 Results and Discussion 114
4.5.1 Decision Rules 115
4.5.2 Data Relativity 117
4.5.3 Reliability 119
4.5.4 Running Time 121
4.5.5 Rule Coverage 123
4.6 Summary 125
CHAPTER 5
AN EFFICIENT FUNCTIONAL LAYERED DATA MART BASED ON NEW TECHNIQUE OF INDUCTIVE RULE MINING
5.1 Introduction 5.1.1 Attribute Relativity in Insurance Data Mart
127 130
xii
5.2 Functional Behavior Analysis on Attribute Relativity 132
5.3 Formation of Functional Layered Repositories based on insurance policies
134
5.4 Efficient Technique for Retrieving data from Functional Layered Insurance Repositories
137
5.5 Decision Making on Functional Layered Insurance Repositories
138
5.6 Efficient Decision Rule patterns 140
5.7 Decision Rule Based on Functional Attribute 142
5.8 Rule Induction with Decision Rules 144
5.9 Efficient Induction Rule Mining in Extracting User Demanded Data
146
5.10 Functional Behavior Analysis 149
5.11 Performance result of Memory Consumption 150
5.12 System Response Time 152
5.13 Execution Time Evaluation 154
5.14 Summary 156
CHAPTER 6
CONCLUSION AND FUTURE WORK
6.1 Conclusion 160
6.2 Future Work 162
BIBLIOGRAPHY 163
xiii
LIST OF TABLES
Table No. Title Page
No. 3.1 Relevant Attribute Vs. Data Retrieval 81 3.2 Data Mart Vs. Attribute Relativity (%) 83 3.3 No. of Data Marts Vs. Functional Behavior Analysis 85 3.4 No. of Data Vs. Data Storage Repositories 87 3.5 Data Accessibility Vs. Response Time 88 3.6 No. of Data Mart Vs. Data Mart Management 90 4.1 No. of User Queries Vs. Decisive Rules 116 4.2 Data Mart Vs. Extracted Data Relativity 118 4.3 Number of Users Vs. Reliability 120 4.4 Data Extraction Vs. Running Time 122 4.5 Data Mart Vs. Rule Coverage 124 5.1 Tabulation for Attribute Relativity 131 5.2 Tabulation of Functional Behavior Analysis 133 5.3 Tabulation for Functional Layered Repositories 135 5.4 Tabulation for Data Retrieval 137 5.5 Tabulation for Decision Support System 139 5.6 Tabulation for Decision Rule Pattern 141 5.7 Functional layered Insurance Repositories Vs. Decision
Rule 143
5.8 Tabulation for Rule Induction 145 5.9 Data Mart Vs. Extracting User Demanded Data 148 5.10 No. of Data Marts Vs. Analysis of functional behavior 149 5.11 Transaction Density Vs. Memory consumption 151 5.12 No. of users Vs. System Response Time 153 5.13 Tabulation of No. of requests sent Vs. Execution
Time 155
xiv
LIST OF FIGURES
Figures
No. Title Page No.
1.1 Types of Data Warehouse 1
1.2 Top down Flow from Data Warehouse to Data Mart 8
1.3 Bottom up Flow from Data Mart to Data Warehouse 9
1.4 Parallel Model in Data Mart Creation 11
1.5 Top down Model with End User Feedback 12
1.6 Bottom up Flow from Data Mart to Data Warehouse 14
1.7 Parallel Model in Creation of Data Mart with Feedback 15
1.8 Framework of Decision Support System 18
1.9 Inductions in Rule Mining 21
3.1 Data Mart in Data warehouse 62
3.2 Abstraction Levels of Data Warehouse Requirements 66
3.3 Architecture diagram of FBPDM 73
3.4 Process of Data Mart 74
3.5 Relevant Attributes Vs. Data Retrieval 82
3.6 Data Mart Vs. Attribute Relativity (%) 84
3.7 No. of Data Marts Vs. Functional Behavior Analysis 86
3.8 No. of Data Vs. Data Storage Repositories 87
3.9 Data Accessibility Vs. Response Time 89
3.10 No. of Data Mart Vs. Data Mart Management 90
4.1 Roles of DS in Decision Making Process 96
4.2 Decisions Making based on Inductive Rule Mining 105
4.3 Architecture diagram of the IRM 107
4.4 Process of Decision rule induction for extracting user information
110
xv
Figures No. Title Page
No.
4.5 No. of User Queries Vs. Decisive Rules 117
4.6 Data Mart Vs. Extracted Data Relativity 119
4.7 No. of users Vs. Reliability 121
4.8 Data Extraction Vs. Running Time 122
4.9 Data Mart Vs. Rule Coverage 124
5.1 No. of Attributes in Insurance Data Mart Vs. Attribute Relativity
132
5.2 No. of Attribute Relativity Vs. Functional Behavior Analysis
134
5.3 No. of Insurance Policies Vs. Functional Layered Repositories
136
5.4 Functional Layered Insurance Repositories Vs. Data Retrieval
138
5.5 Functional Layered Insurance Repositories Vs. Decision Support System
140
5.6 Functional Layered Insurance Repositories Vs. Decision Rule Pattern
142
5.7 Functional Layered Insurance Repositories Vs. Decision Rule
143
5.8 Decision Rules Vs. Induction 146
5.9 Data Mart Vs. Extracting User Demanded Data 148
5.10 No. of Data Marts Vs. Analysis of functional behavior 150
5.11 Transaction Density Vs. Memory consumption 152
5.12 No. of users Vs. System Response Time 153
5.13 No. of requests sent Vs. Execution Time 155
1
1. INTRODUCTION
1.1 Background
A data warehouse is a subject-oriented, incorporated, time-changing
and non-volatile set of shared data. Various architectures of data
warehouse are useful in numerous research applications and business
products. All architectures are put down in one of the three types, namely,
centralized DW [1], data mart and distributed DW. Figure 1.1 shows the
architectures of these three types.
Figure 1.1 Types of Data Warehouse
2
The Figure 1.1 describes the three types of data warehouse. In a
centralized DW, all enterprise data from various operational and functional
departments of a project is incorporated and loaded in a distinct database
with a single activity model. Centralized DW [2] is generated to support
knowledge workers with reliable and combined data from major project
area. They provide the potential to associate information across the
enterprise. Centralized data warehouses [4] are very complex and
time-consuming to execute.
Data Mart stands for well-concentrated version of a Data Warehouse.
Because the data is extremely focused, set up costs and time are
drastically reduced. The importance of a data mart is calculated by the
efficiency of broadcasting a relevant and absolute subset [3] of data
existing in data source to achieve decision support. Data in data mart is
extracted from legacy and operational systems and enterprise data
warehouse [5,7].
1.2 Statement of the Problem
Current Data warehouse models concentrate on the process of data
mart identification and processing. The model fails in analyzing the
functional behavior of the patterns [17]. No significant importance is given
to the selection of relative attribute in presented data warehouse models.
Users or other data interaction is more when identifying the behavior
3
patterns and they affect their choice of actions. Thus, it leads to
complicated action learning difficult problem.
Some existing methods fail to cover all the areas of ETL (Extract,
Transform and Load) and querying [15, 21]. Hence steps are to be taken to
achieve the mentioned benefits especially for the querying mechanism.
In streaming warehouse [26] table updating is performed. In that from
multiple streams evolves many new data arrives, but no steps are involved
for limiting the number of tables to be updated concurrently.
In the present methods the user demanded information is not
effectively retrieved. Some useful techniques are applied to fuzzy rough set
[33] and not to other form of rough set. Hence, a new technique is required
to apply all type of rough sets. Whenever the sources are to be updated,
some of the data become outdated.
1.3 Data Warehousing and Data Mart
Data warehouses are attractive significant benefit of any modern
business enterprise with standard applications in business planning and
strategic analysis. For instance, sales departments use data warehouses
to learn the buying reports of their customers to conclude their business
and sharing plans accordingly. The data warehouse module is a database
developed for logical processing [19] with main purpose to preserve and
analyze historical data.
4
In the model case, facts are the sales of an enterprise, and dimensions
enable the analysis by product, purchaser, point of sale, time of sale and so
on. In simple warehouses, data marts extract their information directly from
operational databases [7]. In complex situations like the data warehouse
architecture [25] is multilevel and the data mart information is extracted from
intermediate repositories called operational database.
1.3.1 Basic Concepts in Data Warehouse Technology
Operational databases provide a combined view of the application,
resulting in mapping each entity of the real world to exactly one concept of
the schema. Therefore, a difficult truth is related to complex plans.
The objective of complex plan is to capture the difficulty of the application
domain. Plans are frequently denoted at conceptual level through an entity-
relationship [27, 31] data model.
In contrast, data warehouse regularly leads the complexity by
providing a vision. The vision in each data is split into a number of simple
plans called data marts. Each one is focused for a particular analysis
activity. In turn each data mart denotes data by means of a star plan
consisting of a large fact table as center and a set of smaller dimension
tables placed in a radial pattern [12] around the fact. The fact table
includes numerical or preservative measurements for the purpose of
computing quantitative values about the enterprise. The dimension tables
provide the entire descriptions of the dimensions of the enterprise.
5
The construction of data warehouse details are exposed in terms of
entity-relationship diagram [13]. The description is the only requirement to
the construction of data warehouse. Formatting an integrated entity-
relationship plan of its information is more difficult on applying operational
database across different independent legacy systems.
1.3.2 Data Mart Development
A data mart is a decision support system integrating a subset of the
enterprise data concentrating on definite functions or activities of the
enterprise. Data marts includes distinct enterprise related features such as
forecasting the effect of marketing promotions, computing sales performance,
calculating the effect of new product introductions on enterprise profits and
forecasting the performance of a new enterprise division. Data Marts are
precise enterprise-related [42, 15] software applications.
Second, data marts are completely developed at a faster pace.
At the same time it is possible to create models of success and share
constituencies that are well on data mart applications in general. Third, data
mart achieves specific functions for a different unit like recognized
corporate or organizational task and political justification of a data mart.
Generally, it is obvious that a manager is able to achieve best decision
support within the specified enterprise budget [25] in addition to improved
technology. The conditions are appropriately solved with the decision
support system (DSS) applications [67, 9, 16].
6
The first pattern of data mart development is best characterized as
subsets of the data warehouse. The subset of data warehouse is placed
comparatively cheap computing platforms [72] that are closer to the user
and are periodically updated from the central data warehouse. In this
pattern the data mart is the child of the data warehouse, and viceversa
data warehouse is the parent of data mart.
The second pattern of development rejects the dominance of data
warehouse. Data mart is independently derived from the collection abstract
of information that predates both data warehouses and data marts.
The data mart uses data warehousing techniques of organization and
managing. The data mart is structurally a data warehouse. Data mart is
just a smaller data warehouse with an exact enterprise function.
The third pattern of development tries to combine and remove the
difference inherent in the first two developments. Here, data marts are
developed in a parallel way with the data warehouse. Both data warehouse
and data mart are developed from collection abstract of information.
But data marts are independent from data warehouse development.
The three patterns are developed without user feedback and with the
availability of user feedback. Each analysis believes that the relationship
between data warehouses and data marts is quite static. The data mart is a
subset of the data warehouse, or the data warehouse is an outgrowth of the
data marts. Data warehouse is parallel development with the data marts
7
directed by the data warehouse data model. Eventually, data mart is outmoded
by the data warehouse in offering a final solution to the abstract of information
problem. In all the three pattern analysis [52] the role of users in the dynamics
is not considered for data warehouse and data mart relationship.
Development Models Without User Feedback
The three pattern of data mart development without user feedback is
detailed. The data mart development is illustrated with a diagrammatical
representation of three models. The alternative models consider the
responsibility of users in collecting feedback for the purpose of development in
data warehouses and data marts. At last, an analysis of the effectiveness of
the six patterns of development is presented in illumination of a particular belief
on organizational reality.
i. Top down Model
The data warehouse is developed from the collection of information
through application of the Extraction, Transformation and Loading (ETL)
process [77]. The data warehouse combines all data in a general format
and a familiar software environment. In theory all organizations' data
resources are combined in the data warehouse build. All data are essential
for decision support that neighborhood in the data warehouse.
It only remains to share the data to information consumers and to present it
so that it does constitute information for them.
8
Abstract of Information
Figure 1.2 Top down Flow from Data Warehouse to Data Mart
The Figure 1.2 describes the top down pattern in building of data
mart. The task of the data mart is to present suitable subsets of the data
warehouse to customers with specific functional needs. In addition, the
structure of the data is concentrated to facilitate better information and to
offer an interface to front-end reporting [61]. The analysis tool [29] provides
the enterprise intelligence precursor information.
ii. Bottom Up Model
In the bottom up model, data marts are built from pre-existing
abstract of information and the data warehouse from the data marts. In this
9
model, the data marts are designed and implemented in parallel.
Development of this type is expected to contain both redundancy and
significant information gaps from an enterprise point of view. Each data
mart attains a combination of abstract of information in the service of the
data mart’s function.
The combination survives only from the thin view point of the
enterprise function supporting the data mart. From the enterprise point of
view, new legacy systems are generated by such a process, and these
form new abstract of information. The only development made is that the
new abstract utilize updated technology. But they are no more combined,
consistent old abstracts exist and no more capable of supporting enterprise
wide functions.
Figure 1.3 Bottom up Flow from Data Mart to Data Warehouse
10
The Figure 1.3 describes the bottom up flow from data mart to data
warehouse. The right-hand side of Figure 1.3 shows the data mart abstract of
information utilized as the basis of data warehouse incorporation.
The process in building data mart with similarity activities supports data
integration. The process is required to eliminate the redundancy in the data
marts, to recognize the issues in the process of isolative data mart creation.
The task of ELT [42] is to integrate the old abstract of information into the new
data warehouse in order to solve the issues. The opportunity of using older
abstract of information in this way is not predicted. The flow from data marts
to data warehouse is sufficient to generate a data warehouse with complete
coverage of enterprise data requirements.
iii. Parallel Model
The most interesting pattern of development compared to top-down
and bottom-up is the parallel development model. The parallel model
observes the freedom of the data marts as restricted in two ways. First, the
data mart is directed during parallel development by a data warehouse
data model exposing the business point of view. This same data model is
used as the basis for enduring development of the data warehouse,
promising that the data marts and the data warehouse are proportional.
Information gaps and redundancies are planned out and catalogued as
data mart development goes forward.
11
Figure 1.4 Parallel Model in Data Mart Creation
The Figure 1.4 describes the parallel model in data mart creation.
Second, the self-government of data marts is treated as a needed and
provisional measure on the road to structure of a data warehouse. Once
the ambition is achieved, the warehouse will supersede the data marts,
which will grow to be true subsets of the fully incorporated warehouse [52].
From that point on, the data warehouse will nourish established data
marts, create subsets for new data marts, and in all-purpose determine the
route of data mart creation and development.
The third pattern initiates to face some of the difficulties of the
relationship between the data warehouse and data marts. Unlike the first
pattern, it identifies that organizational departments and partitions requires
decision support in the short-term. The model is independent in the
development of data warehouse projects to allow outgrowth.
12
Development Models with User Feedback
All the three patterns of data mart development without user
feedback fail to clearly judge constant user feedback in response to data
mart and data warehouse activities. The data mart development mainly
relies on the character and quantity of feedback from users. The suggestions
of user feedback for the three patterns of development create three
alternative patterns of data warehouse and data mart development.
i. Top down Model
In the top down pattern, user feedback before execution of the data
warehouse is based on participation in the system planning, requirements
analysis, system design, prototyping and system acceptance activities of
the software development process. For reasons declared before, this
participation is expected to leave gaps in the reporting of domains and
attributes that are causal in character.
Figure 1.5 Top down Model with End User Feedback
13
The figure 1.5 describes top down model with end user feedback.
The top down model is focused to departmental user feedback. The top
down model provides new version to the top-down data warehouse by
departmental data marts. If the uninterrupted pattern of change to
departmental modification is implemented, a pattern of regular
development of the data warehouses and data marts are generated.
The pattern involves [35] constant feedback from the center.
ii. Bottom up Model
In the bottom-up pattern, in contrast to the outcome of top-down
development with data marts ensures much more absolute reporting of
fundamental and side effect dimensions. This also means that once the
data warehouse is executed, the bottom-up model [67, 37] with feedback
contains small primary gap between user data mart needs in the data
warehouse. Inconsistently, this small gap generated results in an
enterprise level decision to transfer the top down model to long-term
development once the data warehouse is in place.
14
Figure 1.6 Bottom up Flow from Data Mart to Data Warehouse
The Figure 1.6 illustrates the bottom up flow from data mart to data
warehouse. The transformation of bottom-up model to long term
development results in the difficulty of high gap. But if this danger is
negotiated in the development of data mart, then the initial small gap between
the data warehouse and data mart requirements are highly reduced.
15
iii. Parallel Model
The parallel model provides the most assured structure in the
development of data warehouse and data mart. Development starts with a
phase of shared adjustment between the enterprise data model [39] and
the data marts. As long as the center is unlock to data mart feedback the
development of data mart with departmental perspectives on causal and
side effect dimensions and attributes [78]. The period of data warehouse
development is comparatively even. The data mart is directed by the
enterprise data warehouse model, in a very real intellect. The enterprise
level model is directed by the individual and combined input from the data
marts. Nevertheless the enterprise data warehouse data model is more
aggregate of collected data mart models functions.
16
Figure 1.7 Parallel Model in Creation of Data Mart with Feedback
The Figure 1.7 describes the parallel model in building data mart
with feedback. The complexity in implementing the parallel model is at the
beginning of development. The model believes achievement of the data
warehouse data model before data mart development starts. Therefore it
needs fast development of the enterprise level model, and also needs the
data marts to stay until this development is complete.
A total enterprise level data warehouse model is unnecessary to
supervise and assess interdepartmental redundancies, and to follow
information gaps. Additionally, the enterprise level model is unnecessary to
organize data mart back-ends [20, 21, 25, 33] to guarantee eventual
compatibility. On the other hand, if data marts are synchronized by a
central modeling team and confident to progress with development of data
marts including all purposeful speed.
1.3.3 Dynamics of Data Mart Development
The three initial patterns of data mart development are idealistic as it
fails in collecting user feedback to build data marts and data warehouses.
By initiating precise thought of user feedback, the issue of centralized data
warehouse and decision support system is minimized. All three patterns of
development handle the key decision in the construction of data
warehouse. But the three patterns of development are still different choices
17
even if the similar long-term policy of shared change of data marts and
data warehouses is followed after data warehouse development.
The top down pattern need a time of considerable adjustment to data
mart requirements after the data warehouse is developed. The purpose of
time requirement in data mart is to restrain centripetal services and to
adjust the predictable development of independent data marts. The bottom-up
model needs an additional period in processing ETL to hold development
of the data warehouse from the data marts. The parallel development
model needs fast development of an enterprise level data warehouse data
model. The fast development is minimized at the time of immediate
development of data marts and the data warehouse, along with
organization of the enterprise team. Finally, the parallel model with
feedback is more popular, because it offers both co-ordination and self-
sufficiency. In addition, the parallel model is still more well-liked due to its
co-ordination in development of data models and in directing enterprise
level data model.
1.4 Decision Making in Data Mart
The main idea to develop a decision support system (DSS)[66] to face
the enormous amounts of data available across a distributed project lifecycle.
The distributed project lifecycle is accepted by a business and its associate
deal to develop the data towards precious and exact knowledge retrieval. The
data retrieval is faster and has better quality with certain decisions. The system
18
provides several types of data access capabilities to extract and analyze the
data contained in a data warehouse. Some of the data access capabilities are
on-line transaction processing operational (OLTP), and legacy systems for
providing the critical information needed by business decision makers.
Figure 1.8 Framework of Decision Support System
A snapshot of the framework is built up on the model of the decision
support as shown in Figure 1.8. The framework is composed of multi-
agents, Local Database model, modal base and knowledge base.
1.4.1 Multi Agents (MA)
Multi-agent based system [57] facilitates the search in the distributed
database. Simultaneously, in any case the decision gives on-line and dynamic
information. Different agent types are utilized to understand the system
19
objective. The agent’s roles, potential, intellect, independence, support,
communication language, protocol [47] and shared ontology are considered.
1.4.2 Report Visualization Agent
Report visualization agent integrates the end result of the decision
making process and produces the final results in a more understandable
and normal way, making it more attractive to users. Report visualization
agent holds pre-planned report templates and a group of visual illustrations
for exposing the outcome of the decision support operation. It executes a
range of visualization techniques [65] and produces the decision report.
1.4.3 ETL (Extraction, Transformation and Load)
ETL agents handle the process of Extraction, Transformation and
Load (ETL). ETL are shared on databases with Data warehouse, legacy
system and OLTP systems [23,25]. ETL agents recognize the significant
data sources, extract the data and then perform the essential data
cleansing and transforming. The extracted information is stored for later
use. An ETL agent job is the method where business data turn into
business information.
1.4.4 Knowledge Base
The knowledge base process is carried by the Knowledge
Acquisition Agent. The acquired knowledge is changed or modified related
to range of data offered by ETL agents. Different agents encompass
different knowledge acquisition algorithm and feedback method as they
20
deal with different user requests. Agent is capable of estimating knowledge
obtained by other agent before using it in decision making. Furthermore,
the agent is capable of authenticating or highlighting the knowledge.
1.4.5 Decision Making
Decision Making Agent is accountable for organizing the different tasks
that required to be performed. From the user’s specified high level goals,
decision making agent transfers them into exact lower level tasks and creates
a plan of action. The decision making agent employs knowledge about the
task domain, and the facilities of other agents. Decision making agent
requests the services of a group of agents that work co-operatively and
combine the final outcome. Thus, the decision agent is responsible for
performing the actual decision activity and providing the results.
1.5 Inductive Rule Mining
One of the main approaches of data mining is the routine induction
of classification rules. A range of methods are proposed and many relative
studies of methods are carried out. But the lack of broadly available
general platform made it complex to carry out adequately in-depth studies
to establish the advantage of one method over another. Inducer is a rule
induction workbench and intends to afford a general platform for examining
a range of rule induction algorithms [76]. Additionally, inducer indents to
examine the rule induction associated strategies [78] with a variety of
datasets, without the need for any programming.
21
Figure 1.9 Inductions in Rule Mining
The Figure 1.9 describes the induction rule mining. More accurate
results are generated on integration of rule induction along with association
rule mining producing more accurate result. Association Rules structures a
much useful data mining approach. Association Rules are derived from
recurrent item-sets. Data mining [13,16] is the challenging issue in every
domain of research. The challenge lies in mining the data with more
accuracy and processing time.
The use of rule induction along with association rule mining algorithm
in data mining provides high accuracy and processing time. The integration of
rule induction and association mining is able to minimize the number of rules
with more data coverage. In addition, the establishment of rule induction is
able to minimize the error rate with fast processing time from the large
dataset and also minimizes the time complexity [49] with combined use of
rule induction and association mining [63].
Induction
Decision List Induction
Rule Induction
22
1.6 Motivation and Goal of the Study
The rising necessity for enormous volume of data in enterprise and
corporate environment, increases the demand for data warehousing. Data
warehousing gathers the data at various levels like departmental,
operational, functional and stacked as a collective data repository with
better storage efficiency. Several data warehousing models [72] focus on
loading the data more efficiently and quickly.
Additionally extraction of data from the warehouse requires better
understanding of the structure in which the data layers are loaded in the
repository. But function requirements of users are not understood without
doubts by the data warehouse model. Functional requirements [60, 61, 62]
require efficient decision support system to extract the appropriate user
demanded data from data warehouse.
The difficulty in accessing data from data warehouse requires a
better solution. The task of extracting exact user required data is tedious
because the users require better understanding about the data structure
stored in the repositories. Therefore, to solve the issue, data mart is
necessary. An efficient technique for Data Mart is essential to handle the
analysis of the functional behavior pattern [63,74] and to provide a better
decision support system.
23
To handle the user demanded information, an inductive rule mining
is required. The efficient retrieval of user demanded is extracted from data
mart with the establishment of inductive rule mining. The purpose of
creating rule induction is to segregate the layered data repositories [73].
The decision support system is required to extract the required user
demanded data from data warehouse along with inductive rule mining.
1.7 Organization of the Remainder Study The organization of remaining chapters is as follows:
Chapter 2 describes the review of literature and discusses the
techniques of mining user demanded data from data mart and functional
behavior pattern for data mart based on attribute relativity. Chapter 3
initially presents process of data mart in data warehousing and impact of
data warehouse requirements in functional behavior. Chapter 4 discusses
the rule induction for supporting decision making and inductive rule mining.
Chapter 5 describes the performance analysis on extracting user
demanded data in functional layered data mart based on induction rule
mining. Chapter 6 presents the conclusion scope for further studies.
24
2. LITERATURE REVIEW
2.1 Introduction
Data warehouse (DW or DWH) is a database that is used for
reporting and data analysis. DW is the central repository of data, and that
is formed by integrating data from one or more dissimilar sources.
Data warehouses normally store current and historical data which are used
for designing trending reports for senior management reporting. The data
is transferred via an operational data store for extra operations before they
are involved in the DW for reporting.
However, users' function requirements are difficult to understood by
the data warehouse model. Decision support system is demanded to
separate the needed user demanded data from data warehouse.
Data marts are introduced to handle the issue of functional decision
support system to extract user relevant data. Isolated functional data
repository layers are formed by data mart, depending on the departmental
decision support requirements, that are in the enterprise and corporate
data applications.
2.2 Functional Behavior Pattern for Data Mart Based on Attribute
Relativity
In wireless sensors, every component stores some data about the
global state of the system. The system’s functionalities like routing the
25
message, retrieval of information and load sharing normally depend on
modeling the global state. The global data mining models, for e.g., decision
trees and k-means clustering are to be computed which are very costly
because of the system’s scale and communication cost (high). Ran Wolff et al
(2009) presented a two-step method to deal with these costs. First, many
data mining model is to be monitored with the help of a highly capable local
algorithm. Secondly, the local algorithm is employed as a feedback loop
which monitors the complex functions of the data (like k-means clustering).
The association rules are large in number but they miss some
interesting rules and the rules’ quality needs more analysis. Decision
making normally uses these rules and that leads to risky actions.
For discovering domain knowledge report as coherent rules, a framework
has to be developed and it is done by Alex Tze Hiang Sim et al (2010).
Depending on the properties of propositional logic, coherent rules are
invented and no background knowledge is needed anymore for coherent
rule generation. With the help of discovered coherent rule, association
rules [17] can be obtained without the knowledge of the required level of
minimum support threshold.
One of the applications of data mining techniques is web usage
mining and stream data is clicked for the extraction of usage patterns.
User’s behavior is determined by analyzing those patterns and determining
user behavior is a demanding research issue in web usage mining.
26
Data processing is an important task for Web log mining. Suneetha K.R
and R. Krishnamoorthi (2009) aimed to use snow flake schema [14] for
easy retrieval by which the preprocessed data’s usage pattern can be
accessed. On-line analytical processing (OLAP) tools were provided by
data warehouse for the multidimensional data of varied granularities
interactive examination to offer an effective data mining.
2.2.1 Behavior Patterns of Humans
Behavior patterns of humans involve ambiguity, uncertainty,
complexity and inconsistency which are formed because of physical,
logical and emotional factors. So, Sang Wan Lee, et al (2012) designed a
non-supervised learning framework for the human behavior patterns [35]
by considering their behavioral characteristics. Two steps are involved in
the proposed framework. The first step is, a cluster validity index is
proposed along with Agglomerative Iterative Bayesian Fuzzy Clustering
(AIBFC) [36] which is used to detect a meaningful structure of the data.
The second step is, with the help of the structure detected in the first step,
the actions sequence is learned by using the proposed Fuzzy-state
Q-learning (FSQL) [36] process. Some of the user’s interaction may affect
their choice of actions and thus lead to complicated action learning which
still a difficult problem.
Experimentation is conducted with the behavior patterns of
cabdrivers by some other author. Liang Liu (2010) designed a new method
27
to expose cabdrivers’ operation patterns by examining their non-stop digital
traces [25]. A set of important features that are useful to categorize
cabdrivers are identified. Cabdrivers’ operation patterns are defined and
comparison is made among diverse cabdrivers’ behavior.
Based on the daily income of the cabdrivers, they are classified as
top drivers and ordinary drivers. Daily operations of cabdrivers, are
considered in order to uncover spatial selection behavior, context-aware
spatio-temporal operation behavior, route choice behavior and operation
tactics. Focus is on the analysis of cabdriver operation patterns based on
their digital traces. The method is however empirical, and analytical
method is employed for GPS-like trace analysis.
2.2.2 Temporal Pattern Extraction
Temporal pattern extraction is a major and challenging one in
multivariate framework. So, Wenjing Zhang and Xin Feng (2013) proposed
a new method and it employs a Multivariate Reconstructed Phase Space
(MRPS) [15, 71] for determining multivariate temporal patterns. The univariate
reconstructed phase space framework is based on fuzzy unsupervised
clustering method and it is extended by a new technique by the usage of a
novel mechanism called data categorization and it depends on the
definition of events.
First, the technique of univariate RPS embedding is extended to the
Multivariate RPS (MRPS) embedding by the combination of each
28
variable’s embedding. Secondly, the statistical distribution of three latent
states, namely, event state, pattern state and normal state are estimated
by applying the Gaussian Mixture Model (GMM) to the dataset. A new
classifier and an associated objective function are also introduced that
integrate MRPS temporal pattern modeling and for temporal pattern
classification, Bayesian discriminative scoring.
Attacks has not left the data warehousing too and hence, Vijaya
Bhaskar Velpula, Dayanandam Gudipudi (2009) focused on the methods
for detecting insider attacks and provided a solution to avoid attacks.
A behavior-anomaly based system [13] was introduced for detecting insider
attacks. Peer-group profiling, composite feature modeling and real-time
statistical data mining are used by the system to avoid attacks. The real-time
monitoring process is updated with the help of refined analytical models.
The detection approach performance is described as the IBM Identity Risk
and Investigation Solution (IRIS).
Normally, users demand for fresh, attacks free and up-to-date
information in all fields, and so, Active warehousing has been emerged.
During the performance of warehouse transformations, processing and
disk overheads are formed because of the source updates online
refreshment. The root of many common transformations in active
warehouse (e.g., surrogate key assignment, duplicate detection, etc.,) was
the join operation and that is used frequently. The join operator performs
29
join among update stream and a persistent disk relation [11] with limited
memory which was considered by Abhirup Chakraborty, Ajit Singh (2009).
The author has also developed the partition-based join algorithm to reduce
the processing overhead, disk overhead and delay in resultant tuples.
2.2.3 Real Time Warehouse
Compared to active warehousing Real time warehouse also faces
many issues, and steps are to be taken to solve them. Write-read
contention is a great problem during the deployment of real-time data
warehouses. They need to control continuous flows of updates and queries
and needs to obey the conflicting requirements like short response times,
high data quality.
If two criteria are to be considered then a multi objective optimization
problem will be faced. The problem is transformed into a knapsack
problem with extra inequalities. Hence, Maik Thiele et al (2009) developed
a new model for grouping both objectives with provided user preferences [19].
First, the objectives are constructed to a more theoretical level and
separate maximization and minimization problems are also prepared.
Depending on that, a multi-objective scheduling algorithm is developed and
in point of view to user requirements the optimal schedule is provided by
them.
A data warehouse processes only the given set of queries with the
multiple materialized views. All view materialization is impractical because
30
of the space and maintenance cost constraints. The materialized views
selection is the basic problem in the process of developing a data
warehouse to obtain optimal efficiency.
B.Ashadevi and Dr.R.Balasubramanian (2009) designed a framework
for materializing the selection of views [21], for the provided storage space
constraints. The aim of the framework is to attain optimal combination of
better query response and cost of query processing and view maintenance
is also to be low. Using the proposed framework, the maintenance, storage
and query processing cost are optimized for materialization by the
selection of better cost sufficient views.
Real-time warehousing can be used to achieve fresher data that are
demanded by today’s business, but there are numerous challenges to
achieve it in true real-time environment. Hence, Janis Zuters (2011)
developed ‘Multi-stage Trickle & Flip’ method [20] for data warehouse
refreshment.
‘Multi-stage Trickle & Flip’ is a data warehouse refreshing method
and it is designed to reduce the competition among the loading and
querying processes when real-time operation is carried out. It depends on
the ‘Trickle & Flip’ principle and further extraction is needed for loading and
querying activities which makes them both to be more capable. The
method is not covering all the areas of ETL (Extract, Transform and Load)
31
and querying and needs efforts to attain the maximum benefits especially
for the querying mechanism.
Advanced steps are involved to solve problems faced by real time
data warehouse with the help of ETL. Data warehouses (DWs) are
normally dumped with data at regular time intervals with the help of quick
bulk loading methods. Now-a-days, all or some of the new source datas
are loaded very quickly into DWs because of the near-real-time DWs (right-
time DWs).
The above mentioned one is processed with the usage of regular
INSERT statements but it results in very low insert speeds. Hence,
Christian Thomsen et al (2008) proposed RiTE (“Right-Time ETL”), a
middleware system and it provides a solution by making the inserted data
to be available quickly even though with the presence of bulk-load insert
speeds [22]. A main memory based catalyst is provided by RiTE and it
offers fast storage and concurrency control. The system includes an open
source DBMS and the system supports both producer and consumers.
2.2.4 Real Time Environments ETL
The ETL is normally achieved in real time environments but the
following author has said that it is achieved in query networks too. Usually,
the data warehouses refreshment is achieved in an off-line fashion. Active
Data Warehousing is an emerging trend in which data warehouses are
updated often because of the user’s high interest for new data. Alexandros
32
Karakasidis et al (2005) developed a framework for active data warehousing
execution [26]. The framework has many goals, and they are, providing
slight modifications in the source’s software configuration, providing least
overhead because of the data propagation’s active nature, modifying easily
the overall configuration of the environment in a righteous way.
The ETL (Execute, Transform and Load) activities are achieved in
queue networks. The performance and tuning operation of overall
refreshment processes is predicted by applying the queue theory.
The downside of the work is that failures are to be avoided by employing
some safeguarding techniques and fast resumption algorithms.
Data feeds are to be burdened into real time data warehouses, and
hence Mohammad Hossein Bateni et al (2009) made a study on the
scheduling algorithms [28]. They are involved in applications like IP
network monitoring, online financial trading and detection of fraud credit
cards. A large number of data feeds are collected by the warehouse in
those applications and they are achieved by external sources.
In a constant rate, data are generated for all tables and at different
rates data are generated for different tables. During each data feed, an
update is generated by the arriving new data, and then it is added to the
respective table.
Specific percentage of data warehouses fail to face business objectives
as mentioned in some surveys. The main reason is that requirement analysis
33
is usually unnoticed in real projects. Paolo Giorgini et al (2008) designed
GRAnD, to meet requirement analysis for data warehouses depending on
the Tropos method [27]. Requirement analysis in GRAnD involves two
diverse aspects, namely, organizational modeling, based on stakeholders
and decisional modeling, depending on decision makers. Decisional
modeling directly deals to the information that are needed for decision
makers. Organizational modeling enables identification of facts and
supports the supply-driven part of the approach. The work can be
employed with both demand driven and a mixed supply/demand-driven
design framework.
Study is also made in stream warehouse to provide an efficient
outcome. Queries are facilitated by the stream warehouse which ranges
from realtime alerting and diagnostics to long-term data mining.
Continuous data loading from diverse and uncontrolled sources into a real-
time stream warehouse results in many problems; for example, user needs
result in a timely manner but a stable result often needs lengthy
synchronization delays.
So, P. Urmila et al (2012) proposed for stream warehouses a theory
of temporal consistency which permits multiple consistency levels [27].
The update problem of streaming warehouse is denoted as a scheduling
problem. In the scheduling problem jobs are denoted as processes and
that loads new data into tables and aims to reduce data staleness over
34
time. As soon as the new data are loaded, defined applications and
triggers on the warehouse perform immediate action.
As said earlier, streaming data warehouse update problem is
denoted as a scheduling problem. In the scheduling problem, jobs are
denoted as processes and that loads new data into tables and aims is to
reduce data staleness over time. Mohan Raj. A, M. N. Sushmitha developed a
scheduling framework to handle the difficulties faced by a stream
warehouse like view hierarchies and priorities, heterogeneity of update
jobs [24] because of diverse inter arrival times, data volumes in diverse
sources and transient overload, etc.
In streaming data warehouses, update scheduling is developed by
merging the characters of traditional data warehouses and data stream
systems. The framework’s functionality is that, for short jobs which
normally correspond to regularly refreshed tables, the resources are
reserved and the inefficiencies related with partitioned scheduling methods
are avoided. The only downside of the proposed approach is that from
multiple streams many new data arrive, but no steps are involved for
limiting the number of tables to be updated concurrently.
E. Malinowski, E. Zimanyi (2008) offer a temporal extension of the
MultiDim model [8] and the extension depends on research executed in
temporal databases. Diverse temporality types are permitted, and they are
valid time, transaction time and lifespan. Those are got from source
35
systems and loading time is achieved in the data warehouse. Temporal
support is provided for levels, attributes, hierarchies and measures by the
proposed model.
The author took steps to solve the problems faced by previous
method and used TDW to solve it. L. Leonardi et al (2009) developed a
method to store and aggregate spatio-temporal patterns with the help of
Trajectory Data Warehouse (TDW). Frequent patterns are quickly
estimated which are mined from trajectories of moving objects that appears
in a specific spatial zone and at a given temporal interval.
The TDW is spotted depending on a data cube model which has the spatial
and temporal dimensions. Such a TDW can be improved with a new
measure: on trajectories data mining process is carried out and frequent
patterns are also achieved from them. Those patterns are examined by the
user using OLAP queries at different levels of granularity.
In data warehouse design, most of the attention is on the literature
and no attention is paid on data warehouse testing. Matteo Golfarelli et al
(2009) proposed data mart specific testing activities and analysis [7] is
made regarding the one which is to be tested and the way it is tested.
A complete approach is developed to adapt and extend the testing
methodologies and it is designed particularly for data warehouse projects.
The proposed method is built on a set of tips and suggestions got from the
real projects, as well as the data warehouse practitioners. Consequently, a
36
set of appropriate testing activities are identified, classified and framed
within a reference design methodology.
V.Mallikarjuna Reddy et al (2010) proposed a method to adapt data
warehouse schemes in a correct way and new data are also loaded with
the guidance of GUI based Extraction, Transformation and Loading (ETL)
procedure [12] for Active Data Warehouse.
The above specified idea is achieved by using techniques like table
structure replication and query predicate restrictions in order to select data.
The GUI based ETL was mainly presented for continuous data loading in
the active data warehouse by allowing continuous data integration only
during reduced query execution time at the user end of data warehouse.
But the previous method has not mentioned anything about DSS.
Many Decision Support Systems (DSS) are developed in numerous areas
like medical, business agriculture, marketing, etc. Two methods are
proposed by Arvind Selwal in order to design DSS. A general method for
designing a DSS [9] is proposed firstly. A large data warehouse (DWH)
involves data mining methods to gather useful information in order to
improve the decision making process. A Decision support system for
Village Economy Development Planning (VEDP-DSS) is developed
secondly. It is used by District Development Planning Officer (DDPO),
Block Development Officer (BDO) & Village Surpanch for decision making
37
at applicable level. VEDP-DSS can be employed for the decision making
regarding village development.
An analysis framework is provided by the Data warehouses and On-Line
Analytical Processing (OLAP) to support the decision making process. In
numerous application domains, a complex analysis task involves
geographical information for their process. Many applications are present
for the integration of OLAP and Geographic Information Systems (GIS).
However, less effort is provided to support continuous fields, i.e.,
phenomena like temperature, altitude or land use, are recognized to hold a
value at every point in space and/or time. Alejandro Vaisman Esteban
Zimányi (2009) expanded a conceptual multidimensional model with
continuous fields [10]. It is attained by describing an appropriate data type
that holds diverse operations in order to manipulate such type of fields.
A query language depending on relational calculus is also determined and
it permits expressing spatial OLAP queries which includes continuous
fields. Query language is also used to properly define such class of
queries.
Depending on availability of integrated and organized high quality
information, supporting decision-making is really difficult. The above mentioned
problem is resolved by Data warehouses. Regarding internal and external
data intelligence which is critical to understand, data warehouse serves as
38
an integrated repository. Business intelligence supports efficient problem
and opportunity identification, critical decision-making, etc.
The analysis used four themes, namely, integration, implementation,
intelligence and innovation. Salvatore T. March, Alan R. Hevner (2007)
made a clear view of applying data warehousing technology [16] to support
management decision-making. Review is made on the information
requirements used for decision support through a general systems theory
of management.
Till now authors have considered only on single dimensional
databases and so DaWaII (Data Warehouse Integration) came into exist
which is a tool that supports the activities involved in the integration of
multidimensional databases. Combining separately developed data
warehouses is basically a great problem. The tool supports in testing the
validity of a matching heterogeneous dimensions depending on the
number of desirable properties.
Actual integration is performed when two diverse approaches are
offered by the tool and it is found by Luca Cabibbo et al (2006). A scenario
of loosely coupled integration [5] is the foremost approach. In the first
approach, general information between sources are to be spotted and then
it performs drill-across queries against the original sources. The sources
are merged by which the derivation of a materialized view is built.
39
A scenario of tightly coupled integration is referred in which the view is
spotted by the queries.
Web users, navigation and interaction patterns are quite complex,
particularly the interactive applications which supports user sessions and
profiles. Michel C. Desmarais (2006) presented such a case for an
interactive virtual garment dressing room. The application supports
personalization and user profiles and the view of a multi-site user session
and it is also spread over numerous web sites. It is also supported with a
data logging system that produces about 5GB of compound data per
month.
Juan Manuel Pe´ rez et al (2008) made a review on merging Data
Warehouse (DW) and Web data. XML technologies are studied which are
used to integrate, store, query and retrieve Web data [2] and their
application to DWs. Various DW distributed architectures are studied and
XML languages are used as an integration tool in the systems. Study is
also made on Web data repositories, for XML data sources multidimensional
databases design, and the XML extensions of OnLine Analytical
Processing mechanism.
The work has also discovered the main limitations and opportunities
that allow the combination of the DW and the Web fields. The main
problem is that domain ontologies cannot help DWs to interoperate in a
large-scale scenario and also with other information-provider applications.
40
The survey deals only with data-centric XML and not with document-centric
XML collections.
Active Data Warehousing is used as a substitute to traditional
warehousing to gather applications for up-to-date information. A higher
consistency among the information about the data that is store already and
the latest updates of the data are obtained by an active warehouse which
is refreshed on-line. The implementation of data warehouse transformations
faces numerous challenges due to the need of on-line warehouse
refreshment and it depends on their execution time and overhead to the
warehouse processes. N. Polyzotis, et al (2008) focus on a regularly
encountered operation, that is, linking the fast stream of source updates
along with the disk-based relation, [1] over a usage of limited memory.
The above said operation is carried out at several common
transformations like surrogate key assignment, duplicate detection or
identification of newly inserted tuples. A join algorithm named as meshjoin
(MESHJOIN) is designed to compensate the access cost of the two join
inputs. The MESHJOIN algorithm is detailed and a systematic cost model
is designed to enable MESHJOIN. MESHJOIN is enabled for two main
reasons: enlarging the throughput for a specific memory budget or
reducing memory consumption for a specific throughput.
41
2.3 Mining User Demanded Data from Data Mart Using Inductive Rule
Mining
OLAP (On-Line Analytical Processing) systems are extended with
spatial and temporal features and it has drawn the notice of GIS
(Geographic Information Systems) and database communities. However,
there is no specific definition for a spatio-temporal data warehouse and
also not specific to the functionality such a data warehouse must support.
The solutions provided earlier changes depending on the kind of data that
is to be represented as well as the kind of queries that are to be
expressed. Alejandro Vaisman and Esteban Zimanyi (2008) proposed a
conceptual framework to define spatio-temporal data warehouses [50] with
the help of an extensible data type system. A taxonomy of different classes
of queries with increasing expressive power are also defined.
Demonstration is also made to express such form of queries by employing
an extension of the tuple relational calculus of aggregated functions.
Chin-Ang Wu et al (2011) decided to advance the overall data
warehouse mining process and hence developed an intelligent data
warehouse mining method.
The method includes schema ontology [49], schema constraint
ontology, domain ontology and user preference ontology. The rule mining
is employed to exhibit the structures of those ontologies, and demonstrations
are also conducted by using the rule mining in order show the benefits of
42
the mining process. A prototype multidimensional association mining
system is also proposed with the support of ontologies to display how the
users are helped to build data mining models. It is employed to prevent
formation of ineffective patterns, identify concept extended rules, and
implement an active knowledge rediscovering method.
Jin Soung Yoo, and Shashi Shekhar (2009) provide a flexible model
and it denotes the interesting temporal patterns with the help of a user-
defined reference sequence. Similarity-profiled temporal association
patterns [52] are to be mined effectively and for that an algorithm has to be
proposed and it is processed by using interesting properties like support
time sequence and a lower bounding distance. But the association mining
is not very effective. So, Jerzy Błaszczynskia et al (2010) developed a
general rule induction algorithm [40] and it depends on sequential covering
which is applicable for variable consistency rough set methods. VC-DomLEM
is an algorithm and it is employed for both ordered and non-ordered data.
Rough set selection also plays a major role in data warehouse.
Salvatore Greco et al (2008) presented a generalization on the original
definition of rough sets and established variable precision rough sets [44].
The generalization is presented depending on the approach of absolute
and relative rough membership. In variable precision rough set model,
parameterized rough set model generalization is used. Its aim is to model
data relationships and it should be in the form of frequency distribution
43
rather than full inclusion relation and can be used in the classical definition
of rough sets. Thus, for a chance, in the variable precision rough set
model, one or more parameter models the degree and the condition
attribute value approves the decision attribute value.
The previous author mentioned about variable precision rough set
model, and Jerzy Błaszczyn ski et al (2009) focused on probabilistic rough
set method [40] which depends on diverse version of rough approximation
of a set. In those versions, consistency measures took control over the
assignment of objects with lower and upper approximations.
Some basic properties of rough sets are very attractive and it needs
the properties of monotonicity. Three types of monotonicity properties are
taken into account and they are monotonicity in consideration to the set of
attributes, set of objects and dominance relation. The consistency
measures lack some of the properties of the monotonicity. Hence, new
measures have been developed within two types of rough set methods:
Variable Consistency Indiscernibility-based Rough Set Approaches
(VC-IRSA) and Variable Consistency Dominance-based Rough Set
Approaches (VC-DRSA).
Many rough set models exist but Fuzzy Rough Set plays the major
role. The Fuzzy-Rough Set (FRS) method handles the discernibility and
fuzziness in a great way. Some researchers made a study on the rough
estimate of fuzzy sets, but some other researchers made a study only on
44
attribute reduction or feature selection, and it is an application of FRS.
But less study is made on the constructing classifier; it is another
application of FRS.
Suyun Zhao et al (2010) built a rule-based classifier with the help of
one generalized FRS model [37]. The existing FRS must be established as
a robust model in accordance with misclassification and perturbation. It is
performed by integrating the controlled threshold with FRS knowledge
representation. “Consistence Degree” method is denoted as a critical value
and it decreases the redundant attribute values that are involved in the
database. The consistence degree concept is used to form a discernibility
vector and it helps to develop the rule induction algorithms and the
induction rule set can function as a classifier.
2.3.1 Attribute Reduction
Attribute reduction is the important research topic in the existing
fuzzy rough sets. Reducts are computed by using the discernibility matrix
approach and it is found that only the minimal elements in a discernibility
matrix provide sufficient and necessary outputs. Hence, Degang
Chen et al (2011) considered the above fact and proposed a new algorithm
and it is used to find the reducts depending on the minimal elements in the
discernibility matrix.
In conditional attributes, Relative discernibility relations are present
and specified by the author. The relative discernibility relations are used by
45
the fuzzy discernibility matrix to characterize the minimal elements. Using
the fuzzy rough sets framework, two algorithms are developed in order to
compute minimal elements and reducts. The downside of the proposed
model is that it is acceptable only by fuzzy rough set and not by other
rough sets. Hence, steps to be taken to apply it for other rough sets too.
Answering the queries is a challenging issue and it should be tackled
in an efficient manner. In the uncertain database, the probabilistic
threshold query is the common query. In the database a result satisfying
the query must also meet the threshold requirement. Jianxin Li, et al (2013)
made an examination of the probabilistic threshold keyword queries
(PrTKQ) in probabilistic over XML data [45].
The idea of quasi-SLCA is presented first and it is used to produce
outcomes for a PrTKQ by considering the feasible world semantics.
A probabilistic inverted (PI) index is also designed and it rapidly gives back
the qualified answers. The unqualified answers are removed, depending
on the planned lower/upper bounds. Two efficient and equal algorithms are
also presented. They are, Baseline Algorithm and PI index-based Algorithm.
The performances of algorithms are to be speeded up, and so, the
probability density function is employed.
The user can also identify incomplete order constraints among
diverse categories. For example, after visiting the museum, user can go to
the restaurant. Previous works' focus was on the total order query.
46
Jing Li et al (2013) denoted that the ideas denoted by existing papers
involved repeated computations and it is not applicable to large data sets.
So, a new solution is provided for the general optimal route query [43].
The solution depends on two methods, namely, backward search and
forward search. Discussion is also made to answer the different optimal
route queries. The route in the optimal route query covers only a subset of
provided categories.
Theoni Pitoura et al. (2012) proposed Saturn for large-scale data
networks. Saturn is an overlay architecture and it employed for processing
range queries [44] and makes sure of accessing the load balance and fault
tolerance and it is maintained over Distributed Hash Tables (DHTs).
In DHT the consecutive data values are stored in neighboring peers and it
is highly advantageous.
It helps in accelerating range query processing but such a
placement leads to load imbalances. Saturn handles those issues (range
queries, load balancing, fault tolerance) by the development of a new
multiple ring, order-preserving architecture.
Man Lung Yiu et al (2012) took into account a cloud computing
setting and the process involved in it is parallel querying of metric data [65]
is outsourced to a service provider. The data can be exposed only to
trusted users and not to service provider or anyone else. The advantage in
outsourcing is that it allows the data owner scalability and a minimum initial
47
investment. Privacy is also to be provided because of the data’s sensitive,
valuable or otherwise for confidential. Methods are provided to transform
the data that is eager to supply it to the service provider for similarity
queries on the transformed data. Interesting trade-offs are also provided
among query cost and accuracy.
Shaoxu Song et al (2011) conducted a study based on the
computation caching and sharing within a sequence of inference queries in
the databases.
For probabilistic inference queries in databases, the clique tree
propagation (CTP) algorithm was developed. The materialized views are
employed to store the intermediate results that are obtained from the
previous inference queries. Those queries can be shared with the
subsequent queries and that minimizes the time cost. The frequently
queried variables are detected while the query workload is considered.
Different query plans are in exist, and so, heuristics are provided to
measure costs and select the optimal query plan.
Victoria Nebot et al (2009) designed a Semantic Data Warehouse
(SDW) which acts as a repository of ontologies and semantically annotated
data resources. An ontology-driven framework [18] is also developed in
order to plan multidimensional analysis models for Semantic Data
Warehouses.
48
The framework builds an integrated ontology named as the
Multidimensional Integrated Ontology (MIO) and it incorporates the
classes, relationships and instances, representing analysis dimensions and
measures. The main downsides of the implementation is that the data
become outdated because of the updates of the sources and also the
extraction, validation process takes more time to perform. The main
drawback in SDW is that the queries will be too slow.
Business programs mostly concentrate on the data warehouses and
the data mining techniques. Especially, the Radio Frequency Identification
(RFID) application concentrates on those techniques and it provides a
revolution in business programs. In RFID many applications like
manufacturing, the logistics distribution and stages of supply chains are
involved. The noises and duplicates produced in the RFID data confirm the
need of a data warehousing system.
Barjesh Kochar and Rajender Chhillar (2012) designed a new data
cleaning, transformation and loading method. The method allows the data
warehousing system and it is used for any RFID applications [51] to make it
more effective. The RFID application is recording the goods in warehouses
with the usage of RFID tags and readers which is a significant RFID
application.
Spatio temporal data warehousing is a different and interesting
method. Mobility data applications are flourishing in many application
49
domains. Many researchers have lead to the development of concepts,
models, theories and tools to capture mobility data and making it
convenient to those applications. One particular characteristic makes
mobility data management to worry and i.e., for subsequent analysis of
mobility data, it should be stored in data warehouse. Esteban Zimanyi
(2012) proposed an idea of spatio-temporal data warehouses and
demonstrated to define a data warehouse with the help of an extensible set
of data types.
Danubianu et al (2009) made a study on the needs and opportunities
that are involved in the implementation of a data warehouse in tourism
field. Focus is made on the three dimensions of tourism, namely,
economic, social and cultural. A short analysis is carried out in the tourism
area’s information system. But the technologies that are involved in diverse
components of the sector are not efficient.
In data warehouse a lot of information is stored, and to know about
them, metadata is necessary. It increases the level of adoption and data
warehouse’s data usage by knowledge workers and decision makers.
Data warehouse implementation is more effective when metadata model is
used. The quality of the data warehouse decreases if the metadata model
is not employed.
50
2.3.2 Consistent metadata and Data Warehouse
A warehouse implementation is also based on consistent metadata
and it results in a successful warehouse. Nayem Rahman et al (2012) used
ETL (extract transform load) based metadata model [42] for the data
warehouse. The ETL metadata model provides better refreshment in subject
area, especially in metadata-driven area, loads observation timestamps.
The ETL metadata model also decreases the utilization of database
systems resources. The ETL metadata model supplies a set of ETL
development tools to the developer and hand-over a user-friendly batch
cycle refresh monitoring tool to the production support team.
Dr. Anjana Gosain Suman Mann (2010) presented a paper based on
object oriented multidimensional data model. It is determined to describe
the data and it incorporates aggregation, generalization, multiple path
hierarchies, multiplicity, [41] etc. Seven operators are also presented for
the model and it is important to make query and format results. The seven
operators are intersection, difference, symmetric difference, restriction,
union, join and projection. If these operators are minimal, then no operator
can be denoted as the other operator and no one can also be discarded
without sacrificing functionality.
Studying the information provided by the medical images is really a
tedious problem, and hence the annotation of medical image process can
into existence. The diversifications obtained from diverse sources with the
51
annotations are to be modeled with the help of data warehouses. Classical
conceptual modeling does not consider the specificity of the annotations.
Mouhamed Gaith Ayadi et al (2013) proposed a conceptual
modeling of the data warehouse with the help of the StarUML extensibility
mechanism [39]. StarUML is an open source platform and it involves the
XML language to generate the profiles UML. Victoria Nebot, Rafael
Berlanga (2011) provided an efficient analysis and explored large amount
of semantic data. It is performed by the combination of inference power
present in annotation semantics with the analysis facility that are offered by
OLAP-style aggregations, navigation and reporting.
The author provided an explanation about the way the semantic data
must be organized in a well-defined conceptual Multi Dimensional (MD)
plan. So, the complicated queries can be declared and calculated in a
better way. The dimension hierarchies are to be developed and hence the
local method has to be extended, which is a drawback in the method.
Hence, the dimension values can be arranged into a defined number of
levels or categories. There is the presence of performance issue in the
proposed method.
DaWaII (Data Warehouse Integration) is a tool that supports the
activities involved in the integration of multi dimensional databases.
Combining separately developed data warehouses is basically a great
52
problem. The tool supports testing the validity of a matching heterogeneous
dimensions depending on the number of desirable properties.
Actual integration is performed when two diverse approaches are
offered by the tool and it is found by Luca Cabibbo et al (2006). A scenario of
loosely coupled integration [5] is the foremost approach. In the first approach,
general information between sources are to be spotted and then it performs
drill-across queries against the original sources. The sources are merged by
which the derivation of a materialized view is built. A scenario of tightly
coupled integration is referred in which the view is spotted by the queries.
Nittaya Kerdprasop et al (2008) upgraded the LCMS functionality to
represent the accessible content with the induction ability. The induction
method depends on rough set theory. The induced rules very proposed to
be the supportive knowledge to guide the content flow planning [29]. They
are also employed as decision rules to support the content developers to
manage the contents that are delivered to individual learner.
Y.Y. Yao gave a summarization of different formulations of the
standard rough set theory. Those formulations are also approved to
expand different generalized rough set theories [30]. The theory is formed
using constructive and algebraic methods, and connections among them
are also established. In the constructive method, examination is made on
the three definitions of approximation operators and the definitions are
element based, granule based and sub-system based. The algebraic is
53
more appropriate and it is employed to generalize the theory in a unified
manner. The drawback is that, more generalizations of the theory are
present, such as probabilistic and decision theoretic rough sets, but they
are not declared by the proposed method.
Michel C. Desmarais (2006) presented such a case for an interactive
virtual garment dressing room. The application supports personalization
and user profiles and the view of a multi-site user session and it is also
spread over numerous web sites.
It is also supported with a data logging system that produces about
5GB of compound data per month. The analysis on those logs can be
performed but it needs more sophisticated processing than normally
provided for a relational language. Usage of procedural languages and
DBMS also proves to be difficult to process. Analysis of complex log data
is performed with the use of a stream processing architecture and a
specialized language denoted as a grammatical parser and a logic
programming module.
Useful information from server logs (user’s history) can be derived
by using the web usage mining technique by which the user’s expectation
from the Internet can be identified. Web pages need to collect the web
access data and so it uses such type of web mining. The web usage data
provides the details of the paths that lead to the accessed web pages (with
54
the help of preferences and higher priorities) that are identified by using
web usage data.
Those information is obtained automatically in the access logs of the
web server. Poongothai, K. and S. Sathiyabama (2012) developed an
Induction based decision rule model. It is designed to generate inferences and
implied hidden behavioral aspects in the web usage mining and it is examined
at the web server and client logs. The fast decision rule induction algorithm is
merged with a technique which alters a decision tree into a simplified rule set
and the merging is performed by decision based rule induction mining.
2.3.3 Traditional Warehouse and Up-to-date Information
The induction rules can also be employed to active data warehouse.
Active Data Warehousing is used as a substitute to traditional warehousing to
gather applications for up-to-date information. A higher consistency among
the information about the data that is stored already and the latest updates of
the data are obtained by an active warehouse which is refreshed on-line.
The implementation of data warehouse transformations faces numerous
challenges due to the need of on-line warehouse refreshment and it depends
on their execution time and overhead to the warehouse processes.
N. Polyzotis, et al (2008) focus on a regularly encountered operation, that is,
linking the fast stream of source updates along with the disk-based relation,
over a usage of limited memory.
55
The above said operation is carried out at several common
transformations like surrogate key assignment, duplicate detection or
identification of newly inserted tuples. A join algorithm named as meshjoin
(MESHJOIN) is designed to compensate the access cost of the two join
inputs. The MESHJOIN algorithm is detailed and a systematic cost model
is designed to enable MESHJOIN. MESHJOIN is enabled for two main
reasons: enlarging the throughput for a specific memory budget or
reducing memory consumption for a specific throughput.
2.4 Research Gap
In existing DW models, the analysis on functional behavior is not at
all made. Only the process of data mart is identified and processed.
No importance is given to the attribute relativity in existing models. Users
or other data interaction will be more when identifying the behavior
patterns and they may affect their choice of actions. Thus, it leads to
complicated action learning which is still a difficult problem. Regarding
security, the insider attacks, financial and business cost is however large,
and it cannot be avoided even when computer security policies are
involved.
In the existing methods the user demanded information is not
effectively retrieved. So, a new method has to be proposed to effectively
retrieve user demanded information depending on attribute relativity.
Decision support has to be carried out on functional data marts.
56
Some useful techniques are applied to fuzzy rough set and not to other
form of rough set.
Hence, those techniques are to be applied to all type of rough sets.
Whenever the sources are to be updated, some of the data become
outdated, and moreover, in some data warehouse the queries become too
slow during their performance. In some OLAP based concepts,
performance issues also arise. In some cases, generalizations are present
but they are not involved while processing.
2.5 Contributions of thesis
Data must be accessed from the data warehouse but it is a
challenging issue. It is tedious because the user must have better
understanding about the data structure that is stored in the repositories.
Hence, to solve the issue, data mart is introduced. Functional Behavior
Pattern (FBP) for Data Mart is proposed which handles effectively the
analysis of the functional behavior based on attribute relativity and it
provides a better decision support system.
The functional behavior of the system is analyzed to build data
storage repositories in accordance with data attributes using functional
behavior pattern (FBP). The user demanded data path transition is
identified in data mart using interface layers for data movement between
different functional data marts. Thus, the proposed Functional Behavior
Pattern is capable regarding the analysis of functional behavior.
57
The inductive rule mining is used to handle the user demanded
information. The decision support is carried out effectively with the help of
inductive rule mining on functional data marts segregates of the layered
data repository. The decision support system is achieved in an effective
manner to extract the required user demanded data from data warehouse
when inductive rule mining is employed. Hence, a better result is provided
by using the FBP and inductive rule mining.
58
3. PROPOSED METHODOLOGY
FUNCTIONAL BEHAVIOR PATTERN IN DATA MART
3.1 Introduction
Building a data warehouse is an extremely demanding task since it
can frequently occupy many organizational units of a corporation. A data
warehouse is a widespread query intelligent source of data for examination
purpose, which is principally used as hold up for decision processes.
Furthermore, it is a multidimensional model and is used for the storage of
historicized, cleanse and validate, synthesized, functioning, interior and
external information. Stakeholders of a data warehouse system are involved
in analyzing their business processes in a complete and flexible way.
Mostly they have a complete understanding of their commerce
processes, which they would like to discover and examine. The actually
requirement is a sight of their business processes and its data, which
allows them a widespread analysis of their data. The data warehouses are
model multidimensional, which communicate to a characteristic view of its
users. The analysis view of the business processes is very dissimilar to the
universal view even though the fundamental process is the same.
Hence, it is compulsory to bring out requirements from the
stakeholder of a data warehouse, which belongs to their examination
views. The data warehouse system is extremely needy on these
requirements. Extremely the data warehouses are building without
59
understanding appropriately these needs and requirements, and as a
result fail for that reason. During the constraint description procedure
system, analysts of the IT department or consultants work together with
stakeholder and users to explain the necessities for the data warehouse
system.
The data warehousing team receives these descriptions, but they
have repeatedly dilemma in understanding the business terminology and
find the explanation too relaxed to use for the implementation. Therefore,
the data warehousing team writes its classification specification from a
technological point of view. When the system specification is obtainable to
the users, they do not quite appreciate because it is too technical.
They are, though, forced to believe it in order to move forward.
The approach effortlessly results in a data warehouse system that
does not meet the initially defined requirements, because, often the users,
the system analysts and developers do not converse the identical
language. Such communication problems build it tricky to turn an
explanation of an analysis system into a technical specification of a data
warehouse system that all parties can understand.
In addition, because of a mechanical system requirement that is not
fully implicit by the users, a data warehouse system becomes too hard or
not practical for the future purposes. Therefore, it will not transport the
expected result to the company. In these cases, often departments will
60
develop data marts for their own purposes, which can be measured as
stovepipes and make an enterprise wide examination system impossible.
The dispute is to model a data warehouse system in a method that is
both exact and user friendly. Each symbol unfolding the analysis process
should be instinctive for the user and have distinct semantics, so that the
developers use the account as a general but precise requirement of the
data warehouse system.
3.1.1 Proposed Functional Behavior Pattern with new Technique
The growing need for huge volume of data in enterprise and
corporate environment, fuel the demand for data warehousing. Data
warehousing collects the data at different levels like departmental,
operational and functional. Data are stored as a collective data repository
with better storage efficiency. Various data warehousing models
concentrate on storing the data more efficiently and quickly. In addition
accessibility of data from the warehouse needs better understanding of the
structure in which the data layers are stored in the repository. But function
requirements of users are not easily understood by the data warehouse
model. Data warehouse model needs efficient decision support system to
extract the required user demanded data from data warehouse.
The issue of functional decision support system to extract user
relevant data is handled by introducing data marts concept. Data marts
build separate functional data repository layers based on the departmental
61
decision support needs in the enterprise and corporate data applications.
To provide a better decision support system, a Functional Layer Interfaced
Data Mart Architecture is proposed for larger corporate and enterprise data
applications.
The research work involves the analyses of functional behavior of
the corporate system based on its operational goal. The aim of functional
behavior analyses is to build layers of data storage repositories with relevant
data attributes using functional behavior pattern in data mart (FBPDM).
3.2 Process of Data Mart in Data Warehousing
A data warehouse is a database utilized for exposing and analyzing
the data in the database. The data gathered in the warehouse
are uploaded from the equipped systems such as market place. The data
exceed throughout an equipped data hoard for further operations. The data
are used prior to the Data Warehouse for reporting.
A data warehouse created from incorporated data source
systems does not need ETL (Extract, Transform and Load) in producing
databases or operational data store databases as presented by Juan Manuel
Perez et al,. Data warehouses are subdivided into data marts.
62
Figure 3.1 Data Mart in Data warehouse
Data marts hoard subsets the data from a warehouse. An overview
of data warehouse with data mart is shown in Figure 3.1. A data mart is the
entry deposit of the data warehouse. The situation of data warehouse is
utilized to obtain data. The data mart is a division of the data warehouse
which is habitually oriented to a definite business line. In some instances,
each subdivision or industry unit is measured with all the hardware,
software and data of data mart.
Data mart facilitates each subdivision to utilize, operate and extend
the data without changing information. The progress of each subdivision is
handled within other data marts or the data warehouse. The functional
63
behavior of each data mart is analyzed and used in such a way. The merits
of creating data mart is as follows:-
Data mart access regularly desired data easily
Generates combined view by a group of users
Enhances end-user response time
Lesser cost than employing a complete data warehouse
Since the data mart provided an improved end user response time,
the processing of query in data mart consumes less time to access it.
The data mart is a subset of data warehouse and the data is stored based
on its behavior and grouped under a name.
The FBPDM analyzes the attributes in the data mart and classifies the
attributes based on its functional behavior of the database. In addition
FBPDM manages the data in data mart by analyzing the functional behavior
of the attributes. An experimental evaluation is conducted with benchmark
datasets from UCI repository data sets and compared with existing multi-
functional data warehousing model in terms of number of functional data
attributes, attribute relativity and analysis of functional behavior.
3.3 Impact of Data Warehouse Requirements in Functional Behavior
Data warehouses are high-maintenance systems, since reorganizations,
creation prologue, new pricing schemes, new customers, change in
production systems and so onward are departing to involve the data
warehouse. If the data warehouse is leaving to stay present and creature
64
current is completely crucial for user acceptance, changes to the data
warehouse have to be completed promptly. Therefore, the data warehouse
system has to develop with the industry trends.
The expansion team of a data warehouse system frequently
struggles with this evolutionary performance, because, requirements are
forever changing. For an improved organize of problem, an organization
has to classify the steps that the data warehouse team will complete to
supervise their necessities. Documenting these steps facilitate the
members of the organization to carry out necessary project activities every
time and successfully.
Data warehouse system requirements in wide area of business
decide the functional behavior and available data. Data warehouse
determines the accessibility of data, transformation, organization, data
aggregation and finally calculation. The data warehouse requirements are
capable of establishing the direction in communication with the
stakeholders, increasing the expectation level of the entrepreneur goal.
Data mart design process is completely based on a deep identification of
the data warehouse requirements. The data warehouse requirements are
the backbone for all forthcoming project activities. Business specification
has a major impact on the success of the data warehouse project.
The business of the stakeholders is enriched with the enhancement of
data warehouse system as per their expectation. The base of data
65
warehouse requirements is to fulfill the expectation of entrepreneur according
to enterprise goal. The development team of the data warehouse expects
system to build an accurate, absolute and clear specification way.
The data warehouse requirements are the establishment of all
potential project activities and have a main impact on the achievement of
the data warehouse project. Studies have shown that all data warehouse
projects not at all extend, and not succeed in meeting commerce
objectives. On an average, data warehouses habitually fail as a
consequence of unfortunate communication between IT and business
professionals, as well as developers who have poor project organization
skills and procedures. In order to reach a winning data warehousing
implementation, an immense arrangement of requirements engineering
effort and planning is required.
The enterprise requirements include the following levels as shown in
Figure 3.2.
The figure 3.2 given below illustrates the abstraction level of data
warehouse specification. Requirements from enterprise aspect represent
high-level goal of the administration for the data warehouse system.
Generally, business requirements are stored in a document illustrating
project scope and vision.
66
Figure 3.2 Abstraction Levels of Data Warehouse Requirements
Accurate system construction needs a further clarification of the
enterprise requirements from the stakeholders. Finally, the data warehouse
team requires a complete change of enterprise requirements into
definite, testable and accurate specification. The transformation of
67
business specification results in various levels of abstraction. The other
perspective of the data warehouse system is exposed in every abstraction
level with its own collection of stakeholders.
Requirements of an enterprise-wide data warehouse system decide
its practical behavior and its obtainable information. For example, what
data must be available, how it is distorted and prearranged, as well as how
it is aggregate or planned. The requirements allow the stakeholders to
commune the purpose, create the direction and set the potential of
information goals for the enterprise.
Stakeholders often articulate their needs in universal prospect of the
data warehouse system to improve their business. This business view
describes the goals and anticipation of stakeholders, which is the
establishment of the data warehouse requirements. On the other hand, the
expansion team of a data warehouse system expects an absolute, exact
and unequivocal arrangement of the system it has to build, which means
an additional modification of the business supplies from the stakeholders.
Therefore, it is essential to convert the business requirements to a
thorough, testable and whole requirement for the data warehouse team.
The general advantage of the data warehouse system is identified
with the requirements of the enterprise and entrepreneur. The business
requirements hold the higher level in the abstraction level of the
requirement chain. Business requirements convey business goal, business
68
probability and so on. The abstraction level also describes the user
requirements involving functional, information and other requirements in
the requirement chain.
i. Enterprise Requirements
The top level priority of the requirement chain is given to enterprise
requirements of the data warehouse system. Business requirements
illustrate the effective mission of the administration for the data warehouse
system. The document stores the requirements explaining the reach of the
project. Generally the enterprise requirements are depicted in a system
context diagram enhancing the service of data warehouse system.
The enterprise specifications recognize the general advantage of the data
warehouse system based on the administration and users expectation.
Enterprise requirements expose business mission, business probabilities
and so on extending to customer requirements.
ii. Customer Requirements
Customer requirements define the success of customer on utilizing
the data warehouse system. Customer requirements are generally
collected from the customers and workers operating with the data
warehouse system. The customer represents both the working of the data
warehouse system and typical features non-functional.
The features of the non-functional are essential in improving data
warehouse system working. The concepts and scope established in
69
enterprise requirements are organized by the customer requirements.
Customer requirements are represented in terms of use cases, test cases,
user profiles and so on describing the work of customer involved in data
warehouse system. The task of customer requirements is more effective than
the traditional requirements in extraction of users needs for system working.
iii. System Requirements
System requirements correspond to the data warehouse system
requirements on an extremely detailed level. The system requirements
provide detailed elevated level absolute, fine grained specification of the
requirement as an input for the development team. System requirements
support the customer requirements and enterprise requirements enclosing
a robust condition for requirement verification. System requirements
involve functional, information and other requirements.
Functional requirements identify the functionality developed in the data
warehouse system by the development team, facilitating the customer to
succeed the task. The functionality defined completely satisfies the
enterprise requirements. Functional requirements detain the future
behavior of the data warehouse system. The planned behavior is indicated
in terms of services, responsibilities or functions the system is required to
perform. Functional requirements describe the functionality of the system
analyzed based on the users' requirements.
70
Information requirements illustrate the information requirements of the
business. Information requirement represent the data access and
information delivered to the data warehouse. Information requirement
states the quality, arrival, and process of data. In addition, the information
requirements specify the data combination needed for analysis process
and the analysis method used in data processing.
Other requirements, in addition to functional and information
requirements, are defined to illustrate extra relevant effects of the data
warehouse system, like interface requirements or environmental
requirements such as cultural, political and legal.
iv. Requirement Attributes
The functional, information or other requirements enhance the
attributes by relating the uniqueness in different dimensions that are
significant to customers or to the data warehouse team. The properties and
behavior of the data warehouse system represents the requirement
attributes. Requirement attributes include principles, policy and
circumstances to which the data warehouse system have to obey the rules.
Data warehouse system concerns external interfaces, performance
requirements, design, implementation constraints and quality attributes
specifying the requirement attributes. Requirement attributes are generally
emotionally involved to the detailed system service requirements. With the
known functional specification, requirement attributes determine the
71
behavior and quality of data warehouse system. The attributes like data
excellence and granularity are determined with the known information
requirements.
v. Traditional Requirement
A traditional software requirement is differentiate between two types
of requirements:
1) Functional requirements, and
2) Non-functional requirements.
The traditional requirement differentiation in specification is useful in
data warehouse system. But the data and information delivered is the main
feature of data warehouse system. Information requirements are incorporated
in traditional software requirements, i.e., in the functional and non-functional
requirements. The data warehouse is built for the purpose of delivering
data, fulfilling customer requirements and data quality. Besides, the metrics
like data transmission and data granularity are highly important for the data
warehouse design and more complete than the requirements in traditional
software systems. Therefore, traditional software requirements types are
unsuitable for data warehouse projects.
The characteristic of data warehouse system is information centric.
To facilitate the needs and functional behavior pattern of data warehouse
system, the functional requirements and information requirements are
distinguished explicitly. Generally, just information requirements are
72
appropriate for the data mart of a data warehouse and are less efficient in
building data warehouse. The partition of functional and information
requirements is very beneficial for the data mart of a data warehouse.
A collection of all the necessary customer requirements are gathered for
designing data mart. From the collective requirements, the FBPDM selects
the relevant attribute in data mart termed as attribute relativity.
3.4 An Efficient Functional Behavior Pattern of Data Mart based on
Attribute Relativity
The functional behavior pattern of data mart is efficiently designed
for building the data mart components. The data mart is built with a
well-known attributes behavior of the data stored in the data warehouse.
A data warehouse system maintains the data in an arbitrary manner.
In addition, data warehouse handles the data mart based on the functional
behavior of the data in the data warehouse environment.
The functional behavior of the data is identified based on selecting
the relevant attributes of the system. The architecture diagram of the
functional behavior pattern in data mart based on attribute relativity is
shown in the Figure 3.3.
73
Figure 3.3 Architecture diagram of FBPDM
The Figure 3.3 illustrates the architecture diagram of the functional
behavior pattern in data mart based on attribute relativity.
3.4.1 Process of Data Mart
A data mart is a straightforward structure of a data warehouse
system attended on a distinct subject or functional area such as Finance,
Sales or Marketing. Data marts are frequently build by a particular division
of the organization. The enterprise specifies the single-subject hub and
data marts normally build data from only a small amount of sources.
Data Warehouse
Data Marts DM1 DM3
Identify the relevant attributes of each data mart
Build data storage repositories
Analyze and classify the functional behavior of
each data mart
DM2
74
The sources are inner outfitted systems, a central data warehouse or
external data. The major steps involved in implementing a data mart are :-
i. To design the schema,
ii. Construct the physical storage,
iii. Populate the data mart with data from source systems,
iv. Access it to make informed decisions, and
v. Manage it over time.
The overall implementation of data mart is explained with architecture
diagram of data mart in detailed process and is shown in below Figure 3.4.
Figure 3.4 Process of Data Mart
Data warehouse
Collect the necessary requirements of data
Construct physical and logical structure of
schema
Populate the logical structure of data
Access and manage the information
75
A collection of all the necessary customer requirements are gathered for
designing data mart. In addition, the data source of the collected data is
identified to enhance the data mart design. After identifying the sources of data,
an appropriate subset of data is selected. Simultaneously, the physical and
logical structure of data mart is analyzed and designed for better data
accessing. The physical and the logical construction in connection to the data
mart support rapid and proficient access to the data in data mart.
A construction of physical storage is done by identifying schema
objects, such as tables and indexes established in the data mart design.
Data marts are usually smaller and focus on a particular attribute. It is
taken from 6 attributes and these attributes are numerical based
attributes. The precise information is extracted from the customer
subtype attribute. It is basically more data mart. The physical storage is
also build by determining the design structures of the data stored in the
data mart. The populating data mart design handles the tasks of receiving
the data from the resource, transforming it to the exact lay out and loading
into the data mart. The populating step involved in the design of data mart
is defined as follows:
1. Planning data sources to objective data structures,
2. Retrieving data,
3. Transforming the data,
4. Loading data into the data mart, and
76
5. Producing and hoarding metadata.
The data in the data mart is accessed by querying. The retrieved
data is examined and presented in forms of reports, charts and graphs.
Normally, the end users utilize a graphical front-end device to propose
queries to the database and in displaying the outcomes of the queries.
The following step involves the task of accessing data:
1. Set up a transitional deposit for the front-end tool to utilize. The
transitional deposit act as interface describing database structures
and object names into business terms, and
2. Preserve and handle business interfaces.
The management process of the data mart is performed using the
following tasks:
1. Giving protected access to the data,
2. Organizing the development of the data and Optimizing the system
for enhanced performance, and
3. Ensuring the accessibility of data even after system failures.
Finally, the data mart is designed efficiently with data source of user
requirements, rapid data retrieval on specifying attributes, data
transformation and fast loading of data into data mart. The data mart is
efficiently analyzed and processed over different steps in order to obtain a
data with a specified list of attributes.
77
3.4.2 Efficient Analysis for Identifying the functional behavior pattern
The information in the data mart is managed with relevant selection of
attributes for each data mart. The operational objective of the data mart is
identified based on the selection of relevant attributes. Depending upon the
operational objective, the functional behavior of the data mart is analyzed and
stored as data repositories in the layer. The procedure below describes the
analysis of the functional behavior pattern of the data mart (FBPDM).
Given: Data warehouse (DW)
Step1: Sub-divide the DW into a set of Data marts (DM)
Step2: Let the data mart be sales
Step3: DW is formed using a set of schemas
Step4: Designing data mart [described briefly in section 3.4.1]
Step5: After the formation of data mart (sales details)
Step6: Identify the relevant attributes of data mart such as product details,
price, etc.
Step7: Store the relevant attributes of particular data mart in a data repository
Step8: Based on relevant attributes
Step9: Identify the functional behavior of the data
Step10: Based on functional behavior of data
Step11: Classify the data mart
Step12: End
78
Assume the data mart be sales. The main attributes of sales are
normally being sales details, i.e., product name, product ID, product
expiry date, manufactured date, product description, price and so on.
Identify the relevant attributes of sales, i.e., product description,
ID and price. Based on the relevant attributes selection, the functional
behavior is analyzed and stored in data repository for further analysis
of requirements. The functional behavior of the data mart is efficiently
analyzed and the relevant attributes are also taken from the data
warehouse. The analysis of functional behavior pattern is done through the
selection of attributes which are closely related to the data description.
3.5. Experimental Evaluation
The functional behavior pattern is efficiently analyzed for data mart
based on its attribute relativity. The experiments run on an Intel P-IV machine
with 2 GB memory and 3 GHz dual processor CPU. An experimental study to
estimate the effectiveness and performance of the functional behavior pattern
in data mart based on attribute relations. The effectiveness of the functional
behavior pattern in data mart is estimated on benchmark datasets from UCI
repository with varying characteristics.
The attributes are chosen depending upon the company relativity.
The operational objective of the administration is efficiently analyzed and
the functional behavior is identified based on relevant attributes.
79
The performance of the functional behavior pattern in data mart (FBPDM)
based on attribute relativity is measured in terms of
i) Data Retrieval,
ii) Attribute relativity,
iii) Functional Behavior Analysis,
iv) Data Storage Repositories,
v) Data Mart Management.
Data retrieval involves the task of fast data access from the data
mart. Attribute relativity indicates the relevant selection of attributes from
the designed data mart. The relevant attribute is primarily chosen
depending on needs of the organization. Attribute relativity specifies the
rate of attributes present in the data mart that are more closely related to
the operational goal of organization or data mart.
The functional behavior of data mart is easily analyzed based on
attribute relativity. Functional behavior analysis requires functional data
attributes for analyzing. Functional data attributes are attributes allotted
with a function as a substitute of a value. The function is identified at the
time of desired attribute value. Data storage repository is measured in
terms of memory occupied for storing information of the organization.
The management of data mart is organized based on the data protection,
organization and ensuring trusted access even after the failure of system.
80
3.6. Results and Discussion
The functional behavior pattern in data mart (FBPDM) based on
attribute relativity is effective compared to an existing multi-functional data
warehousing model (MFDWM). The performance of functional behavior
pattern is measured in terms of Data Retrieval, Attribute relativity,
Functional Behavior Analysis, Data Storage Repositories and Management
of the data mart by maintaining the attributes in the data mart in a
successful manner.
The functional behavioral pattern of the data mart or organization is
efficiently analyzed and viewed with relevant attribute of the data mart.
The experiments are conducted with benchmark data sets to estimate the
performance. The table and graph below describe the performance of the
function behavior pattern in data mart based on attribute relativity in
comparison to multi-functional data warehousing model proposed by
Neoklis Polyzotis et al.
3.6.1 Data Retrieval The extraction of the functional data attributes from the data mart
enhances the speed of data retrieval using functional behavior patterns.
The populating design of data mart facilitates the rapid retrieval of data
from the data mart. Fast retrieval of information supports the functional
behavior pattern in selecting the best relevant attributes and the functional
data attributes.
81
Six relevant attributes are tested from this data mart process. These
attributes provide genuine information whereas the other attributes are eliminated because they are available as discrete values. Decision Rule Induction algorithm are used.
The performance efficiency of high-speed data retrieval in FBPDM is
proved with the table and graph as follow:-
Table 3.1 Relevant Attribute Vs. Data Retrieval
Relevant Attributes Data Retrieval (secs)
MFDWM FBPDM 1 7 2 2 12 5 3 16 8 4 19 10 5 21 11 6 25 13 7 27 15
The table 3.1 describes the retrieval of data from data mart for
analyzing the functional behavior. The outcome of the FBPDM is compared
with a multi-functional model in data warehouse.
The data mart 1,2,3 refers to the number of data mart created
from single dataset. It refers to the size and attributes length created
from data mart. Five data mart are created from a single dataset. Data
retrieval based on relevant attribute is measured in terms of seconds.
82
Figure 3.5 Relevant Attributes Vs. Data Retrieval
The graph in Figure 3.5 describes the process of retrieving data from
data mart with relevant attributes. Based on the number of attribute
relativity, the rapid data retrieval from data mart is high by about
10-20% in the FBPDM in contrast to multi-functional model in data
warehouse. The physical and logical structure of data mart is designed for
fast data accessing and retrieving. The physical and the logical
construction in connection to the data mart support rapid and proficient
retrieval of data from data mart. Attribute relativity percentage
calculated as under:
83
3.6.2 Attribute Relativity
The Functional Behavior Pattern identifies the relevant attributes
based on the operational goal of the data mart. The functional data
attributes are analyzed efficiently based on relevant attributes of the data
repositories. The attribute identification of the data mart decides the
attribute relativity. The relevant attribute is selected from the collection of
functional attribute, information attribute and other attributes. The relevant
attribute is decided, based on the importance of the individual attribute
properties.
Multi-functional data model works in recognizing the process of data
mart. But the FBPDM efficiently identifies the operational goal of the data
mart by selecting the relevant purposeful data attributes. The attribute
relativity is based on the number of attributes which are closely related to
the operational objective of the data mart.
Table 3.2 Data Mart Vs. Attribute Relativity (%)
Data Mart Attribute Relativity (%)
MFDWM FBPDM 1 5 10 2 12 13 3 10 18 4 15 20 5 18 23 6 20 28 7 26 36
84
The table 3.2 illustrates the attribute relativity of functional data
attributes present in the data mart for analyzing the functional behavior.
The outcome of the FBPDM is compared with multi-functional model in
data warehouse. The performance of FBPDM is evaluated in terms of
attribute relativity. Attribute Relativity is measured in terms of percentage.
Based on the table 3.2, a graph is depicted as follows:-
Figure 3.6 Data Mart Vs. Attribute Relativity (%)
The graph in Figure 3.6 describes the attribute relativity of the
functional data attributes present in the data mart. Based on the number of
attributes available in the data marts, the attribute relativity is selected from
the data mart. A high attribute relativity of about 5-10% is seen in the
FBPDM compared to multi-functional model in data warehousing.
A physical storage is built by determining the design structures of the data
stored in the data mart, supporting the selection of relevant attributes.
85
3.6.3 Functional Behavior Analysis
The operational objective decides the functional behavior pattern of
the data mart. The operational objective of the data mart is identified,
based on the selection of relevant attributes. The information in the data
mart is managed with relevant selection of attributes for each data mart. An
analysis on functional behavior of the data mart is determined to prove the
better performance of FBPDM. The multi-functional data model fails in
analyzing the functional behavior of data mart. But the FBPDM works well
in identifying functional behavior pattern in data mart. As a proof to verify
the efficient working of FBPDM, a table and graph are illustrated below.
Table 3.3 No. of Data Marts Vs. Functional Behavior Analysis
No. of Data Marts Functional Behavior Analysis (%) MFBDW FBPDM
1 24 45 2 33 51 3 38 59 4 46 63 5 51 76 6 62 87 7 69 99
The table 3.3 shows the functional behavioral analysis of the data mart
and the outcome of the Functional behavior pattern in data mart are compared
with multi-functional model in data warehouse. Functional Behavior Analysis
based on the data mart is measured in terms of the percentage.
86
Figure 3.7 No. of Data Marts Vs. Functional Behavior Analysis
The graph in Figure 3.7 describes the efficient analysis of functional
behavior of the data mart. Based on the number of data marts, the
analysis of functional behavior in the data mart is high by about 20-30%
in the FBPDM compared to multi-functional model in DW. The attribute
relativity supports the determination of operational goal enhancing the
functional behavior analysis of the data mart.
3.6.4 Data Storage Repositories
The functional behavior aims in building layers of data storage
repositories with relevant data attributes using functional behavior pattern
in data mart (FBPDM). The data storage repositories efficiency is decided
based on the memory occupied by the data. The allocation of data onto the
memory location is decided based on the data capacity of the organization.
87
The functional behavior is stored in data repositories layer. Data storage
repository is inversely proportional to data accessibility.
Table 3.4 No. of Data Vs. Data Storage Repositories
Number of Data Data Storage Repositories (MB) MFBDW FBPDM
100 16 9 200 21 12 300 29 23 400 34 28 500 45 36 600 56 41 700 72 53
The tabulation in table 3.4 shows the number of data versus data
storage repositories. Data Storage Repositories is measured in terms of
percentage. Based on the table 3.4 a graph is depicted as follows:-
Figure 3.8 No. of Data Vs. Data Storage Repositories
88
The graph in Figure 3.8 describes the data storage repositories in
efficient analysis of functional behavior of the data mart. The data storage
is reduced in FBPDM increasing data accessibility by about 10-20%
compared to MFDWM. The interface transitional deposit for the front-end
tool is designed to utilize database structures in data storage.
3.6.5 Response Time
Time required to access the information in the data is denoted as
response time. Response time is measure in seconds. Response time is
measured in terms of the milliseconds (ms) based on the data accessibility.
The performance of functional behavior pattern in data mart is depicted in
comparison to multi functional model. A table and graph are depicted
below to prove FBPDM efficiency.
Table 3.5 Data Accessibility Vs. Response Time
Data Accessibility Response Time (ms)
MFBDW FBPDM 10 0.76 0.42 20 0.97 0.71 30 1.61 1.01 40 2.42 1.32 50 3.04 2.12 60 3.25 2.41 70 3.76 2.83
The tabulation in table 3.5 illustrates the data accessibility versus
response time. Based on the table 3.5, a graph is depicted as follows:-
89
Figure 3.9 Data Accessibility Vs. Response Time
The graph in Figure 3.9 shows the time required to access the
information in the data, i.e., response time. The response time is less in
FBPDM increasing data accessibility by about 10-20% compared to
MFBDW. The business interface called transitional deposit for the
front-end tool is preserved and handled reducing data accessibility time.
3.6.6 Data Mart Management
The data mart process involves trusted data access, data
organization, optimizing system and promised data accessibility even after
system failure. The performance of data mart management is determined
in comparison to multi-functional model. A table and graph are depicted
below to prove FBPDM efficiency.
90
Table 3.6 No. of Data Mart Vs. Data Mart Management
No. of Data Mart Data Mart Management (%)
MFBDW FBPDM 10 54 76 20 57 79 30 60 81 40 63 83 50 67 86 60 69 89 70 72 93
The tabulation in table 3.6 illustrates the data mart management.
The Data Mart Management is measured in terms of percentage. Based
on the table 3.6, a graph is depicted as follows:-
Figure 3.10 No. of Data Mart Vs. Data Mart Management
91
The graph in Figure 3.10 describes the data mart management with
the given number of data mart. The greater data mart management is
provided in FBPDM by about 20-25% compared to MFBDW. The data
mart is designed efficiently with data source of user requirements, rapid
data retrieval on specifying attributes, data transformation and fast loading
of data into data mart.
Finally, it is being observed that the Functional Behavior pattern
technique efficiently analyzed the functional behavior of the data mart
based on the selection of relevant attributes of the data mart by building
data storage repository layers.
3.7. Summary
Data are stored as a collective data repository with better storage efficiency. Functional behavior pattern in data mart (FBPDM) achieved the analysis of functional behavioral of the data mart based on attribute relativity. The data mart is designed efficiently with data source of user requirements, rapid data retrieval on specifying attributes, data transformation and fast loading of data into data mart.
In existing multi-functional models in DW, the process of data mart is
identified and processed but fails in the analysis of functional behavior. The
issues raised over existing multi-functional models are efficiently handled
by the proposed Functional behavior pattern in data mart based on
attribute relativity (FBPDM).
92
The Functional behavior initially identifies the relevant attributes of
the particular data mart. The attribute relativity is selected specifying the
detailed description of the operational goal of the data mart. The Functional
behavior pattern in data mart achieved the analysis of functional behavioral
of the data mart based on attribute relativity. The data mart is designed
efficiently with data source of user requirements, rapid data retrieval on
specifying attributes, data transformation and fast loading of data into data
mart.
An efficiently developed Functional behavior pattern in data mart
using attribute relativity performed the functional analysis. The experimental
results showed that the functional behavioral pattern for data mart is
analyzed efficiently by using the metrics of data retrieval, attribute relativity,
functional behavior analysis, data storage repositories and management of
the data mart by maintaining the attributes in the data mart successfully.
Publication
[1] “Functional Behavior Pattern For Data Mart Based on Attribute
Relativity”, in the International Journal of Computer Science Issues on
Volume 9, Issue 4, No.1, July 2012, pp. 278-283
93
4. AN EFFICIENT INDUCTIVE RULE MINING NEW
TECHNIQUE IN EXTRACTING USER DEMANDED DATA
FROM DATA MART
4.1 Introduction
Data warehouse is a database for reporting and analyzing the data
stored in the repositories. A data mart acts as the accessing form of the
data warehouse. Data mart obtains the data out from the data warehouse
to the users. Accessing of data in the data warehouse is a challenging
approach as the user requires better understanding of the data structure
stored in the repositories. Data mart is introduced to enhance data
accessing in better perceptive. Data marts built separate functional data
repository layers based on the requirements and needs of the corporate
data applications. The data warehouse models fails in better understanding of
the functional requirements.
Data warehouse model needs efficient decision support system to
extract the required user demanded data from data warehouse. Existing
functional behavior pattern works in only identifying the functional activities
of the data mart based on attribute relativity. Functional behavior pattern
does not extract the user demanded information from the repositories.
Inductive rule mining (IRM) is proposed to improve the decision support
system to extract the required user demanded data from data mart.
94
The decision support with inductive rule mining on functional data
marts segregates the layered data repository and extracts the required
information for the user. The induced rules are proposed to be the
supportive knowledge for identifying the user needed information.
An experimental evaluation is conducted with benchmark datasets from
UCI repository data sets. The inductive rule mining is compared with
existing functional behavior pattern for data mart based on attribute
relativity in terms of number of decision rules, extracted data relativity,
analysis of functional behavior.
4.2 Rule Induction for Supporting Decision Making
Inductive rule mining (IRM) is proposed to improve the decision
support system by extracting the required user demanded data from data mart. The decision support with inductive rule mining on functional data marts segregates the layered data repository and extracts the required information for the user. The induced rules are proposed to be the supportive knowledge for identifying the user needed information.
As the request from the users in distributed system increases, IRM technique execution time decreases gradually.
Recently, considerable attention is paid to utilize the machine mining
techniques as tools for decision support. The inductive rule mining with
decision making methods is applied to a wide variety of problems in data
warehouse because of their ability to discover user demanded information
95
from data mart. The addition of inductive rule mining methods with
conventional decision support systems provides a means for extensively
improving the quality of decision making.
A decision support system employs induction rule mining techniques
to derive knowledge directly from data mart and to filter the knowledge
continually. Inductive mining is perhaps the most widely used machine
mining technique. Inductive mining algorithms are simple and fast. Another
advantage is that inductive mining generate models that are easy to
understand. Finally, inductive mining algorithms are more accurate
compared with other machine mining techniques.
4.2.1 Decision Making on Inductive Rule Mining
The term Decision Support (DS) is used often and in a
combination of frameworks associated with decision making. Recently
decision support system is frequently declared in relation with Data
Warehouses (DW) and On-Line Analytical Processing (OLAP). Another
present trend is to connect decision support with data mining. A data
warehouse sustains a copy of data from the resource operation systems.
The architectural complication presents the chance to continue data
account, incorporate data from various source systems, allowing an inner
analysis across the enterprise. The data quality in data warehouse is
normally improved, by presenting reliable codes and metaphors,
weakening bad data fitting. The data warehouse presents the association
96
information constantly and offers a distinct general data model for all data
of interest apart from the data source.
Decision support usually engages the combination of data and
knowledge organization to support human on creating efficient alternatives.
The online route framework satisfies scalable delivery to well distinctive
individual decision making. Decision making is on the source of stable
varying requirements to involve a rapid reaction. Conventional natural
methods of decision making are no longer sufficient to contract with such
obscured situation. The role of Decision Support in Decision Making
process is shown in Figure 4.1.
Figure 4.1 Roles of DS in Decision Making Process
Decision Making
Decision Systems Decision Sciences
Normative Descriptive Decision Support
97
A data mart is a decision support system in a separation of the
endeavors data paying attention to exact functions or behaviors of the
enterprise. Data marts is specific trade values such as determining the
collision of advertising promotions, or determining and forecasting sales
presentation, or determining the collision of new product prologues on
company profits, or determining and forecasting the presentation of a novel
business division. Data Marts are precise trade software applications.
A data warehouse is different from a data mart, bonds with numerous subject
areas. Data marts are naturally executed and controlled by an essential
organizational unit such as the corporate Information Technology (IT) group.
To improve the data extraction process in data mart, the work
includes inductive rule mining algorithm. The inductive rule mining is an
extensive approach used in the data mart for an extraction of information
from the data storage repositories by forming a set of decisive rules.
4.2.2 Rule Induction for Supporting Decision Making
Rule induction is a part of machine mining involving proper rules to
extract data from a set of observations or data warehouse. The extracted
rules represent a complete model of the data in the form of a rule set or
symbolize local patterns in the data in the form of individual rules. The
general form of each rule is an if-then rule:
IF Conditions THEN Conclusion
98
Conditions contain one or more conjunction of attribute tests. The
test attributes include features of the form as follows:-
i. Attribute is equal to possible value for categorical attributes. (Aj = vj)
ii. Attribute is less than threshold value for numeric attribute. (Aj < v)
The threshold value does not need to correspond to a value of the
attribute. The form of the Conclusion part of the rule depends on the type
of the rule. In the supervised rule mining setting, rules are induced from
dataset. Rules learned in an organized manner are typically used for rule
mining. The rule mining task is defined as follows: Given a set of training
samples, discover a set of rules that is used for decision making on new
instances. Classification rules inductions are of the form:
IF Conditions THEN Class = Target Variable.
In the above case, the Conclusion consists of the target variable
associated with one of the sample dataset. Induction rules are not
individual rules, but rather parts of models or rule sets that work together to
filter new instances. Rule sets are ordered in the if-then-else form or
unordered where each rule votes for a dataset.
In the independently rule mining setting, rules are induced from
unknown data. The goal of freely rule mining is to determine interesting
relations between variables or attributes. Association rule inductions are of
the form
IF Conditions THEN Conclusions,
99
where both the Conditions and the Conclusions are conjunctions of
attribute values or items, depending on the data format. Each rule is an
individual local pattern in the data, not related to other rules. Compared to
classification rule induction mining the approaches used in association rule
induction mining are usually complete and therefore promise optimality of
results in terms of support and confidence.
Traditional association rule induction mining constructs plenty of
redundant rules, which is due to their individuality. The redundancy is avoided
based on the closed frequent item set. The number of non- redundant rules is
substantially smaller than the rule set from the traditional approach, as closed
sets provide compacted data representations.
Assume a set of data, with well defined attributes and their values
and a selected nominal attribute called the target attribute. The goal of
inductive rule mining is to induce rules in the dataset and to explain the
relation between the target attribute and the other attributes in the data
deriving user understandable data. In other words, rule induction is a
process of inducing a set of understandable rules in the dataset form from
described data. The induction rule format restricts the conditions part of the
rules to be in the target attribute and also the values generated in the
conclusion part of the rules.
The purpose of rule induction is to extract the user demanded data.
The idea of rule induction is to look through the rules to increase
100
understanding in the data field. The final objective of rule induction is to
permit the user to recognize the development of essential data.
Differentiating classes of data characterizes the database. Efficient
decision rule induction algorithms emerges data in the data warehouse. The
purpose of emerging data is to capture the data in time. Subsequent emerging
data research has largely focused on the use of the discovered data from data
warehouse. In a correct point of view, emerging data are association rules
induction with an itemset in rule predecessor, and a fixed resultant: Itemset!
D1, for given data set D1 being compared to another data set D2.
The measure of quality of emerging data is the development rate.
Development rate provides a matching ordering to confidence in the
association rule induction. Rule induction mining represents emerging data.
The IRM approach is to extract data in data mart using rule induction mining.
The method is based on table structure from which the data are extracted.
The data extraction depends upon decision support system. Decision making
support system increases significantly from one data set to another.
The measure of quality of data extraction is the development rate.
The rule induction with an itemset in rule predecessor, and a fixed
resultant: ItemSet→D1, for given data set D1 being compared to another
data set D2. The rule induction with an itemset in rule predecessor, and a
fixed resultant: ItemSet→D1, for given data set D1 being compared to
another data set D2. Thus the development rate provides a matching
101
ordering to confidence. Jumping data are extracted with support zero in
one data set and greater than zero in the other data set. A sample training
data set is given below to elaborate the data extraction according to user
demands using induction rule mining.
In the above training sample dataset, the attributes are car, color
and possible value exists according to attribute. The decision making lies if
the attribute car is swift and color is silver. The induction rule mining
involves the decision making on the basis of attribute value selection.
For instance, the car model is Innova and color is changed to white the
approval to data extraction is negotiated.
Induction rule mining argues that extracting all the data above a
minimum development rate constraint generates analyzed data by domain
experts. IRM works on selecting the user required data from data
warehouse. Induction rule mining introduces a query based approach to
extract data in microarray data. The method is based on growing decision
making from which the data are extracted. IRM combines data search with
a statistical procedure based on exact test to assess the significance of
Car =Swift AND Color = Silver → Approved = yes
Car = Swift AND color =White → Approved = no
Car = Innova AND color = White → Approved = no
Car = Swift AND color = White → Approved = no
102
each emerging data. After, test mining based on the data inferred,
extraction is performed using maximum-likelihood linear distinct analysis.
The induction rule mining involves the identification of each sub-group
for user demanded data extraction. The job of sub-group detection is defined
as property of individual data identification in the sub-group. The sub-group of
the data is identified in the inductive rule mining which provides easy
extraction of data in data mart. Sub-group characterization is combinations of
features with a selected data of individuals. A sub-group characterization is
seen as the condition part of a rule Sub-group characterization class.
The sub-group characterization is noticed as a unique case of more
common rule induction mining task. Sub-group identification research has
evolved in several directions. On the one hand, in-depth approaches
guarantee the optimal solution in data extraction given the optimization
criterion or queries. The techniques for induction rule mining together with
constraints appropriate is utilized in data extraction.
Relational sub-group characterization in induction rule mining is
designed for spatial data mining in relational spatial databases. The approach
to relational sub-group identification is attained through properly adapting
induction rule mining and first-order feature construction. Other non-relational
sub-group identification algorithms are developed including an algorithm
for exploiting background knowledge in sub-group identification. In addition
103
a fuzzy system proposed in Suyun Zhao et al, is utilized for handling
sub-group characterization tasks.
The attraction of a subgroup depends on its unnaturalness and size.
The rule quality evaluation on data mart needs to combine both
unnaturalness and size factors. Weighted relative accuracy is used by
decision rule induction algorithms in a different formulation and in different
variants as per presented by Yuhua Qian et al.
In addition, the generalization quotient is also used by the decision
rule induction algorithm. Sub-group data miner uses the classical binominal
test to verify if the target data extracted is significantly different in a sub-
group as compared to the whole data warehouse. Different techniques are
used for eliminating redundant sub-groups. Algorithms decision rule mining
is utilized to achieve rule diversity. The data extraction technique with
induction rule mining coupled with constraints appropriate for descriptive
rules.
Car = Swift → Approved = yes
Color = Silver → Approved = yes
Car = Innova AND color = White → Approved = no
Company = Hyundai → Approved = yes
Car = Swift AND color = silver AND Company = Hyundai→
Approved = yes
104
In the above case, the attributes are car, color and possible value exists
according to attribute. The decision making lies if the attribute car is swift and
color is silver. The induction rule mining involves the decision making on the
basis of attribute value selection. For instance, the car model is innovative and
color is changed to white the approval to data extraction is negotiated.
The rule induction process is conceived as a search process. A metric
is needed to estimate the quality of rules found in the data warehouse and
to direct the search towards the best rule. Man Lung Yiu et al, proposed
metrics regarding rule with less efficient processing. The rule quality
evaluation is a key element in rule induction. In real-world applications, a
typical objective of a mining system is to handle induction rules. Induction
rules make perfect rule quality measure that accepts both training accuracy
and rule coverage into description so that the rules inductions are both
accurate and reliable. A quality measure is estimated from the available
data. All common measures are based on the number of positive and
negative instances covered by a rule. The decision making system based
on induction rule mining is elaborated in the following Figure 4.2
105
Figure 4.2 Decisions Making based on Inductive Rule Mining
The Figure 4.2 illustrates the decision making, based on the
inductive rule mining approach. Each data mart is generated from the data
warehouse segregating into individual sections. The data mart is constructed
based on the physical and logical structure. The populating data mart
design handles the tasks of receiving the data from the resource,
transforming it to the exact layout and loading into the data mart. The
attribute relativity is identified with relevant attribute selection of each data
106
mart. The functional behavior pattern of the data mart is identified based
on the attribute relativity.
4.3. Efficient Analysis to Extract User Demanded Data in Data Mart
Using Inductive Rule Mining
The work is efficiently designed for extracting the user demanded
information from the data storage repositories using inductive rule mining
(IRM). Extracting user needed information using IRM follows two set of
operations. The first operation describes the process of decision making
system and the second operation is to describe the decision inductive rule
mining for extracting the information based on inductive rules derived from
the large set of repositories. The process of the complete IRM is shown in
Figure 4.3.
107
Figure 4.3 Architecture diagram of the IRM
The figure 4.3 illustrates the architecture of the Induction Rule
Mining. The data mart is a subset of data warehouse. The data mart
consists of a set of information based on the operational objective of the
attributes present in the traditional organization. After organizing the data
mart, the inductive rule mining is applied to derive a set of rules based on
the functional attributes of the present data. The IRM presents a set of
rules and easily makes the decision process to extract user demanded set
of information. The formation of decision rules with inductive rule mining
process is computed, based on the processes involved in IRM.
108
4.3.1 Decision Support System in data mart
Decision Support System is certainly an ingredient of resolution
making processes. A decision is termed as the selection of one between a
number of substitutes. Decision making signifies the entire procedure of
building the choice containing:
i. Evaluating the problem,
ii. Gathering and validating information,
iii. Recognizing choices,
iv. Predicting consequences of decisions, and
v. Calculate decisions.
Initially, the problem occurrence is evaluated. The set of information
user demanded are collected. The information is validated to clarify the
correct data extraction. After information validation the choices are
identified in the process of decision making. Determine impact of decision,
an outcome in advance. Finally, compute the decisions. The decisions are
made, based on the data mart attribute selection using inductive rule
mining.
The decision making process is done with three main strategies:
1. Intelligence,
2. Design, and
3. Choice.
109
The work of intelligence is to find the problem and to analyse the
problem generated. The design involves the formulation of solutions,
representation and simulation, and atlast, deciding the choice, i.e., decision
making and implementation.
In data mart, the process of identifying the functional operational
goal of the system is analyzed and being processed. The IRM solves the
problem of determining the analysis of the given validated information in
the data warehouse. The representation of problem and the decision
making process is implemented with solution. IRM presents simulation of
the information available in the data warehouse.
4.3.2 Decision rule induction for extracting information
In data warehousing technology with the promising growth, a massive
expensive resource of information is persuading with functional decision rules.
But with the traditional method, the number of engendered decision rules is
incredible. IRM intend a diverse strategy of inducing definite and probable
decision rules. The induction process is generated by the query in the IRM.
The information concerning the user requirement is stored in a table structure
and decision rules are persuaded by posing query on any attribute.
Using IRM scheme, the relevant inductive rules are minimized. The
framework of the inductive rule mining for extracting the user needed
information is shown in Figure 4.4.
110
Figure 4.4 Process of Decision rule induction for extracting user
information
The framework of decision rule induction for extracting the user
required information is invoked by users query. After query processing, the
supporting data structure is restructured with the tables and attribute's
name mined from the query. The number of times attribute used is
calculated by the column hit in the table form.
111
The counter is varied in descending order to set the most commonly
used attribute in the first row. The most commonly used attribute is referred
by user queries. Efficiently processes range queries on table and ensure
data access as per proposal of Theoni Pitoura et al. Hence, it is important
in producing decision rules on the attribute value. The approach of
inducing decision rules supported on the most commonly inquired attribute
is described in the following algorithm.
An Efficient Algorithm Decision Rule Induction
Input: Users query and a data warehouse
Step 1: Extract table names Ti and attribute names Aj from the query in
DW.
Step 2: Process the attribute ranking (AR) table and revise the hit
counter recognized by each Ti and Aj.
Step 3: Arrange the AR-table based on the hit value in descending
order.
Step 4: Extract the top row of AR-table to achieve T1 and A1.
Step 5: Create a decision table A = <U, A, d> where d = A1, A = a set of
attributes in T1, U = a set of records in T1.
Step 6: Pre-process A by
i. eliminating attributes with number of discrete values = | T1|
ii. discrediting attributes with genuine values
112
Step 7: Partition U into similarity classes
Step 8: Generate certain, negative, positive rules
Step 9: Generalize all decision rules using dimension tables and
hierarchical information from the data warehouse.
Step 10: Include rules into the knowledge base
Output: Decision rules
Decision rules are normally used in classification and prediction of
data obtained in the data mart. Decision rule is nevertheless a dominant
way of data representation. The models formed by decision rules are
characterized and identified. The possible rules of the data are identified
and referred by the user queries. Induction based decision rule algorithm is
used to select the hit attribute at each attribute in the data storage
repositories. The attribute with the highest hit is selected as similarity
classes for the data warehouse. The highest hit attribute reduces the
information. In addition, the hit attribute provide the data demanded by the
user and categorize the attributes in the resulting partitions.
4.4. Experimental Evaluation on Induction Rule mining
The IRM approach in extracting user demanded information is efficiently
analyzed, based on inductive rule mining and is implemented using Java.
An experimental study is done to estimate the effectiveness and performance
of the extracting user required data from data mart using inductive rule mining.
113
The effectiveness of the IRM is estimated on Insurance company
benchmark datasets from UCI repository with varying characteristics.
Information about customers consists of 86 variables and includes
product usage data and socio-demographic data derived from zip area
codes. But the work involves 10 attributes taken out from dataset for
evaluating the performance of the IRM technique for extracting user
demanded information from repositories. The attributes used in customer
dataset are name, age, education, income, marital status, number of
houses, number of children, number of insurance policies, number of
motorcycle / car policies and number of life insurances.
Based on the formation of inductive rules, the user demanded
information is extracted from the data repositories stored in the data mart.
The inductive rule is formed, based on the decision rules, and the decision
is done efficiently by choosing the attributes related with operational goal.
The performance of the extracting user demanded information for data
mart using inductive rule mining is measured in terms of the following:-
i) Decision Rules,
ii) Extracted Data Relativity,
iii) Reliability,
iv) Running Time, and
v) Rule Coverage.
114
Decision rules are a set of functions. Decision rule draws an
examination to a proper action. The decision rules play an important role in
data warehousing concepts to determine the set of rules for extracting the
user demanded information from the data storage repositories. After the
formation of decisive rules, the user required data are retrieved from the
repositories. The retrieved data relativity is found out using the decisive
rule induction algorithm process in terms of extracted data relativity
process. Reliability is the term used to identify the reliable manner of data
extraction from the data storage repositories. The time taken to extract the
user required data from data mart, using inductive rule, is termed as
running time.
4.5. Results and Discussion
The inductive rule mining is compared to the functional behavior
pattern in data mart based on attribute relativity. Extracting user demanded
information in data mart using inductive rule mining is effective in terms of
number of decision rules, extracted data relativity, running time, rule
coverage and reliability of the data mart by maintaining the decision rules
in the data mart in a successful manner.
The functional behavioral pattern of the data mart for organization is
efficiently analyzed and viewed. In addition, the use of inductive rule
mining enhances the extraction of user demanded data. The experiments
are conducted with insurance company benchmark data sets to estimate
115
the performance of the IRM. The below table and graph describe the
performance of the extracting user demanded information in data mart,
using inductive rule mining with an existing scheme.
The output of IRM is extracting from the information provided
by the user. For example if any user select one attribute, it displays
the output based on the available and demand information.
4.5.1 Decision Rules
Decision rules are a set of functions. Decision rule draws an
examination to a proper action. The decision rules play a vital role in data
warehousing concepts, to decide the set of rules for extracting the user
required information, from the data storage repositories. Decision rules are
generally used in classification and prediction of data obtained in the data
mart. Decision rule is straightforward but is a dominant way of data
representation. The models formed by decision rules are characterized and
identified. The possible rules of the data are identified and referred by the
user queries.
In table 4.1, the number of user queries is in the range of 5, 10,
15 which is selected for experimental purpose. The queries are
issued by the user and accordingly based on the user demand the
reply is provided.
116
Table 4.1 No. of User Queries Vs. Decisive Rules
No. of user queries Decision rules (%)
FBPDM IRM 5 2 6 10 7 10 15 11 14 20 13 18 25 15 22 30 21 28 35 24 31
The table 4.1 describes process of decisive rules formed in the data
mart. The FBPDM is used for analyzing the functional behavior of user
required information. The decision rule based on the user queries in
FBPDM and IRM is measured in terms of percentage.
The outcome of the IRM for extracting user demanded data is
compared with an existing Functional behavioral pattern in data mart.
As the information concerning about the user requirement is stored in a
table structure and decision rules are persuaded by posing query on any
attribute. Based on the table 4.1, a graph is depicted as follows:-
117
Figure 4.5 No. of User Queries Vs. Decisive Rules
The graph (Figure 4.5) describes the decisive rules formed in the
data mart. The IRM for data mart efficiently derived the decision rules
and the variance is high by about 5-15% high compared to FBPDM.
The inductive rule mining is applied to derive a set of rules based on the
functional attributes of the data present.
4.5.2 Data Relativity
The retrieved data relativity is found out using the decisive rule
induction algorithm process in terms of extracted data relativity process.
The decision rules are formed on the basis of rule induction. After the
formation of decisive rules, the user required data are retrieved from the
repositories.
118
Data relativity facilitates corporations to reduce the difficulty and
organize the costs associated with enterprise. The data relativity allows
corporations to customize workflows more specific to the needs of the
users enhancing enterprises extra flexible platform. The workflow is
automated and linked to custom validation logic, sustaining guarantee work
product that is distinguished, justifiable and easily analyzed.
Table 4.2 Data Mart Vs. Extracted Data Relativity
Data Mart Extracted data relativity (%)
FBPDM IRM 5 35 48 10 38 51 15 42 54 20 47 59 25 50 62 30 53 64 35 59 67
The table 4.2 describes the extracted data relativity of functional data
attributes requested by the user. Data mart acts as the accessing form of
the data warehouse. User required efficient extract data from data
mart. It is not same for user queries and data mart. The outcome of the
IRM for extracting user demanded data is compared with functional
behavioral pattern in data mart. Based on the table 4.2, a graph is depicted as
follows:-
119
Figure 4.6 Data Mart Vs. Extracted Data Relativity
The graph in Figure 4.6 describes the extracted data relativity of
functional data attributes requested by the user. Extracted data relativity
using the data mart is measured in terms of percentage. FBP for data
mart only identified the process of data mart but the IRM for data mart
efficiently extracted user demanded information and the variance is
10-15% high. Based on the number of decisive rules formed by decision
inductive rule algorithm, the extracted data relativity present in the data
mart is high in the IRM in data mart.
4.5.3 Reliability Reliability is the term used to identify the reliable manner of data
extraction from the data storage repositories. Reliability is used to illustrate
the overall stability of a data extraction measures. The measure of data
extraction is said to contain a high reliability if it produces high
consequences under consistent conditions. Data reliability is the ability of
120
an Inductive rule mining system to perform data extraction as per user
requirement functions under stated query conditions for a specified period
of time.
When the users have increasing automatically HZ increased (Juan Manuel Pe´ rez et al). For example Single user gained with 2 HZ Gained HZ =2
Total HZ= Number of users * Gained HZ
Table 4.3 Number of Users Vs. Reliability
No. of users Reliability (Hz)
FBPDM IRM 10 28 44 20 31 50 30 35 54 40 38 58 50 43 64 60 49 68 70 52 73
The Table 4.3 describes the reliability of IRM in the data mart. The
Inductive rule mining (IRM) provides high data reliability as per the above
table values and is compared with Functional Behavior Pattern in Data
Mart (FBPDM). Based on the table 4.3, a graph is depicted as follows :-
121
Figure 4.7 No. of users Vs. Reliability
The graph in Figure 4.7 describes the reliability of inductive rule
mining in data extraction. The IRM provides high data reliability by
around 20-25% compared to FBPDM. Reliability is measured based on the
user count in terms of the Hertz (Hz). The user requested information in the
data mart is extracted based on the decisive rules formed accurately.
Even though the number of users increases, the reliability of the data mart is
high in IRM.
4.5.4 Running Time
The running time is measured based on the data set loading time
and rule induction time. The time taken to execute the inductive rule mining
algorithm in extracting the user demanded data is termed as running time.
The running time is measured in seconds. The speed acceleration varies
among the data sets. The improvement is more significant on large data
122
sets. To prove the better performance of IRM in terms of running time, it is
compared with FBPDM.
Table 4.4 Data Extraction Vs. Running Time
Data Extraction Running Time (seconds)
FBPDM IRM 10 27 14 20 31 27 30 43 34 40 57 43 50 64 56 60 78 68 70 83 74
The Table 4.4 describes the reliability of IRM in the data mart. The
outcome of the IRM for extracting user demanded data is compared with
Functional behavioral pattern in data mart. Based on the table 4.4, a graph
is depicted as follows:-
Figure 4.8 Data Extraction Vs. Running Time
123
The graph in Figure 4.8 describes the reliability of inductive rule
mining in data extraction. The IRM facilitates less running time around
10-20% compared to FBPDM. The running time of FBPDM and IRM
based on data extraction is measured in terms of seconds. The user
requested information in the data mart is extracted based on the decisive
rules formed accurately. Even though the number of users increases, the
reliability of the data mart is high in IRM.
4.5.5 Rule Coverage
The rule induction process is considered as a data search process in
data mart. The rule coverage metric is required to determine the quality of
rules established in the inductive rule mining to direct the data extraction
towards the best rule. The rule quality measure is a mean element in rule
induction.
In real-world applications, a typical purpose of a decision support
system is to find rules that optimize a rule quality criterion to extract data
with both training accuracy and rule coverage into account. So, the
inductive rule mining is both accurate and reliable. A quality measure of
rule induction is determined from the obtainable data. All common
measures are based on the number of positive and negative instances
covered by a rule.
The IRM used inductive rule mining for handling the user demanded information. The decision support is efficiently done with inductive
124
rule mining on functional data marts segregation of the layered data
repository. Table 4.5 Data Mart Vs. Rule Coverage
Data Mart Rule Coverage (%)
FBPDM IRM 10 38 54 20 44 57 30 52 64 40 58 73 50 62 76 60 74 88 70 87 94
The Table 4.5 illustrates data mart versus rule coverage. The
outcome of the IRM for extracting user demanded data is compared with
Functional behavioral pattern in data mart. The rule coverage based on
the data mart is measured in terms of percentage (%). Based on the
table 4.5, a graph is depicted as follows:-
Figure 4.9 Data Mart Vs. Rule Coverage
125
The graph in Figure 5.3 describes the rule coverage quality measure
on inductive rule mining. The IRM provides better rule induction about
10-15% compared to FBPDM. As the user requested information in the
data mart is extracted based on the decisive rules formed accurately. Even
though the number of users increases, the reliability of the data mart is
high in IRM.
Finally, the IRM facilitates extracting user demanded data in data
mart using decision rule induction algorithm. The experiments concludes
the efficient working of IRM in terms of metrics such as decision rules, data
relativity, reliability, running time and rule coverage. Evaluations are done
with the benchmark datasets using the decisive rules based on the
decision rule induction algorithm and the user demanded information is
taken out in a reliable manner.
4.6. Summary
The functional behavioral pattern in data mart is utilized and
processed in the analysis of data and extraction of user demanded data is
unsatisfied. The issue raised over functional behavioral pattern is it does
not efficiently retrieve the user demanded information based on attribute
relativity. The IRM used inductive rule mining for handling the user
demanded information. The decision support is efficiently done with
inductive rule mining on functional data marts segregation of the layered
data repository.
126
The framework of decision rule induction for extracting the user
required information is invoked by users' query. After query processing, the
supporting data structure is restructured with the tables and attribute's
name mined from the query. The decision support system is achieved in a
reliable manner to extract the user required demanded data from data
warehouse. The IRM is efficiently developed with inductive rule mining
approach to extract the user demanded information in data mart.
Experimental results showed that the Inductive rule mining in data mart
analyzed efficiently and performance is measured in terms of decisive rule
formation, data relativity, reliability, running time and rule coverage contrast
to FBP for data mart based on attribute relativity.
The insurance dataset user’s level selects the attribute data.
Using inductive rule mining based on the user demand data,
segregation is made in an efficient manner. As a resultant data is
easily transformed to the user.
Publication
1. “Mining User Demanded Data From Data Mart Using Inductive Rule
Mining”, in the International Journal of Computer Science Issues on
Volume 9, Issue 4, No.1, Huly 2012, pp.405-410
127
5. AN EFFICIENT FUNCTIONAL LAYERED DATA MART
BASED ON NEW TECHNIQUE OF
INDUCTIVE RULE MINING
5.1 Introduction
Data mart is a division of the data warehouse in a generally oriented
business line. Data warehousing collects the data at different levels. Data
are stored as a collective data repository with better storage efficiency.
Various data warehousing models concentrate on storing the data more
efficiently and retrieving data quickly. Data stored in the repository are
segregated into data layers. The functional behavior of the corporate
system is analyzed to build layers of data storage repositories with relevant
data attributes.
Then, the decision support system is applied with inductive rule
mining on functional data in the layered data repository. Inductive rule
mining (IRM) improves the decision making in retrieving data from data
mart. In addition, IRM facilitates better understanding of the structure in
extracting data from the warehouse. The functional behavior pattern in
data mart (FBPDM) with attribute relativity is compared and the result is
analyzed with existing multi-functional data warehousing model (MFDWM).
The inductive rule mining is compared with the functional behavior pattern
in data mart.
128
The objective of case study on performance analysis of data mart in
functional layered repositories with induction rule mining is to offer an
experiential and scientific foundation in extracting user required data from
data mart. The effectiveness of the IRM mechanism is estimated on
benchmark datasets from UCI repository with varying characteristics.
Performance of functional behavior in data mart and inductive rule mining
in extracting user demanded data is evaluated with Insurance Company
Benchmark (COIL 2000) data set from UCI Repository data sets.
Insurance Company benchmark data set, used in the CoIL 2000
Challenge, contains information on customers of an insurance company.
The characteristic of data set is multivariate and attribute characteristic is
categorical and integer. The data consists of 86 variables, 9,000 instances
and includes product usage data and socio-demographic data. The data
was supplied by the Dutch data mining company Sentient Machine
Research and is based on a real world business problem. The training set
contains over 5,000 descriptions of customers, including the information of
a caravan insurance policy. A test set contains 4,000 customers with or
without insurance policy in caravan organization.
Data warehousing collects the insurance company data. Insurance
company data are stored as a collective data repository with better storage
efficiency. Initially, the functional behavior of the corporate system is
analyzed as policies, based on annual income of the customer. Based on
129
the number of insurance policies, the insurance company repositories are
constructed in layers including car policies, motorcycle/scooter policies,
agricultural machines policies, life insurances, family accidents insurance
policies, fire policies and boat policies with relevant attributes in each
policy. Based on the customer annual income and career, decision making
system decides the insurance policies.
The closely related attributes such as Customer Subtype, Career,
Income and Third party insurance are chosen from insurance company
organization. The functional behavior of the insurance company is
identified as insurance policies with these attributes. Based on the
formation of inductive rules, the user demanded information is extracted
from the insurance company repositories.
The inductive rules are formed based on the decision rules and the
decision is done efficiently by choosing the attributes related to operational
objective of the insurance company. Usually in the data mart, the
extraction of information is done based on users query. Efficiently
processing of range queries on table ensures data access as proposed by
Theoni Pitoura et al. For instance, the customer subtype in the insurance
data mart consists of various policy forms. The policy form of the requested
customer subtype is retrieved, depending upon the users query.
The performance analysis of data mart in functional layered
repositories is evaluated with induction rule mining. Extracting user
130
demanded information in data mart using inductive rule mining is
implemented using Java. The experiments run on an Intel P-IV machine
with 2 GB memory and 3 GHz dual processor CPU. An experimental study
is presented to estimate the effectiveness and performance of the
functional behavior pattern and IRM in data mart. Several experiments are
conducted and evaluations are showed to prove the better extraction of
user required data in data mart.
5.1.1 Attribute Relativity in Insurance Data Mart
Data mart is built with a well-known attributes behavior of the
insurance company data stored in the data warehouse. A data warehouse
system maintains the insurance company data in an arbitrary manner.
In addition, data warehouse handles the insurance data mart, customer
profile data mart and policy data mart based on the functional behavior of
the data in the data warehouse environment. For instance, the insurance
data mart is considered for evaluation.
The relevant attributes such as Customer Subtype, Career, Income,
policies and Third party insurance are chosen from insurance data mart.
The functional behavior of the insurance data mart is identified based on
the relevant attributes of the insurance data mart. The below table and
graph describe the performance of the function behavior pattern in data
mart based on attribute relativity in comparison to multi-functional data
warehousing model as proposed by Neoklis Polyzotis et al.
131
Table 5.1 Tabulation for Attribute Relativity
No. of Attributes in Insurance Data Mart
Attribute Relativity (%) MFDWM FBPDM
10 10 5 20 19 12 30 28 16 40 37 23 50 48 26 60 56 32 70 64 45
The table 5.1 illustrates the relevant attribute selection from number of
attributes in insurance data mart. The attribute relativity based on the
insurance data mart is measured in terms of the percentage. For each and every process, different relative attributes with respect to time are
taken from our dataset. The values are calculated by using the given formula.
Here using different relative attributes for each time, the calculated values differ from each time.
Yes, it is true. When no. of attributes increases then relative
attribute relativity also increase by using above formula.
Based on the table 5.1, a graph is depicted as follows:-
132
Figure 5.1 No. of Attributes in Insurance Data Mart Vs. Attribute Relativity
The graph in Figure 5.1 illustrates the attribute relativity of insurance
company data attributes present in the insurance data mart for analyzing the
functional behavior. A high attribute relativity of about 25- 35% is seen in
the FBPDM compared to multi-functional model in data warehousing.
As an individual attribute character is studied in designing logical storage of
insurance data mart facilitating the selection of relevant attributes.
5.2 Functional Behavior Analysis on Attribute Relativity
The operational objective of insurance company decides the
functional behavior pattern of the insurance data mart. The operational
objective of the insurance data mart is identified, based on the selection of
relevant attributes like Customer Subtype, career, income, policies and
third party insurance. Generally, the information in the insurance data mart,
133
customer profile data mart and policy data mart is managed with relevant
selection of attributes for each data mart.
Table 5.2 Tabulation of Functional Behavior Analysis
No. of Attribute Relativity
Functional Behavior Analysis (%) MFDWM FBPDM
5 5 1 12 12 4 16 16 6 23 22 8 26 27 9 32 30 12 45 39 15
The table 5.2 describes the functional behavior analysis made on
different sets of attribute relativity. The operational goal of the insurance
company is to register insurance policies. The insurance policies are the
functional behavior of the insurance organization which is decided on the
basis of attribute relativity. The functional behavior of insurance data mart
is mainly to provide insurance policies to different fields. In this case, policy
of customer in the insurance data mart is recognized as functional behavior
of insurance company.
134
Figure 5.2 No. of Attribute Relativity Vs. Functional Behavior Analysis
The graph in Figure 5.2 describes the efficient analysis of functional
behavior with relevant attributes. The functional behavior analysis based
on the attribute relativity is measured in terms of the percentage. The
functional behavior of insurance data mart with various attribute
relativity is identified and variance is 25-45% compared to MFDWM.
The operational goal of the insurance company is recognized on the basis
of attribute relativity in analyzing functional behavior pattern.
5.3 Formation of Functional Layered Repositories based on
Insurance Policies
The functional behavior aims in building layers of data storage
repositories with relevant data attributes using functional behavior pattern
in data mart. A collection of all the necessary customer requirements are
135
gathered for designing insurance data repositories. In addition, the
insurance data source of the collected data is identified to enhance the
data mart design. After identifying insurance data sources, the physical
and logical structure of data mart is analyzed and designed for better data
accessing.
The physical and the logical construction in connection to the data
mart support rapid and proficient access to the data in insurance data mart.
The functional behavior is recognized on the operational goal of the
organization i.e., based on the insurance company policies, the data are
stored in the repositories as layers. Based on the insurance policies, the
insurance company repositories are constructed in layers containing car
policies, motorcycle/scooter policies, agricultural machines policies, life
insurances, family accidents insurance policies, fire policies and boat
policies with relevant attributes in each policy.
Table 5.3 Tabulation for Functional Layered Repositories
No. of Insurance Policies
Functional Layered Repositories (kb) MFDWM FBPDM
1 5 2 2 8 4 3 12 6 4 14 9 5 17 11 6 20 13 7 24 16
136
The table 5.3 shows tabulation of functional layered insurance
repositories with the insurance policies. The insurance layered repositories
minimize the storage size. The insurance data storage repositories efficiency
is decided, based on the memory occupied by the data in the insurance data
mart. Based on the table 5.3, a graph is depicted as follows:-
Figure 5.3 No. of Insurance Policies Vs. Functional Layered Repositories
The graph in Figure 5.3 describes the functional layered insurance
repositories in efficient analysis of policies as the functional behavior in
data mart. The data storage is reduced in FBPDM increasing data
accessibility by about 10-20% compared to MFDWM. The layered
formation of the insurance data mart enhances efficient data retrieval.
The interface transitional deposit for the front-end tool is designed to
utilize insurance database structures in data storage. Functional Layered
137
Repositories based on the Insurance Policies number, the repositories are
measured in terms of the Kilo Bytes (KB).
5.4 Efficient Technique for Retrieving data from Functional Layered
Insurance Repositories
The layered formation of the insurance repositories on the basis of
functional data attributes from the insurance data mart improving the speed of
data retrieval. The populating design of data storage repositories facilitates the
rapid retrieval of data from the insurance data mart. The performance efficiency
of high-speed data retrieval in FBPDM is proved with the table and graph as
follows:-
Table 5.4 Tabulation for Data Retrieval
Functional layered Insurance Repositories
Data Retrieval (seconds) MFDWM FBPDM
1 0.95 0.46 2 1.48 0.89 3 2.18 1.56 4 2.54 1.89 5 2.97 2.11 6 3.20 2.63 7 3.64 2.97
The table 5.4 describes the retrieval of data from functional layered
insurance repositories. The outcome of the functional behavior pattern in
data mart is compared with a multi-functional model in data warehouse.
Data retrieval is measured in terms of seconds based on the functional
layer insurance repositories. Based on the table (table 5.4) a graph is
depicted as follows:-
138
Figure 5.4 Functional Layered Insurance Repositories Vs. Data Retrieval
The graph in Figure 5.4 describes the process of retrieving data from
functional layered insurance repositories with functional attributes. Based on
the number of attribute relativity, the rapid data retrieval from data mart is
high by about 10-20% in the FBPDM in contrast to multi-functional model
in data warehouse. The insurance data storage repositories are constructed
in layers based on the functional behavior policies pattern.
5.5 Decision Making on Functional Layered Insurance Repositories
Decision Support system is certainly a part of resolution making
processes. A decision is termed as the selection of one between a number of
substitutes. Decision making on functional layered insurance repositories forms
decision rule induction in extracting user demanded data in data mart. Decision
making signifies the entire procedure of building the choice.
139
The set of demanded customer's policies are collected. The
insurance policy information is validated to clarify the correct data
extraction. After insurance policy data validation, the choices are identified
in the process of decision making. Finally, the decision on utilizing the
policy insurance for the customer is reported. The policy insurance
decisions are made, based on the insurance data mart attribute selection.
Decision support system facilitates the inductive rule mining.
Table 5.5 Tabulation for Decision Support System
Functional layered Insurance Repositories
Decision Support System (%) FBPDM IRM
1 71 89 2 69 85 3 67 78 4 61 74 5 58 70 6 52 67 7 48 62
The table 5.5 describes the decision made on the insurance
repositories. The problem occurrence is solved with decision on insurance
data validation. After policy information validation, the choices are
identified in the process of decision making. The impact of decision making
in insurance repositories is determined in prior to confirm the data
extraction. Decision Support System based on the Functional Layered
Insurance Repositories is measured in terms of percentage.
140
Figure 5.5 Functional Layered Insurance Repositories vs. Decision
Support System
The graph in Figure 5.5 illustrates the decision made on the functional
layered insurance repositories to enhance the user demanded data extraction.
The capacity of decision making in IRM is high by about 15-20%
compared to FBPDM. The decision support system works in terms of
intelligence, design and choice. The intelligence finds the problem occurred in
insurance data mart and analyses the problem generated. The design involves
the formulation of solutions and deciding choice in decision making
implementation.
5.6 Efficient Decision Rule patterns
In data warehousing technology with the promising growth, a
massive expensive resource of information is persuading with functional
141
decision rules. Several decision rule formations are derived from the
insurance data mart. But with the traditional method, the number of
produced decision rules is incredible. The decision rule formation on the
basis of insurance repositories is depicted in a table below:-
Table 5.6 Tabulation for Decision Rule Pattern
Functional layered Insurance Repositories
Decision Rule Patterns (%) FBPDM IRM
1 61 69 2 52 59 3 42 47 4 36 41 5 33 39 6 25 30 7 19 28
The table 5.6 shows the decision rule pattern formation based on the
insurance repositories. Decision rule pattern based on the functional
layered insurance repositories is measured in terms of the percentage (%).
Based on the table 5.6 a graph is depicted as follows:-
142
Figure 5.6 Functional Layered Insurance Repositories vs. Decision
Rule Pattern
The graph in Figure 5.6 illustrates the decision rule pattern in the
functional layered insurance repositories. The decision rule pattern
formation is little high in IRM by about 5-10% compared to FBPDM as
decision rule are formed on the basis of decision making in the decision
support system.
5.7 Decision Rule Based on Functional Attribute
The decision rules convincing enormous costly resource of
information in data warehousing technology with the hopeful development.
The number of engendered decision rules from the insurance repositories
is incredible with decision making. In addition, IRM intend a diverse
strategy of inducing definite and probable decision rules. IRM derives a set
of decision rule based on the functional policy attribute in the insurance
143
data mart. The decision rule patterned with decision support system is
enhanced with decision rule based on the functional attribute.
Table 5.7 Functional layered Insurance Repositories Vs. Decision Rule
Functional layered Insurance Repositories
Decision Rule (%) FBPDM IRM
1 61 75 2 52 67 3 42 58 4 36 53 5 33 50 6 25 41 7 19 37
The table 5.7 shows the decision rule pattern formation based on the
insurance repositories. Decision rule is measured in terms of percentage
based on the Functional Layered Insurance Repositories. Based on the
table 5.7, a graph is depicted as follows:-
Figure 5.7 Functional Layered Insurance Repositories Vs. Decision Rule
144
The graph in Figure 5.7 elaborates the decision rule based on the
functional policy attribute in the functional layered insurance repositories.
The decision rule based on functional policy attribute is high in IRM
about 15-20% compared to FBPDM as IRM by solves the problem of
decisive analysis of the specified authorized insurance data in insurance
company data warehouse.
5.8 Rule Induction with Decision Rules
The rule induction process is considered as a policy data search
process in insurance data mart. The rule coverage metric is required to
determine the quality of decision rules established in the inductive rule
mining to direct the data extraction towards the best decision rule. The rule
quality measure is a mean element in rule induction.
In real-world applications like insurance company, a typical purpose of a
decision support system is to find decision rules. The decision rule determined
works in optimizing a rule quality standard to extract insurance data with both
training accuracy and rule coverage into account. So, the inductive rule mining
is both accurate and reliable. A quality measure of rule induction is determined
from the obtainable insurance data. All common measures are based on the
number of positive and negative instances covered by a rule induction.
145
Table 5.8 Tabulation for Rule Induction
Decision Rules Rule Induction (%)
FBPDM IRM 1 61 88 2 57 82 3 52 78 4 46 73 5 43 70 6 38 62 7 34 57
The table 5.8 shows the rule induction in directing the insurance data
extraction towards the best decision rule. The rule induction is measured in
terms of the percentage (%). Decision rule values vary from range 1 to 7.
Decision rules are generally used in classification and prediction of data obtained in the data mart. Decision rules are persuaded by posing query on any attribute.
Decision rule induction for extracting the user required information is invoked from the users query.
The inference out the result is that Query processing is performed. So, the supporting data structure is restructured with the tables and attribute's name mined from the query.
146
Based on the table (table 5.8) a graph is depicted as follows:-
Figure 5.8 Decision Rules Vs. Induction
The graph in Figure 5.8 elaborates the determination of best
decision rule based on the functional policy attribute in the functional
layered insurance repositories using rule induction. Inducing best
decision rule based on functional policy attribute using IRM is higher
by about 20-30% compared to FBPDM. The insurance company information
concerning customer requirement is stored in a table structure and the decision
rules are persuaded by posing query on policy attribute value.
5.9 Efficient Induction Rule Mining in Extracting User Demanded Data
The framework of decision rule in induction rule mining for extracting
the user required information is invoked by users query. After query
processing, the supporting data structure is restructured with the customer
147
profile tables and policy attribute's name mined from the query. The attribute
count is calculated by the column hit in the table form. The counter is varied in
descending order to set the most commonly used attribute in the first row.
The most commonly used attribute is referred by user queries.
Induction based decision rule algorithm selects the hit attribute at each
policy attribute in the functional layered insurance storage repositories. The
attribute with the highest hit is selected as similarity classes for the data
warehouse. For example, in accident insurance policy table, the hit attribute is
vehicle price and is referred by query ‘In table (Accident Insurance policy)
vehicle price ≥ 3lakhs’. Finally, the query results with the extraction of user
demanded data in the insurance data mart.
148
Table 5.9 Data Mart Vs. Extracting User Demanded Data
Data Mart Extracting User Demanded Data (seconds)
FBPDM IRM 1 64 53 2 67 56 3 70 59 4 75 62 5 82 66 6 87 68 7 92 72
The table 5.9 shows the induction rule mining in extracting user
demanded data in data mart. The formation of decision rules is recognized
by computing inductive rule mining process enhancing decision rule
induction algorithm. Using IRM scheme, the relevant inductive rules are
minimized. Extracting User Demanded Data is measured in terms of
seconds (secs) based on the data mart group.
Figure 5.9 Data Mart Vs. Extracting User Demanded Data
149
The graph in Figure 5.9 describes the user demanded data
extraction in data mart. IRM provides better extraction of user
demanded data by about 15-25% compared to FBPDM. As the highest
hit attribute reduces the information and the hit attribute provide the user
demanded data by categorize the attributes in the resulting partitions.
5.10 Functional Behavior Analysis
Functional Behavior depends on attribute relativity which specifies
the rate of the attributes present in the data mart, closely related with the
operational goal of organization.
Table 5.10 No. of Data Marts Vs. Analysis of functional behavior
Data Mart Analysis of functional behavior (%)
FBPDM IRM 1 5 10 2 8.4 15.3 3 12.5 19.2 4 15.7 21.5 5 18 24 6 22 28 7 25.6 30.5
Table 5.10 describes the functional behavioral analysis of the data
mart and the outcome of the proposed FBP for data mart is compared with
an existing multi-functional model in DW using attribute relativity.
The analysis of functional behavior based on the data mart is measured in
terms of percentage.
150
Fig. 5.10 No. of Data Marts Vs. Analysis of functional behavior
The Figure 5.10 describes the efficiency of analysis of functional
behavior of the data mart. The proposed Functional Behavior Pattern
chosen the relevant attributes based on the operational goal of the data
mart is high. Since the relativity of attribute is high, the efficiency of the
proposed FBP for data mart is also high.
Existing multi-functional data model does not analyze the functional
behavior of data mart but the proposed FBP does. Based on the number of
data marts, the analysis of functional behavior in the data mart is
45 – 50 % high in the proposed FBP for data mart contrast to an existing
multi-functional model in DW.
5.11 Performance result of Memory Consumption
Memory Consumption is the amount of space taken to store the data
in the Data Warehouse. Percentage of total memory contribution to a
mechanism is called memory consumption rate. Memory Consumption
151
objective is to decrease the amount of energy required to offer quality data
retrieval from the data warehouse.
Table 5.11 Transaction Density Vs. Memory consumption
Transaction Density Memory consumption (MB)
FBPDM IRM 10 10 8 20 14 13 30 16 15 40 20 18 50 24 20 60 26 22 70 29 26 80 34 30
Table 5.11 described memory consumption of IRM and FBPDM
model. The outcome of the proposed IRM for data warehouse is compared
with an existing system based on the transaction density. Transaction density also called as transaction refers to an item
set in each data. For a pair of an item set and a transaction set, the density is defined by the average number of items included in a transaction.
In particular data transacted with limitation of time.
Transaction density = Number of transaction data * Time (ms)
152
Memory Consumption is measured in terms of Mega Bytes (MB).
Fig 5.11 Transaction Density Vs. Memory consumption
The Figure 5.11 describes the memory consumption of data from data
warehouse for analyzing the functional behavior of user needed information.
As the transaction density increases, the memory usage is decreased gradually.
Based on transaction density, memory consumption of IRM is 25 - 30 %
less when compared to the FBPDM model. Higher the transaction, lower the
memory consumption of data mart. The IRM consumes lesser memory for
storage of information in data mart. It is measured in terms of Mega Bytes (MB).
5.12 System Response Time
The interval between the instant at which an operator at a terminal
enters a request for a response from a data warehouse and first data of
153
response received to the terminal is called the system response time. It is
measured in terms of milliseconds (ms).
Table 5.12 No. of users Vs. System Response Time
No. of users System Response Time (ms)
FBPDM IRM 5 1450 789
10 1750 1010 15 2100 1250 20 2560 1350 25 2890 1650 30 3250 1850 35 3680 2010
The system response time in terms of milliseconds are tabulated in
table 5.12. The system response time is measured based on number of
users. The results are compared with FBPDM and IRM.
Fig 5.12 No. of users Vs. System Response Time
154
The performance graph of system response time based on user is
measured in IRM model. The user requested information in the data mart
is extracted based on the decisive rules formed accurately. Even though
the number of users increases, the system response time of the data mart
is less in IRM. The response time taken is lesser when compared with the
FBPDM. The variance in the system response time of IRM would be
50 – 60 % lesser response time taken when compared with the
FBPDM.
5.13 Execution Time Evaluation
Execution time is when a program is running (or being executable).
That is, when you start a program running, it is runtime for that program.
The execution time is defined as the time taken to transfer the data from
the source to destination in data mart using IRM.
155
Amount of time taken to request send Execution Time =
Total no. of users in data mart
Table 5.13 Tabulation of No. of requests sent Vs. Execution Time
No. of requests sent Execution Time (seconds)
FBPDM IRM 1 45 18 2 80 30 3 125 48 4 175 60 5 210 82 6 255 101 7 310 122
The table 5.13 describes the execution time in FBPDM and IRM model.
The execution time is measured in terms of seconds.
Fig 5.13 No. of requests sent Vs. Execution Time
156
Fig 5.13 demonstrates the performance of the execution in IRM.
As the request from the users in distributed system increases, IRM
technique execution time decreases gradually. The result differs based on
the number of request send to the users in data mart. The variance is
approximately 30 - 40% lesser in terms of time when compared to the
FBPDM model.
The IRM concludes efficient work on extracting user demanded data in
data mart using decision rule induction algorithm.
Evaluations are done with Insurance Company Benchmark (COIL
2000) data Set from UCI repository data sets using the decisive rules based
on the decision rule induction algorithm on the functional layered insurance
repositories. The user demanded information is extracted in a reliable
manner. 5.14 Summary
The objective of Inductive rule mining (IRM) is that it improves
the decision making while retrieving data from data mart. IRM
facilitates better understanding of the structure for efficient
extraction of data from the warehouse. The significant part of IRM is
that it ensures efficient extraction of information based on users
query. As a result, efficient processing of range queries on table
ensures data access. Finally, IRM extracts the user demanded
information in a reliable manner.
157
IRM provides better efficient extraction of user demanded data
by about 15-25% compared to FBPDM.
The decision rule based on functional policy attribute is high in IRM about 15-20% compared to FBPDM as IRM by solves the problem of decisive analysis of the specified authorized data in data warehouse.
The following paragraphs reveal performance analysis on extracting
user demanded data in functional layered data mart based on induction
rule mining.
1) A high attribute relativity of about 25-35% is seen in the FBPDM
compared to multi-functional model in data warehousing.
2) The functional behavior of insurance data mart with various attribute
relativity is identified and variance is 25-45% compared to MFDWM.
3) The data storage is reduced in FBPDM increasing data accessibility
by about 10-20% compared to MFDWM.
4) Based on the number of attribute relativity, the rapid data retrieval from
data mart is high by about 10-20% in the FBPDM in contrast to multi-
functional model in data warehouse.
5) The capacity of decision making in IRM is high by about 15-20%
compared to FBPDM.
6) The decision rule pattern formation is little high in IRM by about 5-10%
compared to FBPDM as decision rule are formed on the basis of
decision making in the decision support system.
158
7) The decision rule based on functional policy attribute is high in IRM
about 15-20% compared to FBPDM as IRM by solves the problem of
decisive analysis of the specified authorized insurance data in
insurance company data warehouse.
8) Inducing best decision rule based on functional policy attribute using
IRM is higher by about 20-30% compared to FBPDM.
9) IRM provides better efficient extraction of user demanded data by
about 15-25% compared to FBPDM.
10) Based on the number of data marts, the analysis of functional behavior
in the data mart is 45 – 50 % high in the proposed FBP for data mart
contrast to an existing multi-functional model in DW.
11) Based on transaction density, memory consumption of IRM is 25 - 30 %
less when compared to the FBPDM model.
12) The variance in the system response time of IRM would be
50 – 60 % lesser response time taken when compared with the
FBPDM.
13) Based on the number of request send to the users in data mart, the
variance is approximately 30 - 40% lesser in terms of time when
compared to the FBPDM model.
14) The functional behavior of the corporate system is analyzed to
build layers of data storage repositories with relevant data
attributes. Inductive rule mining (IRM) improves the decision
159
making in retrieving data from data mart. Based on the formation
of inductive rules, the user demanded information is extracted
from the insurance company repositories.
160
6. CONCLUSION AND FUTURE WORK
6.1 Conclusion
A data warehouse is a large database used for revealing and
examining the data in the database. The data collected in the warehouse
are updated from the operational systems such as market place. The data
surpass throughout an operational data stored for further operations.
The data in the data warehouse are utilized to report the information.
A data warehouse generated from integrated data source systems does
not need (ETL) Extract, Transform and Load process in producing
databases or operational data store databases. Data warehouses are
subdivided into data marts. Data marts store the data in subsets from
warehouse.
Functional behavior pattern in data mart (FBPDM) based on attribute
relativity analyze the functional behavior. The Functional behavior
identified the relevant attributes of the exacting data mart. The attribute
relativity is selected based on the complete report of data mart obtaining
operational goal. The Functional behavior pattern in data mart achieved
the best analysis of functional behavioral of the data mart depending on
attribute relativity. The data mart is finally designed efficiently with data
source of user needs, fast data retrieval on specifying attributes, data
broadcasting and fast storing of data into data mart.
161
The effective functional behavioral pattern in data mart is utilized and
processed in the analysis of data. Inductive rule mining (IRM) efficiently
retrieves the user demanded data in data mart based on attribute relativity.
The IRM properly used inductive rule mining for handling the user demanded
information. The decision support is efficiently carried out with inductive rule
mining on functional data marts segregation of the layered data repository.
The framework of decision rule induction used users query to extract user
demanded information. Based on the query processing, the supporting data
structure is restructured with the tables and attribute's name. The decision
support system achieved reliable data extraction from data warehouse.
Experimental results showed that the functional behavioral pattern
for data mart and Inductive rule mining in data mart are analyzed efficiently.
Performance of functional behavior pattern and inductive rule mining is
measured in terms of attribute relativity, functional behavior analysis,
decisive rule formation, data retrieval by about 10% to 15% in
comparison to multi-functional data warehousing model.
Finally, the IRM concludes efficient work on extracting user
demanded data in data mart using decision rule induction algorithm.
Evaluations are done with Insurance Company Benchmark (COIL 2000)
data Set from UCI repository data sets using the decision rule induction
algorithm on the functional layered insurance repositories and the user
demanded information is extracted in a reliable manner.
162
6.2 Future Work
Functional behavior of the data mart is analyzed but more time
consuming. In addition, functional behavior faces the possible dangerous
in problem behavior. The issue in functional behavior problem is considered
as future direction in addition to time minimization factor with the
establishment of functional training communication. The purpose of
Functional Communication Training is to understand each enterprise
communication behaviors as a replacement for functional behavior.
Additionally, functional training communication is able to consume time.
Data extractions are very complex in some subject areas and drop
out of the better data retrieval because the implementation is through
function modules. To still work in compliance with the data extraction, the
enterprise content data sources are loaded on a one-to-one basis into the
data mart and are further processed. But, no field extension is done in the
source system of data warehouse any longer. The data repository layer is
planned with enterprise content data sources to extend the better
performance in user required data extraction. The extraction process is
simple with the establishment of inductive rule mining. The difficulty arises
if the preferred data item is visible but out of reach. In future, a novel
technique is planned to extract the unreachable items in data mart.
163
BIBLIOGRAPHY
1. Polyzotis, Skiadopoulos, Panos Vassiliadis, Alkis Simitsis, Nils-Erik
Frantzell, “Meshing Streaming Updates with Persistent Data in an
Active Data Warehouse”, IEEE Transactions on Knowledge and Data
Engineering, Vol. 5, Issue No. 7, pp. 123-135, May 2008.
2. Juan Manuel Pe´ rez, Rafael Berlanga, Marı´a Jose´ Aramburu, and
Torben Bach Pedersen, Member, IEEE, “Integrating Data Warehouses
with Web Data: A Survey”, IEEE Transactions on Knowledge and
Data Engineering, Vehic. Technology, Soc, News pp.133-145,
September 2008.
3. Y. Tao, M. Yiu, D. Papadias, M. Hadjieleftheriou, and N. Mamoulis,
“RPJ: Producing fast join results on streams through rate-based
optimization”, Pentice Hall, 2005.
4. Desmarais. M.C, “Web log session analyzer: integrating parsing and logic
programming into a data mart architecture”, Web Intelligence, 2005 at
http://www.webintelligence.
5. Luca Cabibbo, Ivan Panella, and Riccardo Torlone, “DaWaII: a Tool
for the Integration of Autonomous Data Marts”, Proceedings of the
22nd International Conference on Data Engineering, in proceedings of
IEEE ICC, 2006.
164
6. L. Leonardi, S. Orlando, A. Raffaetà, A. Roncato, C. Silvestri, “Frequent
Spatio-Temporal Patterns in Trajectory Data Warehouses”, Pentice Hall
press, 2009.
7. Matteo Golfarelli, Stefano Rizzi, “A Comprehensive Approach to Data
Warehouse Testing”, Kluwer Academic Publishers, 2009.
8. E. Malinowski., E. Zimanyi, “A conceptual model for temporal data
warehouses and its transformation to the ER and the object-relational
models”, Elsevier-2008.
9. Arvind Selwa., “A Decision Support System for Village Economy
Development Planning” in proceedings of 8th Annual conference,
Delhi, 2007.
10. Alejandro Vaisman, Esteban Zimányi, “A Multidimensional Model
Representing Continuous Fields in Spatial Data Warehouses”, Nova
Science Publishers, 2009.
11. Abhirup Chakraborty, Ajit Singh, “A Partition-based Approach to
Support Streaming Updates over Persistent Data in an Active
DataWarehouse”, 2009.
12. V.Mallikarjuna Reddy, S.K.Jena, M.Nageswara Rao, “Active
Datawarehouse Loading by GUI Based ETL Procedure”, International
Conference on Computational Intelligence Applications, 2010.
165
13. Vijaya Bhaskar Velpula, Dayanandam Gudipudi, “Behavior-Anomaly-
Based System for Detecting Insider Attacks and Data Mining”,
International Journal of Recent Trends in Engineering, Vol. 1, No. 2,
May 2009.
14. Suneetha K.R, Dr. R. Krishnamoorthi, “Data Preprocessing and Easy
Access Retrieval of Data through Data Ware House”, Proceedings of
the World Congress on Engineering and Computer Science 2009,
October 2009.
15. Wenjing Zhang, Member, IEEE and Xin Feng, Senior Member, IEEE,
“Event Characterization and Prediction Based on Temporal Patterns
in Dynamic Data System”, Vol. 51, No. 7, pp. 115-125, September
2013.
16. Salvatore T. March, Alan R. Hevner, “Integrated decision support
systems: A data warehousing perspective”, 2007.
17. Alex Tze Hiang Sim, Maria Indrawan, Samar Zutshi, Member, IEEE,
and Bala Srinivasan, “Logic-Based Pattern Discovery”, IEEE
Transactions on Knowledge And Data Engineering, Vol. 22, No. 6,
June 2010.
18. Victoria Nebot, Rafael Berlanga, Juan Manuel Pérez, María José
Aramburu Torben Bach Pedersen, “Multidimensional Integrated
Ontologies: A Framework For Designing Semantic Data Warehouses”,
2009.
166
19. Maik Thiele Andreas Bader Wolfgang Lehner, “Multi-Objective
Scheduling for Real-Time Data Warehouses”, Kluwer Academic
Publishers, pp 62-633, 2009.
20. Janis Zuters, “Near Real-time Data Warehousing with Multi-stage
Trickle & Flip”, 2011.
21. B.Ashadevi and Dr.R.Balasubramanian, “Optimized Cost Effective
Approach for Selection of Materialized Views in Data Warehousing”,
Vol. 9 No. 1 April 2009.
22. Christian Thomsen, Torben Bach Pedersen, Wolfgang Lehner, “RiTE:
Providing On-Demand Data for Right-Time Data Warehousing”, 2008.
23. P. Urmila, K.Siva Rama Krishna, P. Raja Prakash Rao, “Scheduling of
Updates In Data Warehouses”, International Journal of Advanced
Computer and Mathematical Sciences, Vol 3, Issue 3, 2012.
24. Mohan Raj. A, M. N. Sushmitha, “Updates in Streaming Data
Warehouses by Scalable Scheduling”, International Journal of
Science and Research (IJSR), Volume 2 Issue 3, March 2013.
25. Liang Liu, Clio Andris, Carlo Ratti, “Uncovering cabdrivers’ behavior
patterns from their digital traces”, Computers, Environment and Urban
Systems 2010.
26. Alexandros Karakasidis, Panos Vassiliadis, Evaggelia Pitoura, “ETL
Queues for Active Data Warehousing”, 2005.
167
27. Paolo Giorgini, Stefano Rizzi, Maddalena Garzetti, “GRAnD: A Goal-
Oriented Approach to Requirement Analysis in Data Warehouses”,
2008.
28. Mohammad Hossein Bateni, Lukasz Golab, MohammadTaghi,
Hajiaghayi Howard Karloff, “Scheduling to Minimize Staleness and
Stretch in Real-Time Data Warehouses”, 2009.
29. Nittaya Kerdprasop, Narin Muenrat, and Kittisak Kerdprasop,
“Decision Rule Induction in a Learning Content Management System”,
World Academy of Science, Engineering and Technology, 2:2 2008.
30. Y.Y. Yao., “On Generalizing Rough Set Theory”, Kluwer Academic
Publishers, 2003.
31. Poongothai. K and S. Sathiyabama, “Efficient Web Usage Miner Using
Decisive Induction Rules”, Journal of Computer Science 8 (6):,835-
840, 2012, ISSN 1549-3636.
32. Shaoxu Song, Student Member, IEEE, Lei Chen, Member, IEEE, and
Jeffrey Xu Yu, Senior Member, IEEE, “Answering Frequent
Probabilistic Inference Queries in Databases”, IEEE Transactions On
Knowledge And Data Engineering, Vol. 23, No. 4, April 2011.
33. Man Lung Yiu, Ira Assent, Christian S. Jensen, Fellow, IEEE, and
Panos Kalnis, “Outsourced Similarity Search on Metric Data Assets”,
IEEE Transactions On Knowledge And Data Engineering, Vol. 24,
No. 2, February 2012.
168
34. Theoni Pitoura, Member, IEEE, Nikos Ntarmos, and Peter
Triantafillou, “Saturn: Range Queries, Load Balancing and Fault
Tolerance in DHT Data Systems”, IEEE Transactions on Knowledge
And Data Engineering, Vol. 24, No. 7, July 2012.
35. Sang Wan Lee, Yong Soo Kim, and Zeungnam Bien, Fellow, IEEE,
“A Nonsupervised Learning Framework of Human Behavior Patterns
Based on Sequential Actions”, IEEE Transactions on Knowledge And
Data Engineering, Vol. 22, No. 4, April 2010.
36. Degang Chen, Lei Zhang, Suyun Zhao, Qinghua Hu, and Pengfei
Zhu, “A Novel Algorithm for Finding Reducts with Fuzzy Rough Sets”,
2011.
37. Suyun Zhao, Student Member, IEEE, Eric C.C. Tsang, Degang Chen,
and XiZhao Wang, Senior Member, IEEE, “Building a Rule-Based
Classifier—A Fuzzy-Rough Set Approach”, IEEE Transactions On
Knowledge And Data Engineering, Vol. 22, NO. 5, May 2010.
38. Victoria Nebot, Rafael Berlanga, “Building Data Warehouses with
Semantic Web Data”, 2011.
39. Mouhamed gaith ayadi, Riadh bouslimi, jalel akaichi, “Modeling a Data
Warehouse of Annotation of Medical Images using UML profile”,
International Journal of Advanced Research in Computer Science and
Software Engineering, Volume 3, Issue 5, May 2013.
169
40. Jerzy Błaszczyn ski, Salvatore Greco, Roman Słowin ski, Marcin Szel,
“Monotonic Variable Consistency Rough Set Approaches”,
International Journal of Approximate Reasoning, 2009.
41. Anjana Gosain, Suman Mann, “Object Oriented Multidimensional
Model for a Data Warehouse with Operators”, International Journal of
Database Theory and Application, Vol. 3, No. 4, December, 2010.
42. Nayem Rahma, Jessica Marz and Shameem Akhter, “An ETL
Metadata Model for Data Warehousing”, Journal of Computing and
Information Technology - 2012.
43. Jing Li, Yin David Yang, and Nikos Mamoulis, “Optimal Route Queries
with Arbitrary Order Constraints”, IEEE Transactions on Knowledge
And Data Engineering, Vol. 25, No. 5, May 2013.
44. Salvatore Greco, Benedetto Matarazzo, Roman Słowin ski,
“Parameterized rough set model using rough membership and
Bayesian confirmation measures”, International Journal of
Approximate Reasoning, 2008.
45. Jianxin Li, Chengfei Liu, Rui Zhou and Jeffrey Xu Yu, Member, IEEE,
“Quasi-SLCA based Keyword Query Processing over Probabilistic
XML Data”, 2013.
46. Jerzy Błaszczynskia, Roman Słowinski, Marcin Szelaga, “Sequential
Covering Rule Induction Algorithm for Variable Consistency Rough
Set Approaches”, 2010.
170
47. Mirela danubianu, Candidate tiberiu socaciu, Candidate adina barila,
“Some Aspects Of Data Warehousing In Tourism Industry”, Volume 9,
No.1, 2009.
48. Esteban Zim´anyi, “Spatio-Temporal Data Warehouses and Mobility
Data: Current Status and Research Issues”, 2012.
49. Chin-Ang Wu, Wen-Yang Lin, Chang-Long Jiang, Chuan-Chun Wu,
“Toward Intelligent Data Warehouse Mining: An Ontology-Integrated
Approach for Multi-Dimensional Association Mining”, 2011.
50. Alejandro Vaisman and Esteban Zimanyi, “What is Spatio-Temporal
Data Warehousing? ”, 2008.
51. Barjesh Kochar and Rajender Chhillar, “An Effective Data
Warehousing System for RFID Using Novel Data Cleaning, Data
Transformation and Loading Techniques”, The International Arab
Journal of Information Technology, Vol. 9, No. 3, May 2012.
52. Jin Soung Yoo, Member, IEEE, and Shashi Shekhar, Fellow, IEEE,
“Similarity-Profiled Temporal Association Mining”, IEEE Transactions
on Knowledge and Data Engineering, Vol. 21, No. 8, August 2009.
53. Yuhua Qian, Chuangyin Dang, Jiye Liang, Dawei Tang, “Set-valued
ordered information systems”, Elsevier Science Direct on Information
Science, 2009.
171
54. Jung-Yi Jiang, Ren-Jia Liou, and Shie-Jue Lee, “A Fuzzy Self-
Constructing Feature Clustering Algorithm for Text Classification”,
IEEE Transactions on Knowledge and Data Engineering, Vol. 23,
No. 3, March 2011
55. Yun Yang, and Ke Chen, “Temporal Data Clustering via Weighted
Clustering Ensemble with Different Representations”, IEEE Transactions
on Knowledge and Data Engineering, Vol. 23, No. 2, February 2011
56. Rahmat Widia Sembiring, Jasni Mohamad Zain, Abdullah Embong,
“Clustering High Dimensional Data Using Subspace and Projected
Clustering Algorithms”, International journal of computer science &
information Technology (IJCSIT) Vol.2, No.4, DOI: 10.5121/ijcsit.2010.
2414, August 2010
57. Hans-Peter Kriegel, Peer Kroger, and Arthur Zimek, “Clustering
High-Dimensional Data: A Survey on Subspace Clustering, Pattern-
Based Clustering, and Correlation Clustering,” ACM Transactions on
Knowledge Discovery from Data, Vol. 3, No. 1, Article, Publication
date: March 2009.
58. Satish Gajawada, and Durga Toshniwal, “Vinayaka: A Semi-
Supervised Projected Clustering Method Using Differential Evolution”,
International Journal of Software Engineering & Applications (IJSEA),
Vol.3, No.4, DOI : 10.5121/ijsea.2012.3406 , July 2012.
172
59. Yanming Nie, Richard Cocci, Zhao Cao, Yanlei Diao, and Prashant
Shenoy, “SPIRE: Efficient Data Inference and Compression over
RFID Streams”, IEEE Transactions on Knowledge and Data
Engineering, Vol. 24, NO. 1, January 2012
60. Deng Cai, Chiyuan Zhang, Xiaofei He, “Unsupervised Feature
Selection for Multi-Cluster Data”, ACM journal, 2010.
61. Mohamed Bouguessa, and Shengrui Wang, “Mining Projected
Clusters in High-Dimensional Spaces”, IEEE Transactions on
Knowledge and Data Engineering, Vol. 21, No. 4, APRIL 2009
62. Emmanuel Muller, Stephan Gunnemann, Ira Assent, Thomas Seidl,
“Evaluating Clustering in Subspace Projections of High Dimensional
Data”, ACM Journal, 2009
63. Yogendra Kumar Jain. et al., “An Efficient Association Rule Hiding
Algorithm for Privacy Preserving Data Mining”, International Journal
on Computer Science and Engineering (IJCSE), 2011
64. K.Venkateswara Rao, A.Govardhan, and K.V.Chalapati Rao,
“Spatiotemporal Data Mining: Issues, Tasks and Applications”,
International Journal of Computer Science & Engineering Survey
(IJCSES) Vol.3, No.1, DOI : 10.5121/ijcses.2012. 3104 39, February
2012.
173
65. Man Lung Yiu, Ira Assent, Christian S. Jensen, and Panos Kalnis,
“Outsourced Similarity Search on Metric Data Assets”, IEEE
Transactions on Knowledge and Data Engineering, Vol. 24, No. 2,
February 2012.
66. L.V.S.S.Swarupa Penmetsa, Ch. Raja Ramesh, “Anonymization of
the Sequential Patterns in Location Based Service Environments”,
International Journal of Computer Technology & Research, IJCTR,
ISSN 2319-8184, Vol 1 Issue 1, October 2012.
67. Marco Muselli., and Enrico Ferrari., “Coupling Logical Analysis of Data
and Shadow Clustering for Partially Defined Positive Boolean
Function Reconstruction”, IEEE Transactions on Knowledge and Data
Engineering, Vol. 23, No. 1, January 2011.
68. Liang Wang, Christopher Leckie, Kotagiri Ramamohanarao, and
James Bezdek., “Automatically Determining the Number of Clusters in
Unlabeled Data Sets”, IEEE Transactions on Knowledge and Data
Engineering, Vol. 21, No. 3, March 2009.
69. Ning Zhong, Yuefeng Li, and Sheng-Tang Wu, “Effective Pattern
Discovery for Text Mining”, IEEE Transactions on Knowledge and
Data Engineering, Vol. 24, No. 1, January 2012.
70. Sharadh Ramaswamy, and Kenneth Rose, “Adaptive Cluster Distance
Bounding for High-Dimensional Indexing”, IEEE Transactions on
Knowledge and Data Engineering, Vol. 23, No. 6, June 2011.
174
71. Nan Zhang, Wei Zhao, “Privacy-Preserving OLAP: An Information-
Theoretic Approach”, IEEE Transactions on Knowledge and Data
Engineering, 2010.
72. Archana Tomar, Vineet Richhariya, Mahendra Ku. Mishra,
“A Improved Privacy Preserving Algorithm Using Association Rule
Mining In Centralized Database”, International Journal of Advanced
Technology & Engineering Research (IJATER), 2012.
73. N V Muthu Lakshmi, and Dr. K Sandhya Ran, “Privacy Preserving
Association Rule Mining Without Trusted Party For Horizontally
Partitioned Databases”, International Journal of Data Mining &
Knowledge Management Process (IJDKP) Vol.2, No.2, March 2012.
74. Tiancheng Li, Ninghui Li, Jian Zhang, Ian Molloy, “Slicing: A New
Approach to Privacy Preserving Data Publishing”, IEEE Transactions
on Knowledge and Data Engineering, volume: 24, Issue:3, 2012.
75. Eric Hsueh-Chan Lu, Vincent S. Tseng, and Philip S. Yu, “Mining
Cluster-Based Temporal Mobile Sequential Patterns in Location-
Based Service Environments”, IEEE Transactions on Knowledge and
Data Engineering, Vol. 23, No. 6, June 2011.
76. Kyriacos E. Pavlou, and Richard T. Snodgrass, “The Tiled Bitmap
Forensic Analysis Algorithm”, IEEE Transactions on Knowledge and
Data Engineering, Vol. 22, No. 4, April 2010.
175
77. Jyoti Jadhav, Lata Ragha, Vijay Katkar, “Incremental Frequent Pattern
Mining”, International Journal of Engineering and Advanced
Technology (IJEAT) ISSN: 2249 – 8958, Volume-1, Issue-6, August
2012.
78. Donghun Lee, Sang K. Cha, and Arthur H. Lee, “A Performance
Anomaly Detection and Analysis Framework for DBMS Development”,
IEEE Transactions on Knowledge and Data Engineering, Vol. 24,
No. 8, August 2012.
79. Foster. I; and Keselman. C., The Grid: Blue print for a New Computing
Infrastructure, San Francisco: Morgan Kaufimann Publishers, 1999.
176
PUBLICATIONS
Publications
International Journals
1. “Functional Behavior Pattern For Data Mart Based on Attribute
Relativity”, in the International Journal of Computer Science Issues
on Volume 9, Issue 4, No.1, July 2012, pp. 278-283
2. “Mining User Demanded Data From Data Mart Using Inductive
Rule Mining”, in the International Journal of Computer Science Issues
on Volume 9, Issue 4, No.1, July 2012, pp.405-410
National Journals
3. “Performance Evaluation of Functional Behavior Pattern” in the
National Journal of Engineering Today on Volume VII, May 2010,
pp.11-16.