rajdeep chowdhury assistant professor, department of computer application jis college of...

28
Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit Pal Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India A Data Warehouse Architectural Design Using Proposed Pseudo Mesh Schema Amitava Ghosh Student , Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India @ICII 1 st – 2 nd December, 2012 47 th Annual National Convention of Computer Society of India Mallika De Professor, Department of Engineering and Technological Studies University of Kalyani, Kalyani, Nadia – 741235, West Bengal, India

Upload: leo-mccormick

Post on 02-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Rajdeep ChowdhuryAssistant Professor, Department of Computer ApplicationJIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India

Bikramjit Pal Assistant Professor, Department of Computer ApplicationJIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India

A Data Warehouse Architectural Design Using Proposed Pseudo Mesh Schema

Amitava GhoshStudent , Department of Computer ApplicationJIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India

@ICII 1st – 2nd December, 2012

47th Annual National Convention of Computer Society of India

Mallika De Professor, Department of Engineering and Technological StudiesUniversity of Kalyani, Kalyani, Nadia – 741235, West Bengal, India 

Page 2: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Abstract

Introduction

Proposed Schema

Mathematical Representation

Comparative Study

Conclusions

References

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 2

Presentation Outline

Page 3: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

A set of dimension tables interlinked with one another, ensuring improvement in the time complexity of a data warehouse is presented.

Fact tables and dimension tables have been used here, in their normalized form, to diminish redundancy.

The notion of pseudo mesh schema architecture, conforming interlinking of all dimension tables and fact tables to one another have been proposed.

The number of links can be precisely calculated using n (n-1)/2, wherein n embodies the number of tables present within the structure.

The structure is obviously a flexible one, as any increase or decrease of one or more databases within the structure, doesn’t affect the entire schema structure of the data warehouse.

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 3

Abstract

Page 4: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Introduction

Data warehouse is a set of incorporated databases designed to support decision-making and problem solving functions, comprising of highly summarized data.

Data warehouse has become an increasingly popular topic for researchers in respect to modern trends of business organizations.

Data warehouses are designed specifically to facilitate comprehensive reporting and adept analysis.

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 4

Page 5: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

As inferred from literature, data warehouse is invariably built using star schema and snowflake schema.

Star schema comprises of one or more fact table(s) connected with dimension tables.

Center of the star schema consist of one or more fact table(s) and it points to distinguished dimension tables .

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 5

Continued….

Page 6: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Snowflake schema is an extension of star schema where each point of the star explodes into more points.

In star schema, fact tables are in normalized format and dimension tables are in un-normalized format, keeping queries simple and furnishing fast response time.

In snowflake schema, both fact tables and dimension tables are in normalized format, thereby reducing the query performance, on the basis of existence of more joins.

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 6

Continued….

Page 7: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

In star schema, each dimension is represented by a solitary dimension table.

In snowflake schema, that particular dimension table is normalized into multiple lookup tables, each representing a level in the dimension hierarchy.

Both these architectures are well accepted by the industry, ignoring the ambiguity associated with them.

The proposed concept ensures that it will furnish a suitable mode of communication amongst n number of dimension tables.

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 7

Continued….

Page 8: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

The proposed schema is based on the pseudo mesh architecture.

Each node, rather dimension table is connected to other dimension table(s) within the data warehouse purview, with the concept of views generated from the original fact table.

Based on requirement, the number of dimension tables can be increased in number.

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 8

Proposed Schema

Page 9: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

With increase in dimension tables, there will be no amend in data within an existing design.

Only connectivity between dimension tables will increase.

The design formulated is said to be flexible.

In each and every fragment, there has to be a fact table included compulsorily, which would contain kn keys corresponding to n dimension tables within the data warehouse.

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 9

Continued….

Page 10: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Figure-1.1 The figure represents a mesh schema having two dimension tables

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 10

Continued….

Page 11: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

The first figure is an instance of the proposed schema.

The dimension table can be further fragmented or normalized into distinct normal forms, thereby diminishing redundancy and clustering the similar kind of data into a separate table.

The pictorial representation of the devised concept has been comprehensively ensured in the figure (Figure – 1.1) and then further fragmented and simplified from the preceding figure into the following figures (Figure – 1.2) and (Figure – 1.3).

 

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 11

Continued….

Page 12: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Figure-1.2 The figure represents a mesh schema consisting of three dimension tables and n (n-1)/2 view tables, that is, three view tables

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 12

Continued….

Page 13: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Continued….

Dimension tables T1 and T2 are further normalised into n smaller tables.

The number of connectivity between various dimension tables varies directly with the number of dimension tables including the fact tables.

The number of connectivity can be calculated as n (n-1)/2.

In the devised formula, n is the number of nodes.

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 13

Page 14: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Continued….

The imperative role of the proposed schema is to improve the competence and speed up the query processing by inserting a view table in between two dimension tables.

The table view_fact is a view created from the fact table.

 

A view is a virtual concept which does not require any physical storage .

Placing a view table between intermediate nodes improves the query processing, thereby reducing unnecessary searches and storage.

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 14

Page 15: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Continued….

Figure-1.3 The figure represents the view concept between various dimension tables

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 15

Page 16: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Continued….

The figure (Figure – 1.3) illustrates pseudo mesh schema architecture with view table between intermediate nodes.

The fact table consists of kn number of keys, with those set of keys being copied in the view table.

If the schema consists of n number of dimension tables, then for setting up relation between each dimension table with the others,

n (n-1)/2 number of view tables are required.

  A view table comprises of kn set of keys, from which a specific ki key

has to be chosen, for setting up relation between two dimension tables.

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 16

Page 17: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Mathematical Representation

The theoretical explanation can be well established mathematically as shown below: 

Let K1, K2, ……….., Kn be the number of keys in the fact table F.

As per the theory conjured up herein, the number of possible views that can exist in the schema structure is n.

Let Ki be the ith key, where 1<= i <=n, which is used to define the fact view FVi.

The fact view can then be used to join two dimension tables D1i and D2i. 

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 17

Page 18: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Continued….

It can also be represented as:

FV1U FV2 U…………….U FVn = FV

where, FVi ’s are the fact views and FV is the parent view defined on F.

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 18

Page 19: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Parameters Snowflake Schema Star Schema1 Ease of Use More complex queries, hence, less easy to

understandLess complex queries, hence, easy to understand

2 Query Performance or Query Execution time

More foreign keys, hence, more complexity and more query execution time

Lesser number of foreign keys, hence, less complexity and lesser query execution time

3 Normalization Contains normalized tables Contains un-normalized tables

4 Number of joins Higher number of joins Lesser number of joins

5 Dimension Tables Schema may have more than one dimension table for each dimension

Schema contains only solitary dimension table for each dimension

6 Maintenance No redundant data, and since tables are normalized, therefore it is easier to maintain and update

Schema contains redundant data, and therefore it is not so easy to maintain

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 19

Comparative Study

Page 20: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Lesser are the number of queries between the main dimension tables.

Complex queries may appear in the internal processing of each dimension table because of more number of foreign keys.

The circumstance can be further exterminated through proper query optimization techniques.

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 20

Continued….

Page 21: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Continued….

Each and every dimension table is expected to be highly normalized, in lieu with the architectural design formulated.

A dimension table can be conked out into n number of sub dimension tables.

The architectural design comprises only of one fact table and n number of dimension tables attached to the fact table.

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 21

Page 22: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Continued….

The speed can be optimized as there are unswerving relationships amongst the various dimension tables.

The process diminishes unnecessary comparisons amongst the various dimension tables in a data warehouse.

The concept not only diminishes the query processing time but also diminishes storage complexity.

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 22

Page 23: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Continued….

Establishment of relationships between various dimension tables require some space and for massive data warehouses, the space gradually intensifies.

The relation amongst various table(s) is established through view.

View is a logical concept and requires no memory space.

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 23

Page 24: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Continued….

Maintaining a data warehouse using mesh schema can be relatively difficult.

With increase in dimension tables, the number of view tables also increases.

However, increase in view tables does not generate the problem of space complexity.

It further enhances direct relation between dimension tables.

 

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 24

Page 25: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Conclusions

The paper confers a lucid notion of how the proposed pseudo mesh schema is better in comparison to the existing schemas for data warehouse design.

Increasing the number of views, does not exactly create any overhead in the design process.

If the number of dimension tables increases, it will also increase the complexity of the pseudo mesh.

Number of dimension tables can be controlled by limiting the size of the fact table.

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 25

Page 26: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Continued….

.

A newer and an innovative version of the pseudo mesh architecture can be definitely created.

The introduction of the innovative architectural purview furnishes an end user with the alternative mode of data warehouse design.

The incorporation of conceptual logic into practical scenario ensures an unfathomable foundation for the newer architectural mode.

The nuances that conjure up the design and the proposal of the innovative structure ensure its implementation in business organizations.

 

 R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 26

Page 27: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

References1. Chowdhury, R., Pal, B., “Proposed Hybrid Data Warehouse Architecture Based on Data Model”, International

Journal of Computer Science and Communication, 1 (2), (2010), pp. 211–213. 2. KALIDO Dynamic Information Warehouse – A Technical Overview, KALIDO White Paper, (2004).3. Chaudhuri, S., Dayal, U., “An Overview of Data Warehousing and OLAP Technology”, ACM SIGMOD Record, 26

(1), (1997). 4. Be_bel, B., Eder, J., Koncilia, C., Morzy, T., Wrembel, R., “Creation and Management of Versions in Multiversion

Data Warehouse”, Proceedings of SAC, Nicosia, Cyprus, (2004), pp. 717–723. 5. Blaschka, M., Sapia, C., Ho¨fling, G., “On Schema Evolution in Multidimensional Databases”, Proceedings of

DaWaK, (1999), pp. 153–164.6. Patel, A., Patel, J., M., “Data Modeling Techniques for Data Warehouse”, International Journal of Multidisciplinary

Research, 2 (2), (2012), pp. 240–246.7. Cosmadakis, S., S., Kanellakis, P., C., “Functional and Inclusion Dependencies: A Graph Theoretic Approach”,

Proceedings of 3rd PODS, (1984), pp. 29–37.8. De Castro, C., Grandi, F., Scalas, M., R., “On Schema Versioning in Temporal Databases”, “Recent Advances in

Temporal Databases”, Zurich, Switzerland, (1995), pp. 272–294.9. Eder, J., Koncilia, C., Morzy, T., “The COMET Metamodel for Temporal Data Warehouses”, Proceedings of

CAiSE, (2002), pp. 83–99.10. Eder, J., Koncilia, C., “Changes of Dimension Data in Temporal Data Warehouses”, Proceedings of DaWaK,

(2001), pp. 284–293.11. Golfarelli, M., Maio, D., Rizzi, S., “The Dimensional Fact Model: A Conceptual Model for Data Warehouses”,

International Journal of Cooperative Information Systems 7 (2–3), (1998), pp. 215–247.12. Golfarelli, M., Rizzi, S., “A Methodological Framework for Data Warehouse Design”, Proceedings of 1st DOLAP

Workshop, Washington, (1998), pp. 3–9.13. http://en.wikipedia.org/wiki/Star_schema.14. http://en.wikipedia.org/wiki/Snowflake_schema

R. Chowdhury, B. Pal, A. Ghosh, M. De ICII 2012 27

Page 28: Rajdeep Chowdhury Assistant Professor, Department of Computer Application JIS College of Engineering, Kalyani, Nadia – 741235, West Bengal, India Bikramjit

Authors are grateful to the Organizing Chair

and the Referees of ICII 2012

47th Annual National Convention of Computer Society of India