data warehousing fundamentals · 1 the compelling need for data warehousing 1 1. chapter objectives...
TRANSCRIPT
-
DATA WAREHOUSINGFUNDAMENTALSA Comprehensive Guide forIT Professionals
PAULRAJ PONNIAH
A Wiley-Interscience PublicationJOHN WILEY & SONS, INC.New York / Chichester / Weinheim / Brisbane / Singapore / Toronto
Innodata0471463892.jpg
-
DATA WAREHOUSINGFUNDAMENTALS
-
DATA WAREHOUSINGFUNDAMENTALSA Comprehensive Guide forIT Professionals
PAULRAJ PONNIAH
A Wiley-Interscience PublicationJOHN WILEY & SONS, INC.New York / Chichester / Weinheim / Brisbane / Singapore / Toronto
-
Designations used by companies to distinguish their products are often claimed as trademarks. In all instanceswhere John Wiley & Sons, Inc., is aware of a claim, the product names appear in initial capital or ALL CAPITALLETTERS. Readers, however, should contact the appropriate companies for more complete information regardingtrademarks and registration.
Copyright © 2001 by John Wiley & Sons, Inc. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronicor mechanical, including uploading, downloading, printing, decompiling, recording or otherwise, except as permitted underSections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests tothe Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue,New York, NY 10158-0012, (212) 850-6011, fax (212) 850-6008, E-Mail: PERMREQ @ WILEY.COM.
This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold with the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, theservices of a competent professional person should be sought.
This title is also available in print as ISBN 0-471-41254-6.
For more information about Wiley products, visit our web site at www.Wiley.com.
http://www.Wiley.com
-
ToVimala, my loving wife
and to
Joseph, David, and Shobi,my dear children
-
CONTENTS
Foreword xxi
Preface xxiii
Part 1 OVERVIEW AND CONCEPTS
1 The Compelling Need for Data Warehousing 11 Chapter Objectives 11 Escalating Need for Strategic Information 21 The Information Crisis 31 Technology Trends 41 Opportunities and Risks1 Failures of Past Decision-Support Systems 71 History of Decision-Support Systems 81 Inability to Provide Information 91 Operational Versus Decision-Support Systems 91 Making the Wheels of Business Turn 101 Watching the Wheels of Business Turn 101 Different Scope, Different Purposes 101 Data Warehousing—The Only Viable Solution 121 A New Type of System Environment 121 Processing Requirements in the New Environment 121 Business Intelligence at the Data Warehouse 121 Data Warehouse Defined 131 A Simple Concept for Information Delivery 14
vii
6
-
1 An Environment, Not a Product 141 A Blend of Many Technologies 141 Chapter Summary 151 Review Questions 161 Exercises 16
2 Data Warehouse: The Building Blocks 19
1 Chapter Objectives 191 Defining Features 201 Subject-Oriented Data 201 Integrated Data 211 Time-Variant Data 221 Nonvolatile Data 231 Data Granularity 231 Data Warehouses and Data Marts 241 How are They Different? 2511 Top-Down Versus Bottom-Up Approach 261 A Practical Approach 271 Overview of the Components 281 Source Data Component 281 Data Staging Component 311 Data Storage Component 331 Information Delivery Component 341 Metadata Component 351 Management and Control Component 351 Metadata in the Data Warehouse 351 Types of Metadata 361 Special Significance 361 Chapter Summary 361 Review Questions 371 Exercises 37
3 Trends in Data Warehousing 39
1 Chapter Objectives 391 Continued Growth in Data Warehousing 401 Data Warehousing is Becoming Mainstream 401 Data Warehouse Expansion 411 Vendor Solutions and Products 421 Significant Trends 431 Multiple Data Types 441 Data Visualization 461 Parallel Processing 48
viii CONTENTS
-
1 Query Tools 491 Browser Tools 501 Data Fusion 501 Multidimensional Analysis 511 Agent Technology 511 Syndicated Data 521 Data Warehousing and ERP 521 Data Warehousing and KM 531 Data Warehousing and CRM 541 Active Data Warehousing 561 Emergence of Standards 561 Metadata 571 OLAP 571 Web-Enabled Data Warehouse 581 The Warehouse to the Web 591 The Web to the Warehouse 591 The Web-Enabled Configuration 601 Chapter Summary 611 Review Questions 611 Exercises 62
Part 2 PLANNING AND REQUIREMENTS
4 Planning and Project Management 63
1 Chapter Objectives 631 Planning Your Data Warehouse 641 Key Issues 641 Business Requirements, Not Technology 661 Top Management Support 671 Justifying Your Data Warehouse 671 The Overall Plan 681 The Data Warehouse Project 691 How is it Different? 701 Assessment of Readiness 711 The Life-Cycle Approach 711 The Development Phases 731 The Project Team 741 Organizing the Project Team 751 Roles and Responsibilities 751 Skills and Experience Levels 771 User Participation 781 Project Management Considerations 801 Guiding Principles 81
CONTENTS ix
-
1 Warning Signs 821 Success Factors 821 Anatomy of a Successful Project 831 Adopt a Practical Approach 841 Chapter Summary 861 Review Questions 861 Exercises 87
5 Defining the Business Requirements 89
1 Chapter Objectives 891 Dimensional Analysis 901 Usage of Information Unpredictable 901 Dimensional Nature of Business Data 901 Examples of Business Dimensions 921 Information Packages—A New Concept 931 Requirements Not Fully Determinate 931 Business Dimensions 951 Dimension Hierarchies/Categories 951 Key Business Metrics or Facts 961 Requirements Gathering Methods 971 Interview Techniques 991 Adapting the JAD Methodology 1021 Review of Existing Documentation 1031 Requirements Definition: Scope and Content 1041 Data Sources 1051 Data Transformation 1051 Data Storage 1051 Information Delivery 1051 Information Package Diagrams 1061 Requirements Definition Document Outline 1061 Chapter Summary 1061 Review Questions 1071 Exercises 107
6 Requirements as the Driving Force for Data Warehousing 109
1 Chapter Objectives 1091 Data Design 1101 Structure for Business Dimensions 1121 Structure for Key Measurements 1121 Levels of Detail 1131 The Architectural Plan 1131 Composition of the Components 114
x CONTENTS
-
1 Special Considerations 1151 Tools and Products 1181 Data Storage Specifications 1191 DBMS Selection 1201 Storage Sizing 1201 Information Delivery Strategy 1211 Queries and Reports 1221 Types of Analysis 1231 Information Distribution 12311 Decision Support Applications 1231 Growth and Expansion 1231 Chapter Summary 1241 Review Questions 1241 Exercises 125
Part 3 ARCHITECTURE AND INFRASTRUCTURE
7 The Architectural Components 127
1 Chapter Objectives 1271 Understanding Data Warehouse Architecture 1271 Architecture: Definitions 1271 Architecture in Three Major Areas 1281 Distinguishing Characteristics 1291 Different Objectives and Scope 1301 Data Content 1301 Complex Analysis and Quick Response 1311 Flexible and Dynamic 1311 Metadata-driven 1321 Architectural Framework 1321 Architecture Supporting Flow of Data 1321 The Management and Control Module 1331 Technical Architecture 1341 Data Acquisition 1351 Data Storage 1381 Information Delivery 1401 Chapter Summary 1421 Review Questions 1421 Exercises 143
8 Infrastructure as the Foundation for Data Warehousing 145
1 Chapter Objectives 1451 Infrastructure Supporting Architecture 145
CONTENTS xi
-
1 Operational Infrastructure 1471 Physical Infrastructure 1471 Hardware and Operating Systems 1481 Platform Options 1501 Server Hardware 1581 Database Software 1641 Parallel Processing Options 1641 Selection of the DBMS 1661 Collection of Tools 1671 Architecture First, Then Tools 1681 Data Modeling 1691 Data Extraction 1691 Data Transformation 1691 Data Loading 1691 Data Quality 1691 Queries and Reports 1701 Online Analytical Processing (OLAP) 1701 Alert Systems 1701 Middleware and Connectivity 1701 Data Warehouse Management 1701 Chapter Summary 1701 Review Questions 1711 Exercises 171
9 The Significant Role of Metadata 173
1 Chapter Objectives 1731 Why Metadata is Important 1731 A Critical Need in the Data Warehouse 1751 Why Metadata is Vital for End-Users 1771 Why Metadata is Essential for IT 1791 Automation of Warehousing Tasks 1811 Establishing the Context of Information 1831 Metadata Types by Functional Areas 1831 Data Acquisition 1841 Data Storage 1861 Information Delivery 1861 Business Metadata 1871 Content Overview 1881 Examples of Business Metadata 1881 Content Highlights 1891 Who Benefits? 1901 Technical Metadata 190
xii CONTENTS
-
1 2 Content Overview 1901 2 Examples of Technical Metadata 1911 2 Content Highlights 1921 2 Who Benefits? 19212 How to Provide Metadata 1931 2 Metadata Requirements 1931 2 Sources of Metadata 1941 2 Challenges for Metadata Management 1961 2 Metadata Repository 1961 2 Metadata Integration and Standards 1981 2 Implementation Options 1991 2 Chapter Summary 2001 2 Review Questions 2011 2 Exercises 201
Part 4 DATA DESIGN AND DATA PREPARATION
10 Principles of Dimensional Modeling 203
Chapter Objectives 203
From Requirements to Data Design 203
1 2 Design Decisions 2041 2 Dimensional Modeling Basics 2041 2 E-R Modeling Versus Dimensional Modeling 2091 2 Use of CASE Tools 209
The STAR Schema 210
1 2 Review of a Simple STAR Schema 2101 2 Inside a Dimension Table 2121 2 Inside the Fact Table 2141 2 The Factless Fact Table 2161 2 Data Granularity 217
STAR Schema Keys 218
1 2 Primary Keys 2181 2 Surrogate Keys 2191 2 Foreign Keys 219
Advantages of the STAR Schema 220
1 2 Easy for Users to Understand 2201 2 Optimizes Navigation 2211 2 Most Suitable for Query Processing 2221 2 STARjoin and STARindex 223
Chapter Summary 223
Review Questions 224
Exercises 224
CONTENTS xiii
-
11 Dimensional Modeling: Advanced Topics 225
Chapter Objectives 225
Updates to the Dimension Tables 226
1 2 Slowly Changing Dimensions 2261 2 Type 1 Changes: Correction of Errors 2271 2 Type 2 Changes: Preservation of History 2281 2 Type 3 Changes: Tentative Soft Revisions 230
Miscellaneous Dimensions 231
1 2 Large Dimensions 2311 2 Rapidly Changing Dimensions 2331 2 Junk Dimensions 235
The Snowflake Schema 235
1 2 Options to Normalize 2351 2 Advantages and Disadvantages 2381 2 When to Snowflake 238
Aggregate Fact Tables 239
1 2 Fact Table Sizes 2411 2 Need for Aggregates 2421 2 Aggregating Fact Tables 2431 2 Aggregation Options 247
Families of STARS 249
1 2 Snapshot and Transaction Tables 2501 2 Core and Custom Tables 2511 2 Supporting Enterprise Value Chain or Value Circle 2511 2 Conforming Dimensions 2531 2 Standardizing Facts 2541 2 Summary of Family of STARS 254
Chapter Summary 255
Review Questions 255
Exercises 256
12 Data Extraction, Transformation, and Loading 257
Chapter Objectives 257
ETL Overview 258
1 2 Most Important and Most Challenging 2591 2 Time-consuming and Arduous 2601 2 ETL Requirements and Steps 2601 2 Key Factors 261
Data Extraction 262
1 2 Source Identification 2631 2 Data Extraction Techniques 2631 2 Evaluation of the Techniques 270
xiv CONTENTS
-
Data Transformation 271
1 2 Data Transformation: Basic Tasks 2721 2 Major Transformation Types 2731 2 Data Integration and Consolidation 2751 2 Transformation for Dimension Attributes 2771 2 How to Implement Transformation 277
Data Loading 279
1 2 Applying Data: Techniques and Processes 2801 2 Data Refresh Versus Update 2821 2 Procedure for Dimension Tables 2831 2 Fact Tables: History and Incremental Loads 284
ETL Summary 285
1 2 ETL Tool Options 2851 2 Reemphasizing ETL Metadata 2861 2 ETL Summary and Approach 287
Chapter Summary 288
Review Questions 288
Exercises 289
13 Data Quality: A Key to Success 291
Chapter Objectives 291
Why is Data Quality Critical? 292
1 2 What is Data Quality? 2921 2 Benefits of Improved Data Quality 2951 2 Types of Data Quality Problems 296
Data Quality Challenges 299
1 2 Sources of Data Pollution 2991 2 Validation of Names and Addresses 3011 2 Costs of Poor Data Quality 302
Data Quality Tools 303
1 2 Categories of Data Cleansing Tools 3031 2 Error Discovery Features 3031 2 Data Correction Features 3031 2 The DBMS for Quality Control 304
Data Quality Initiative 304
1 2 Data Cleansing Decisions 3051 2 Who Should be Responsible? 3071 2 The Purification Process 3091 2 Practical Tips on Data Quality 311
Chapter Summary 311
Review Questions 312
Exercises 312
CONTENTS xv
-
Part 5 INFORMATION ACCESS AND DELIVERY
14 Matching Information to the Classes of Users 315
Chapter Objectives 315
Information from the Data Warehouse 316
1 2 Data Warehouse Versus Operational Systems 3161 2 Information Potential 3181 2 User-Information Interface 3211 2 Industry Applications 323
Who Will Use the Information? 323
1 2 Classes of Users 3231 2 What They Need 3261 2 How to Provide Information 329
Information Delivery 329
1 2 Queries 3311 2 Reports 3321 2 Analysis 3331 2 Applications 334
Information Delivery Tools 335
1 2 The Desktop Environment 3351 2 Methodology for Tool Selection 3351 2 Tool Selection Criteria 3381 2 Information Delivery Framework 340
Chapter Summary 341
Review Questions 341
Exercises 341
15 OLAP in the Data Warehouse 343
Chapter Objectives 343
Demand for Online Analytical Processing 344
1 2 Need for Multidimensional Analysis 3441 2 Fast Access and Powerful Calculations 3451 2 Limitations of Other Analysis Methods 3471 2 OLAP is the Answer 3491 2 OLAP Definitions and Rules 3491 2 OLAP Characteristics 352
Major Features and Functions 353
1 2 General Features 3531 2 Dimensional Analysis 3531 2 What are Hypercubes? 3571 2 Drill-Down and Roll-Up 3601 2 Slice-and-Dice or Rotation 362
xvi CONTENTS
-
1 2 Uses and Benefits 3631 1OLAP Models 3631 2 Overview of Variations 3641 2 The MOLAP Model 3651 2 The ROLAP Model 3661 2 ROLAP Versus MOLAP 3671 1OLAP Implementation Considerations 3681 2 Data Design and Preparation 3681 2 Administration and Performance 3701 2 OLAP Platforms 3721 2 OLAP Tools and Products 3731 2 Implementation Steps 374
Chapter Summary 374
Review Questions 374
Exercises 375
16 Data Warehousing and the Web 377
Chapter Objectives 377
Web-Enabled Data Warehouse 378
1 2 Why the Web? 3781 2 Convergence of Technologies 3801 2 Adapting the Data Warehouse for the Web 3811 2 The Web as a Data Source 382
Web-Based Information Delivery 383
1 2 Expanded Usage 3831 2 New Information Strategies 3851 2 Browser Technology for the Data Warehouse 3871 2 Security Issues 389
OLAP and the Web 389
1 2 Enterprise OLAP 3891 2 Web-OLAP Approaches 3901 2 OLAP Engine Design 390
Building a Web-Enabled Data Warehouse 391
1 2 Nature of the Data Webhouse 3911 2 Implementation Considerations 3931 2 Putting the Pieces Together 3941 2 Web Processing Model 394
Chapter Summary 396
Review Questions 396
Exercises 396
CONTENTS xvii
-
17 Data Mining Basics 399
Chapter Objectives 399
What is Data Mining? 400
1 2 Data Mining Defined 4011 2 The Knowledge Discovery Process 4021 2 OLAP Versus Data Mining 4041 2 Data Mining and the Data Warehouse 406
Major Data Mining Techniques 408
1 2 Cluster Detection 4091 2 Decision Trees 4111 2 Memory-Based Reasoning 4131 2 Link Analysis 4151 2 Neural Networks 4171 2 Genetic Algorithms 4181 2 Moving into Data Mining 419
Data Mining Applications 422
1 2 Benefits of Data Mining 4231 2 Applications in Retail Industry 4241 2 Applications in Telecommunications Industry 4251 2 Applications in Banking and Finance 426
Chapter Summary 426
Review Questions 426
Exercises 427
Part 6 IMPLEMENTATION AND MAINTENANCE
18 The Physical Design Process 429
Chapter Objectives 429
Physical Design Steps 430
1 2 Develop Standards 4301 2 Create Aggregates Plan 4311 2 Determine the Data Partitioning Scheme 4311 2 Establish Clustering Options 4321 2 Prepare an Indexing Strategy 4321 2 Assign Storage Structures 4321 2 Complete Physical Model 433
Physical Design Considerations 433
1 2 Physical Design Objectives 4331 2 From Logical Model to Physical Model 4341 2 Physical Model Components 4351 2 Significance of Standards 436
Physical Storage 438
xviii CONTENTS
-
1 2 Storage Area Data Structures 4391 2 Optimizing Storage 4401 2 Using RAID Technology 4421 2 Estimating Storage Sizes 442
Indexing the Data Warehouse 443
1 2 Indexing Overview 4431 2 B-Tree Index 4451 2 Bitmapped Index 4461 2 Clustered Indexes 4481 2 Indexing the Fact Table 4481 2 Indexing the Dimension Tables 449
Performance Enhancement Techniques 449
1 2 Data Partitioning 4491 2 Data Clustering 4501 2 Parallel Processing 4501 2 Summary Levels 4511 2 Referential Integrity Checks 4511 2 Initialization Parameters 4511 2 Data Arrays 452
Chapter Summary 452
Review Questions 452
Exercises 453
19 Data Warehouse Deployment 455
Chapter Objectives 455
Major Deployment Activities 456
1 2 Complete User Acceptance 4561 2 Perform Initial Loads 4571 2 Get User Desktops Ready 4581 2 Complete Initial User Training 4591 2 Institute Initial User Support 4601 2 Deploy in Stages 460
Considerations for a Pilot 462
1 2 When Is a Pilot Data Mart Useful? 4621 2 Types of Pilot Projects 4631 2 Choosing the Pilot 4651 2 Expanding and Integrating the Pilot 466
Security 467
1 2 Security Policy 4671 2 Managing User Privileges 4681 2 Password Considerations 4691 2 Security Tools 469
CONTENTS xix
-
Backup and Recovery 470
1 2 Why Back Up the Data Warehouse? 4701 2 Backup Strategy 4711 2 Setting Up a Practical Schedule 4721 2 Recovery 472
Chapter Summary 473
Review Questions 474
Exercises 474
20 Growth and Maintenance 477
1 Chapter Objectives 477Monitoring the Data Warehouse 478
1 2 Collection of Statistics 4781 2 Using Statistics for Growth Planning 4801 2 Using Statistics for Fine-Tuning 4801 2 Publishing Trends for Users 481
User Training and Support 481
1 2 User Training Content 4821 2 Preparing the Training Program 4821 2 Delivering the Training Program 4841 2 User Support 485
Managing the Data Warehouse 487
1 2 Platform Upgrades 4871 2 Managing Data Growth 4881 2 Storage Management 4881 2 ETL Management 4891 2 Data Model Revisions 4891 2 Information Delivery Enhancements 4891 2 Ongoing Fine-Tuning 490
Chapter Summary 490
Review Questions 491
Exercises 491
Appendix A. Project Life Cycle Steps and Checklists 493
Appendix B. Critical Factors for Success 497
Appendix C. Guidelines for Evaluating Vendor Solutions 499
References 501
Glossary 503
Index 511
xx CONTENTS
-
FOREWORD
I am delighted to share my thoughts with information technology professionals about myfaculty colleague Paulraj Ponniah’s textbook Data Warehousing Fundamentals. In thespring of 1998, Raritan Valley Community College decided to offer a course on datawarehousing. This was mainly through the initiative of Dr. Ponniah, who had been teach-ing our database design and development course for several years. It was very difficult tofind a good textbook for a college course on data warehousing. We had to settle for a bookthat was not quite suitable. In order to make the course effective, Paul had to supplementthe book with his own data warehousing seminar materials. Our students, primarily ITprofessionals from local industries, received the course very well. Now this magnificenttextbook on data warehousing comes to you through the foresight and diligent work of Dr.Ponniah, along with the insightful support of the publishers, John Wiley and Sons.
This book has numerous features that make it a winner:
� The order of topics is very logical.� The choice of topics is quite appropriate for a comprehensive introductory book.
The coverage of topics is also very well balanced.
� The subject matter is logically structured, with chapters covering essential compo-nents of the data warehousing field. The sequence of topics is well planned to pro-vide a seamless transition from design to implementation.
� Within each chapter, the continuity of topics is excellent.� None of the topics included in the textbook is superfluous to the basic objectives.� The material included is technically correct and up-to-date. The figures appropriate-
ly enhance and amplify the topics.
� Ample review questions and exercises can be found at the end of each chapter. Thisis something lacking in most books on data warehousing. These review questionsand exercises are pedagogically sound. They are designed to test the knowledge, notthe ignorance, of the reader.
xxi
-
Dr. Ponniah’s writing style is clear and concise. Because of the simplicity and com-pleteness of this book, I believe it will find a definite market niche, particularly amongcollege students, not-so-technically savvy IT people, and data warehousing mavens.
In spite of a plethora of books on data warehousing by luminaries such as Kimball, In-mon, Barquin, and Singh, this book fulfills a special purpose, and information technologyprofessionals will definitely benefit from reading it. In addition, the book should be wellreceived by college professors for use by students in their data warehousing courses. Toput it succinctly, this book fills a void in the midst of plenty.
In summary, Dr. Ponniah has produced a winner for both students and experienced ITprofessionals. As someone who has been in IT education for many years, I certainly rec-ommend this book to college professors and seminar leaders for their data warehousingcourses.
PRATAP P. REDDY, Ph.D.
Professor and Chair of CIS DepartmentRaritan Valley Community CollegeNorth Branch, New Jersey
xxii FOREWORD
-
PREFACE
THIS BOOK IS FOR YOU
Are you an information technology professional watching, with great interest, the massiveunfolding of the data warehouse movement? Are you contemplating a move into this newarea of opportunity? Are you a systems analyst, programmer, data analyst, database ad-ministrator, project leader, or software engineer eager to grasp the fundamentals of datawarehousing? Do you wonder how many different books you may have to read to learn thebasics? Are you lost in the maze of the literature and products on the subject? Do youwish for a single publication on data warehousing, clearly and specifically designed for ITprofessionals? Do you need a textbook that helps you learn the fundamentals in sufficientdepth—not more, not less? If you answered “yes” to any of the above, this book is writtenspecially for you.
This is the one definitive book on data warehousing clearly intended for IT profession-als. The organization and presentation of the book are specially tuned for IT professionals.This book does not presume to target anyone and everyone remotely interested in the sub-ject for some reason or another, but is written to address the specific needs of IT profes-sionals like you. It does not tend to emphasize certain aspects and neglect other criticalones. The book takes you over the entire landscape of data warehousing.
How can this book be exactly suitable for IT professionals? As a veteran IT profession-al with wide and intensive industry experience, as a successful database and data ware-housing consultant for many years, and as one who teaches data warehousing fundamen-tals in the college classroom and in public seminars, I have come to appreciate the preciseneeds of IT professionals, and in every chapter I have incorporated these requirements ofthe IT community.
xxiii
-
THE SCENARIO
Why are companies rushing into data warehousing? Why is there a tremendous surge ininterest? Data warehousing is no longer a purely novel idea just for research and experi-mentation. It has become a mainstream phenomenon. True, the data warehouse is not inevery doctor’s office yet, but neither is it confined to only high-end businesses. More thanhalf of all U.S. companies and a large percentage of worldwide businesses have made acommitment to data warehousing.
In every industry across the board, from retail chain stores to financial institutions,from manufacturing enterprises to government departments, and from airline companiesto utility businesses, data warehousing is revolutionizing the way people perform businessanalysis and make strategic decisions. Every company that has a data warehouse is realiz-ing the enormous benefits translated into positive results at the bottom line. These compa-nies, now incorporating Web-based technologies, are enhancing the potential for greaterand easier delivery of vital information.
Over the past five years, hundreds of vendors have flooded the market with numerousdata warehousing products. Vendor solutions and products run the gamut of data ware-housing—data modeling, data acquisition, data quality, data analysis, metadata, and soon. The market is already large and continues to grow.
CHANGED ROLE OF IT
In this scenario, information technology departments of all progressive companies per-ceive a radical change in their roles. IT is no longer required to create every report andpresent every screen for providing information to the end-users. IT is now charged withthe building of information delivery systems and letting the end-users themselves retrieveinformation in innovative ways for analysis and decision making. Data warehousing isproving to be just that type of successful information delivery system.
IT professionals responsible for building data warehouses need to revise their mindsetsabout building applications. They have to understand that a data warehouse is not a one-size-fits-all proposition; they must get a clear understanding of the extraction of data fromsource systems, data transformations, data staging, data warehouse architecture, infra-structure, and the various methods of information delivery.
In short, IT professionals, like you, must get a strong grip on the fundamentals of datawarehousing.
WHAT THIS BOOK CAN DO FOR YOU
The book is comprehensive and detailed. You will be able to study every significant topicin planning, requirements, architecture, infrastructure, design, data preparation, informa-tion delivery, deployment, and maintenance. It is specially designed for IT professionals;you will be able to follow the presentation easily because it is built upon the foundation ofyour background as an IT professional, your knowledge, and the technical terminology fa-miliar to you. It is organized logically, beginning with an overview of concepts, movingon to planning and requirements, then to architecture and infrastructure, on to data design,then to information delivery, and concluding with deployment and maintenance. This pro-
xxiv PREFACE
-
gression is typical of what you are most familiar with in your experience and day-to-daywork.
The book provides an interactive learning experience. It is not a one-way lecture. Youparticipate through the review questions and exercises at the end of each chapter. For eachchapter, the objectives set the theme and the summary provides a list of the topics cov-ered. You can relate each concept and technique to the data warehousing industry andmarketplace. You will notice a substantial number of industry examples. Although intend-ed as a first course on fundamentals, this book provides sufficient coverage of each topicso that you can comfortably proceed to the next step of specialization for specific roles ina data warehouse project.
Featuring all the significant topics in appropriate measure, this book is eminently suit-able as a textbook for serious self-study, a college course, or a seminar on the essentials. Itprovides an opportunity for you to become a data warehouse expert.
I acknowledge my indebtedness to the authors listed in the reference section at the endof the book. Their insights and observations have helped me cover adequately the topics. Imust also express my appreciation to my students and professional colleagues. Our inter-actions have enabled me to shape this textbook according to the needs of IT professionals.
PAULRAJ PONNIAH, Ph.D. Edison, New JerseyJune 2001
PREFACE xxv
-
DATA WAREHOUSINGFUNDAMENTALS
-
CHAPTER 1
THE COMPELLING NEED FOR DATA WAREHOUSING
CHAPTER OBJECTIVES
� Understand the desperate need for strategic information� Recognize the information crisis at every enterprise� Distinguish between operational and informational systems� Learn why all past attempts to provide strategic information failed� Clearly see why data warehousing is the viable solution
As an information technology professional, you have worked on computer applicationsas an analyst, programmer, designer, developer, database administrator, or project manag-er. You have been involved in the design, implementation, and maintenance of systemsthat support day-to-day business operations. Depending on the industries you haveworked in, you must have been involved in applications such as order processing, generalledger, inventory, in-patient billing, checking accounts, insurance claims, and so on.
These applications are important systems that run businesses. They process orders,maintain inventory, keep the accounting books, service the clients, receive payments, andprocess claims. Without these computer systems, no modern business can survive. Com-panies started building and using these systems in the 1960s and have become completelydependent on them. As an enterprise grows larger, hundreds of computer applications areneeded to support the various business processes. These applications are effective in whatthey are designed to do. They gather, store, and process all the data needed to successfullyperform the daily operations. They provide online information and produce a variety ofreports to monitor and run the business.
In the 1990s, as businesses grew more complex, corporations spread globally, andcompetition became fiercer, business executives became desperate for information to staycompetitive and improve the bottom line. The operational computer systems did provideinformation to run the day-to-day operations, but what the executives needed were differ-ent kinds of information that could be readily used to make strategic decisions. They
1
-
wanted to know where to build the next warehouse, which product lines to expand, andwhich markets they should strengthen. The operational systems, important as they were,could not provide strategic information. Businesses, therefore, were compelled to turn tonew ways of getting strategic information.
Data warehousing is a new paradigm specifically intended to provide vital strategic in-formation. In the 1990s, organizations began to achieve competitive advantage by build-ing data warehouse systems. Figure 1-1 shows a sample of strategic areas where datawarehousing is already producing results in different industries.
We will now briefly examine a crucial question: why do enterprises really need datawarehouses? This discussion is important because unless we grasp the significance of thiscritical need, our study of data warehousing will lack motivation. So, please pay close at-tention.
ESCALATING NEED FOR STRATEGIC INFORMATION
While we discuss the clamor by enterprises for strategic information, we need to look atthe prevailing information crisis that is holding them back as well as the technology trendsof the past few years that are working in our favor, enabling us to provide strategic infor-mation. Our discussion of the need for strategic information will not be complete unlesswe study the opportunities provided by strategic information and the risks facing a com-pany without such information.
Who needs strategic information in an enterprise? What exactly do we mean by strate-gic information? The executives and managers who are responsible for keeping the enter-prise competitive need information to make proper decisions. They need information toformulate the business strategies, establish goals, set objectives, and monitor results.
Here are some examples of business objectives:
� Retain the present customer base� Increase the customer base by 15% over the next 5 years
2 THE COMPELLING NEED FOR DATA WAREHOUSING
� Retail
� Customer Loyalty
� Market Planning
� Financial
� Risk Management
� Fraud Detection
� Airlines
� Route Profitability
� Yield Management
� Manufacturing
� Cost Reduction
� Logistics Management
� Utilities
� Asset Management
� Resource Management
� Government
� Manpower Planning
� Cost Control
Organizations achieve competitive advantage:
Figure 1-1 Organizations’ use of data warehousing.