cs636 advanced database technologiessattler/teaching/bills-cs636.pdf · 2005-12-01 · introduction...

27
Data-warehousing in the “Real World” by William John Anthony Pinnington 30 th November, 2005 CS636 Advanced Database Technologies 1 © WJAP 2005

Upload: others

Post on 10-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

Data-warehousing in the “Real World”

byWilliam John Anthony Pinnington

30th November, 2005

CS636Advanced Database Technologies

1© WJAP 2005

Page 2: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

IntroductionWhat is the purpose of this presentation?

➲ Share my observations and experience of building a data-warehouse (DW) in the real world.

➲ Hints and tips for successful:

● implementation;● development; and● support.

2© WJAP 2005

Page 3: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

IntroductionWhy am I qualified to tell you this?

➲ Over seven years DW experience in industry working for a division of one of the largest food companies in the UK.

➲ Implemented my first data-warehouse in 2000.

➲ Responsible for the development, maintenance and support.

➲ Implemented models and reporting systems for all business functions and also provided help to group companies embarking on DW projects.

➲ Participation in web forums, discussion groups, user-groups.

➲ My MSc dissertation critiqued the value of End-User Computing (EUC) in a DW context.

➲ However, this discipline is changing very rapidly! 3© WJAP 2005

Page 4: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

IntroductionWhat is a data-warehouse?

➲ A data-warehouse is the back-end of a Business Intelligence (BI) system.

➲ Business Intelligence (BI) = Decision Support System (DSS) = Management Information System (MIS), etcetera.

➲ DSS are “interactive computer-based systems intended to help decision makers utilise data and models to identify and solve problems and make decisions” (Power, 1999).

➲ A data-warehouse is essentially a database comprised of objects, primarily tables and stored procedures.

➲ It is designed to be very efficient at serving queries

➲ It is a central repository for all corporate information.

4© WJAP 2005

Page 5: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

IntroductionWhy is a data-warehouse important?

➲ “BI systems are primarily concerned with helping individuals to extract useful information from data; the information is then used to make decisions” (Whitehorn, 2005).

➲ “The demand for enterprises to provide timely, accurate information to business users has never been greater. In modern decentralised organisations, users at all levels are being given unprecedented responsibility for important decision-making, which requires wider access to corporate data” (Sheina, 2005).

➲ Therefore, the providing accurate and timely information, in a manner which decision makers can readily understand can help make the company more competitive.

5© WJAP 2005

Page 6: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

IntroductionData-warehousing is non-trivial

➲ Organisations are not providing their employees with information in a manner which is appropriate for them.

➲ According to Mike Thoma of Actuate “Empowerment is all about making sure your users have the right information at the right time and in the right form, to take action” (Sheina, 2005).

➲ Simply providing a BI infrastructure and putting blind faith in the end-users to empower themselves is insufficient.

6

Computing (4th August, 2005)

© WJAP 2005

Page 7: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

Section 1: Implementation

7© WJAP 2005

Page 8: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

ImplementationWhy build a data-warehouse?

➲ The purpose of a DW is to support the BI environment

➲ If you are talking about DW, you are talking about medium-to-large enterprises and typically these operate an ERP System.

➲ Typically the ROI from the ERP is attributed to standardisation of business processes (Alvarez, 2002).

➲ The most important supporting actor is exploitation of the knowledge and information contained in the data.

➲ BI is the tool that exploits the latent value contained in data within the ERP system.

8© WJAP 2005

Page 9: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

ImplementationSimplified BI environment

9

IBM iSeries: The ERP system, Geac System 21, runs on a mid-range computer. It stores data in objects in a re-lational database.

Workstation: Users update the ERP system across the WAN from their PC via Client Ac-cess.

Client Access, Query, Excel, etc.: Native query running on an emulation session. Client access al-lowing file transfer via ODBC.

© WJAP 2005

Page 10: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

ImplementationWhy a DW is a sensible choice

➲ You can stage data for BI on hardware the ERP sits on, but:

● it is costly;

● results in contention issues;

● degradation of OLTP performance ;

● server optimisation compromise; and

● you really want to “query off-load.”

➲ The alternative is to implement a dedicated DW.

10© WJAP 2005

Page 11: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

ImplementationA mature DW environment

11

Workstation: Users update the ERP system across the WAN from their PC via Client Ac-cess. IBM iSeries: The ERP

system, Geac System 21, runs on a mid-range computer. It stores data in objects in a re-lational database.

Data Warehouse: The Data Warehouse runs on a Wintel plat-form. The database software is Microsoft SQL Server.

ETL: The ETL layer is provided by DTS. This uses standard OS/400 functionality to ‘mirror’ data in real time between the ERP and the DW.

BI Server: The BI Server also runs on a Wintel platform. The server runs a number of Cognos BI applications.

Cognos PowerPlay: Cognos PowerPlay is an OLAP BI tool.

Cognos Impromptu: Cognos Impromptu is a traditional query BI tool.

Cognos Planning: Cognos Planning is a financial budgeting and forecasting BI tool.

© WJAP 2005

Page 12: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

ImplementationExtract Transform Load (ETL)

➲ “The first issue that hit me like a two-by-four was the fact that integrating data from legacy sources is a very non-trivial thing. In the beginning, we thought, ‘Well, you’ve got this source of data over here; you just write a program and you bring the data forward into this data warehouse.’ I’ll never forget saying, ‘Gee, what’s so hard about this?’ Today, there’s a whole industry called ETL that does that” (Hayes, 2003).

➲ Pull or Push—the choice is yours

● Pull (ODBC/OLE-DB/DTS Packages)● Push (DataMirror Transformation Server/bespoke)

12© WJAP 2005

Page 13: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

ImplementationDesign and schema considerations

➲ The literature talks of snowflake and star schemas.

➲ I have never seen these followed in practice.

➲ In practice, the DW is simply a repository.

➲ It is a collection of tables to support reports and models.

➲ Share dimensions where possible but remember that duplication is okay—no really, it is!!!

13© WJAP 2005

Page 14: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

Section 2: Development

14© WJAP 2005

Page 15: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

DevelopmentPerformance

➲ De-normalisation

➲ Indexing

➲ Tight data-types

➲ Composite primary keys

➲ Null columns

➲ Temporary tables and tables variables

➲ Learn how your query optimiser works (re-compilation)

➲ Learn how data is physically stored in your database of choice

➲ READ, READ, READ. 15© WJAP 2005

Page 16: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

DevelopmentFuture-proofing

➲ Realise that ERP/BI/3rd Party software changes frequently.

➲ Recognise that the same business questions persist for centuries.

➲ Design your DW to be as agnostic as possible.

➲ Make judicious use of interface files.

16© WJAP 2005

Page 17: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

DevelopmentAccuracy and data-quality—fix at source imperative

➲ Good decisions come from good quality information.

➲ “Data quality is an extremely important issue since quality determines the data’s usefulness as well as the quality of the decisions based on the data” (Turban et al., 2002).

➲ “The business contribution of leadership and management activities depends on the quality of the decisions that are made and, concomitantly, the quality of the data used to make them” (Gendron and D’Onofrio, 2001).

➲ All data must embody an appropriate degree of quality for decisions that will be made on them (Ballou and Tayi, 1999).

➲ It is not only about actual data quality but also about perception because the perception affects the way in which decisions are enacted (Larcker and Parker, 1980). 17© WJAP 2005

Page 18: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

Section 3: Support

18© WJAP 2005

Page 19: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

SupportEasy things that make support easier

➲ Always leave things as you would hope to find them.

➲ Assume someone else will have to support this in the future.

➲ Stored procedures in logical steps.

➲ Naming conventions (just be consistent).

➲ Documentation through comments in your scripts.

➲ Neat scripts (consistent cases and tabs) [Example]

19© WJAP 2005

Page 20: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

SupportRecommended activities

➲ Find out where to get good support from, i.e. on-line work groups and forums—take the time to participate and you will learn a lot.

➲ Ask for time to re-develop old queries/reporting systems as your skills improve, i.e. don't just develop new systems.

➲ Incorporate new features that come with version upgrades, N.B. this is much easier if you beta-test software.

➲ Periodically (every 2-3 years) compare software from 3 vendors to prove that yours is still the best (or otherwise).

➲ Keep current (but not necessarily bleeding edge) with software.

➲ Remember that new systems will require more: (1) disk; (2) memory; and (3) CPU therefore build it into the budget. 20© WJAP 2005

Page 21: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

SupportEnd-User Computing (EUC)

➲ EUC is “the provision of a programming environment, normally by the information systems (IS) department, which allows users to tailor a system to their own needs using some form of programming syntax, having little or no interaction with IS during the process” (Friedman and Cornford, 1989).

➲ “On an individual level, end user computing involves the development and application of computing skills to fulfilling informational needs. As with any learning experience, a circular pattern develops: an end user acquires basic skills, applies these skills to new problems, and thereby develops the competence and confidence to acquire additional skills for more complex problems. Thus, each new end user represents a stream of future demands for training and support—demands that materialise as the user’s computing skills are developed and applied” (Huff et al., 1988).

21© WJAP 2005

Page 22: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

SupportUser Competence (UC)

➲ Munro et al. (1997) argue that UC is multi-faceted; that “it is composed of an individual’s breadth and depth of knowledge of the end user technologies, and his or her ability to creatively apply these technologies (finesse).”

➲ Training helps but doesn't deliver finesse.

➲ Finesse comes with experience and practice.

➲ Staff churn and re-assignment degrades competence over time.

➲ Most users (>80%) will be prevented from developing finesse by the power-users.

22© WJAP 2005

Page 23: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

SupportUser Resistance (UR)

➲ UR is nothing new; “it has plagued the computing community for decades” (Hirschheim and Newman, 1988).

➲ UR can take many forms but, in essence, it falls into three categories:

1. action;2. in-action; and3. chicanery.

➲ “Resistance is a complex phenomenon which defies simple prescriptions” (Hirschheim and Newman, 1988).

➲ Observe and use gut-instinct as it is usually correct.

23© WJAP 2005

Page 24: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

SupportDW and BI is political

➲ It is used to apportion blame—who is performing and who is not.

➲ All new systems change the balance of power.

➲ User participation is often an exercise in bounded-freedom and lip-service (Howcroft and Wilson, 2003).

➲ You just need to be able to tell when the game is being played.

24© WJAP 2005

Page 25: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

“It is not always what we know or analyzed before we make a decision that makes it a great decision. It is what we do after we make the decision to im-plement and execute it that makes it a good de-cision”

(William Pollard).

Thought for the day

25© WJAP 2005

Page 26: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

References

26

➲ Alvarez, R. (2002), The Myth of Integration: A Case Study of an ERP Implementation, Chapter 2 in Enterprise Resource Planning—Global Opportunities & Challenges, by Hossain, L., Patrick, J. D., and Rashid, M. A. (2002), Idea Publishing Group, Hershey, PA.

➲ Ballou, D. and G. Tayi, G. (1999), Enhancing data quality in data warehouse environments, Communications of the ACM, 42: pp. 73-78.

➲ Friedman, A. L. and Cornford, D. S. (1989), Computer Systems Development: History, Organization and Implementation, John Wiley and Sons, Chichester.

➲ Gendron, M. S. and D'Onofrio, M. J. (2001), Data Quality in the Healthcare Industry, Data Quality, 7(1), September 2001.

➲ Hayes, F. (2003), The Story So Far: Business Intelligence [Internet], Computer World 14th April, 2003. Available from: http://www.computerworld.com/databasetopics/data/story/0,10801,80227,00.html, (Accessed 18th August, 2005).

➲ Hirschheim, R. and Newman, M. (1988), Information Systems and User Resistance: Theory and Practice, The Computer Journal, 31(5), pp. 398-407.

© WJAP 2005

Page 27: CS636 Advanced Database Technologiessattler/teaching/Bills-CS636.pdf · 2005-12-01 · Introduction Data-warehousing is non-trivial Organisations are not providing their employees

References

27

➲ Huff, S.L., Munro, M.C., and Martin, B.H. (1988), Growth Stages of End User Computing, Communications of the ACM, 31(5), May 1988, pp. 542-550.

➲ Larcker, D. and Parker, L. (1980), Perceived usefulness of information: a psychometric examination, Decision Sciences, 11, pp. 121-134.

➲ Munro, M.C., Huff, S.L., Marcolin, B.L. and Compeau, D.R. (1997), Understanding and measuring user competence, Information & Management, 33, pp. 45-57.

➲ Power, D. J. (1999), Decision Support Systems Glossary [Internet], DSS Resources. Available from: http://dssresources.com/glossary/, (Accessed 18th August, 2005).

➲ Sheina, M. (2005), Controlled Empowerment [Internet], Computer Business Review On-line. Available from: http://www.cbronline.com/content/COMP/magazine/Articles/Data_Warehousing/000145.asp, (Accessed 25th July, 2005).

➲ Turban, E., McLean, E., Wetherbe, J. (2002), Information Technology for Management: Transforming Business in the Digital Economy, 3rd edition, John Wiley & Sons, Inc. Hoboken, NJ.

➲ Whitehorn, M. (2005), Oops! Sorry!, Business Intelligence, Server Management, May 2005, p.62.

© WJAP 2005