introduction to data warehouse by sathish yellanki

42
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 1 Welcome To The World of Data Analysis And Business Intelligence

Upload: satishalerts

Post on 20-Jul-2016

40 views

Category:

Documents


4 download

DESCRIPTION

Introduction Notes To Data Warehousing

TRANSCRIPT

Page 1: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 1

Welcome To The World of Data Analysis And Business Intelligence

Page 2: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 2

What is A Data Warehouse?

Page 3: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 3

• A Data Warehouse is A Relational OR Multidimensional Database That is Designed For Query And Analysis.

• Data Warehouse is Not Optimized For Transaction Processing Instead Optimized For Query And Data Analysis.

• Data Warehouses Are Designed To Consolidate Historical And Transactional Data Derived From Multiple Data Sources.

• Data Warehouses Separate Analysis Workload From Transaction Workload And Enable An Organization To Consolidate Data From Several Data Sources.

• Data in A Data Warehouse Usually Stores Many Months OR Years of Data To Support Historical Analysis.

• The Data in A Data Warehouse is Typically Loaded Through An Extraction, Transformation, And Loading (ETL) Process From One OR More Data Sources.

• The Data Sources That Feed The Data For The Data Warehouse Can Be OLTP Applications, Mainframe Applications, OR External Data Providers.

• The End Users of The Data Warehouse Perform Data Analyses That is Often Time-Related.

• Sophisticated Analyses Include Trend Analyses And Data Mining, Which Use Existing Data To Forecast Trends OR Predict Futures.

• The Data Warehouse Typically Provides The Foundation For A Business Intelligence Environment.

Page 4: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 4

What Are The Key Characteristics of A Data Warehouse? • Within The Data Warehouse Some Data is De-Normalized For

Query Simplification And To Improve Analysis Performance. • Large Amounts of Historical Data is Used. • Queries Often Retrieve Large Amounts of Data. • Both Planned And Ad-Hoc Queries Are Common. • The Data Load is Controlled By Quantum And By Time.

What is The Key Point For Successful Data Warehouse? • High Data Throughput is The Key Point For A Successful Data

Warehouse.

What Are The Common Data Warehousing Tasks? • Configure A Database For Use As A Data Warehouse. • Take The Initial Steps in Consolidating The Data. • Begin To Define The Target Objects in The Warehouse. • Define Strategies For Extracting, Transforming, And Loading Data into

The Target. • Consider Performance Issues And Advanced ETL Concepts. • Deploy To Target Schemas And Execute ETL Logic. • Analyze Metadata Dependencies. • Write Efficient SQL. • Refresh The Data Warehouse. • Optimize Operations For Query Analysis. • Eliminate Performance Bottlenecks. • Review Some Basics of Data Warehouse Backup And Recovery. • Review Some Basics of Data Warehouse Security.

Page 5: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 5

What Does An Data Warehouse Environment Includes? • An Extraction, Transportation, Transformation, And Loading

(ETL) Solution. • An Online Analytical Processing (OLAP) Engine. • Client Analysis Tools. • Applications That Manage The Process of Gathering Data And

Delivering it To Business Users.

What is Data Warehouse By Definition? • A Data Warehouse As Per Bill Inmon is Collection of Data in

Support of Management's Decision Making Process Associated With Properties Like • Subject-Oriented • Integrated • Time-Variant • Non-Volatile

Subject Oriented • Data Warehouses Are Designed To Help The End User To

Analyze Data Specific To An Area of The Business.

Integrated • Data Warehouses Must Put Data From Disparate Sources into A

Consistent Format. • Data Warehouses Must Resolve Problems Such As

• Naming Conflicts

• Inconsistencies Among Units of Measure

Page 6: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 6

Nonvolatile • Nonvolatile Means That, Once Entered into The Warehouse,

Data Should Not Change.

• Nonvolatile is Logical Because The Purpose of A Warehouse is To Enable The End User To Analyze What Has Occurred.

• The Final Rule is Historical Data in A Data Warehouse Should Never Be Altered.

Time Variant • In Order To Discover Trends in Business, Analysts Need Large

Amounts of Data.

• This is Very Much in Contrast To Online Transaction Processing (OLTP) Systems, Where Performance Requirements Demand That Historical Data Be Moved To An Archive.

• Historical Data is Kept in A Data Warehouse With A Time Span of 3 Months, 6 Months, 12 Months, OR Even Older Data From A Data Warehouse.

What is Ralph Kimball View in Building Data Warehouse? • A Data Warehouse is A Copy of Transaction Data Specifically

Structured For Query And Analysis As Per The Subject of Analysis in Business Trends, Which is A Functional View of A Data Warehouse.

• Kimball Did Not Address How The Data Warehouse is Built, Rather He Focused on The Functionality of A Data Warehouse.

Page 7: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 7

Contrasting Environments of OLTP And Data Warehousing

OLTP

Complex Data Structures (3NF Databases)

Few

Many

Normalized DBMS

Rare

Data Warehouse

Multidimensional Data Structures

Many

Some

Denormalized DBMS

Common

Data Structures

Indexes

Joins

Duplicated Data

Derived Data And Aggregates

Page 8: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 8

Difference Between Data Warehouses And OLTP Systems Workload • Data Warehouses Are Designed To Accommodate Ad-Hoc Queries. • We Might Not Know The Workload of The Data Warehouse in Advance,

Hence A Data Warehouse Should Be Optimized To Perform Well For A Wide Variety of Possible Query Operations.

• OLTP Systems Support Only Predefined Operations. The Applications Might Be Specifically Tuned OR Designed To Support Only Specified Operations.

Data Modifications • A Data Warehouse is Updated on A Regular Basis By The ETL Process

Which is Run Nightly OR Weekly Using Bulk Data Modification Techniques.

• The End Users of A Data Warehouse Do Not Directly Update The Data Warehouse.

• In OLTP Systems, End Users Routinely Issue Individual Data Modification Statements To The Database.

• The OLTP Database is Always Up To Date, And Reflects The Current State of Each Business Transaction.

Schema Design • Data Warehouses Often Use Denormalized OR Partially Denormalized

Schemas Which is Generally A Star Schema To Optimize Query Performance.

• OLTP Systems Often Use Fully Normalized Schemas To Optimize Update OR Insert OR Delete Performance, And Should Guarantee Data Consistency.

Page 9: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 9

Typical Operations • A Typical Data Warehouse Query Scans Thousands OR Millions of Rows. • A Typical OLTP Operation Accesses Only A Handful of Records.

Historical Data • Data Warehouses Usually Store Many Months OR Years of Data, To

Support Historical Analysis. • OLTP Systems Usually Store Data From Only A Few Weeks OR Months,

Which is Only Historical Data As Needed To Successfully Meet The Requirements of The Current Transaction.

Can We Visualize The Data Warehouse Architecture? • Different Data Warehousing Systems Have Different Types of

Structural Requirements. • Some Data Warehouses May Have An Operational Data

Store(ODS), While Some May Have Multiple Data Marts. • Some Data Warehouses May Have A Small Number of Data

Sources, While Some May Have Dozens of Data Sources. • Because of These Dynamic Differences That Arise A Data

Warehouse is Considered As A Collection of Different Layers Rather Than Specifics of Any One System.

What Are The Different Layers of A Data Warehouse? • Data Source Layer • Data Extraction Layer • Staging Area • ETL Layer • Data Storage Layer

• Data Logic Layer • Data Presentation Layer • Metadata Layer • System Operations Layer

Page 10: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 10

Components of The Data Warehouse Architecture

Metadata Layer

Data Source Layer

Data Extraction

Layer

Staging Area

ETL Layer

Data Storage Layer

Data Logic Layer

Data Presentation

Layer

System Operations Layer

Data Source Layer • Data Source Layer Represents The Different Data Sources That

Feed Data into The Data Warehouse. • The Data Source Can Be of Any Format Which Can Be Plain

Text File, Relational Database, Other Types of Database, Excel File.

Page 11: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 11

• Many Different Types of Data Can Be A Data Source • Operations Data, Such As Sales Data, HR Data, Product Data,

Inventory Data, Marketing Data, Systems Data. • Web Server Logs With User Browsing Data. • Internal Market Research Data. • Third-Party Data, Such As Census Data, Demographics Data, OR

Survey Data.

Data Extraction Layer • In Data Extraction Layer The Data Gets Pulled From The Data

Source into The Data Warehouse System. • In The Data Extraction Layer There is Likely Some Minimal Data

Cleansing, But There is Unlikely Any Major Data Transformation.

Staging Area • Staging Area is The Area Where Data Sits Prior To Being

Scrubbed And Transformed into A Data Warehouse OR Data Mart.

• Having One Common Area Makes it Easier For Subsequent Data Processing OR Integration in Data Mart OR Data Warehouse.

ETL Layer • ETL Layer is The Layer Where Data Gains its "Intelligence", As

Logic is Applied To Transform The Data From A Transactional Nature To An Analytical Nature.

• ETL Layer is Also The Layer Where Data Cleansing is Implemented.

Page 12: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 12

• The ETL Design Phase is Often The Most Time-Consuming Phase in A Data Warehousing Project, To Make The Process More Quick And Productive An ETL Tool is Often Used in This Layer.

Data Storage Layer • Data Storage Layer is The Layer Where The Transformed And

Cleansed Data Actually Resides OR Sits. • Based on Scope And Functionality of The Project, Three Types

of Entities Can Be Found Here • Data Warehouse • Data Mart • Operational Data Store (ODS)

• Depending on The Requirement of Analysis Conducted Upon The System, We May Have Just One of The Three, Two of The Three, OR All of The Three Types.

Data Logic Layer • Data Logic Layer is The Layer Where Business Rules Are Stored. • Business Rules Stored in Data Logic Layer Do Not Affect The

Underlying Data Transformation Rules, But The Business Rules Do Affect The Way The Report Looks Like.

Data Presentation Layer • Data Presentation Layer Refers To The Information That

Reaches The End Users. • Data Presentation Layer Can Be in A Form of A Tabular OR

Graphical Report in A Browser.

Page 13: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 13

• The Report Can Be An Emailed Report That Gets Automatically Generated And Sent Everyday, OR An Alert That Warns Users of Exceptions, Among Others.

• Usually An OLAP Tool AND/OR A Reporting Tool is Used in The Data Presentation Layer.

Metadata Layer • Metadata Layer is The Layer Where Information About The Data

Stored in The Data Warehouse System is Stored. • A Logical Data Model Would Be An Example of Something That's

in The Metadata Layer. • A Metadata Tool is Often Used To Manage Metadata.

System Operations Layer • System Operations Layer Includes Information on How The

Data Warehouse System Operates, Such As • ETL Job Status • System Performance • User Access History

What Are The Different Data Warehouse Architectures? • Data Warehouses And Their Architectures Vary Depending

Upon The Specifics of An Organization's Situation. • Before Deciding The Type of Data Warehouse We Have To Study

The Constructive Requirement of The Client And His Need in Data Analysis.

• The Requirement Phase of The Data Warehouse Decides This.

Page 14: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 14

Three Common Architectures • Basic Data Warehouse Architecture • Staging Area Based Data Warehouse Architecture • Staging Area And Data Marts Integrated Data Warehouse Architecture

Basic Data Warehouse Architecture

Data Source

Operational System

Operational System

Flat Files

Metadata

Raw Data Summary

Data

Warehouse

Analysis

Reporting

Mining

Users

Page 15: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 15

Staging Area Based Data Warehouse Architecture

Data Source

Operational System

Operational System

Flat Files

Metadata

Raw Data Summary

Data

Warehouse

Analysis

Reporting

Mining

Users

Staging Area

Page 16: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 16

Staging Area And Data Marts Integrated Data Warehouse Architecture

Data Source

Operational System

Operational System

Flat Files

Metadata

Raw Data

Summary Data

Warehouse

Analysis

Reporting

Mining

Users

Staging Area

Sales

Purchasing

Inventory

Data Marts

Page 17: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 17

Let Us Understand The Data Warehouse Design

Page 18: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 18

What Are The Steps in Data Warehouse Project Cycle? • Requirement Gathering • Physical Environment Setup • Data Modeling • Extraction, Transformation And Loading(ETL) • OLAP Cube Design • Front End Development • Report Development • Performance Tuning • Query Optimization • Quality Assurance • Rolling Out To Production • Production Maintenance • Incremental Enhancements

What Are Components Set For Building A Data Warehouse? Task Description • Typically Needs To Be Accomplished During This Particular Data

Warehouse Design Phase. Time Requirement • A Rough Estimate of Time This Particular Data Warehouse Task Takes. Deliverables • Documents Fully Describing The Steps And Results of Particular Task,

Important For Consultants To Communicate Results To Clients. Possible Pitfalls • Things To Watch Out For Failure in Implementing Successful Data

Warehouse.

Page 19: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 19

Requirements Gathering

Page 20: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 20

Task Description • Requirements Gathering is The First Step in Which The Project Team

Will Engage on Highest Priority. • End Users May Not Be Familiar With The Data Warehousing Process OR

Concept, Hence The Help of The Business Sponsor is Essential. • Requirement Gathering Can Be Conducted As One-To-One Meetings OR

As Joint Application Development (JAD) Sessions, Where Multiple People Are Talking About The Project Scope in The Same Meeting.

• Requirements Gathering Phase May Identify The End User Reporting OR Analysis Requirements, And The Project Team Should Spend The Remaining Period of Time Trying To Satisfy These Requirements.

• Based on The Information Gathered Above, A Disaster Recovery Plan Needs To Be Developed So That The Data Warehousing System Can Recover From Accidents That Disable The System.

Time Requirement • 2 To 8 weeks.

Deliverables • A List of Reports OR Cubes To Be Delivered To The End Users By The

End of This Current Phase. • An Updated Project Plan That Clearly Identifies Resource Loads And

Milestone Delivery Dates.

Possible Pitfalls • Data From Multiple Sources Can Be Critical For Management. • Political Battles That Center on The Willingness of Information Sharing. • Unwillingness of Certain Groups To Release OR Participate in DWH. • High Level Influence To Make Sure Everyone Cooperates.

Page 21: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 21

Physical Environment Setup

Page 22: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 22

Task Description • Set Up The Physical Servers And Databases And Configure The Servers

And Databases. • It is Necessary To Set Up A Development And A Production

Environment At The Minimum OR Development, Testing, And Production Environments Separately.

• The Different Physical Environments Should Be Set Up And Configured For The Different Processes That Are Initiated And Operated in Data Warehouse, Each Environment Individually For ETL, OLAP Cube, And Reporting.

• It is Best For The Different Environments Are Configured With Their Own Distinct Application And Database Servers.

Time Requirement • The Servers And Database Configurations Should Take Less Than 1

Week.

Deliverables • Hardware OR Software Setup Document For All of The Environments,

Including Hardware Specifications, And Scripts OR Settings For The Software.

Possible Pitfalls • To Save on Capital, Often Data Warehousing Teams Will Decide To Use

Only A Single Database And A Single Server For The Different Environments.

• Instead of Having An Environment Physically Separated The Teams May Go For Logical Separation, Which Can Effect The Production And Development Parallel.

Page 23: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 23

Data Modeling

Page 24: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 24

Task Description • Data Modeling is The Most Important And Crucial Step in The Data

Warehousing Project. The Foundation of The Data Warehousing System Totally Depends on The Data Model.

• A Good Data Model Will Allow The Data Warehousing System To Grow Easily, As Well As Allowing For Good Performance.

• In Data Warehousing Project, The Logical Data Model is Built Based on User Requirements, And Then it is Translated into The Physical Data Model.

• Part of The Data Modeling Process is Mostly Dedicated in Identification of Data Sources in The Actual Relational System.

Time Requirement • 2 To 6 Weeks.

Deliverables • Identification of Data Sources. • Logical Data Model. • Physical Data Model.

Possible Pitfalls • It is Essential To Have A Subject-Matter Expert As Part of The Data

Modeling Team, Which Can Be Costly Resource Some Times. • Subject Matter Expert Can Be A Person Who Can Be An Outside

Consultant OR Someone In-House Who Has Extensive Experience in The Industry, Which Could Be Scarcity.

• The Absence of The Subject Matter Expert Makes it Difficult To Get A Definitive Answer on Many of The Questions That Arise in Reporting Requirement, And The Entire Project Gets Dragged Out.

Page 25: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 25

Extraction, Transformation And Loading(ETL)

Page 26: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 26

Task Description • The ETL (Extraction, Transformation, Loading) Process Typically Takes

The Longest Time To Develop, ETL Process Can Easily Take Up To 50% of The Data Warehouse Implementation Cycle OR Longer.

• The Main Reasons Are • Time To Get The Source Data • Understand The Necessary Columns • Understand The Business Rules • Understand The Logical And Physical Data Models

Time Requirement • 1 To 6 Weeks.

Deliverables • Data Mapping Document • ETL Script / ETL Package In The ETL Tool

Possible Pitfalls • Major Time Because of Lack of Experience Too Little Development Time

is Allocated For The Development Team To Develop The ETL. • End Users Can Excuse The Formatting Mistakes OR Pattern Combination

Mistakes But Cannot Tolerate The Data Which is Wrong by Values. • Lack of Experienced Personnel in The ETL Team Can Make The Process

More Complicated Than Necessary Creating Difference of Opinions. • In ETL Design, The Primary Goal Should Be To Optimize Load Speed

Without Sacrificing on Quality, Which is Many Times Not At All Applied. • Design Goal is To Cover All Possible Future Uses, Including Practical OR

Just A Figment of Someone's Imagination, Costing The Performance of The Entire Data Warehousing System.

Page 27: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 27

OLAP Cube Design

Page 28: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 28

Task Description • The Design of The OLAP Cube Can Be Derived From The Requirement

Gathering Phase. • End Users Have Some Idea on What They Want, But it is Difficult For

Them To Specify The Exact Report OR Analysis They Want To See. • It is Usually A Good Idea To Include Enough Information So That End

User Can Feel That The Data Warehouse is Containing At-Least Some Combination of Data As Expected.

• OLAP Cube is A Data Structure That Keeps A Pattern Based Data Combinations Very Specific To The Requirement of The Reports of The End User.

• The Data Warehousing is An Iterative Process, Hence No One Can Ever Meet All The Requirements All At Once in One Single Data Warehouse.

Time Requirement • 1 To 2 Weeks.

Deliverables • Documentation Specifying The OLAP Cube Dimensions And Measures. • Actual OLAP Cube OR Report.

Possible Pitfalls • We Should Make Sure That The OLAP Cube-Building Process is

Optimized And it Should Be Tested Thoroughly For its Performance And Optimization Before Execution.

• It is Common For The Data Warehouse To Be on The Bottom of The Nightly Batch Load, And After The Loading of The Data Warehouse, There Usually Isn't Much Time Remaining For The OLAP Cube To Be Refreshed.

Page 29: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 29

Front End Development

Page 30: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 30

Task Description • Data Warehouse Has No Recognition And Identification if The Users

Cannot Visualize The Reports, Regardless of its Strength of The OLAP Engine And The Integrity of The Data.

• The Front End Development is An Important Task And Challenging Environment of A Data Warehouse.

• The Reports Should Be Delivered Over The Web, Hence The Front-End Developers Should Conform To The Interne And Standardization To Standard Browsers.

• The Front-End Technology For Development Can Range From An Internal Front-End Development Team Using Scripting Languages Such As ASP, PHP, OR PERL, To Off-The-Shelf Products Such As Seagate Crystal Reports, To The More Higher-Level Products Such As Actuate. OR Business Objects OR COGNOS OR BI Publisher.

Time Requirement • 1 To 4 Weeks.

Deliverables • Front End Deployment Documentation

Possible Pitfalls • End Users Does Not Care The Complexity And Advancement of The

Front-End Technology And Least Cares The Infrastructure. • He Expects That The Information He Needs For Analysis Should Be

Presented in Timely Manner, With Higher Degree of Accuracy And Consistency.

• The Clients Rigidness Can Be A Major Concern For The Development Teams.

Page 31: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 31

Report Development

Page 32: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 32

Task Description • Report Specification Typically Comes Directly From The Requirements

Phase Itself.

• Report Development Plays A Very Important Role in Determining The Success of The Data Warehousing Project As End User Final Delivery Point is Reports.

• The Criticality of Report Design Depends on The Kind of Information And Arrangement The Client Expects in Presentation, The Dynamism Also Plays A Very Important Role Here.

• The Major Tasks of Accomplishments in report Design Are

User Customization

The Report Development Process Need To Consider The Metrics And Filter Standards Upon The Data.

Report Delivery

What Report Delivery Methods Are Needed? in Addition To Delivering The Report To The Web Front End, Other Possibilities That May Include Are Delivery Via Email, Via Text Messaging, OR in Some Form of Spreadsheet.

Access Privileges

Special Attention Needs To Be Paid To Who Should Have What Access To Which Information.

• Report Development Does Not Happen Only During The Implementation Phase of The Data Warehouse, It Can Even Continue After The System Goes into Production State As There May Be Requests For Additional Reports As To The Strategy of The Business Requirements of The Customer.

Page 33: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 33

• The Categories of Report Requests Raised By The Client Can Include Two Different Situations Data is Already Available in The Data Warehouse • In This Case, it is Straightforward To Develop The New Report into

The Front End. • There is No Need To Wait For A Major Production Push in This Case

Before Making New Reports Available. Data is Not Yet Available in The Data Warehouse • This Means That The Request Needs To Be Prioritized And Put into A

Future Data Warehousing Development Cycle.

Time Requirement • 1 To 2 Weeks.

Deliverables • Report Specification Documentation. • Reports Set Up in The Front End OR Reports Delivered To User's

Preferred Channel. • Confirmation Document on The Tasks That Are Completed As To The

Standards of The Client.

Possible Pitfalls • Make Sure The Exact Definitions of The Report Are Communicated To

The Users, Else The Interpretation of The Report Can Be Erroneous. • There Can Be Some Difference of Opinion From The Developers To

Subject Experts in Understanding The State of The Terminology And Jargon of The Business Domains.

• The Type of The Reporting Tool Determines The Complexity of Development.

Page 34: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 34

Performance Tuning

Page 35: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 35

Task Description • Three Major Areas Where A Data Warehousing System Uses

Performance Tuning Extraction, Transformation And Loading • The Data Load is Usually Very High Time-Consuming Process, Data

Warehousing Related Batch Jobs Are Typically of Lower Priority. Query Processing • In A ROLAP Environment OR in A System Where The Reports Are

Run Directly Against The Relationship Database, Query Performance Can Be An Issue.

• It is Ideal For The Data Warehousing Team To Invest Some Time To Tune The Query, Especially The Most Popular Ones.

Report Delivery • End Users May Experience Significant Delay in Receiving Reports

Due To Factors Other Than The Query Performance, Which Include Network Traffic, Server Setup, And The Way Front-End is Built.

Time Requirement • 3 To 5 Days.

Deliverables • Performance Tuning Document Along With Goal And Result

Possible Pitfalls • Make Sure The Development Environment Mimics The Production

Environment As Much As Possible. • Performance Enhancements Seen on Less Powerful Machines

Sometimes Do Not Materialize on The Larger, Production-Level Machines.

Page 36: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 36

Query Optimization

Page 37: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 37

Task Description • For Any Production Database, SQL Query Performance Becomes An

Issue Sooner or Later in The Process of The Project. • Having Long-Running Queries Consumes System Resources Making The

Server And Application Run Slowly, May Lead To Table Locking And Data Corruption Issues.

Principles For Query Optimization • We Should Understand How Our Database is Executing Our Query. • Retrieve As Little Data As Possible Through The Query. • Store Intermediate Results That Are Produced By The Query. • Sometimes Logic For A Query Can Be Quite Complex. Due To The Use of

Sub-Queries, Inline Views, And Union-Type Statements, in Which The Results Are Not Stored in The Database, But Are Immediately Used Within The Query, Leading To Performance Issues.

• To Increase Query Performance Store The Intermediate Results in A Temporary Table, And Break Up The Initial SQL Statement into Several SQL Statements.

• Build An Index on The Temporary Table To Speed Up The Query Performance.

Query Optimization Strategies. • Use Indexing Standards. • Aggregate Table With The Aggregated Data. • Vertical Partitioning • Horizontal Partitioning • De-Normalization • Server Tuning

Page 38: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 38

Quality Assurance

Page 39: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 39

Task Description • Once The Development Team Declares That Everything is Ready For

Further Testing, The QA Team Takes Over The Responsibility. • The QA Team is Always Proposed From The Client. • Usually The QA Team Members Will Know Little About Data

Warehousing, Hence May Resend The Need To Have To Learn Another Tool OR Tools.

• Sometimes The QA Process is Overlooked, Because of Which The Target Time of The Project May Increase.

Time Requirement • 1 To 4 Weeks.

Deliverables • QA Test Plan • QA Verification That The Data Warehousing System is Ready To Go into

Production

Possible Pitfalls • Usually The QA Team Members Know Little About Data Warehousing,

Because of Which The Process Can Become A Vicious Cyle. • Make Sure The QA Team Members Get Enough Education So That They

Can Complete The Testing Themselves. • The Client May Not Have Metrics OR Parameters That Are Definite For

The Quality He Expects. • The Data Warehousing Team Should Know The Standards of The QA

That Are Defined By The Client, Else The Directionality of The Process Can Reach Great Deviations.

• QA Parameters May Be Logical Terms OR Physical By Nature.

Page 40: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 40

Rollout To Production

Page 41: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 41

Task Description • Once The QA Team Gives Assurance on The Standards And Quality, it is

Time For The Data Warehouse System To Go Live. • Depending on The Number of End Users, Roll Out To Production Can

Take Up To A Full Week To Bring Everyone Online. • It is Easy When The End Users Access The Data Warehouse Over The

Web, Making Going Production Easier By Sending An URL Via Email.

Time Requirement • 1 To 3 Days.

Deliverables • Delivery of The Data Warehousing System To The End Users.

Possible Pitfalls • We Have To Take Care of The User Education Needs For Addressing The

System.

Production Maintenance Task Description • Once The Data Warehouse Goes into Production, it Needs To Be

Maintained By Change Requests That May Arise. • Tasks Such As Regular Backup And Crisis Management Become

Important And Should Be Planned Very Properly. • We Have To Consistently Monitor End User Usage of The System. • The Purpose of Monitoring The Data Warehouse

• To Capture Any Runaway Requests So That They Can Be Fixed Before Slowing The Entire System Down.

• To Understand How Much Users Are Utilizing The Data Warehouse For Return-on-investment Calculations And Future Enhancement Considerations.

Page 42: Introduction To Data Warehouse By Sathish Yellanki

Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 42

Time Requirement • It is Always Ongoing

Deliverables • Consistent Availability of The Data Warehousing System To The End

Users.

Possible Pitfalls • Usually By This Time Most of The Developers Could Have Left The

Project, So it is Essential That Proper Documentation Be Left For Those Who Are Handling Production Maintenance.

Incremental Enhancements Task Description • Once The Data Warehousing System Goes Live, There Are Often Needs

For Incremental Enhancements, Which Are Simple Small Changes That Follow The Business Itself.

Deliverables • Change Management Documentation • Actual Change To The Data Warehousing System

Possible Pitfalls • Many Times The Changes Are Simple To Make, it is Very Tempting To

Just Go Ahead And Make The Change in Production. • Many Unexpected Problems Can Pop Up if Changes Are Taken Suddenly. • It is Strongly Recommend That The Typical Life Cycle is Followed

• Development • QA • Production