bidw roadmap
TRANSCRIPT
BIDW RoadmapAuthor : Dave Goyal
BIDW Process Roadmap
2Author : Dave Goyal
Overall Process Program / Project Planning and
Management Business Process Definition Technical Architecture Design Product Selection and Installation Dimensional Modeling
3Author : Dave Goyal
Overall Process…Contd. Physical Design ETL Design and Development BI Application Design BI Application Development Deployment Change Management and Maintenance
4Author : Dave Goyal
Program / Project Planning and Management
Define the Project Build the Business Case and Justification Plan the Project Manage the Project Manage the Program
5Author : Dave Goyal
Business Process Define Business Process Define Requirements using Interviews Define Requirements using Facilitated
Sessions
6Author : Dave Goyal
Technical Architecture Design Back Room Architecture (Source , ETL) Presentation Server Architecture
(Dimensional Architecture) Front Room Architecture (BI) Additional Architecture Features
(Infrastructure, Metadata, Security)
7Author : Dave Goyal
Product Selection and Installation Architecture Plan (DW Architecture
Diagram and Application Architecture Document)
Product Selection (Hardware/OS, DBMS, ETL, BI, Data Profiling, Data Cleansing etc.)
8Author : Dave Goyal
Dimensional Modeling Process Value Chain Business Process Choose the Business Process Declare the Grain Identify the Dimensions Identify the Facts Enterprise Bus Matrix
9Author : Dave Goyal
Physical Design High Level Physical Design Develop Standards Develop the Physical Data Model Develop Initial Indexing Plan Design OLAP Database Design Aggregations
10Author : Dave Goyal
ETL Table Naming Convention D_ : Dimension Table F_ : Fact Table S_ : Source Table - Contains all data
copied directly from a source file X _ : Extract Table – Contains changed
source data only, Changes may be from an incremental extract or derived from a full extract
11Author : Dave Goyal
ETL Table Naming Convention 2 C_ : Clean Table – contains source rows that
have been cleaned E_ : Error Table - contains error rows found in
source data M_ : Master table – maintains history of all clean
rows T_ : Transform Table – contains the data
resulting from a transformation of source data
12Author : Dave Goyal
ETL Table Naming Convention 3 I_ : Insert Table – contains new data to be
inserted in dimension table U_ : Update Table – contains changed data
to be inserted in dimension table
13Author : Dave Goyal
Data Quality Avoid Null string in dimension tables Specify default value for NOT NULL
columns – ‘N/A’, ‘Not Known’, ‘Invalid’ Dimension Primary keys should be auto
generated surrogate keys. Allow data quality rows as 0, -1 , -2
Author : Dave Goyal 14
Surrogate Keys Always use surrogate keys for dimension
keys as auto generate keys Use SET IDENTITY ON and SET
IDENTITY OFF sql statement to create keys 0 , -1 and -2 rows for each dimension when it is created 0 : INVALID -1 : UNKNOWN -2 : NOT APPLICABLE
Author : Dave Goyal 15
ETL Design and Development Round Up the Requirements Extract Data from source (3 Steps) Clean and Conform Data (5 Steps) Delivering Data (13 Steps) Managing the ETL Environment (13 Steps)
16Author : Dave Goyal
ETL Roadmap
17Author : Dave Goyal
ETL Implementation Process Analyze data quality thoroughly and have
options available to resolve it Define Data source definitions Create High Level S2T Map Create Detail Level S2T Map Create Fact Worksheet
18Author : Dave Goyal
ETL Process…Extract Extract Data to S_Table (Full Load) Compare S_ to M_ table and load the difference
in X_ tables Clean X Table by removing duplicate rows from
X_ Table . De-duplication step Move duplicate rows to E_ Table Move non duplicate clean rows to C_ table Compare C_ to M_ and insert new into M and
update M_ with changed
19Author : Dave Goyal
ETL Process…Transform Select and Transform from C_ to T_ Compare T_ with D_ for new and changed
rows Insert New rows in I_ and changed rows in
U_
20Author : Dave Goyal
ETL Process…Load (I_) Insert rows directly into D_ table from I_ Update rows from U_ to D_ when its SCD
1,3. Insert rows from U_ to D_ when its SCD 2 Please Dimension or Surrogate keys will be
generated during Load stage
21Author : Dave Goyal
ETL Process… To remember S_ , X_ , M_ , C_ , E_ tables should be
named as source tables such S_Agents . T_ , I_ , U_ , or D_ table should be named
as target tables such as T_Agent, T_PolicyHolder etc.
Source table data size should follow source data formats except Natural keys should be varchar to accommodate data quality
22Author : Dave Goyal
High Level BIDW System Architecture Model
23Author : Dave Goyal
BI Application Design Define the structure of the portal and its
webpages Define High Level Reporting requirements
(Dashbaords, Scorecards) Define Analytical reporting requirements
( Cubes, Interactive reports, Adhoc Queries) Define Detailed reporting requirements
( Filter based reports, Adhoc queries)
24Author : Dave Goyal
BI Application Layers
Author : Dave Goyal 25
BI Application Development Setup the development environment Setup the Issue management system Develop all reports Test and Balance each report against the
source system
26Author : Dave Goyal
Deployment / Maintenance Design Version control system Define the change management process Define the documents to deploy changes
from Dev, Test, QA to Production Manage and maintain environments.
27Author : Dave Goyal