modeling&etldesign.pdf

71
RAINER:- CHRISTOPHER RICHARD 1 Data Warehousing For the Participants of IBM Bangalore Prepared By Christopher Richard Data Warehousing System Architect [Microsoft Certified Trainer] 2 OBJECTIVES This Training is for you, the Designers, managers, and owners of the data warehouse. This Training is a field guide, a set of tools, for designing, developing, and deploying data warehouses. Concrete and actionable The training describes a coherent framework that goes all the way from the original scoping of an overall data warehouse, through all the detailed steps of developing and deploying the data warehouse. Along the way, I hope to give you the perspective and judgment I have accumulated in doing several data warehouse installations and consultation assignments since 1996

Upload: murugananthamcm

Post on 24-Nov-2015

11 views

Category:

Documents


0 download

DESCRIPTION

Modeling&ETLDesign.pdf

TRANSCRIPT

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    1

    Data WarehousingFor the Participants of IBM Bangalore

    Prepared ByChristopher Richard

    Data Warehousing System Architect[Microsoft Certified Trainer]

    2

    OBJECTIVESThis Training is for you, the Designers, managers, and owners of the data warehouse.This Training is a field guide, a set of tools, for designing, developing, and deploying data warehouses.Concrete and actionableThe training describes a coherent framework that goes all the way from the original scoping of an overall data warehouse, through all the detailed steps of developing and deploying the data warehouse.Along the way, I hope to give you the perspective and judgment I have accumulated in doing several data warehouse installations and consultation assignments since 1996

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    3

    OBJECTIVESAchieve your goals of building a data warehouse more quicklyBuild effective data warehouses that match well against the goals.And Make fewer mistakes along the wayYou will not reinvent the wheel and discover previously owned truths.Structure and discipline to help in building a large and complex data warehouse.

    4

    Evolution of Data Warehousing

    How Did We Get Here?

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    5

    The progression1st data warehouse in 1905 by Dupont Corp

    1st data cube by sales, branch and date

    1970s - Management Decision Systems developed product called Express (Oracle) 1983 Metaphor - founded by Ralph Kimball and 2 partners as standalone DSS

    Lessons learned - manage information as corporate resource

    1980 - E.F.Codd - Promise of relational databases (data every which way)Inmon 1993 - Popularisation of the term

    6

    Evolution through 90sReporting Summarization EIS applications OLAP Data Mining Intelligent Agents Active Warehouses

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    7

    Data Warehousing Industry

    8

    Data Warehousing Industry

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    9

    IntroductionThe data warehouse marketplace has moved beyond its infancyA data warehouse is continuously evolving and dynamic.A data warehouse cannot be static.Complete Lifecycle perspective.At the very least, a data warehouse needs to evolve as fast as the surrounding organization evolves.Adjust our expectations and our techniques from the original idealistic, static view

    10

    IntroductionWe need design techniques that are flexible and adaptable.We need to be half DBA and half MBA.We need our changes to the data warehouse to always be graceful.There is a number of security topics you simply have to understand if you are going to perform your job responsibly.Welcome to Data Warehousing!!!!

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    11

    MESSAGE:Information Requirements are Increasing -

    Geometrically

    A Goodly Chunk of Them will have to be Met, so Build a Data Warehouse

    BUT, BEFORE YOU BUILD A DATA WAREHOUSE

    !The DW Consultants will Steal You Blind

    INFORM YOURSELF - If You Dont

    12

    TO INFORM YOURSELF:!READ: The Data Warehouse Toolkit

    !READ: The Data Warehouse Lifecycle Toolkit

    !JOIN: This Data Warehouse Training Program

    !ATTEND One Implementation Conference

    !WATCH Every Presentation on Data Warehousing you can

    !SUSCRIBE to these Listservs

    !DW-List: http://www.datawarehousing.com/list.asp

    !EduCause: http://www.educause.edu/memdir/cg/cg.html

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    13

    The Goals of a Data WarehouseThe Most important assets of an organization is almost always kept in two forms

    The Operational systems of recordThe Data Warehouse

    Ultimately, we need to put aside the details of implementation and modeling, and remember what the fundamental goals of the data warehouse are.Makes an organizations information accessibleMakes the organizations information consistentIs an adaptive and resilient source of informationIs a secure bastion that protects our information assetIs the foundation for decision makingIs accepted and used by the end user

    14

    The Chess PiecesSource System-

    An operational system of record whose function it is to capture the transactions of the businessMain Properties of a source system are uptime and and availability.

    Data Staging Area-A Storage area and set of processes that clean, transform, combine, de-duplicate, household, archive and prepare source data for use in the data warehouse.No User Query services

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    15

    The Chess PiecesPresentation Server -

    The target physical machine on which the data warehouse data is organized and stored for direct querying by end users, report writers, and other applications.

    Dimensional Model A specific discipline for modeling data that is an alternative to entity relationship (E/R) modeling.

    Business Process A coherent set of business activities that make sense to the business users of our data warehouses

    16

    The Chess PiecesROLAP ( Relational OLAP )

    A storage option or set of user interfaces and applications that give a relational database a dimensional flavor.

    MOLAP ( Multidimensional OLAP) A storage option or set of user interfaces and applications and proprietary database technology that have a strongly dimensional flavor.

    HOLAP ( Hybrid OLAP) A storage option of both relational and proprietary structure.

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    17

    The Chess PiecesData Mart

    A logical subset of the complete data warehouse.Data Warehouse -

    The queryable source of data in the enterprise.OLAP (On-line Analytic Processing)

    The general activity of querying and presenting text and number data from data warehouses, as well as a specifically dimensional style of querying and presenting that is exemplified by a number of OLAP vendors

    18

    The Chess PiecesEnd User Application

    A collection of tools that query, analyze, and present information targeted to support a business need.

    End User Data Access Tool -A client of the data warehouse.

    Ad Hoc Query Tool A specific kind of end user data access tool that invites the user to form their own queries by directly manipulating relational tables and their joins.

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    19

    View the dataCreate reportsAd-hocFine TuningAll DoneNOT!!

    20

    The Chess PiecesModeling Applications

    A sophisticated kind of data warehouse client with analytic capabilities that transform or digest the out put from the data warehouse.Modeling applications include :Forecasting modelsBehavior scoring modelsAllocation modelsData mining tools

    Metadata All the information in the data warehouse environment that is not the actual data itself.

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    21

    DWH Architecture

    supports

    External Sources

    Data Warehouse

    OLAP Servers

    Tools for extraction, cleaning, loading,

    integration, etc.

    Data Marts

    Operational DBsClient Tools

    Information Sources

    Data Mining

    OLAP tools for Queries/Reports

    Analysis

    22

    Two Different WorldsOLTP is profoundly different from dimensional data warehousing.Design techniques and design instincts appropriate for transaction processing are inappropriate and even destructive for data warehousing.Consistency

    OLTP consistency is microscopicAll we care about is that all transactions presented to the system have been accountedData warehouse has a quality assurance perspective.We care enormously that the current load of data is a full and consistent set of data

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    23

    Two Different WorldsTransaction

    OLTP system processes thousands or even millions of transitionsDW will process only one transaction per day.We call it a Production Data Load

    Users and ManagersOLTP system users turn the wheels of an organizationOLTP system users almost always deal with one account at a time.They perform the same task many, many times.Performance is the absolute king of the OLTP systemReporting is the primary activity of the Data warehouse.

    24

    Two Different WorldsOne Machine or Two

    The resource argument is usually sufficient reason to require a second machineThe data warehouse is often a centralized resource where data isintegrated from multiple remote OLTP systems.Data must be copied and restructured from the DW.

    The Time DimensionOLTP database is a twinkling databaseThis is the first temporal inconsistency that we avoid in a datawarehouse.It is a major burden on the OLTP system to correctly depict old history.

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    25

    Two Different WorldsThe Entity Relational Data Model

    E/R model the MiracleDrives out redundancyThe closest analogy is to the map of Los Angles.The E/R model is very symmetric.Huge number of connection paths between tables.The value of the E/R model is to use the tables individually andin pairsE/R models are a disaster for querying coz they cannot be understood by users.And cannot be navigated usefully by DBMS software.E/R model cannot be used as the basis for an enterprise DW.

    26

    A small subset of tables of an existing system

    Typical ERDs

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    27

    Northwind Database Model Relational Format

    Categories

    PK CategoryID

    I1 CategoryNameDescriptionPicture

    Territories

    PK TerritoryID

    TerritoryDescriptionFK1 RegionID

    Products

    PK ProductID

    I3 ProductNameFK2,I4,I5 SupplierIDFK1,I2,I1 CategoryID

    QuantityPerUnitUnitPriceUnitsInStockUnitsOnOrderReorderLevelDiscontinued

    CustomerCustomerDemo

    PK,FK2 CustomerIDPK,FK1 CustomerTypeID

    CustomerDemographics

    PK CustomerTypeID

    CustomerDesc

    EmployeeTerritories

    FK2 TerritoryIDFK1 EmployeeID

    Customers

    PK CustomerID

    I2 CompanyNameContactNameContactTitleAddress

    I1 CityI4 RegionI3 PostalCode

    CountryPhoneFax

    Region

    PK RegionID

    RegionDescription

    Order Details

    PK,FK1,I2,I1 OrderIDPK,FK2,I4,I3 ProductID

    UnitPriceQuantityDiscount

    Shippers

    PK ShipperID

    CompanyNamePhone

    Orders

    PK OrderID

    FK1,I1,I2 CustomerIDFK2,I4,I3 EmployeeIDI5 OrderDate

    RequiredDateI6 ShippedDateFK3,I7 ShipVia

    FreightShipNameShipAddressShipCityShipRegion

    I8 ShipPostalCodeShipCountry

    Suppliers

    PK SupplierID

    I1 CompanyNameContactNameContactTitleAddressCityRegion

    I2 PostalCodeCountryPhoneFaxHomePage

    Employees

    PK EmployeeID

    I1 LastNameFirstNameTitleTitleOfCourtesyBirthDateHireDateAddressCityRegion

    I2 PostalCodeCountryHomePhoneExtensionPhotoNotes

    FK1 ReportsToPhotoPath

    28

    The Dimensional ModelA Simple data cube structure that matches end users needs for simplicityThe dimensional model is very asymmetric.One large dominant table in the center of the schema.It is the only table in the schema with multiple joins.The center table is called the Fact Table.The other tables are called the Dimension Tables.

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    29

    Components of a Star SchemaComponents of a Star Schema

    30

    Star Schema ExampleStar Schema Example

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    31

    Northwind Database Star Schema Orders

    d im C u s to m e rs

    P K C u s to m e r K e y

    C u s to m e r IDC o m p a n y N a m eC o n ta c tN a m eC o n ta c tT it leA d d re s sC ityR e g io nP o s ta lC o d eC o u n tryP h o n eF a xC u s to m e rT y p e IDC u s to m e rD e s c

    d im S h ip p e rs

    P K S h ip p e rK e y

    S h ip p e r IDC o m p a n y N a m eP h o n e

    fc tO rd e rs

    P K O r d e r K e y

    F K 3 P ro d u c tK e yF K 2 E m p lo y e e K e yF K 1 C u s to m e rK e yF K 4 S h ip p e rK e yF K 6 O rd e rD a te K e yF K 5 R e q u ire d D a te K e yF K 7 S h ip p e d D a te K e y

    O rd e r IDS h ip V iaF re ig h tS h ip N a m eS h ip A d d re s sS h ip C ityS h ip R e g io nS h ip P o s ta lC o d eS h ip C o u n try

    d im E m p lo y e e s

    P K E m p lo y e e K e y

    E m p lo y e e IDL a s tN a m eF irs tN a m eT it leT it le O fC o u rte s yB ir th D a teH ire D a teA d d re s sC ityR e g io nP o s ta lC o d eC o u n tryH o m e P h o n eE x te n s io nP h o toN o te sR e p o r ts T oP h o to P a thT e r r ito ry IDT e r r ito ry D e s c r ip t io nR e g io n IDR e g io n D e s c r ip t io n

    d im O rd e rD e ta ils

    P K P r o d u c tK e y

    O rd e r IDU n itP r ic eQ u a n tityD is c o u n tE x te n d e d P r ic eP ro d u c tIDP ro d u c tN a m eQ u a n tity P e rU n itU n itP r ic eU n its In S to c kU n its O n O rd e rR e o rd e rL e v e lD is c o n t in u e dC a te g o ry IDC a te g o ry N a m eD e s c r ip t io nS u p p lie r IDC o m p a n y N a m eC o n ta c tN a m eC o n ta c tT it leA d d re s sC ityR e g io nP o s ta lC o d eC o u n tryP h o n eF a xH o m e P a g e

    d im D a te

    P K D a te K e y

    D a y D a teD a y D a te _ Y Y Y Y M M D DD a y O fW e e k N a m eD a y O fW e e k N a m e A b b rvD a y N u m b e rIn W e e kD a y N u m b e rIn M o n thD a y N u m b e rIn Q u a r teD a y N u m b e rIn Y e a rW e e k D a y In d ic a to rW e e k E n d In d ic a to rW e e k _ Y Y Y Y W WW e e k N u m b e r In Y e a rM o n th _ Y Y Y Y M MM o n th N a m eM o n th N a m e A b b rvM o n th N u m b e r In Y e a rQ u a r te r_ Y Y Y Y QQ u a r te rN a m eQ u a r te rN a m e A b rvQ u a r te rN u m b e r In Y e a rY e a r

    32

    Dimensions in Data AnalysisIn the world of data warehousing, a summarizable numerical value that you use to monitor your business is called a FACT

    When looking for numeric information your first question will be What Fact U want to see?

    You could look at lets say, sales units, sales dollars, defects etc.

    Suppose that U ask to see a report of your companys Units Sold.Heres what u get:

    113

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    33

    Dimensions in Data AnalysisLooking at one value doesnt tell you much. You want to break it into some thing more informative. For example, how has your company done over time.You ask for a monthly report on Units SoldHeres the new report

    January February March April

    14 41 33 25

    34

    Dimensions in Data AnalysisYour Still not satisfied with the monthly report. Your company sells more than one product how did each of those products do over time?You ask for a new report on Units Sold by product and timeHeres the new report

    6 17Feb Mar AprJan

    Salt BreadSweet BreadMuffins

    68

    1625

    621

    8

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    35

    Dimensions in Data AnalysisSuppose your company sells in two different states and you would like to know how each product is doing each month in each state.You ask for a new report on Units Sold by product by time and stateHeres the new report

    3 10Feb Mar AprJan

    Salt BreadSweet BreadMuffins

    34

    1616

    66

    3 7Salt BreadSweet BreadMuffins

    34 9 15

    8

    KA

    TN

    36

    Dimensions in Data AnalysisWhichever way you layout your report, it has 3 independent list of labels

    The total number of potential values in the report equals the number of unique items in the first independent list of labels(2 States) & the number of unique items in the second independent list of labels(3 products) * the number of unique items in the third independent list of labels(4 months)

    In place of independent list of labels, data warehouse designers borrow the term dimension from mathematics.

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    37

    Dimensions in Data AnalysisThus our report has 3 dimensions TIME, STATE and PRODUCTSThe items in a dimension are called members of that dimension.

    38

    Hierarchies in Data AnalysisGrouping aggregating is the way that humans deal with numerous items.

    Once your company has sold items for over a year you would like to look at reports for a year, quarter and month.

    But how do aggregations such as quarters fit into a dimension.

    Generally you think of members in a dimension as belonging together

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    39

    Hierarchies in Data AnalysisDo months and Qtr belong togetherMonths & Quarters form an hierarchy within the Time Dimension, and each degree of summarization is referred to as a level.The member at the lowest level of detail are called leaf members.There are 3 types of hierarchies that you may encounter

    Balanced HierarchiesUnbalanced HierarchiesRagged Hierarchies

    40

    Balanced Hierarchies1998

    Qtr1 Qtr2 Qtr3 Qtr4

    Jan Feb Mar

    Apr May Jun AugJul Sep

    Oct Nov Dec

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    41

    Unbalanced HierarchiesSheri

    Darren Maya

    Rebecca Walter

    Brenda Jonathan

    42

    Ragged HierarchiesNorth America

    USA Canada Mexico

    North West

    CaliforniaOregon WashingtonBrit

    ColumbiaDist

    FederalZacatecas

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    43

    Fact TableA Fact Table is a table in the relational data warehouse that stores the detailed values for measures, or facts.Example a fact table that stores Dollars and Units by state, by product and by Month has five columns.

    The first 3 columns are Key columns, the remaining two are measure values.

    State Product Month Units Dollars

    44

    Fact TableEach column in the fact table should be either a key or a measure.

    The fact table must contain a column for each measure.

    The fact table must contain rows at the lowest level of detail you might want to retrieve for a measure.

    A fact table almost always uses an integer key for each member rather than a descriptive name.

    The key column for a date dimension might be either an integer key or a date.

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    45

    Dimension TablesA dimension table contains one row for each leaf level member of the dimension.Ex. A product dimension table with 3 products will have 3 rows.

    In most cases a dimension table also contains one column containing a numeric key columns that uniquely identifies each member.

    This column that contains the unique value is the primary key and references the foreign key in the fact table.

    46

    Dimension TablesIf the dimension is involved in a balanced hierarchy it will have an additional column that gives the parent for each member.Ex.if you have 3 products in a dimension table that belong to a particular product Subcategory your table will look like this.

    PROD_ID Prod_Name SubCategory

    5895921218

    Sweet MuffinsCoconut MuffinsSalt Bread

    MuffinsMuffinsBread

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    47

    Star SchemaWhen each dimension is stored in a single table, the databases organization is called a star Schema Design.

    When a Database Dimensions are stored in a chain of tables, the databases design is called a Snowflake Design.

    A relational database must perform time consuming joins each time a report executes, and a star design for a dimension requires fewer joins than a snowflake design.

    48

    CUSTOMER

    PK CUSTKEY

    NAMESTREETCITYSTATEZIP

    SHIPMENTS

    PK,FK4 PRODKEYPK INVOICE

    FK1,I1 PERKEYFK2,I3 CUSTKEYFK3,I4 SHIPKEY

    DOLLARSWEIGHT

    PERIOD

    PK PERKEY

    MONTHYEARQUARTERTRIDATE_COL

    PRODUCT

    PK PRODKEY

    PRODUCTDISTRIBUTORBERRYAROMAACIDBODYROAST

    SHIPDATE

    PK PERKEY

    MONTHYEARQUARTERTRIDATE_COL

    Stargood

    SnowflakeBAD!!!!

    D_PROD

    I1 PROD_CODE

    PROD_NAMEPOSITIONTYPEVERSION

    Star V/s Snowflake Schema

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    49

    Star Schema with Sample Data

    50

    TipSome times when we are designing a DW, it is unclear whether a numeric data field extracted from a production data source is a fact or an attribute.Simply ask yourself the question.Is the numeric data field a measurement that varies every time we sample it?

    Or whether it is a discretely valued description of some thing that is more or less constant?

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    51

    DataConnection(s) LayerETLQuery ToolsAnalysis ToolsPresentation InterfaceQuality Assurance procedures

    *Politics*

    Data Warehouse System

    52

    Basic Processes - Data WarehouseExtracting The first step of getting Data into the data warehouse.

    Transformation Once data extracted into the data staging area, many possible transformation steps, including Cleaning the data, correcting misspelling, purging selected fields, Creating Surrogate keys for each dimension, Building Aggregates etc.

    Loading and Indexing Loading in the data warehouse.

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    53

    Excel SpreadsheetsAccess databaseA plethora of other RDBMSs

    Most of your work will be in the ETL, data staging area. This will make or break your project!

    Consolidation of Disparate Data Sources

    54

    Basic Processes - Data WarehouseQuality Assurance Checking Quality assurance can be checked by running a comprehensive exception report over the entire new set of newly loaded data.

    Release/Publishing - The User community must be notified that the new data is ready.

    Updating Modern data marts may well be updated, sometimes frequently. Changes in labels, changes in hierarchies, changes in status, and changes in corporate ownership.

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    55

    Basic Processes - Data WarehouseQuerying Querying is abroad term that encompasses all the activities of requesting data from a data mart.

    Data Feedback/Feeding in Reverse The data can also flow in the opposite direction uphill from the traditional flow we have discussed.

    Auditing At times it is critically important to know where the data came from and what were the calculations performed. For this you can create special audit records.

    56

    Basic Processes - Data WarehouseSecuring - Every data warehouse has an exquisite dilemma: Publishing the data as widely to as many users as possible with the easiest of user interfaces, at the same time protect the data from misuse and snoopers.

    Backing Up and Recovering Since data warehouse data is a flow of data from the legacy system on through to the data marts and eventually onto the users desktops, a real question arises about where to take the necessary snapshots.

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    57

    Core PiecesSelect Reporting Tool

    Must be simple yet robust for ClientsPerformance, server/client work loadSecurity, server/client layers

    Select ETL methodUse what you know bestEase of maintenance

    58

    Steps in the Design ProcessIt is good to approach the design for a data warehouse in a consistent way.You can archive this by following the four steps in a particular orderRemember the perspective necessary to actually make these decisions come from an understanding of the end user requirements and what is in the legacy data sources that are available to the data warehouseChoose a business process to modelChoose the grain of the business processChoose the dimensions and their attributesChoose the measured facts

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    59

    Database Design Methodology for Data Warehouses

    Nine-Step Methodology includes following steps:

    Choosing the processChoosing the grainIdentifying and conforming the dimensionsChoosing the factsStoring pre-calculations in the fact tableRounding out the dimension tablesChoosing the duration of the databaseTracking slowly changing dimensionsDeciding the query priorities and the query modes.

    60

    Step 1: Choosing The Process

    The process (function) refers to the subject matter of a particular data mart.

    First data mart built should be the one that is most likely to be delivered on time, within budget, and to answer the most commercially important business questions.

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    61

    ER Model of an Extended Version of DreamHome

    62

    ER Model of Property Sales Business Process of DreamHome

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    63

    Step 2: Choosing The Grain

    Decide what a record of the fact table is to represent.

    Identify dimensions of the fact table. The grain decision for the fact table also determines the grain of each dimension table.

    Also include time as a core dimension, which is always present in star schemas.

    64

    GrainLevel of detail at which measures are recordedProvide meaning to a number stored in the fact tableFact= revenueDimension= day, sales person, productGrain= revenue per day per sales person per product

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    65

    Step 3: Identifying and Conforming the Dimensions

    Dimensions set the context for asking questions about the facts in the fact table.

    If any dimension occurs in two data marts, they must be exactly the same dimension, or one must be a mathematical subset of the other.

    A dimension used in more than one data mart is referred to as being conformed.

    66

    Star Schemas for Property Sales and Property Advertising

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    67

    Step 4: Choosing The Facts

    The grain of the fact table determines which facts can be used in the data mart.

    Facts should be numeric and additive.

    Unusable facts include:non-numeric facts,

    non-additive facts,

    fact at different granularity from other facts in table.

    68

    Property Rentals with a Badly Structured Fact Table

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    69

    Property Rentals with Fact Table Corrected

    70

    Step 5: Storing Pre-Calculations in the Fact Table

    Once the facts have been selected each should be re-examined to determine whether there are opportunities to use pre-calculations.

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    71

    Step 6: Rounding Out The Dimension Tables

    Text descriptions are added to the dimension tables.

    Text descriptions should be as intuitive and understandable to the users as possible.

    Usefulness of a data mart is determined by the scope and nature of the attributes of the dimension tables.

    72

    Step 7: Choosing The Duration Of The Database

    Duration measures how far back in time the fact table goes.

    Very large fact tables raise at least two very significant data warehouse design issues.

    Often difficult to source increasing old data.

    It is mandatory that the old versions of the important dimensions be used, not the most current versions. Known as the Slowly Changing Dimension problem.

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    73

    Step 8: Tracking Slowly Changing Dimensions

    Slowly changing dimension problem means that the proper description of the old dimension data must be used with old fact data.

    Often, a generalized key must be assigned to important dimensions in order to distinguish multiple snapshots of dimensions over a period of time.

    74

    Step 8: Tracking Slowly Changing Dimensions

    Three basic types of slowly changing dimensions:

    Type 1, where a changed dimension attribute is overwritten.

    Type 2, where a changed dimension attribute causes a new dimension record to be created.

    Type 3, where a changed dimension attribute causes an alternate attribute to be created so that both the old and new values of the attribute are simultaneously accessible in the same dimension record.

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    75

    Step 9: Deciding The Query Priorities And The Query Modes

    Most critical physical design issues affecting the end-users perception includes:

    physical sort order of the fact table on disk;

    presence of pre-stored summaries or aggregations.

    Additional physical design issues include administration, backup, indexing performance, and security.

    76

    Database Design Methodology for Data Warehouses

    Methodology designs a data mart that supports requirements of particular business process and allows the easy integration with other related data marts to form the enterprise-wide data warehouse.

    A dimensional model, which contains more than one fact table sharing one or more conformed dimension tables, is referred to as a fact constellation.

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    77

    Fact and Dimension Tables for each Business Process of DreamHome

    78

    Dimensional Model (Fact Constellation) for the DreamHome Data Warehouse

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    79

    When I wish upon a Star

    80

    Are You FamiliarThe Goals of a Data WarehouseThe Chess PiecesDifferent worlds OLTP/Data warehouseDimensional Model BasicHierarchies in DimensionsThe Fact TableThe Star SchemaThe Snowflake SchemaBasic Processes of a Data warehouse

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    81

    What Is ETL?

    ExtractExtract -- the process of reading data from a outer database.

    TransformTransform -- the process of converting extracted data to a form useable by the target database.

    Occurs by using rules or lookup tables or by combining the data with other

    data.

    LoadLoad -- the process of writing the data into the target database.

    82

    What does ETL do?Extracts data from multiple data sources Migrates data from one DB to anotherConverts DB from one format or type to another.Transforms the data to make it accessible to business analysisForms data marts and data warehousesEnables loading of multiple target databases Performs at least three specific functions

    reads data from an input source ;passes the stream of information through either an ETL engine-or code-based process to modify, enhance, or eliminate data elements based on the instructions of the job;writes the resultant data set back out to a flat file, relational table, etc.

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    83

    What can ETL be used?To acquire a temporary subset of data (like a VIEW) for reports or other purposes.

    A more permanent data set may be acquired for other purposes such as: the population of a data mart or data warehouse

    84

    ETL SYSTEM

    Operational Data

    Outer SourcesDifferent vendorDifferent format

    ETL Engine

    ExtractTransform

    LoadFilter

    Data Warehouse

    Local Data Marts

    Local Data Marts

    Local Data Marts

    Local Data Marts

    OLAP End Users

    OLAP End Users

    OLAP End Users

    OLAP End Users

    Data extracted from the data warehouseprovide faster processing

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    85

    Technical architecture designDesign of the technical environment to enable the logical design It is a description of the elements and services of the BI environmentA map of how the components will fit together and communicateBasically.a blue print by which the team, consultants, and vendors will build the Business Intelligence Environment

    86

    PA/PMSiemens/SMSDEC Alpha UnixSybase

    Critical PathsLandacorpOracleHP-Unix

    BudgetsCustomMainframeSAS

    Cost ReportsContract monitoringMS - WindowsMS - Excel

    Acquisition Services

    Data Staging Services- Extraction- Transformation- Load- Cleansing

    Data Staging Administration - Job/Process Control- Job/Process Monitoring- Metadata exchange- Data Modeling

    LoadFiles

    Data Staging Area

    Data Warehouse

    Organization Services

    Metadata Services - Source/Target Models- Business Definitions- Audit Statistics- Performance Statistics- ETL Statistics

    MetadataExchange

    MetadataRepository

    Consumption Services

    Data MartOLAPMDB

    Data MartRDBMS

    Data Access Services- Report Library Management- Report Distribution- Report Scheduling- OLAP Cube Refreshing- Query Management- Aggregation Management- Security Verification- Metadata Navigation

    Data Services- Bulk Data Loader- Aggregation Management- Index Management- Audit Statistics- DBA Administrator- Security Administration

    Program EvaluationOLAP MDB

    PerformanceBased Budgeting

    RDBMS

    Planned Services-Web Reporting- Web OLAP- Data Mining

    Data WarehouseAdministration - Data Modeling- Data Access Tool Mgmt.- Data Base Administrator- Data Staging Administration

    The architecture conceptual model

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    87

    PA/PMSiemens/SMSDEC Alpha UnixSybase

    Data Staging Services- Extraction- Transformation- Load- Cleansing

    Data StagingAdministration

    - Job/Process Control- Job/Process Monitoring- Metadata exchange- Data Modeling

    LoadFiles

    Data Staging Area

    MetadataExchange

    COSTSEclipsys/TSICompaq HPUXOracle

    BUDGETSCustomMainframeSAS

    PATHWAYSLandacorpIBM AIXOracle

    OrganizationServices

    Source Systems

    Data acquisition services

    88

    Acquiring the data

    PM/PA EMR AP/MM Home Solucient State

    MR CDR

    Etc.

    GL/HRInternal &External

    Data

    Obstacles toIntegration

    " Different data models" Different data definitions" Different data base systems

    " Different computer platforms" Dirty data" Number of operational sources

    1 Hand code extraction, transformation, cleansing, and loading services using the data manipulation language of choice (e.g., SAS, COBOL, MS DTS, Perl), most common approach especially for proprietary DSS data models

    2 Buy acquisition services from an ETL software vendor and customize to your environment.

    Approachesto Acquisition

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    89

    ETL attributes = $$$$Multi threaded engines (e.g., Informatica, Cognos) or Code generation (e.g., ETI, SAS, DataStage)Number of Source/Target DBMS supportedNumber of computing platforms supported (1-tier, 2-tier, N-Tier)Change data captureBreadth of transformation techniquesMetadata drivenWhat metadata standard?Multiple data loading options (incremental, bulk, table management, partitioning)

    90

    CARLETON

    INFORMATICAINFORMATICA

    ETL technology - horizontal marketplace

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    91

    " The large HIS vendors will adopt generic ETL technology and customize the functionality to their application portfolio and data bases.

    " Horizontal ETL vendors MAY develop health care vendor portfolios such as they do for ERP vendors but that will depend on demandand if they survive.

    " DBMS providers will increasingly provide powerful ETL solutions making any third-party tool obsolete, assuming you have a homogenous DBMS implementation.

    " Addressing data quality will be the hardest process and tool set to sell to healthcare organizations.

    " Transitioning from hard-coded interfaces to a metadata driven data acquisition environment will follow the typical healthcare technology adoption cycle, that is, a long time.

    ETL technology predictions

    92

    Data Warehouse

    Metadata Services- Source/Target Models- Business Definitions- Audit Statistics- Performance Statistics- ETL Statistics

    MetadataRepository

    Data Services- Bulk Data Loader- Aggregation Mgmt- Index Management- Audit Statistics- DBA Administrator- Security Admin

    AcquisitionServices

    LoadFiles

    Organization services

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    93

    ERWinEmbarcaderoOr DSS proprietary data models

    Data Staging Services

    - Extraction- Transformation- Load- Cleansing

    LoadFiles

    MetadataExchange

    Source and Target data models are the center of a metadata driven environment.

    Data modeling tools

    94

    Issues that are key to an effective ETL tool

    Scheduling and job dependencies: particularly relies on graphical environment.

    Session nesting: When developing an ETL session for a particular part of the system, nesting eliminates duplicate development.

    Robust SQL support: Increases speed over using code to read and write to a database.

    Version management: enables quick roll back rather than manually making code changes. In many cases, the DBs version control may not work on the ETL.

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    95

    Key Issues (Contd)

    Debugging functionality: very useful for developer support.

    ETL should rely on underlying database security.

    Transformation capabilities vs. cleansing capabilities: seldom very strong in both.

    Metadata support: must work with the overall metadata strategy.

    96

    Current ETL Market ShareTotal Market Share: $667 Million

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    97

    ETL EvaluationThroughout the following sections, each of the vendors and their ETL products are evaluated, focusing on primary differences between such products.

    Ascential Software Formed in July 2001

    Focuses on improving, developing, and perfecting their ETL and back-end toolsDo not have current plans of entering the BI tool market.

    The Ascential DataStage product familyhighly scalable ETL solution

    uses end-to-end metadata management and data quality assurance functions.

    can create and manage scalable, complex data integration for enterprise applications such as CRM, ERP, SCM, BI/analytics, E-business and data warehouses.

    98

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    99

    Cognos CorporationFounded in 1969Prefers that all components of the enterprise data warehouse are Cognos Products

    DecisionStream easily integrates with Cognos BI tools, etc.has difficulty integrating with other vendor Products.

    DecisionStream is powerful ETL softwareAllows users to extract and unite data from disparate sources and deliver coordinated Business Intelligence across your organization.includes advanced data merging, aggregation and transformation capabilities: let users unite data from differentsources, and transform it into information using best-practices dimensional design.

    100

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    101

    Informatica PowerConnectAn extension to Informatica PowerCenter, and PowerCenterRT data integration software.Eliminates the need for customers to manually code data extraction programs for their enterprise applications.Ensures that mission-critical operational data can be effectively used to inform key business decisions across the enterprise.Allows companies to directly source and integrate:

    ERPCRMReal-time message queueMainframeAS/400Remote dataMetadata

    with other enterprise data and deliver it to:Data warehousesOperational data storesBusiness intelligence toolsPackaged analytic applications.

    102

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    103

    ConclusionIssues analyzed:

    development environmentsversion controlSecuritiesmetadata exchanges standardsCost

    The ETL tools presented by Ascential and Informatica are comparable in numerous waysit would be best to select Informatica as an ETL vendor.

    more mature and stable as a company

    104

    The Staging Area

    How to Stock Your Data Warehouse Pantry

    Christopher Richard[Data Warehousing System Architect]

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    105

    All-You-Can-Eat BuffetBuffet (ODS, DW, DM)Recipe (Business/transformation rules)Kitchen (ETL)Ingredients from different suppliers (Source systems)Pantry (Staging Area)Our topic is the pantry the Staging Area, because it is the foundation & stepchild of Data Warehousing

    106

    Why have a pantry?Minimizing processing on source systems

    Extract only onceData integrity

    Source data within own controlIncrementalsFreedom of storage format and abstractionAudit trailPersistence of dataTiming flexibilityProcessing powerConsistent interface for downstream processes

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    107

    Minimizing processing on source systemsExtract only once

    Staging Area serves downstream systems, thus limiting impact to the source systemConsistent extract methodologyCentral knowledge base of source system extraction expertise

    Data IntegrityProper timing of different extracts within source system schedulesBoth table-centric and document-centric extraction can be applied as necessary

    108

    Table-centric Vs Document-centric Extraction

    2/1/2001

    Order Date Order Amount

    Order Number

    !00.001000

    2

    1

    Line Number

    QtyProductOrder Number

    20B1000

    10A1000

    1

    Restart ID

    2/1/2001

    Order Date Order Amount

    Order Number

    100.001000

    3

    2

    Restart ID

    2

    1

    Line Number

    QtyProductOrder Number

    20B1000

    10A1000

    2/1/2001

    Order Date Order Amount

    Order Number

    100.001000

    2

    1

    Line Number

    QtyProductOrder Number

    20B1000

    10A1000

    2

    1

    Restart ID

    2/1/2001

    2/1/2001

    Order Date

    100.00

    100.00

    Order Amount

    2

    1

    Line Number

    QtyProductOrder Number

    20B1000

    10A1000

    Source Staging AreaTable-centric

    Document-centric

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    109

    Incremental Source ExtractionReliable Change Identifier

    Ever increasing numberTimestamp

    Correlated Change IdentifierChange LogDont Forget about deletes

    Hard deletesSoft deletes

    110

    Incrementals ImplementationCyclic Redundancy Checksum

    Calculate for extracted incrementTrue delta identification, should precede all other items

    Data Manipulation Language Code [Insert, Update, Delete]Propagatable after reassessment

    Column Change BitmapEasy identification for downstream systems (Type 2 SCD)

    Restart Identifier [Bookmark]An ever-increasing number unique in the whole Staging AreaUsed to quickly identify the records not yet processed by downstream systems

    Source Key Identifier [1:1 with source key]An ever-increasing number unique for a particular source key, in the whole Staging Area Multiple per source key allowed to support source key re-use

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    111

    Column Change Bitmap Example

    Shoe

    Product Type ColorProduct

    BlueA

    001

    Change Bitmap

    24

    Restart ID

    Shoe

    Product Type

    ColorProduct

    RedAShoe

    Product Type ColorProduct

    RedA

    2/1/2001

    Effective Date PriceProduct

    50.00A

    5/1/2001

    Effective Date PriceProduct

    55.00A 011

    Change Bitmap

    49

    Restart ID

    5/1/2001

    Effective Date

    PriceProduct

    55.00A

    Shoe

    Product Type

    0011

    Change Bitmap

    24

    Restart ID

    5/1/2001

    Effective Date

    PriceProduct

    55.00A

    Source Tables

    Staging Area Tables

    Data Mart Table

    112

    Audit TrailTrack data lineage

    Track data movement across tables and systemsTry to tag the data as soon as it enters the stream

    Track data changesTrack data changes within a tableAutomate data change tracking outside of coding discipline wherever possible

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    113

    Audit Trail - ImplementationPropagation of the identifiers to downstream processes

    Restart IdentifierSource Key IdentifierSource System Identifier

    Table specific audit dataJob Run IdentifierSource extract date & timeCreate and change date & time and userColumn Change Bitmap

    114

    Key learnings from doingTrue delta determination is essential for large data volumes and Type II/III Slowly Changing DimensionsYou will have to compromise functionality for performanceYou will have to compromise data completeness for performanceAllow staging tables to differ in design from the source tablesCookie cutters do work

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    115

    Key learnings from doingUse one sequencer for all surrogate keysImplement complete pieces of logic as early in the process stream as possible, so downstream processes can benefit from it in the most timely mannerSet processing may lead to seeking alternative storage optionsUse a sounding board

    116

    Data StagingThe Data Staging Process is the iceberg of the data warehouse project.While an iceberg looks formidable from the ships helm, we often dont gain a full appreciation of its magnitude until we collide with itSo many challenges are buried in the data sources and the systems they run on that this part of the process invariable takes much more time than you expect.The concepts and approach in this training apply to both hand-coded staging systems and data staging tools

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    117

    Data StagingTakes data from the operational systems and prepares it for dimensional model in the data presentation area.It is a backroom service and not a query service.Unfortunately many teams focus on the E and L of ETLThe E does have its challenges.But most of the heavy lifting occurs in theT

    118

    TransformationCombine dataDeal with quality issuesIdentify updated dataManage surrogate keysBuild aggregatesHandle errors

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    119

    Getting StartedFor once I will skip our primary mantra of focus on the business requirements and present our second-favorite aphorismMAKE A PLAN

    Do we need to use a ToolYou need to decide earlyDo not expect to recoup your investment on the first iteration due to the learning curve.A tool would provide greater metadata integration and enhanced flexibility, reusability, and maintainability in the long run.

    120

    Dimensional Data StagingExtract Dimensional Data from Operational SystemsCleanse attribute values

    Name and address parsingInconsistent descriptive valuesMissing decodesOverloaded codes with multiple meaning over timeInvalid dataMissing data

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    121

    Dimensional Data StagingManage surrogate key assignments

    Since we maintain surrogate keys in the warehouse we must maintain a persistent master cross-reference table in the staging area for each dimensionThe cross reference table keeps track of the surrogate key assigned to an operational key at a point in time along with the attribute profile.We interrogate the extracted dimensional source data to determine whether it is new dimension row, an update to an existing row, or neither.New records are identified easily because the operational source key is not maintained in the master cross reference table

    122

    master cross reference table

    Most Recent Cyclic Redundancy checksum(CRC)

    Most recent Dimension Row Indicator

    Dimension row Expiration Date

    Dimension row effective date

    Dimension Attribute 1-N

    Operational Source Key

    Surrogate Dimension Key

    Master Dimension Cross Reference table

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    123

    To quickly determine if rows have changed, we rely on cyclic redundancy checksum(CRC) algorithm.If the CRC is identical for the extracted record and the most recent row of the master cross- reference table, then we ignore the extracted recordIf the CRC differs then we need to study each column to determine whats changed and then how the change will be handled.Type 1/ Type2/Type 3The final Step is to update the most recent surrogate key assignment table.This table consists of OS Keys and its most recent assigned surrogate keys to act as a fast look up.

    Dimensional Data Staging

    124

    Dimensional table Surrogate Key management

    SourceExtract

    CRC COMPARE

    Master DimCross-Ref

    Assign surrogate Keys & set

    dates/Indicator

    Ignore

    UpdatePrior most recentrow

    Assign surrogate Keys & set

    dates/Indicator

    UpdateDimension

    Master DimCross-Ref

    Most RecentKey

    Assignment

    New SourceRows

    No CRCCHANGE

    CHANGEDRows

    Type1 or 3

    Type1 or 3

    Insert

    Update

    Update

    Update

    Insert

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    125

    Dimension Data StagingBuild dimension row load images and publish revused dataOnce the dimension table reflects the most recent extract(and has been confidently quality assured), it is published to all data marts that use dimensions.

    126

    Fact Table StagingExtract fact data from operational sourcesReceive updated dimensions from the dimension authoritiesSeparate the fact data by granularity as requiredTransform the fact table as requiredReplace the operational source keys with surrogate keys

    We use the most recent surrogate key assignment table created by the dimension authority to do this.

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    127

    Fact Table StagingAdd additional keys for known context.Quality assure the fact table dataConstruct or update aggregation fact tablesBulk load the dataAlert the users

    128

    Microsoft owerPoint Presentatio

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    129

    Smarter Business Intelligenceoutsmarting to be number #1

    Informatica Corporation April 23, 2003

    130

    Business Imperatives

    Changing markets forcing products to evolve or innovate

    Changing competitive landscape forcing strategies to change

    Changing economies forces organizations to contract and be effective

    Changing financial drivers geared towards profitability

    Changing market positioning to leadership to be NUMBER 1!

    Forces all companies to think smarter than ever!

    Application

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    131

    Business Imperatives

    Smarter.Marketing campaignsProducts and positioningGo-to-market strategiesFinancial investmentsLead to Sales generation cyclePeople!

    132

    Business Imperatives

    The Challenge:

    Making people think smarter

    Expensive!

    Impossible!

    Not worth the effort!

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    133

    Business Imperatives

    The Solution:

    Business Intelligence Initiatives Enterprise Data Warehouse Project Balance Scorecard Systems EIS (Executive Information System) Project Management Cockpit Infrastructure

    Business Analytics Platform

    134

    Business Analytics Solutions Often Include Multiple Tools And Technologies

    Extract, transform and load data into the warehouse

    DataIntegration

    Organize and store transaction information

    DataWarehouse

    Provide end-users with reports and ad hoc access to

    the data in the warehouse

    BusinessIntelligence

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    135

    Informatica Business Analytics SuiteModular Plug-&-Play Approach Offers Best of Buy and Build

    136

    Market Leaders Rely on Informatica80%+ of the Fortune 10080%+ of the Dow Jones Industrial AverageGlobal Reach

    Entertainment - The 5 LargestTelecommunications - 13 of the Top 14Financial Services - 12 of the Top 15Pharmaceutical - 12 of the Top 13Utilities - 15 of the Top 20Insurance - 16 of the Top 21Manufacturing - 12 of the Top 16

    All 4 branches of the US Armed Forces

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    137

    Boosting productivity

    By visually defining mappings and transformation through an easy to use GUI, we have been able to significantly reduce data warehouse maintenance and support costs. In fact, we now only have one resource managing a half-terabyte data warehouse.

    Grady BoggsData Warehouse Manager

    At Hewlett-Packard, we are always looking for innovate ways to leverage technology to improve productivity and using Informatica, we have seen an over 75 percent improvement in development productivity and time to market.

    Rudy GarzaData Architect

    We have achieved very rapid time-to-deployment with Informatica, and the resulting increase in our operational and analytic capabilities will drive increased value and savings for Deluxe.Through automated replication processes and streamlined workflow, we anticipates a $6 million annual reduction in data-maintenance costs.

    Andy FieldSenior Director

    138

    Thrifty improves productivity by over 75%Challenge:

    Systems difficult to maintain through lack of updated and accuraterecords of how, why, and where data was transferredHeavy reliance on code resulted in limited transformation capabilities and flexibility to deal with changes in business requirements Develop a metadata strategy promoting reuse proved to be difficult

    Solution:Single console for design, development, testing, daily management, scheduling, and smart recovery after failed components Simple operation, and evolutionObject-oriented, user friendly interface with over 100 built in transformations and robust visual debuggerUse of wizards to visually go through error-prone and repetitive tasks

    Results:Integrated product suite enables rapid development and time to marketActive and automated metadata solution, promoting reuseROI in under a year

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    139

    Delivering on the Performance Promise

    One of the main drivers behind the success of our very high performance, highly scalable enterprise data warehouse has been the performance and scalability of PowerCenter.PowerCenters performance gives us the confidence to scale our data warehouse into the 10-20 Terabyte range in the years ahead.

    Mark CothronData Warehouse Architect

    Informatica's performance capabilities and scalability immediately lifted it over the competition.Using Informatica we have created a multi-terabyte data warehouse and the analysis and action-enabling information this system provides has given us a competitive advantage that can't be matched.

    Patrick FirouzianDirector

    140

    PepsiCo creates 3 data warehouses in excess of 1 TB

    Informaticas performance has been superb and we have seen drastic improvements with each new release.We are always looking to get information into the hands of our business users quicker and more efficiently and using Informatica we have over 30 data integration projects, with the largest being a 7 Terabyte data warehouse. Wendy Faegre

    Systems Manager

    !Results:Largest data warehouse > 7 TB and easily loads in 3 hour batch windowProcess over 60 GBs daily and 800 GBs monthly

    throughput exceed 30 GB/hour70 % improvement in performance over code

  • FOR STUDENT REFERENCE ONLY

    TRAINER:- CHRISTOPHER RICHARD

    141

    Informatica Overview

    Corporate ! Founded (1993); Nasdaq: INFA (1999)! Over 800 employees worldwide

    Financials! 2000: $154 million revenue

    ! 2001: $197 million revenue

    ! 2002: $195 million revenue

    Partners! Over 200 sales, marketing and implementation partners

    ! Including: i2, PeopleSoft, Big 5, Siebel, SAP; Mitsubishi

    Products! Industry-leading solutions for deploying business

    analytics across the enterprise:

    - Data integration - Data Warehouses - Business Intelligence - Analytic Applications

    Customers ! Over 1700 worldwide! 80 of the Fortune 100 and 80% of Dow Jones