dataware slides2

Upload: sandhu554

Post on 30-May-2018

239 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Dataware slides2

    1/16

    Good

    Evening to All

    http://images.google.co.in/imgres?imgurl=http://www.victorianrose.org/images/red_rose2.jpg&imgrefurl=http://www.nextbillion.net/archive/2007/8&h=313&w=501&sz=19&hl=en&start=14&tbnid=Mw37_CezC67l7M:&tbnh=81&tbnw=130&prev=/images%3Fq%3Droses%26gbv%3D2%26svnum%3D10%26hl%3Denhttp://images.google.co.in/imgres?imgurl=http://www.victorianrose.org/images/red_rose2.jpg&imgrefurl=http://www.nextbillion.net/archive/2007/8&h=313&w=501&sz=19&hl=en&start=14&tbnid=Mw37_CezC67l7M:&tbnh=81&tbnw=130&prev=/images%3Fq%3Droses%26gbv%3D2%26svnum%3D10%26hl%3Den
  • 8/14/2019 Dataware slides2

    2/16

    III & II

    IT

  • 8/14/2019 Dataware slides2

    3/16

    Characteristics of Data-warehousing

    Goals Of Data-Warehousing

    Architecture Of Data-Warehousing

    The Phases Involved

  • 8/14/2019 Dataware slides2

    4/16

    Accessibility Getting required information

    when ever needed

    Timeliness Time taken to submit the report

    Formats Formats like spreadsheets,

    graphs, maps etc.,

    Integrity Accuracy and Reliability of data

  • 8/14/2019 Dataware slides2

    5/16

    Data Warehouse is a database of data

    gathered from many systems and intended tosupport management reporting and

    decision making.

    This process of gathering data is called

    Data Warehousing

  • 8/14/2019 Dataware slides2

    6/16

    Subject oriented: Data Warehouse deals with all the subjects of

    corporate data.Eg: sales, finance, customers etc

    Integrated: Integrates data from different Database systems

    (Heterogeneous data) to single homogeneous

    data. Non-volatile: The Data Warehouse is a read only database. It

    cannot be overwritten or deleted. So, itsNon-volatile.

    Time variant: Historical data with chronological importance,

    i.e. Historical data is maintained and analysed

    for future analysis.

  • 8/14/2019 Dataware slides2

    7/16

    To provide a reliable, single, integrated source of

    information

    To give end users access to their data without a reliance on

    reports produced by Information System (IS)

    department.

    Allows to analyze corporate data, predictive models and

    improve Business Intelligence.

  • 8/14/2019 Dataware slides2

    8/16

  • 8/14/2019 Dataware slides2

    9/16

    Four Data structures for the storage of data are: 1. DATA STORE 1, called , called Online Transaction Processing

    (OLTP).

    2. DATA STORE 2, called Integration Layer or Data Warehouse

    3. DATA STORE 3, called Data Mart or High Processing QuerySystem (HPQS)

    4. DATA STORE 4, called Online Analytical Processing (OLTP)

    Three Data flow paths between the four data structures are:

    1. FLOW1, from DATA STORE1 to DATA STORE 2

    2. FLOW2, from DATA STORE2 to DATA STORE 3

    3. FLOW3, from DATA STORE3 to DATA STORE 4

  • 8/14/2019 Dataware slides2

    10/16

    The architecture is divided into threephases :

    1.Extract Phase2.Transform Phase

    3.Loading Phase

    Transfer data

    Data Store 1---------------Data Store 2There are different mechanisms for extracting that

    data out of its sources. This is called Data

  • 8/14/2019 Dataware slides2

    11/16

    The art of determining what records to extract from the

    source system is frequently called Change data capture.

    Some general techniques used to recognize changes to

    source database tables. They are:Timestamps: The lucky among us extract data from

    systems the timestamp records whenever

    they are inserted or deleted.Triggers: Every time a record is inserted into,

    updated in or deleted from a source table,

    these triggers write a corresponding

    message in a log file.FileCompares: Identify changes in your data is to

    compare the file as it appears today to a

    copy of how it appeared when you last

    loaded the warehouse.

  • 8/14/2019 Dataware slides2

    12/16

    Transform phase is where this data is Transformed into the required form in the

    DATA STORE 2 . Some of the fundamental steps in the Transformation phase are:

    1. Converting heterogeneous data to homogeneous data:--- The data in the DATA STORE 2 is from the different source

    systems of DATA- STORE 1. So, the data is heterogeneous.DATA STORE 2 is called Integration Layer or Warehouse.

    2. Adding Surrogate keys:--- For example, rather than using the customer number as

    the key on the CUSTOMER table, you might use asurrogate key that is simply a sequential number generatedby your warehouse load programs.

    3. Removing dirty data:----a. Ignoring them.

    b. Rejecting bad records, but saving them in a separate filefor manual review.

    c. Loadingas much of the bad record as possible and pointingout the errors for later.

    4. Normalization:---A normalized database is like a flat file that is broken up into

    smaller files or tables in order to store the data more

  • 8/14/2019 Dataware slides2

    13/16

    Transformed data is sent to DATA STORE3, which is called DATAMART.

    DEFINITION OF DATA MART:Data Marts are databases that share many of the

    features of data warehouses but are smaller in scope.

    LOADING phase constitutes several schemas. Two of them are:

    Star Schema: Maintenance of data will be in one facttable and multiple dimension tables.Snow Flake Schema: Maintenance of data will be in the form of

    normalized dimension tables.This DATA STORE 3 is also calledHigh Performance Query

    Structures [HPQS].

    DATA FLOW 3 is the transfer of data from the High PerformanceQuery Structures to the End User Reporting applications,DATA STORE4.

    DATA STORE 4 is the data in the end users hands. This report inusers hands is the end of the information utility. It is, also, the

  • 8/14/2019 Dataware slides2

    14/16

    A Centralized Data Warehouse Server is maintained at aparticular place. The transactions of all the GovernmentDepartments are transferred to the Centralized Server,Data Warehouse Server.The topology of the Network is

    equated to the Architecture of the Data Warehouse asshown in the fig

  • 8/14/2019 Dataware slides2

    15/16

    DWHS-Data Ware Housing Server

    OLPS-OnLine Analytical Processing

    System.

    In the above example, Data from three departments areextracted and transformed to Centralized Server [DWHS].

    Data Marts can answer most complex Queries andReport generation will be immediate

    This Data can be checked further for any correctionsif any Incorrect data is found in the Data Ware House canbe informed to the government.

    .Thus, Data Warehousing can take both

    private and public sectors to a top level.

  • 8/14/2019 Dataware slides2

    16/16