48810459-informatica-senarios

Upload: ranjit-m

Post on 08-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 48810459-informatica-senarios

    1/6

    How to use "SUBSTR" functiion in mapping.Explanation :Returns a portion of a string. SUBSTR counts all characters, including blanks, starting at the

    beginning of the string.

    Syntax

    SUBSTR( string , start [, length ] )

    Example

    Substr (IN_PHONE, 1 ,3)

    Design a mapping , which generates sequence of numbers using setvariable function in exptransformation( without using sequence generator)

    Mapping:

    Design a mapping generates sequence of numbers without using sequence

    generator?

    Solution : Source : Flatfile

    Target : RelationalDatabase : Oracle

    Note : usage of setmaxvariable() function and mapping variables !

    Download : XML FILE

    m_sequence_variablefunction

    DWH

    Design a mapping to move first half of the data to one target and second half of the data to other

    target? eg., if you 20 records in source - first 10 to one target and other 10 to second target or ifyour source records have odd number first n/2 +1 in one target and other in second target?

    Mapping : first half to one target and second half to other target.

    Solution : Source : FlatfileTarget : Relational

    Database : OracleTip : use stored procedure to count the records

    Download : XML FILE

    m_firsthalf_secondhalf

  • 8/7/2019 48810459-informatica-senarios

    2/6

    REPOSITORY ADMIN CONSOLE

    Actions

    y Create Local or Global Repository

    y Start Repositories.y Back up repositoryy Move the copy of the Repository to a different Servery Disable the Repository.y Export connection information.y Notificy Users :: Notification message can be send to all the users connected to the

    Repositoryy Propagatey Register Repositoriesy Rstore Repositoryy

    Upgrade Repository

    Actions

    y Create Local or Global Repositoryy Start Repositories.y Back up repositoryy Move the copy of the Repository to a different Servery Disable the Repository.y Export connection information.y Notificy Users :: Notification message can be send to all the users connected to the

    Repositoryy Propagatey Register Repositoriesy Rstore Repositoryy Upgrade Repository

    Actionsy Create Reusable tasks , Worklets , Workflows.y Schedule Workflows.y Configure tasks.

    Workflow

    A workflow is a set of instructions that describes how and when to run tasks related to extracting,transforming, and loading data.

    WorkletsA worklet is an object that represents a set of tasks.

  • 8/7/2019 48810459-informatica-senarios

    3/6

    When to create Worklets?

    Create a worklet when you want to reuse a set of workflow logic in several workflows. Use theWorklet Designer to create and edit worklets.

    Where to use

    Worklets?You can run worklets inside a workflow. The workflow that contains the worklet is called the

    parent workflow. You can also nest a worklet in another worklet.

    WORKFLOWMONITOR

    You can monitor workflows and tasks in the Workflow Monitor. View details about a workflow

    or task in Gantt Chart view or Task view.

    ActionsYou can run, stop, abort, and resume workflows from the Workflow Monitor.

    You can view the log file and Performance DataSlowly Changed Dimension

    y It is a Dimension which slowly changes over a time.

    Slowly Changed

    Dimension MappingType Description

    SCD Type 1 Slowly Changing Dimension Inserts new dimensions.Overwrites existing

    dimensions withchanged dimensions.

    (Shows Current Data)

    SC

    D Type 2 /VersionData SlowlyC

    hanging Dimension Inserts new and changeddimensions. Creates aversion number and

    increments the primarykey to track changes.

    SCD Type 2 /Flag

    Current

    Slowly Changing Dimension Inserts new and changed

    dimensions. Flags thecurrent version and

    increments the primarykey to track changes.

    SCD Type 2 /Date

    Range

    Slowly Changing Dimension Inserts new and changed

    dimensions. Creates aneffective date range totrack changes.

    SCD Type 3 Slowly Changing Dimension Inserts new dimensions.

    Updates changed valuesin existing dimensions.

    Optionally uses the loaddate to track changes.

  • 8/7/2019 48810459-informatica-senarios

    4/6

    OLTP OLAP

    On Line Transaction processing On Line Analytical processing

    Continuously updates data Read Only Data

    Tables are in normalized form Partially Normalized / Denormalized Tables

    Single record access Multiple records for analysis purpose

    Holds current data Holds current and historical data

    Records are maintained using Primary keyfeild

    Records are baased on surogate keyfield

    Delete the table or record Cannot delete the records

    Complex data model Simplified data model

    DATAMART DATA WAREHOUSE

    A scaled - down version of the DataWarehouse that addresses only one subject

    like Sales Department, HR Department

    etc.,

    It is a database management system thatfacilitates on-line analytical processing by

    allowing the data to be viewed in different

    dimensions or perspectives to provide businessintelligence.

    One fact table with multiple dimensiontables.

    More than one fact table and multipledimension tables.

    [Sales Department] [HR Department][Manufacturing Department]

    [Sales Department , HR Department ,Manufacturing Department]

    Small Organizations prefer DATAMARTBigger Organization prefer DATA

    WAREHOUSE

    Ans DIMENSION TABLE FACT TABLE

    It provides the context /descriptiveinformation for a fact table measurements. It provides measurement of an enterprise.

    Structure of Dimension - Surrogate key ,one or more other fields that compose the

    natural key (nk) and set of Attributes.

    Measurement is the amount determined byobservation.

    Size of Dimension Table is smaller than

    Fact Table.

    Structure of Fact Table - foreign key (fk),

    Degenerated Dimension and Measurements.

    . In a schema more number of dimensions

    are presented than Fact Table.

    Size of Fact Table is larger than Dimension

    Table.

    Surrogate Key is used to prevent theprimary key (pk) violation(store historical

    data).

    In a schema less number of Fact Tables observedcompared to Dimension Tables.

    Provides entry points to data. Compose of Degenerate Dimension fields act asPrimary Key.

    Values of fields are in numeric and text

    representation.

    Values of the fields always in numeric or integer

    form.

  • 8/7/2019 48810459-informatica-senarios

    5/6

    DATA MINING VS WEB MINING

    DATA MINING WEB MININGData mining involves using techniques to findunderlying structure and relationships in large

    amounts of data.

    Web mining involves the analysis ofWeb server logs of a Web site.

    Data mining products tend to fall into fivecategories: neural networks, knowledge

    discovery, data visualization, fuzzy queryanalysis and case-based reasoning.

    The Web server logs contain theentire collection of requests made by

    a potential or current customerthrough their browser and responses

    by the Web server

    FACT TABLE VS DIMENSION TABLE

    FACT TABLE DIMENSION TABLEA table in a data warehouse whose entries

    describe data in a fact table. Dimension tablescontain the data from which dimensions are

    created. A fact table in data ware house is itdescribes the transaction data. It contains

    characteristics and key figures.

    A dimensional table is a collection of

    hierarchies and categories along whichthe user can drill down and drill up. it

    contains only the textual attributes.

    In a Data Model schema less number of facttables are observed.

    In a Data Model schema more number ofdimensional tables are observed.

    RDBMS SCHEMA VS DWH SCHEMA

    RDBMS SCHEMA DWH SCHEMA

    * Used for OLTP systems* Traditional and old schema

    * Normalized* Difficult to understand and navigate

    * Cannot solve extract and complexproblems

    * Poorly modelled

    * Used for OLAP systems* New generation schema

    * Denormalized* Easy to understand and navigate

    * Extract and complex problems can beeasily solved

    * Very good model

    How to find the number of success , rejected and bad records in the same mapping.

    y First we seperate this data using Expression transformation.Which is used to flag the row for 1or 0 .The condition as follows ..

    y IIF(NOT IS_DATE(HIREDATE,'DD-MON-YY') OR ISNULL(EMPNO) OR

    ISNULL(NAME) OR ISNULL(HIREDATE) OR ISNULL(SEX) ,1,0)

    y FLAG =1 is considered as invalid data and FLAG =0 is considered as valid data .This datawill be routed into next transformation using router transformation .Here we added two user

    groups one as FLAG=1 for invalid data and the other as FLAG=0 for valid data.

  • 8/7/2019 48810459-informatica-senarios

    6/6

    y FLAG=1 data is forwarded to the expression transformation .Here we take one variable portand trwo ouput ports .One for increament purpose and the other for flag the row ...