dwh informatica session (1).pdf

32
Data Warehousing & Informatica By Deepthi.G

Upload: swaroop-vanteru

Post on 25-Oct-2015

36 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: DWH Informatica Session (1).pdf

Data Warehousing & Informatica

By

Deepthi.G

Page 2: DWH Informatica Session (1).pdf

AGENDAINTRODUCTION TO DATAWAREHOUSING DATAWAREHOUSING ARCHITECTURE STEPS FOR BULIDING DATAWAREHOUSETYPES OF SCHEMAS CONCEPTS OF DATAWAREHOUSINGLIST OF TOOLS

Page 3: DWH Informatica Session (1).pdf

INTRODUCTION TO INFORMATICAINFORMATICA ARCHITECTURECOMPONENTS OF INFORMATICAWORKING WITH INFORMATICAINSTALLATION AND CONFIGURATION

Page 4: DWH Informatica Session (1).pdf

AS PER W.H.INMON DATWAREHOUSING IS AS PER W.H.INMON DATWAREHOUSING IS Subject-OrientedIntegratedTime-VariantNon-volatile

The other name of data warehousing isDecision support system (DSS)

Page 5: DWH Informatica Session (1).pdf

Subject Oriented Analysis

SalesSales

CustomersCustomers

ProductsProducts

EntrySales RepQuantity SoldProd NumberDate Customer NameProduct DescriptionUnit PriceMail Address

Process Oriented Subject Oriented

Transactional Storage Data Warehouse Storage

Page 6: DWH Informatica Session (1).pdf

Integration of Data

Data Warehouse StorageTransactional Storage

Appl. A - M, FAppl. B - 1, 0Appl. C - X, Y

Appl. A - pipeline cm.Appl. B - pipeline inchesAppl. C - pipeline mcf

Appl. A - balance dec(13,2) Appl. B - balance PIC 9(9)V99Appl. C - balance float

Appl. A - bal-on-handAppl. B - current_balanceAppl. C - balance

Appl. A - date (Julian)Appl. B - date (yymmdd)Appl. C - date (absolute)

M, F

pipeline cm

balance dec(13, 2)

balance

date (Julian)

Encoding

Unit of Attributes

Physical Attributes

Naming Conventions

Data Consistency

Page 7: DWH Informatica Session (1).pdf

Volatility of Data

Load

Access

Mass Load / Access of DataRecord-by-Record Data Manipulation

Insert

Access

Insert

Change

Delete

Change

Data Warehouse StorageTransactional Storage

Volatile Non-Volatile

Page 8: DWH Informatica Session (1).pdf

Time Variant Data Analysis

Data Warehouse StorageTransactional Storage

Current Data Historical Data

0

5

10

15

20

Sales ( in lakhs )

January February MarchYear97

Sales ( Region , Year - Year 97 - 1st Qtr)

EastWestNorth

Page 9: DWH Informatica Session (1).pdf

Data warehouses store large volumes of data which are frequently used by DSSIt is maintained separately from the organization’s operational databasesData warehouses are relatively static with only infrequent updatesA data warehouse is a stand-alone repository of information, integrated from several, possibly heterogeneous operational databases

Page 10: DWH Informatica Session (1).pdf

Is the enabling technology that facilitates improved business decision-makingIt’s a process, not a productA technique for assembling and managing a wide variety of data from multiple operational systems for decision support and analytical processing

It’s a journey not an destination……

Page 11: DWH Informatica Session (1).pdf

Data Warehouse Architecture

Source

Analysis

Reporting

Data Mining

StagingArea

DataWarehouse

Data Mart

Metadata

Raw Data

SummaryData

Oracle

Teradata

DB2

SQL Server

Page 12: DWH Informatica Session (1).pdf

Source:It’s Database where data is extracted

Ex : OracleTeradataSybaseDB2

Page 13: DWH Informatica Session (1).pdf

Staging area:

It’s a temporary storage area used for the process of data

Meta Data:Data about the data.

Or Description of the data.

Page 14: DWH Informatica Session (1).pdf

Data Mart :

A Data mart is nothing but a Data warehouse but for specific domain

A Data mart can be divided into two types:

Independent Data mart

Dependent Data mart

Page 15: DWH Informatica Session (1).pdf

Steps For Building A Data warehouse

Identify key business drivers, sponsorship, risks .Survey information needs and identify desired functionality and define functional requirements for initial subject area.Architect long-term, data warehousing architectureEvaluate and Finalize DW tool & technologyConduct Proof-of-ConceptDesign target data base schemaBuild data mapping, extract, transformation, cleansing and aggregation/summarization rulesBuild initial data mart, using exact subset of enterprise data warehousing architecture and expand to enterprise architecture over subsequent phasesMaintain and administer data warehouse

Page 16: DWH Informatica Session (1).pdf

Snow Flake Schema

Same use star flake schema but the cube will have at least one dimension with two/more levels under at leastTwo hierarchy.

Page 17: DWH Informatica Session (1).pdf

List Of Tools

ETL TOOLS Informatica,Ascential Data stage , IBM Visual Warehouse , Oracle warehouse Builder .

OLAP SERVER Oracle Express Server, Hyperion Essbase, IBM DB2 OLAP Server, Microsoft SQL Server OLAP Services, Seagate HOLOS, SAS/MDDB .

OLAP TOOLS Oracle Express Suite, Business Objects, Web Intelligence, SAS, Cognos Powerplay/Impromtu, KALIDO, MicroStrategy, Brio

Query, MetaCube .

Data warehouse Oracle, Informix, Teradata, DB2/UDB, Sybase, Microsoft SQL Server .

Page 18: DWH Informatica Session (1).pdf

INTRODUCTION TO INFORMATICA

It is an ETL TOOL.Extracting of data from sourcesPerforming the Transformations Loading the data in to target

Page 19: DWH Informatica Session (1).pdf

INFORMATICA ARCHITECTURE

Repository manager

Informatica

Repository

Server

Repository Admin console

Source Informatica server Target

Designer Workflow Manager

Workflow Monitor

validationsession

Status

Page 20: DWH Informatica Session (1).pdf

Components of Informatica

REPOSITORY MANAGERDESIGNERSERVER MANAGER

Page 21: DWH Informatica Session (1).pdf

REPOSITORY MANAGER

REPOSITORY SECURITYFOLDER MANAGEMENTMETADATA REPORTINGREPOSITORY MAINTENANCE

Page 22: DWH Informatica Session (1).pdf

OUTPUT WINDOW

DEPENDENCY WINDOW

ANALYSIS WINDOWNAVIGATORWINDOW

Page 23: DWH Informatica Session (1).pdf

REPOSITORY SECURITY

♦ CREATE USERS

♦ CREATE GROUPS

♦ ASSIGN PRIVILEGES

♦ MOVE USERS INTO GROUPS

ASSIGN ADDITIONAL PRIVILEGES TO USERS

Page 24: DWH Informatica Session (1).pdf

REPOSITORY SECURITY

♦ LOCK TYPES ( READ, WRITE, EXEC, FETCH, SAVE )

♦ OBJECT LOCKS ( FOLDERS, SOURCE DEF., TARGET DEF. )

♦ VIEW LOCKS ( EDIT| SHOW LOCKS )

♦ UNLOCKING OBJECTS

Page 25: DWH Informatica Session (1).pdf

FOLDER MANAGEMENT

FOLDER ATTRIBUTES * OWNER * PERMISSIONS * SHARED * SHORTCUT * VERSIONS

Page 26: DWH Informatica Session (1).pdf

DESIGNER

SOURCE ANALYZER TO CREATE SOURCE DEFINITIONSWAREHOUSE DESIGNER TO CREATE TARGET DEFINITIONSTRANSFORMATION DEVELOPER TO CREATE REUSABLE TRANSFORMATIONSMAPPLET DESIGNER TO CREATE REUSABLE MAPPINGSMAPPING DESIGNER TO CREATE SOURCE TO TARGET MAPPINGS

Page 27: DWH Informatica Session (1).pdf

Designer

Mapping = Source +Transformation+Target

Transformation : 2 Types

Active Transformation

Passive Transformation

Page 28: DWH Informatica Session (1).pdf

ACTIVE TRANSFORMATION

PASSIVE TRANSFORMATION

SorterRankRouterNormalizerSource QualifierJoinerAggregatorAdvance external ProcedureUpdate StrategyCustom TransformationTransformation controlUnion

LookupExpressionStored ProcedureSequence generatorExternal ProcedureXML Source Qualifier

Page 29: DWH Informatica Session (1).pdf

SERVER MANAGER

CONFIGURE SERVERCREATE SESSIONSTART SESSION MONITOR SESSIONVIEW LOGSCORRECT SESSION PROBLEMS

Page 30: DWH Informatica Session (1).pdf

Important Bottlenecks:

TEST SQL QUERY CHECK SESSION LOG FOR ERRORSCHECK PERFORMANCE DETAILSREDUCE NUMBER OF RECORDS

PROCESSEDINDEX THE SOURCEREPLACE DEFAULT QUERY WITH AN

OPTIMIZED QUERYDROP INDEXES BEFORE LOADING CONSIDER INCEASING COMMIT LEVEL.

Page 31: DWH Informatica Session (1).pdf

INSTALLATION AND CONFIGURATION

SYSTEM REQUIREMENTS :

OPERATING SYSTEM ( WINDOW 95/98/ NT 4.0 )DISK SPACE ( 120 MB )RAM ( 32 MB)CONNECTIVITY ( MERANT ODBC 3.50 )NETWORK SUPPORT ( TCP/IP OR IPX/SPX )

Page 32: DWH Informatica Session (1).pdf

THANK YOU