data warehousing concepts

Data warehousing concepts By PenchalaRaju.Yanamala

What is Dimensional Modelling

Latest Answer: It is logical design techniques and visual techinques it can be

contain aggregate table, dimension table, fact table ...

What is the Difference between OLTP and OLAP

Answered by swetha on 2005-03-30 12:00:33: OLTP Current data Short

database transactions Online update/insert/delete Normalization is

promoted High volume transactions Transaction

Latest Answer: Thanks Jyothsna....... ...

What is surrogate key ? where we use it expalin with examples

don't know

Latest Answer: We can say "Surrogate key" is a User defined primary key..

What are Data Marts

Data Mart is a segment of a data warehouse that can provide data for reporting

and analysis on a section, unit, department or operation in the company, e.g.

sales, payroll, production. Data marts are sometimes

Latest Answer: A Data Mart is the subset of the data warehouse that caters the

needs of a specific functional domain.examples of functional domains can be

given as Sales, Finance, Maketing, HR etc. ...

What are the methodologies of Data Warehousing.

Latest Answer: There are four methods in which one can build a

datawarehouse.1. Top-Down (Emphasizes the DW. )2. Bottom-Up (Emphasizes

data marts.)3. Hybrid (Emphasizes DW and data marts; blends “top-down” and

“bottom-up” methods.)4. Federated (Emphasizes the need to ...

What is a Data Warehousing?

Data Warehouse is a repository of integrated information, available for queries

and analysis. Data and information are extracted from heterogeneous sources as

they are generated....This makes it much

http://www.geekinterview.com/question_details/205

http://www.geekinterview.com/Interview-Questions/Data-Warehouse/General










Latest Answer: Data Warehousing is Relational Database which is specially

designed for analysis processing rather then for querying and transactional

processing. ...

What are the vaious ETL tools in the Market

Latest Answer: By far, the best ETL tool on the market is Hummingbird

Genio.Hummingbird is a division of OpenText, they make, among other things,

connectivity and ETL software. ...

What is Fact table

Answer posted by Chintan on 2005-05-22 18:46:03: A table in a data warehouse

whose entries describe data in a fact table. Dimension tables contain the data

from which dimensions are created.

Latest Answer: Fact table is the one which contains measures of interest at most

granular level.These values are numeric.Ex:sales amount would be a measure

.Each dimension table has a single-part primary which exactly corresponds to

one of the components ofÂ multiparty ...

What is ODS

Latest Answer: ODS means Operational Data store. ODS & Staging layer are the

two layers between the source and the target datbases in the data

warehouse..ODS is used to store the recent data. ...

What are conformed dimensions

Latest Answer: A dimension whichÂ can be shared with multiple fact tables such

dimensions are know as conformed dimension. ...

What is a lookup table

Latest Answer: hi if the data is not available in the source systems then we have

to get the data by some reference tables which are present in the database.these

tables are called lookuptablesfor example while loading the data from oltp to

olap,we have ...

What is ER Diagram


http://www.geekinterview.com/Interview-Questions/Data-Warehouse/General/page1









Answered by Puneet on 2005-05-07 04:21:07: ER - Stands for entitity

relationship diagrams. It is the first step in the design of data model which will

later lead to a physical database design of possible

Latest Answer: Entity Relationship Diagrams are a major data modelling tool and

will help organize the data in your project into entities and define the relationships

There are three basic elements in ER models: Entities are the "things" about

which we seek ...

What is ETL

Answered by sunitha on 2005-04-28 21:17:53: ETL is extraction,trasformation

and loading,ETL technology is used for extraction the information from the

source database and loading it to the target database

Latest Answer: Data Acquisition technique is now called ETL(Extraction,

Transformation and Loading)Extraction-The process of extracting the data from

various sources. Sources can be file system, database, XML file, Cobol File,

ERP etcTransformation-Transforming the ...

What are conformed dimensions

Latest Answer: In Integrated schema Design, a dimension which can be shared

across multiple fact tables is called Conformed Dimension. ...

What is conformed fact?

Latest Answer: A fact,which can be used across multiple datamarts is called as

conformed fact. ...

Can a dimension table contains numeric values?

Latest Answer: Absolutely!For example, a perishable product in a grocery store

might have SHELF_LIFE (in days) as part of the product dimension. This value

may, for example, be used to calculate optimum inventory levels for the product.

Too much inventory, ...

What is a Star Schema

Answer posted by Chintan on 2005-05-22 18:34:55: A relational database

schema organized around a central table (fact table) joined to a few smaller

tables (dimension tables) using foreign key references.









Latest Answer: A data warehouse design that enhances the performance of

multidimensional queries on traditional relational databases. One fact table is

surrounded by a series of related tables. Data is joined from one of the points to

the center, providing a so-called ...

What Snow Flake Schema

Answered by Girinath.S.V.S on 2005-03-17 06:40:48: Snowflake schemas

normalize dimensions to eliminate redundancy. That is, the dimension data has

been grouped into multiple tables instead of one large

Latest Answer: Any schema with extended dimensions(ie., dimesion with one or

more extensions) is known as snowflake schema ...

What is a dimension table

Answer posted by Riaz Ahmad on 2005-06-09 14:45:26: A dimensional table is a

collection of hierarchies and categories along which the user can drill down and

drill up. it contains only the textual attributes.

Latest Answer: A dimensional table contains detail values/data which is short and

wide(ie; less coloums and more rows) Always based on dimensions analysis is

done in Datawarehousing. ...

What is data mining

Answered by Puneet on 2005-05-07 04:24:28: Data mining is a process of

extracting hidden trends within a datawarehouse. For example an insurance

dataware house can be used to mine data for the most high

Latest Answer: Data Mining: Smpler way we can define as DWH(Data

Warehouse)+ AI(Artificial Intellegence)used in DSS(Decision Supportive

System) ...

What type of Indexing mechanism do we need to use for a typical

datawarehouse

Answered by on 2005-03-23 01:45:54: bitmap index

Latest Answer: Space requirements for indexes in a warehouse are often

significantly larger than the space needed to store the data, especially for the fact

table and particularly if the indexes are B*trees.Hence, you may want to keep

indexing on the fact table to a ...









Differences between star and snowflake schemas

Answered by sudhakar on 2005-05-09 18:32:18: star schema uses denormalized

dimension tables,but in case of snowflake schema it uses normalized dimensions

to avoid redundancy...

Latest Answer: star schema uses denormalized dimension tables,but in case of

snowflake schema it uses normalized dimensions to avoid redundancy... ...

What is Difference between E-R Modeling and Dimentional Modeling.

Latest Answer: E-R Modeling is a model for OLTP, optimized for Operational

database, namely insert, update, delete data and stressing on data relational

integrity.Dimensional Modeling is a model for OLAP, optimized for retrieving data

because it's uncommon to update ...

Why fact table is in normal form?

Latest Answer: The Fact table is central table in Star schema, Fact table is kept

Normalized because its very bigger and so we should avoid redundant data in it.

Thats why we make different dimensions there by making normalized star

schema model which helps in query ...

What is junk dimension?what is the difference between junk dimension and

degenerated dimension?

Latest Answer: Junk Dimension also called as garbage dimension. A garbage

dimension is a dimension that consists of low-cardinality columns such as codes,

indicators, status,and flags. The garbage dimension is also referred to as a junk

dimension. Attributes in a garbage ...

What are slowly changing dimensions

Latest Answer: The definition of slowly changing dimension is in its name only.

The dimension whichÂ changes slowly with time. A customer dimension table

represents customer. When creatingÂ a customer, normal assumption is it is

independent of time. But what if address ...

How do you load the time dimension









Latest Answer: create a procedure to load data into Time Dimension. The

procedure needs to run only once to popullate all the data. For eg, the code

below fills up till 2015. You can modify the code to suit the feilds in ur table.create

or replace procedure ...

Difference between Snow flake and Star Schema. What are situations where

Snow flake Schema is better

Difference between Snow flake and Star Schema. What are situations where

Snow flake Schema is better than Star Schema to use and when the opposite is

true?

What is a linked cube?

A cube can be stored on a single analysis server and then defined as a linked

cube on other Analysis servers. End users connected to any of these analysis

servers can then access the cube. This arrangement

Latest Answer: Hi All,Could you please let me know what is Replicate Cube &

Transparent Cube?Thanks & regards,Amit Sagpariya ...

What is the datatype of the surrgate key

Latest Answer: It is a system generated sequence number, an artificial key used

in maintaining history.It comes while handling slowly changing dimensions ...

For 80GB Datawarehouse How many records are there in Fact Table There are

25 Dimension and 12 Fact

For 80GB Datawarehouse How many records are there in Fact Table There are

25 Dimension and 12 Fact Tables

How data in datawarehouse stored after data has been extracted and

transformed from hetrogeneous sources

How data in datawarehouse stored after data has been extracted and

transformed from hetrogeneous sources and where does the data go from

datawarehouse.

What is the role of surrogate keys in data warehouse and how will u generate

them?














Latest Answer: A surrogate key is a substitution for the natural primary key. We

tend to use our own Primary keys (surrogate keys) rather than depend on the

primary key that is available in the source system. When integrating the data,

trying to work with ...

What are the various Reporting tools in the Market

Answered by Hemakumar on 2005-04-12 05:40:50:

Cognos BusinessObjects MicroStrategies Actuate

Latest Answer: Dear friends you have mentioned so many reporting tools but

missed one open source tool (java based)that is jasper reportsunfortunatly i am

working on that. ...

What is Normalization, First Normal Form, Second Normal Form , Third Normal

Form

Answer posted by Badri Santhosh on 2005-05-18 09:40:29: Normalization : The

process of decomposing tables to eliminate data redundancy is called

Normalization. 1N.F:- The table should caontain

Latest Answer: Normalization:It is the process of efficiently organizing data in a

database.There are 2-goals of the normalization process: 1. Eliminate redundant

data 2. Ensure data dependencies make sense(only storing related data in a

table)First Normal ...

What does level of Granularity of a fact table signify

Latest Answer: Granularity means nothing but it is a level of representation of

measures and metrics.The lowest level is called detailed dataand highest level is

called summary dataIt depends of project we extract fact table significanceBye ...

What are non-additive facts

Latest Answer: Non additive facts are the facts that do not participate in

arithmetic caliculations. for example in stock fact table there will be opening and

closing balances along with qty sold and amt etc. but opening and closing

balances were never used in arithmetic ...

What is VLDB











Answered by Kiran on 2005-05-06 20:12:19: The perception of what constitutes a

VLDB continues to grow. A one terabyte database would normally be considered

to be a VLDB.

Latest Answer: Very Large Database (VLDB)it is sometimes used to describe

databases occupying magnetic storage in the terabyte range and containing

billions of table rows. Typically, these are decision support systems or

transaction processing applications serving large ...

What is SCD1 , SCD2 , SCD3

Latest Answer: SCD1, SCD2, SCD3 are also called TYPE1, TYPE2, TYPE3

dimensions Type1: It never maintains history in the target table. It keeps the most

recent updated record in the data base. Type2: It maintains full history in the

target. It maintains history by ...

Why are OLTP database designs not generally a good idea for a Data

Warehouse

Answer posted by Shri Dana on 2005-04-06 19:04:05: OLTP cannot store

historical information about the organization. It is used for storing the details of

daily transactions while a datawarehouse is a huge

Latest Answer: OLTP databases are generally volatile in nature which are not

suitable for datawarehouses which we use to store historic data ...

What is a CUBE in datawarehousing concept?

Latest Answer: CUBE is used in DWH for representing multidimensional data

logically. Using the cube, it is easy to carry out certain activity e.g. drill down /

drill up, slice and dice, etc. which enables the business users to understand the

trend of the business. ...

What is the main differnce between schema in RDBMS and schemas in

DataWarehouse....?

Latest Answer: Diff b.w OLTP and OLAP :------------------------OLTP Schema :*

Normalized * More no.of trans* Less time for queries execution* More no.of

users* Have Insert,delete and update trans. OLAP (DWH) Schema :* De

Normalized * Less no.of trans* ...








What is meant by metadata in context of a Datawarehouse and how it is

important?

-

Latest Answer: meta data is stored in repository only not in dataware house .. but

we r placing our repository in database in that way ur correct ,,but not directly

stored in the dataware house plz check it mam ...

Wht r the data types present in bo?n wht happens if we implement view in the

designer n report

Latest Answer: hi venkateshdimension , measure, detail are objects type.data

types are character, date and numeric ...

What is the definition of normalized and denormalized view and what are the

differences between them

What is the definition of normalized and denormalized view and what are the

differences between them

What is the main difference between Inmon and Kimball philosophies of data

warehousing?

Latest Answer: RalfKimball: he follows bottum-up approach i.e., first create

individual Data Marts from the existing sources and then create Data

Warehouse.BillImmon: he follows top-down approach i.e., first create Data

Warehouse from the existing ...

Explain degenerated dimension in detail.

Latest Answer: A Degenerate dimension is a Dimension which has only a single

attribute.This dimension is typically represented as a single field in a fact

table.Degenerate Dimensions are the fastest way to group similar

transactions.Degenerate Dimensions are used when ...

What is the need of surrogate key;why primary key not used as surrogate key

Latest Answer: Datawarehousing depends on the surrogate key not primary key,

for suppose if u r taking the product price it will change over the time, but product

no. will not change but price will change over the time to maintain the full

hystorical data ...











How do you connect two fact tables ? Is it possible ?

Latest Answer: The only way to connect two fact tables is by using conformed

dimension. ...

Explain the flow of data starting with OLTP to OLAP including staging

,summary tables,Facts and dimensions.

Explain the flow of data starting with OLTP to OLAP including staging ,summary

tables,Facts and dimensions.

What are the Different methods of loading Dimension tables

Latest Answer: The answer to this depends on what kind of Dimension are we

loading. If it is not changing , then simply insert. If it is slowly changing dim of

type 1 , update else insert(50% of the time)Type 2, Only Insert (50% of the

time)Type 3 ,Rarely used as we ...

What are modeling tools available in the Market

Latest Answer: There is one more data modelling tool available in the market and

that is "KALIDO".This is end to end data warehousing tool.Â Its a unique and

user friendly tool. ...

What is real time data-warehousing

Latest Answer: Real time Data warehousing means combination of hetrogenious

databases and query and analysis purpose and Decisionmaking and reporting

purpose. ...

What are Semi-additive and factless facts and in which scenario will you use

such kinds of fact tables

What are Semi-additive and factless facts and in which scenario will you use

such kinds of fact tables

What is degenerate dimension table?

Latest Answer: Degenerate Dimensions : If a table contains the values, which r

neither dimesion nor measures is called degenerate dimensions.Ex : invoice

id,empno ...












What is Data warehosuing Hierarchy?

Latest Answer: hierarchy is an ordered series of related dimension objects

grouped together to perform the multidimensional analysis.Multidimensional

analysis isÂ a technique to modify the data,so that the dataÂ can be viewed

fromÂ different perspectives and at different ...

What is the difference between view and materialized view

Latest Answer: ViewÂ is a logical reference to a database table. But Meterial

View is actual table and we can refresh data in time intervels. If you made any

change in database table that change will effect into view but not meterialize

view.. ...

What are the different architecture of datawarehouse

Latest Answer: Architecture 1:Source=>Staging=>DWHArchitecture

2:Source=>Staging=>Datamarts ...

What is hybrid slowly changing dimension

Latest Answer: Hybrid SCDs are combination of both SCDÂ 2 and SCD

3.Whatever changes done in source for each and every record there is a new

entry in target side, whether it may be UPDATE or INSERT.Â There is new

column added to provide the previous record info (generally ...

What is the difference between star schema and snow flake schema ?and

when we use those schema's?

What is the difference between star schema and snow flake schema ?and when

we use those schema's?

Can you convert a snowflake schema in to star schema?

Latest Answer: Star ----->Snow Flake also vice versa is possibleIn Star

SchemaWhen we try to access many attributes or few attributes from a single

dimension table the performance of the query falls. So we denormalize this

dimension table into two or sub dimensions. ...

Explain the situations where snowflake is better than star schema

Latest Answer: A snowflake schema is a way to handle problems that do not fit

within the star schema. It consists of outrigger tables which relate to dimensions












rather than to the fact table.The amount of space taken up by dimensions is so

small compared to the ...

What are Aggregate tables

Latest Answer: Aggregate table contains the summary of existing warehouse

data which is grouped to certain levels of dimensions.Retrieving the required

data from the actual table, which have millions of records will take more time and

also affects the server

What is a general purpose scheduling tool

Latest Answer: A sheduling tool is a tool which is used to shedule the

datawarehouse jobs...All the jobs which does some process are sheduled using

this tool, which eliminates the manual intervension. ...

Which columns go to the fact table and which columns go the dimension table

Answered by Satish on 2005-04-29 08:20:29: The Aggreation or calculated value

colums will go to Fac Tablw and details information will go to diamensional table.

Latest Answer: Before broken into coloumns is going to the factAfter broken

going to dimensions ...

Why should you put your data warehouse on a different system than your

OLTP system

Latest Answer: An DW is typically used most often for intensive querying . Since

the primary responsibility of an OLTP system is to faithfully record on going

transactions (inserts/updates/deletes), these operations will be considerably

slowed down by the heavy querying ...

What is the main FUNCTIONAL difference between ROLAP,MOLAP,HOLAP?

(NOT AS A RELATIONAL,MULTI, HYBRID?)

Latest Answer: The FUNCTIONAL difference between these is how they

information is stored. In all cases, the users see the data as a cube of

dimensions and facts.ROLAP - detailed data is stored in a relational database in

3NF, star, or snowflake form. Queries ...

Is it correct/feasible develop a Data Mart using an ODS?

the ODS is technically designed to be used as the feeder for the DW and other

DM's -- yes. It is to be the source of truth.Read the complete thread at












http://asktom.oracle.com/pls/ask/f?

p=4950:8:16165205144590546310::NO::F4950_P8_DISPLAYID,F4950_P8_CRI

TERIA:30801968442845,

Latest Answer: Hi According to Bill Inmon's paradigm an enterprize can have one

datware house and datamarts source their information from the datawarehouse.

In the dataware house, information is stroed in 3rd Normalization. This Dataware

house is build on ODS. You ...

What are the possible data marts in Retail sales.?

Latest Answer: product informationstore time ...

Read Answers (3)

Answer Question Subscribe

What is BUS Schema?

Latest Answer: Bus Schema : Let we consider/explain these in x,y axis

Dimension Table : A,B,C,D,E,F ...

Read Answers (3) | Asked by : Reddeppa

What are the steps to build the datawarehouse

Latest Answer: 1.Understand the bussiness requirements.2.Once the business

requirements are clear then Identify the Grains(Levels).3.Grains are defined

,design the Dimensional tables with the Lower level Grains.4.Once the

Dimensions are designed,design the Fact table ...

What is rapidly changing dimension?

Latest Answer: A rapidly changing dimension is a result of poor decisions during

the requirements analysis and data modeling stages of the Data Warehousing

project. If the data in the dimension table is changing a lot, it is a hint that the

design should be revisited. ...

What is data cleaning? how is it done?

Latest Answer: it is a process of identifing and changing the inconsistencies and

inaccuracies ...

Do u need seperate space for Datawarehouse & Data mart








http://www.geekinterview.com/login.html?ref=http://www.geekinterview.com/

http://www.geekinterview.com/question_details/16278/reply



Latest Answer: I think the comments made earlier are not specific.We dont

required any seperate space for data mart and data where house unless until

those marts are too big or client required.We can maintain both in a same

schema. ...

What is source qualifier?

Latest Answer: Source qualifier is a transformation which extracts data from the

source. Source qualifier acts as SQL query when the source is a relational

database and it acts as a data interpreter if the source is a flatfile. ...

Explain ODS and ODS types.

Latest Answer: It is designed to support Operational Monitoring. It is subject

oriented,integrated database which holds the current,detailed data.data here is

volatile ...

What is a level of Granularity of a fact table

Latest Answer: It also means that we can have (for example) data agregated for

a year for a given product as well as the data can be drilled down to Monthly,

weekl and daily basis...teh lowest level is known as the grain. going down to

details is Granularity ...

How are the Dimension tables designed

Latest Answer: Find where data for this dimension are located. Figure out how to

extract this data. Determine how to maintain changes to this dimension (see

more on this in the next section). Change fact table and DW population routines.

...

1.what is incremental loading?2.what is batch processing?3.what is cross

reference table?4.what is aggregate

1.what is incremental loading?2.what is batch processing?3.what is cross

reference table?4.what is aggregate fact table

Give examples of degenerated dimensions

Latest Answer: Degenerated Dimension is a dimension key without

corresponding dimension. Example: In the PointOfSale Transaction Fact table,

we have: Date Key (FK), Product Key (FK), Store ...













What is the difference between Datawarehouse and Datawarehousing

Latest Answer: dataware house is a container to store the historical datawhere

as dataware hosuning is a process or technique to analyze tha data in the ware

house ...

Summarize the differene between OLTP,ODS AND DATA WAREHOUSE ?

Latest Answer: ODS: this is operational data stores, which means the real time

transactional databases. In data warehouse, we extract the data from ODS,

transform in the stagging area and load into the target data warehouse.I think,

earlier comments on the ODS is little ...

What is the purpose of "Factless Fact Table"? How it is involved in Many to

many relationship?

What is the purpose of "Factless Fact Table"? How it is involved in Many to many

relationship?

What is the difference between Data modelling and Dimensional modelling?

Latest Answer: Dimensional Modelling is the Analysis of the Transactional Data

(Facts) based on Master Data (Dimensions).Data Modeling is the process of

creating a data model by applying a data model theory to create a data model

instance.Regards,Sridhar Tirukovela ...

Explain the advanatages of RAID 1, 1/0, and 5. What type of RAID setup would

you put your TX logs

Latest Answer: Raid 0 - Make several physical hard drives look like one hard

drive. No redundancy but very fast. May use for temporary spaces where loss of

the files will not result in loss of committed data. Raid 1- Mirroring. Each hard

drive in the ...

What is the life cycle of data warehouse projects

Latest Answer: STRAGEGY & PROJECT PLANNINGDefinition of scope, goals,

objectives & purpose, and expectationsEstablishment of implementation

strategyPreliminary identification of project resourcesAssembling of project

teamEstimation of project scheduleREQUIREMENTS ...












What is slicing and dicing? Explain with real time usage and business reasons

of it's use

Latest Answer: Hi, Slicing and Dicing is a feature that helps us in seeing the

more detailed information about a particular thing. For eg: You have a report

which shows the quarterly based performance of a particular product. But you

want to see it ...

What is meant by Aggregate Factable?

Factable having aggregated calculations like sum, avg, sum(sal)

+sum(comm),these are Aggregated FactableCheersPadhu

Latest Answer: An aggregate fact table stores information that has been

aggregated, or summarized from a detail fact table. Aggregate fact table ares

useful in improving query performance. Often an aggregate fact table can be

maintained through the use of ...

What is difference between BO, Microstrategy and Cognos

Latest Answer: BO is a ROLAP Tool,Cognos is a MLAP Tool and MicroStrategy

is a HLAP Tool ...

Read Answers (1) | Asked by : prasad

What is data validation strategies for data mart validation after loading process

Latest Answer: Data validation is to make sure that the loaded data is accurate

and meets the business requriments.Strategies are different methods followed to

meet the validation requriments ...

Which automation tool is used in data warehouse testing?

Latest Answer: No Tool testing in done in DWH, only manual testing is done.

What are the advantages data mining over traditional approaches?

Latest Answer: Data Mining is used for the estimation of future. For example, if

we take a company/business organization, by using the concept of Data Mining,

we can predict the future of business interms of Revenue (or) Employees (or)

Cutomers (or) Orders ...

What is the differences between the static and dynamic caches?

Latest Answer: static cache stores overloaded values in the memory and it wont

change throught the running of the session where as dynamic cache stores the













values in the memory and changes dynamically duirng the running of the session

used in scd types -- where target ...

What is cube and why we are crating a cube what is diff between etl and olap

cubes any budy ans

What is cube and why we are crating a cube what is diff between etl and olap

cubes any budy ans plz?

What are the various attributes in time dimension, If this dimension has to

consider only date of birth

What are the various attributes in time dimension, If this dimension has to

consider only date of birth of a citizen of a country?

What are late arriving Facts and late arriving dim ? How does it impacts DW?

Latest Answer: Late arriving Fact table:Â Â Â Â Â Â Â This is rarely happens in

practice. For example there was a credit card of HDFC transaction happened on

25th Mar 2005, but this record we received on 14th Aug 2007. During this period

there is a possibility of change ...

What are the various techniques in ER modelling?

Latest Answer: ER modelling is the first step for any Database project like

Oracle, DB2.1. Conceptual Modelling2. Logical Modelling3. Physical Modelling ...

Explain Bill Inmon's versus Ralph Kimball's Approach to Data Warehousing.

Bill Inmon vs Ralph Kimball In the data warehousing field, we often hear about

discussions on where a person / organization's philosophy falls into Bill Inmon's

camp or into Ralph Kimball's

Latest Answer: Bill inmon : Data warehouse à Data martRalph Kimbol : Data

mart à Data warehouseCheers,Sithu, [email protected] ...

I want to know how to protect my data over networ.which software will be use

Information Packages(IP) are advanced by some author as a way of building

dimensional models - e.g.

Information Packages(IP) are advanced by some author as a way of building

dimensional models - e.g. star schemas. Explain what IPs are and Give an

example of it\'s use in building a dimensional model.

What is Replicate,Transparent and Linked cubes?















1) What is Data warehouse?

Data warehouse is relational database used for query analysis and reporting. By definition data warehouse is Subject-oriented, Integrated, Non-volatile, Time variant.

Subject oriented : Data warehouse is maintained particular subject.

Integrated : Data collected from multiple sources integrated into a

user readable unique format.

Non volatile : Maintain Historical date.

Time variant : data display the weekly, monthly, yearly.

2) What is Data mart?

A subset of data warehouse is called Data mart.

3) Difference between Data warehouse and Data mart?

Data warehouse is maintaining the total organization of data. Multiple data marts used in data warehouse. where as data mart is maintained only particular subject.

4) Difference between OLTP and OLAP?

OLTP is Online Transaction Processing. This is maintained current transactional data. That means insert, update and delete must be fast.

5) Explain ODS?

Operational data store is a part of data warehouse. This is maintained only current transactional data. ODS is subject oriented, integrated, volatile, current data.

6) Difference between Power Center and Power Mart?

Power center receive all product functionality including ability to multiple register servers and metadata across the repository and partition data.

One repository multiple informatica servers. Power mart received all features except multiple register servers and partition data.

7) What is a staging area?

Staging area is a temporary storage area used for transaction, integrated and rather than transaction processing.

When ever your data put in data warehouse you need to clean and process your data.

8) Explain Additive, Semi-additive, Non-additive facts?

Additive fact: Additive Fact can be aggregated by simple arithmetical additions.

Semi-Additive fact: semi additive fact can be aggregated simple arithmetical

additions along with some other dimensions.

Non-additive fact: Non-additive fact can’t be added at all.

9) What is a Fact less Fact and example?

Fact table which has no measures.

10)Explain Surrogate Key?

Surrogate Key is a series of sequential numbers assigned to be a primary key for the table.

11)How many types of approaches in DHW?

Two approaches: Top-down(Inmol approach), Bottom-up(Ralph Kimball)

12) Explain Star Schema?

Star Schema consists of one or more fact table and one or more dimension tables that are related to foreign keys. Dimension tables are De-normalized, Fact table-normalized

Advantages: Less database space & Simplify queries.

13) Explain Snowflake schema?

Snow flake schema is a normalize dimensions to eliminate the redundancy.The dimension data has been grouped into one large table. Both dimension and fact tables normalized.

14) What is confirm dimension?

If both data marts use same type of dimension that is called confirm dimension.If you have same type of dimension can be used in multiple fact that is called confirm dimension.

15) Explain the DWH architecture?

16) What is a slowly growing dimension?

Slowly growing dimensions are dimensional data,there dimensions increasing dimension data with out update existing dimensions.That means appending new data to existing dimensions.

17) What is a slowly changing dimension?

Slowly changing dimension are dimension data,these dimensions increasing dimensions data with update existing dimensions.

Type1: Rows containing changes to existing dimensional are update in the target by overwriting the existing dimension.In the Type1 Dimension mapping, all rows contain current dimension data.

Use the type1 dimension mapping to update a slowly changing dimension table when you do not need to keep any previous versions of dimensions in the table.

Type2: The Type2 Dimension data mapping inserts both new and changed dimensions into the target.Changes are tracked in the target table by versioning the primary key and creating a version number for each dimension in the table.

Use the Type2 Dimension/version data mapping to update a slowly changing dimension when you want to keep a full history of dimension data in the table.version numbers and versioned primary keys track the order of changes to each dimension.

Type3: The type 3 dimension mapping filters source rows based on user-defined comparisions and inserts only those found to be new dimensions to the target.Rows containing changes to existing dimensions are updated in the target. When updating an existing dimension the informatica server saves existing data in different columns of the same row and replaces the existing data with the updates.

18) When you use for dynamic cache.

Your target table is also look up table then you go for dynamic cache .In dynamic cache multiple matches return an error.use only = operator.

19) what is lookup override?

Override the default SQL statement.You can join multiple sources use lookup override.By default informatica server add the order by clause.

20) we can pass the null value in lookup transformation?

Lookup transformation returns the null value or equal to null value.

21) what is the target load order?

You specify the target load order based on source qualifiers in a mapping.if u have the multiple source qualifiers connected to the multiple targets you can designate the order in which informatica server loads data into the targets.

22) what is default join that source qualifier provides?

Inner equi join.

23) what are the difference between joiner transformation and source qualifier transformation?

You can join heterogeneous data sources in joiner transformation, which we cannot achive in source qualifier transformation.

You need matching keys to join two relational sources in source qualifier transformation.where you doesn’t need matching keys to join two sources.

Two relational sources should come from same data source in source qualifier.You can join relational sources, which are coming from different sources in source qualifier.You can join relational sources which are coming from different sources also.

24) what is update strategy transformation?

Whenever you create the target table whether you are store the historical data or current transaction data in to target table.

25) Describe two levels in which update strategy transformation sets?

26) what is default source option for update strategy transformation?

Data driven.

27) What is data driven?

The information server follows instructions coded into update strategy transformations with in the session mapping determine how to flag records for insert,update,delete or reject if u do not choose data driven option setting , the informatica server ignores all update strategy transformations in the mapping.

28) what are the options in the trarget session of update strategy transformation?

Insert

Delete

Update

Update as update

Update as insert

Update else insert

Truncate table.

29) Difference between the source filter and filter?

Source filter is filtering the data only relational sources. Where as filter transformation filter the data any type of source.

30) what is a tracing level?

Amount of information sent to log file.

-- What are the types of tracing levels?

Normal,Terse,verbose data,verbose intitialization.

--Expalin sequence generator transformation?

-- can you connect multiple ports from one group to multiple transformations?

Yes

31) can you connect more than one group to the same target or transformation?

NO

32) what is a reusable transformation?

Reusable transformation can be a single transformation.This transformation can be used in multiple mappings.when you need to incorporate this transformation into mapping you add an instance of it to mapping.Later if you change the definition of the transformation, all instances of it inherit the changes.Since the instance of reusable transformation is a pointer to that transformation.U can change the transformation in the transformation developer, its instance automatically reflect these changes. This feature can save U great deal of work.

-- what are the methods for creating reusable transformation?

Two methods

1) Design it in the transformation developer.

2) Promote a standard transformation from the mapping designer.After you add a transformation to the mapping, you can promote it to status of reusable transformation.

Once you promote a standard transformation to reusable status, you can demote it to a standard transformation at any time.

If u change the properties of a reusable transformation in mapping , you can revert it to the original reusable transformation properties by clicking the revert.

33)what are mapping parameters and mapping variables?

Mapping parameter represents a constant value that you can define before running a session.A mapping parameter retains the same value throughout the entire session.

When you use the mapping parameter , you declare and use the parameter in a mapping or mapplet.Then define the value of parameter in a parameter file for the session.

Unlike a mapping parameter, a mapping variable represents a value that can change through out the session. The informatica server save the value of mapping variable to the repository at the end of session run and uses that value next time you run the session.

34)can you use the mapping parameters or variables created in one mapping into another mapping?

NO, we can use mapping parameters or variables in any transformation of the same mapping or mapplet in which have crated mapping parameters or variables.

35)Can you are the mapping parameters or variables created in one mapping into any other result transformation.

Yes because the reusable transformation is not contained with any mapplet or mapping.

36)How the informatica server sorts the string values in rank transformation?

When the informatica server runs in the ASCII data movement mode it sorts session data using binary sort order.If you configures the session to use a binary sort order, the informatica server calculates the binary value of each string and returns the specified number of rows with the highest binary values for the string.

37)What is the rank index in rank transformation?

The designer automatically creates a RANKINDEX port for each Rank transformation. The informatica server uses the Rank Index port to store the ranking position for each record in a group.For example, if you create a Rank transformation that ranks the top 5 sales persons for each quarter, the rank index number the salespeople from 1 to 5.

38)what is the mapplet?

Mapplet is a set of transformation that you build in the mapplet designer and you can use in multiple mappings.

39)Difference between mapplet and reusable transformation?

Reusable transformation can be a single transformation.Where as mapplet use multiple transformations.

40)what is a parameter a file?

Paramater file defines the values for parameter and variables.

WORKFLOW MANAGER

41)what is a server?

The power center server moves data from source to targets based on a workflow and mapping metadata stored in a repository.

42)what is a work flow?

A workflow is a set of instructions that describe how and when to run tasks related to extracting,transformation and loading data.

-- what is session?

A session is a set of instructions that describes how to move data from source to target using a mapping.

-- what is workflow monitor?

Use the work flow monitor work flows and stop the power center server.

43)explain a work flow process?

The power center server uses both process memory and system shared memory to perform these tasks.

Load manager process: stores and locks the workflow tasks and start the DTM run the sessions.

Data Transformation Process DTM: Perform session validations,create threads to initialize the session,read,write and transform data, and handle pre and post session operations.

The default memory allocation is 12,000,000 bytes.

44)What are types of threads in DTM?

The main dtm thread is called the master thread.

Mapping thread.

Transformation thread.

Reader thread.

Writer thread.

Pre-and-post session thread.

45)Explain work flow manager tools?

1) Task developer.

2) Work flow designer.

3) Worklet designer.

46)Explain work flow schedule.

You can sehedule a work flow to run continuously, repeat at given time or interval or you manually start a work flow.By default the workflow runs on demand.

47)Explain stopping or aborting a session task?

If the power center is executing a session task when you issue the stop the command the power center stop reading data. If continuous processing and writing data and committing data to targets.

If the power center can’t finish processing and committing data you issue the abort command.

You can also abort a session by using the Abort() function in the mapping logic.

48)What is a worklet?

A worklet is an object that represents a set of taske.It can contain any task available in the work flow manager. You can run worklets inside a workflow. You can also nest a worklet in another worklet.The worklet manager does not provide a parameter file for worklets.

The power center server writes information about worklet execution in the workflow log.

49)what is a commit interval and explain the types?

A commit interval is the interval at which power center server commits data to targets during a session. The commit interval the number of rows you want to use as a basis for the commit point.

Target Based commit: The power center server commits data based on the number of target rows and the key constraints on the target table. The commit point also depends on the buffer block size and the commit interval.

Source-based commit:---------------------------------------------

User-defined commit:----------------------------------------------

50)Explain bulk loading?

You can use bulk loading to improve performance of a session that inserts a large amount of data to a db2,sysbase,oracle or MS SQL server database.

When bulk loading the power center server by passes the database log,which speeds performance.

With out writing to the database log, however the target database can’t perform rollback.As a result you may not be perform recovery.

51)What is a constraint based loading?

When you select this option the power center server orders the target load on a row-by-row basis only.

Edit tasks->properties->select treat source rows as insert.

Edit tasks->config object tab->select constraint based

If session is configured constraint absed loading when target table receive rows from different sources.The power center server revert the normal loading for those tables but loads all other targets in the session using constraint based loading when possible loading the primary key table first then the foreign key table.

Use the constraint based loading only when the session option treat rows as set to insert.

Constraint based load ordering functionality which allows developers to read the source once and populate parent and child tables in a single process.

52)Explain incremental aggregation?

When using incremental aggregation you apply captured changes in the source to aggregate calculations in a session.If the source changes only incrementally and you can capture changes you can configure the session to process only those changes. This allows the power center server to update your target incrementally rather than forcing it to process the entire source and recalculate the same data each time you run the session.

You can capture new source data.use incremental aggregation when you can capture new source data much time you run the session.Use a stored procedure on filter transformation only new data.

Incremental changes do not significantly change the target.Use incremental aggregation when the changes do not significantly change the target.If processing the incrementally changed source alters more than half the existing target, the session may not benefit from using incremental aggregation. In this case drop the table and recreate the target with complete source data.

53)Processing of incremental aggregation

The first time u run an incremental aggregation session the power center server process the entire source.At the end of the session the power center server stores aggregate data from the session runs in two files, the index file and the data file .The power center server creates the files in a local directory.

Transformations.

--- what is transformation?

Transformation is repository object that generates modifies or passes data.

54)what are the type of transformations?

2 types:

1) active

2) passive.

-- explain active and passive transformation?

Active transformation can change the number of rows that pass through it.No of output rows less than or equal to no of input rows.

Passive transformation does not change the number of rows.Always no of

output rows equal to no of input rows.

55)Difference filter and router transformation.

Filter transformation to filter the data only one condition and drop the rows don’t meet the condition.

Drop rows does not store any ware like session log file..

Router transformation to filter the data based on multiple conditions and give yiou the option to route rows that don’t match to a default group.

56)what r the types of groups in router transformation?

Router transformation 2 groups 1. Input group 2. output groups.

Output groups in 2 types. 1. user defined group 2. default group.

57)difference between expression and aggregator transformation?

Expression transformation calculate the single row values before writes the target.Expression transformation executed by row-by-row basis only.

Aggregator transformation allows you to perform aggregate calculations like max, min,avg…

Aggregate transformation perform calculation on groups.

58)How can u improve the session performance in aggregate transformation?

Use stored input.

59)what is aggregate cache in aggregate transformation?

The aggregate stores data in the aggregate cache until it completes aggregate calculations.When u run a session that uses an aggregate transformation , the informatica server creates index and data caches in memory is process the transformation. If the informatica server requires more space it seores overview values in cache files.

60)explain joiner transformation?

Joiner transformation joins two related heterogeneous sources residing in different locations or files.

--What are the types of joins in joiner in the joiner transformation?

Normal

Master outer

Detail outer

Full outer

61)Difference between connected and unconnected transformations.

Connected transformation is connected to another transformation with in a mapping.

Unconnected transformation is not connected to any transformation with in a mapping.

62)In which conditions we cannot use joiner transformation(limitations of joiner transformation)?

Both pipelines begin with the same original data source.

Both input pipelines originate from the same source qualifier transformation.

Both input pipelines originate from the same normalizer transformation

Both input pipelines originate from the same joiner transformation.

Either input pipelines contains an update strategy transformation

Either input pipelines contains sequence generator transformation.

63)what are the settings that u use to configure the joiner transformation?

Master and detail source.

Type of join

Condition of the join

64)what is look up transformation

look up transformation can be used in a table view based on condition by default lookup is left outer join

65)why use the lookup transformation?

To perform the following tasks.

Get a related value.For example if your table includes employee ID,but you want to include such as gross sales per invoice or sales tax but not the calculated value(such as net sales)

Update slowly changing dimension tables. You can use a lookup transformation to determine whether records already exist in the target.

66)what are the types of lookup?

Connected and unconnected

67)difference between connected and unconnected lookup?

Connected lookup Unconnected lookup

Receives input values directly from the pipe line.

Receives input values from the result of a clkp expression in a another transformation.

U can use a dynamic or static

Cache

U can use a static cache

Cache includes all lokkup columns used in the mapping(that is lookup table columns included in the lookup condition and lookup table columns linked as output ports to other transformations)

Cache includes all lookup/output ports in the lookup condition and the lookup/return port.

Can return multiple columns from the same row or insert into the dynamic lookup cache.

Designate one return port(R).Returns one column from each row.

If there is no match for the lookup condition, the informatica server returns the default value for all output ports.If u configure dynamic caching the informatica server inserts rows into the cache.

If there is no matching for the lookup condition the informatica server returns NULL

Pass multiple output values to another transformatnion.Link lookup/output ports to another transformation

Pass one output value to another transformation.The lookup/output/return port passes the same value to the ---------------------------------------------------------

Supports user-defined default values.

Does not support user-defined default values.

68)explain index cache and data cache?

The informatica server stores conditions values in the index cache and output values in the data cache.

69)What are the types of lookup cache?

Persistent cache: U can save the look up cache files and reuse them the next time the informatica server processes a lookup transformation to use the cache.

Static cache: U can configure a static or read-only lookup table.By default informatica server creates a static cache.It caches the lookup table and lookup values in the cache for each row that comes into the transformation.When the lookup condition is true the inforamtica server does not update the cache while it processes the lookup transformation.

Dynamic cache: If you want to cache the target table and insert new rows into cache and the target you can create a look up transformation to use dynamic cache.The informatica server dynamically inserts data into the target table.

Shared cache: You can share the lookup cache between multiple transformations.You can share unnamed cache between transformation in the same mapping.

70)Difference between static cache and dynamic cache?

Static cache Dynamic cache

You cannot insert or update the cache

You can insert rows into the cache as you pass rows to the target

The informatica server returns a value from the lookup table or cache when the condition is true,.When the condition is true the informatica server returns the default value for connected transformation

The informatica server inserts rows into the cache when the condition is false.This indicates that the row in the cache or target table.You can pass these rows to the target table.

ORACLE:

71) Difference between primary key and unique key?

Primary key is Not null unique

Unique accept the null values.

72) Difference between inserting and sub string?

73) What is referential integrity?

74) Difference between view and materialized view?

75) What is Redolog file?

The set of redo log files for a database is collectively know as the databases redo log.

76) What is RollBack statement?

A database contains one or more rollback segments to temporarily store undo information.Roll back segment are used to generate read consistant data base information during database recovery to rooback uncommitted transactions for users.

-- what is table space?

A data base is divided into logical storage unit called table space.A table space is used to grouped related logical structures together.

-- How to delete the duplicate records.

-- What are the difference types of joins in Oracle?

Self-join,equi-join,outer join.

77) What is outer join?

One of which rows that don’t match those in the commen column of another table.

78) write query Max 5 salaries?

Select * from emp e where 5>(select count(*) from emp where sal>e.sal)

79) what is synonym?

80) --------------------------------

81)

82) What is bit map index and example?

83) What is stored procedure and advantages?

84) Explain cursor and how many types of triggers in oracle?

Trigger is stored procedure.Trigger is automatically executed.

85) Difference between function and stored procedure?

Function returns a value.Procedure does not return a value(but returns a value tru IN OUT parameters!!!!!!)

86) Difference between replace and translate?

87) Write the query nth max sal

Select distinct (a.sal) from emp a where &n=select count(distinct(b.sal) from emp b where a.sal<=b.sal

88) Write the query odd and even numbers?

Select * from emp where (rowed,1) in (select rowed,mod(rownum,2) from emp)Interview Questions

1. What are the different types of joins1. self join 2. equi-join3. non equi-join4. cross join5. natural join6. full outer join7. outer join8. left outer join9. right outer join

2. what is sub-query? types of sub-quires? use of sub-quires?Sub-query is nothing but a query inside a query which appears only after the WHERE clause of a select statement.Two types of sub-query are there:a) Co-Related sub-queryb) Non-Co-related sub-query.Use of sub-query is to run or execute your sub-query according to the best Execution plan available with Oracle

3. what is view? types of views? use of views? how to create view(syntax)?View is nothing but parsed SQL statement which fetches record at the time of execution.There are mainly two type of views

a) Simple Viewb) Complex View

apart from that we can also subdivided views as Updatable Views and Read only Views.Lastly there is an another view named as Materialized Views.View is used for the purposes as stated below:

a) Securityb) Faster Responsec) Complex Query solve

Syntax is :

Create or replace view([olumn1],[column2]...)asSelect column1,column2...from table_name[where condition][with read only],[with check option]

4. Briefly explain the difference between first ,second ,third and fourth normal forms?First Normal form : Attribute should be atomic.Second Normal Form : Non-Key attribute should be fully functionally dependent on key Attribute.Third normal Form : There is no transitivity dependency between attribute. Suppose 'y' is dependent on 'x' i.e. x->y and 'z' is dependent on 'y' i.e. y->z this is transitivity dependency So we can split table on to two tables os that result will be x->z.Forth Normal Form : A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent. A relation is in BCNF is, and only if, every determinant is a candidate key.

5. Difference between Two tier architecture and Three tier architecture?Following are the tier types in a client server application:a. 1 tier application: All the processing is done on one machines and number of clients are attached to this machine (mainframe applications)b. 2 tier application: Clients and data base on different machines. Clients are thick clients i.e. processing is done at client side. Application layer is on Clients.c. 3 tier application: Client are partially thick. Apart from that there are two more layers application layer and database layer.d. 4 tier application: Some clients may be totally non thick clients some clients may be partially thick and further there are 3 layers web layer, application layer and database layer.

6. There is a eno & gender in a table. Eno has primary key and gender has a check constraints for the values 'M' and 'F'.While inserting the data into the table M was misspelled as F and F as M.What is the update statement to replace F with M and M with F?CREATE TABLE temp(eno NUMBER CONSTRAINTS pk_eno PRIMARY KEY,gender CHAR(1) CHECK (gender IN( 'M','F')));

INSERT INTO temp VALUES ('01','M');INSERT INTO temp VALUES ('02','M');INSERT INTO temp VALUES ('03','F');INSERT INTO temp VALUES ('04','M');INSERT INTO temp VALUES ('05','M');INSERT INTO temp VALUES ('06','F');INSERT INTO temp VALUES ('07','M');INSERT INTO temp VALUES ('08','F');

COMMIT;

UPDATE temp SET gender =DECODE(gender,'M','F','F','M');

Commit;

7. What is difference between Co-related sub query and nested sub query?? Co-related sub query is one in which inner query is evaluated only once and from that result outer query is evaluated.Nested query is one in which Inner query is evaluated for multiple times for getting one row of that outer query.ex. Query used with IN() clause is Co-related query.Query used with = operator is Nested query

8. How to find out the database name from SQL*PLUS command prompt?SELECT INSTANCE_NAME FROM V$INSTANCE;SELECT * FROM V$DATABASE;SELECT * FROM GLOBAL_NAME;

9. What is Normalization?Normalization is the process of removing redundant data from your tables in order to improve storage efficiency, data integrity and scalability.

10. Difference between Store Procedure and TriggerStored procedure is a pl/sql programming block stored in the database for repeated execution Whereas, rigger is a pl/sql programming block that is executed implicitly by a data manipulation statement.

11. What is the difference between Single row sub-Query and Scalar sub-QuerySingle row sub-queries returns only one row of results. A single row sub query uses a single row operator; the common operator is the equality operator(=). A Scalar sub-query returns exactly one column value from one row. Scalar sub-queris can be used in most places where you would use a column name or expression, such as inside a single row function as an argument, in insert, order by clause, where clause, case expressions but not in group by or having clause.

12. TRUNCATE TABLE EMP; DELETE FROM EMP; Will the outputs of the above two commands Delete Command: 1. It’s a DML Command 2. Data can be rolled back. 3. Its slower than Truncate command b’coz it logs each row deletion. 4. With delete command trigger can be fire. Truncate Command: 1. It’s a DDL Command 2. Data Can not be rolled back. 3. Its is faster than delete b’coz it does not log rows. With Truncate command trigger can not be fire. both cases only the table data is removed, not the table structure.

13. What is the use of the DROP option in the ALTER TABLE command

Drop option in the ALTER TABLE command is used to drop columns you no longer need from the table.

The column may or may not contain data Using alter column statement only one column can be dropped at a time. The table must have at least one column remaining in it after it is altered. Once a column is dropped, it cannot be recovered.

14. What will be the output of the following querySELECT REPLACE(TRANSLATE(LTRIM(RTRIM('!! ATHEN !!','!'), '!'), 'AN', '**'),'*','TROUBLE') FROM DUAL;TROUBLETHETROUBLE

15. What will be the output of the following querySELECT DECODE(TRANSLATE('A','1234567890','1111111111'), '1','YES', 'NO' );NOExplanation : The query checks whether a given string is a numerical digit.

16. How can one transfer LOB and user defined data from oracle to warehouse using ETL informatica because whenever you select the source data in informatica it shows it can take only character data.LOB can be trasferred as text in informatica 7.1.2

17. what is data validation strategies for data mart validation after loading processData validation strategies are often heavily influenced by the architecture for the application. If the application is already in production it will be significantly harder to build the optimal architecture than if the application is still in a design stage. If a system takes a typical architectural approach of providing common services then one common component can filter all input and output, thus optimizing the rules and minimizing efforts. There are three main models to think about when designing a data validation strategy.

Accept Only Known Valid Data Reject Known Bad Data Sanitize Bad Data

We cannot emphasize strongly enough that "Accept Only Known Valid Data" is the best strategy. We do, however, recognize that this isn't always feasible for political, financial or technical reasons, and so we describe the other strategies as well. All three methods must check:

Data Type Syntax Length

18. In which situation context and alias are going to use?AliasesWhich are logical pointers to an alternate table name. This command is dimmed until you select a table within the Structure window. You can define aliases to

resolve the loops that Designer detected in the universe structure. This feature works only if you have defined at least one join and all the cardinalities in the joins have been detected.ContextCan be used to resolve loops in the universe, You can create contexts manually, or cause them to be detected by Designer. When contexts are useful, Designer suggests a list of contexts that you can create.

19. what is the difference between ETL tool and OLAP tools?ETL tools are used to extract, transformation and loading the data into data warehouse / data martOLAP tools are used to create cubes/reports for business analysis from data warehouse / data mart

20. What is Data warehousing Hierarchy?HierarchiesHierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure.

Within a hierarchy, each level is logically connected to the levels above and below it. Data values at lower levels aggregate into the data values at higher levels. A dimension can be composed of more than one hierarchy. For example, in the product dimension, there might be two hierarchies--one for product categories and one for product suppliers.

Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to enable you to drill down into your data to view different levels of granularity. This is one of the key benefits of a data warehouse.

When designing hierarchies, you must consider the relationships in business structures. For example, a divisional multilevel sales organization.

Hierarchies impose a family structure on dimension values. For a particular level value, a value at the next higher level is its parent, and values at the next lower level are its children. These familial relationships enable analysts to access data quickly.

LevelsA level represents a position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the month, quarter, and year levels. Levels range from general to specific, with the root level as the highest or most general level. The levels in a dimension are organized into one or more hierarchies.

Level RelationshipsLevel relationships specify top-to-bottom ordering of levels from most general (the root) to most specific information. They define the parent-child relationship between the levels in a hierarchy.

Hierarchies are also essential components in enabling more complex rewrites. For example, the database can aggregate an existing sales revenue on a quarterly base to a yearly aggregation when the dimensional dependencies between quarter and year are known.

21. what are the data types present in BO? what happens if we implement view in the designer and report?Three different data types: Dimensions, Measure and Detail.View is nothing but an alias and it can be used to resolve the loops in the universe.

22. What is surrogate key ? where we use it explain with examplessurrogate key is a substitution for the natural primary key.

It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table.

Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension tables primary keys. They can use Infa sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key.

It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.

Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as the primary keys (according to the business users) but ,not only can these change, indexing on a numerical value is probably better and you could consider creating a surrogate key called, say, AIRPORT_ID. This would be internal to the system and as far as the client is concerned you may display only the AIRPORT_NAME.

Another benefit you can get from surrogate keys (SID) is :

Tracking the SCD - Slowly Changing Dimension.

Let me give you a simple, classical example:

On the 1st of January 2002, Employee 'E1' belongs to Business Unit 'BU1' (that's what would be in your Employee Dimension). This employee has a turnover allocated to him on the Business Unit 'BU1' But on the 2nd of June the Employee 'E1' is muted from Business Unit 'BU1' to Business Unit 'BU2.' All the new turnover have to belong to the new Business Unit 'BU2' but the old one should Belong to the Business Unit 'BU1.'

If you used the natural business key 'E1' for your employee within your datawarehouse everything would be allocated to Business Unit 'BU2' even what actualy belongs to 'BU1.'

If you use surrogate keys, you could create on the 2nd of June a new record for the Employee 'E1' in your Employee Dimension with a new surrogate key.

This way, in your fact table, you have your old data (before 2nd of June) with the SID of the Employee 'E1' + 'BU1.' All new data (after 2nd of June) would take the SID of the employee 'E1' + 'BU2.'

You could consider Slowly Changing Dimension as an enlargement of your natural key: natural key of the Employee was Employee Code 'E1' but for you it becomes Employee Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the difference with the natural key enlargement process, is that you might not have all part of your new key within your fact table, so you might not be able to do the join on the new enlarge key -> so you need another id.

23. What is a linked cube?A cube can be stored on a single analysis server and then defined as a linked cube on other Analysis servers. End users connected to any of these analysis servers can then access the cube. This arrangement avoids the more costly alternative of storing and maintaining copies of a cube on multiple analysis servers. linked cubes can be connected using TCP/IP or HTTP. To end users a linked cube looks like a regular cube.

24. What is meant by metadata in context of a Data warehouse and how it is important?In context of a Data warehouse metadata is meant the information about the data .This information is stored in the designer repository.

25. What is the main difference between schema in RDBMS and schemas in Data Warehouse....?RDBMS Schema * Used for OLTP systems * Traditional and old schema * Normalized * Difficult to understand and navigate * Cannot solve extract and complex problems * Poorly modeled DWH Schema * Used for OLAP systems * New generation schema * De Normalized * Easy to understand and navigate * Extract and complex problems can be easily solved * Very good model

26. What is Dimensional Modelling?Dimensional Modelling is a design concept used by many data warehouse designers to build their data-warehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table

contains the context of measurements i.e., the dimensions on which the facts are calculated.

27. What is real time data-warehousing?In real-time data warehousing, your warehouse contains completely up-to-date data and is synchronized with the source systems that provide the source data. In near-real-time data warehousing, there is a minimal delay between source data being generated and being available in the data warehouse. Therefore, if you want to achieve real-time or near-real-time updates to your data warehouse, you’ll need to do three things:

1. Reduce or eliminate the time taken to get new and changed data out of your source systems.

2. Eliminate, or reduce as much as possible, the time required to cleanse, transform and load your data.

3. Reduce as much as possible the time required to update your aggregates. Starting with version 9i, and continuing with the latest 10g release, Oracle has gradually introduced features into the database to support real-time, and near-real-time, data warehousing. These features include:

• Change Data Capture • External tables, table functions, pipelining, and the MERGE command,

and • Fast refresh materialized views

28. What is a lookup table?When a table is used to check for some data for its presence prior to loading of some other data or the same data to another table, the table is called a LOOKUP Table.

29. What type of Indexing mechanism do we need to use for a typical data-warehouse? On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other types of clustered/non-clustered, unique/non-unique indexes.

30. What does level of Granularity of a fact table signify?In simple terms, level of granularity defines the extent of detail. As an example, let us look at geographical level of granularity. We may analyze data at the levels of COUNTRY, REGION, TERRITORY, CITY and STREET. In this case, we say the highest level of granularity is STREET.

31. What is data mining?Data mining is a process of extracting hidden trends within a data-warehouse. For example an insurance data warehouse can be used to mine data for the most high risk people to insure in a certain geographical area.

32. What is degenerate dimension table?the values of dimension which is stored in fact table is called degenerate dimensions. these dimensions doesn’t have its own dimensions.

for e.g. Invoice_no, Invoice_line_no in fact table will be a degenerate dimension (columns), provided if you don’t have a dimension called invoice.

33. How do you load the time dimension?Every Data-warehouse maintains a time dimension. It would be at the most granular level at which the business runs at (ex: week day, day of the month and so on). Depending on the data loads, these time dimensions are updated. Weekly process gets updated every week and monthly process, every month. Time dimension in DWH must be load Manually. we load data into Time dimension using pl/sql scripts.

34. What is ER Diagram ?ER - Stands for entity relationship diagrams. It is the first step in the design of data model which will later lead to a physical database design of possible a OLTP or OLAP database

35. Difference between Snow flake and Star Schema?Star schema contains the dimesion tables mapped around one or more fact tables.It is a denormalised model.No need to use complicated joins.Queries results fastly.

Snowflake schema It is the normalised form of Star schema.Contains in-depth joins, because the tables are splitted in to many pieces. We can easily do modification directly in the tables.We have to use complicated joins, since we have more tables .There will be some delay in processing the Query .

36. What is a CUBE in data warehousing concept?Cubes are logical representation of multidimensional data. The edge of the cube contains dimension members and the body of the cube contains data values.

37. What are non-additive facts?Fact table typically has two types of columns: those that contain numeric facts (often called measurements), and those that are foreign keys to dimension tables. A fact table contains either detail-level facts or facts that have been aggregated. Fact tables that contain aggregated facts are often called summary tables. A fact table usually contains facts with the same level of aggregation. Though most facts are additive, they can also be semi-additive or non-additive. Additive facts can be aggregated by simple arithmetical addition. A common example of this is sales. Non-additive facts cannot be added at all. An example of this is averages. Semi-additive facts can be aggregated along some of the dimensions and not along others. An example of this is inventory levels, where you cannot tell what a level means simply by looking at it.

38. How are the Dimension tables designed?Most dimension tables are designed using Normalization principles upto 2NF. In some instances they are further normalized to 3NF.

39. What are Semi-additive and factless facts and in which scenario will you use such kinds of fact tables?Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. For example: Current_Balance and Profit_Margin are the facts. Current_Balance is a semi-additive fact, as it makes sense to add them up for all accounts (what's the total current balance for all accounts in the bank?), but it does not make sense to add them up through time (adding up all current balances for a given account for each day of the month does not give us any useful information

A factless fact table captures the many-to-many relationships between dimensions, but contains no numeric or textual facts. They are often used to record events or coverage information. Common examples of factless fact tables include: - Identifying product promotion events (to determine promoted products that didn’t sell) - Tracking student attendance or registration events - Tracking insurance-related accident events - Identifying building, facility, and equipment schedules for a hospital or university

40. What are the Different methods of loading Dimension tables?Conventional Load: Before loading the data, all the Table constraints will be checked against the data. Direct load:(Faster Loading) All the Constraints will be disabled. Data will be loaded directly. Later the data will be checked against the table constraints and the bad data won't be indexed.

41. What are Aggregate tables?Aggregate table contains the summary of existing warehouse data which is grouped to certain levels of dimensions. Retrieving the required data from the actual table, which have millions of records will take more time and also affects the server performance. To avoid this we can aggregate the table to certain required level and can use it. This tables reduces the load in the database server and increases the performance of the query and can retrieve the result very fastly.

42. What is active data warehousing?An active data warehouse provides information that enables decision-makers within an organization to manage customer relationships nimbly, efficiently and proactively. Active data warehousing is all about integrating advanced decision support with day-to-day-even minute-to-minute-decision making in a way that increases quality of those customer touches which encourages customer loyalty and thus secure an organization's bottom line. The marketplace is coming of age as we progress from first-generation "passive" decision-support systems to current- and next-generation "active" data warehouse implementations.

43. Why Denormalization is promoted in Universe Designing?In a relational data model, for normalization purposes, some lookup tables are not merged as a single table. In a dimensional data modeling(star schema),

these tables would be merged as a single table called DIMENSION table for performance and slicing data. Due to this merging of tables into one large Dimension table, it comes out of complex intermediate joins. Dimension tables are directly joined to Fact tables. Though, redundancy of data occurs in DIMENSION table, size of DIMENSION table is 15% only when compared to FACT table. So only Denormalization is promoted in Universe Designing.

44. what is the metadata extension?Informatica allows end users and partners to extend the metadata stored in the repository by associating information with individual objects in the repository. For example, when you create a mapping, you can store your contact information with the mapping. You associate information with repository metadata using metadata extensions. Informatica Client applications can contain the following types of metadata extensions: Vendor-defined. Third-party application vendors create vendor-defined metadata extensions. You can view and change the values of vendor-defined metadata extensions, but you cannot create, delete, or redefine them. User-defined. You create user-defined metadata extensions using PowerCenter/PowerMart. You can create, edit, delete, and view user-defined metadata extensions. You can also change the values of user-defined extensions

45. What is a Metadata?Data that is used to describe other data. Data definitions are sometimes referred to as metadata. Examples of metadata include schema, table, index, view and column definitions.

46. What are the types of metadata that stores in repository?• Source definitions. Definitions of database objects (tables, views,

synonyms) or files that provide source data. • Target definitions. Definitions of database objects or files that contain the

target data. • Multi-dimensional metadata. Target definitions that are configured as

cubes and dimensions. • Mappings. A set of source and target definitions along with

transformations containing business logic that you build into the transformation. These are the instructions that the Informatica Server uses to transform and move data.

• Reusable transformations. Transformations that you can use in multiple mappings.

• Mapplets. A set of transformations that you can use in multiple mappings. • Sessions and workflows. Sessions and workflows store information

about how and when the Informatica Server moves data. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in a workflow. Each session corresponds to a single mapping.

47. What is Informatica Metadata and where is it stored?

Informatica Metadata contains all the information about the source tables, target tables, the transformations, so that it will be useful and easy to perform transformations during the ETL process. The Informatica Metadata is stored in Informatica repository.

48. Define informatica repository?The Informatica repository is a relational database that stores information, or metadata, used by the Informatica Server and Client tools. Metadata can include information such as mappings describing how to transform source data, sessions indicating when you want the Informatica Server to perform the transformations, and connect strings for sources and targets. The repository also stores administrative information such as usernames and passwords, permissions and privileges, and product version. Use repository manager to create the repository. The Repository Manager connects to the repository database and runs the code needed to create the repository tables. These tables stores metadata in specific format the informatica server, client tools use.

49. What is power center repository? The PowerCenter repository allows you to share metadata across repositories to create a data mart domain. In a data mart domain, you can create a single global repository to store metadata used across an enterprise, and a number of local repositories to share the global metadata as needed.

50. What is metadata reporter? It is a web based application that enables you to run reports against repository metadata. With a meta data reporter, you can access information about yours repository with out having knowledge of sql, transformation language or underlying tables in the repository.

51. What does the Metadata Application Programming Interface (API) allow you to do? A. Repair damaged data dictionary entries. B. Delete data dictionary information about database objects you no longer need. C. Extract data definition commands from the data dictionary in a variety of formats. D. Prepare pseudocode modules for conversion to Java or PL/SQL programs with a Metadata code generator.

52. Why you use repository connectivity?When you edit, schedule the session each time, informatica server directly communicates the repository to check whether or not the session and users are valid. All the metadata of sessions and mappings will be stored in repository.

53. If I done any modifications for my table in back end does it reflect in informatca warehouse or mapping designer or source analyzer? NO. Informatica is not at all concern with back end data base. It displays all the information that is to be stored in repository. If want to reflect back end changes to informatica screens, again we have to import from back end to informatica by valid connection and you have to replace the existing files with imported files.

54. What’s the diff between Informatica, powercenter server, repository server and repository?Powercenter server contains the scheduled runs at which time data should load from source to targetRepository contains all the definitions of the mappings done in designer.

55. What are the tasks that Load manger process will do? Manages the session and batch scheduling: When you start the informatica server the load manager launches and queries the repository for a list of sessions configured to run on the informatica server. When you configure the session the load manager maintains list of list of sessions and session start times. When you start a session load manger fetches the session information from the repository to perform the validations and verifications prior to starting DTM process. Locking and reading the session: When the informatica server starts a session load manager locks the session from the repository. Locking prevents you starting the session again and again. Reading the parameter file: If the session uses a parameter files, load manager reads the parameter file and verifies that the session level parameters are declared in the file Verifies permission and privileges: When the session starts load manger checks whether or not the user have privileges to run the session. Creating log files: Load manger creates log file contains the status of session.

56. What are the mapping parameters and mapping variables? Mapping parameter represents a constant value that you can define before running a session. A mapping parameter retains the same value throughout the entire session. When you use the mapping parameter, we declare and use the parameter in a mapping or mapplet. Then define the value of parameter in a parameter file for the session. Unlike a mapping parameter, a mapping variable represents a value that can change throughout the session. The informatica server saves the value of mapping variable to the repository at the end of session run and uses that value next time you run the session.

57. What are the rank caches? During the session, the informatica server compares an input row with rows in the datacache. If the input row out-ranks a stored row, the informatica server replaces the stored row with the input row. The informatica server stores group information in an index cache and row data in a data cache.

58. What is the status code? Status code provides error handling for the informatica server during the session. The stored procedure issues a status code that notifies whether or not stored procedure completed sucessfully. This value can not seen by the user. It only used by the informatica server to determine whether to continue running the session or stop.

59. What are the tasks that source qualifier performs? 1. Join data originating from same source data base. 2. Filter records when the informatica server reads source data.

3. Specify an outer join rather than the default inner join 4. Specify sorted records. 5. Select only distinct values from the source. 6. Creating custom query to issue a special SELECT statement for the

informatica server to read source data.

60. What are parameter files ? Where do we use them?Parameter file is any text file where u can define a value for the parameter defined in the informatica session, this parameter file can be referenced in the session properties, When the informatica sessions runs the values for the parameter is fetched from the specified file. For eg : $$ABC is defined in the infomatica mapping and the value for this variable is defined in the file called abc.txt as [foldername_session_name] ABC='hello world" In the session properties you can give in the parameter file name field abc.txt

61. What is a mapping, session, worklet, workflow, mapplet?Mapping - represents the flow and transformation of data from source to target. Mapplet - a group of transformations that can be called within a mapping. Session - a task associated with a mapping to define the connections and other configurations for that mapping. Workflow - controls the execution of tasks such as commands, emails and sessions. Worklet - a workflow that can be called within a workflow.

62. What is the difference between Power Center & Power Mart?Power Center : we can connect to single and multiple Repositories, generally used in big Enterprises. Power Mart : we can connect to only a single Repository. ERP support.

63. Can Informatica load heterogeneous targets from heterogeneous sources?Yes

64. What are snapshots? A snapshot is a table that contains the results of a query of one or more tables or views, often located on a remote database.

65. What are materialized views ?Materialized view is a view in which data is also stored in some temp table .i.e. if we will go with the View concept in DB in that we only store query and once we call View it extract data from DB. But In materialized View data is stored in some temp tables.

66. What is partitioning? Partitioning is a part of physical data warehouse design that is carried out to improve performance and simplify stored-data management. Partitioning is done to break up a large table into smaller, independently-manageable components because it: 1. reduces work involved with addition of new data.

2. reduces work involved with purging of old data. 67. What are the types of partitioning?Two types of partitioning are: 1. Horizontal partitioning. 2. Vertical partitioning (reduces efficiency in the context of a data warehouse).

68. What is Full load & Incremental or Refresh load?Full Load is the entire data dump load taking place the very first time. Gradually to synchronize the target data with source data, there are further 2 techniques:- Refresh load - Where the existing data is truncated and reloaded completely. Incremental - Where delta or difference between target and source data is dumped at regular intervals. Timestamp for previous delta load has to be maintained.

69. What are the modules in Power Mart?1. PowerMart Designer 2. Server 3. Server Manager 4. Repository 5. Repository Manager

70. What is a staging area? Do we need it? What is the purpose of a staging area?Staging area is place where you hold temporary tables on data warehouse server. Staging tables are connected to work area or fact tables. We basically need staging area to hold the data , and perform data cleansing and merging , before loading the data into warehouse.

71. How to determine what records to extract?When addressing a table some dimension key must reflect the need for a record to get extracted. Mostly it will be from time dimension (e.g. date >= 1st of current mth) or a transaction flag (e.g. Order Invoiced Stat). Foolproof would be adding an archive flag to record which gets reset when record changes.

72. What are the various transformation available?Aggregator Transformation Expression Transformation Filter Transformation Joiner Transformation Lookup Transformation Normalizer Transformation External Transformation

73. What is a three tier data warehouse?Three tier data warehouse contains three tier such as bottom tier, middle tier and top tier. Bottom tier deals with retrieving related data’s or information from various information repositories by using SQL. Middle tier contains two types of servers.

1.ROLAP server 2.MOLAP server Top tier deals with presentation or visualization of the results .

74. How can we use mapping variables in Informatica? Where do we use them?After creating a variable, we can use it in any expression in a mapping or a mapplet. Also they can be used in source qualifier filter, user defined joins or extract overrides and in expression editor of reusable transformations. Their values can change automatically between sessions.

75. Techniques of Error Handling - Ignore , Rejecting bad records to a flat file , loading the records and reviewing them (default values)Rejection of records either at the database due to constraint key violation or the informatica server when writing data into target table. These rejected records we can find in the badfiles folder where a reject file will be created for a session. we can check why a record has been rejected. And this bad file contains first column a row indicator and second column a column indicator. These row indicators or of four types D-valid data, O-overflowed data, N-null data, T- Truncated data, And depending on these indicators we can changes to load data successfully to target.

76. How do we call shell scripts from informatica?You can use a Command task to call the shell scripts, in the following ways: 1. Standalone Command task. You can use a Command task anywhere in the workflow or worklet to run shell commands. 2. Pre- and post-session shell command. You can call a Command task as the pre- or post-session shell command for a Session task. For more information about specifying pre-session and post-session shell commands.

77. What are active transformation / Passive transformations?An active transformation can change the number of rows as output after a transformation, while a passive transformation does not change the number of rows and passes through the same number of rows that was given to it as input.

78. How to use mapping parameters and what is their use? In designer you will find the mapping parameters and variables options. you can assign a value to them in designer. coming to there uses suppose you are doing incremental extractions daily. suppose your source system contains the day column. so every day u have to go to that mapping and change the day so that the particular data will be extracted . if we do that it will be like a layman's work. there comes the concept of mapping parameters and variables. once if u assign a value to a mapping variable then it will change between sessions79. How to delete duplicate rows in flat files source is any option in informatica?Use a sorter transformation , in that u will have a "distinct" option make use of it .

data warehousing concepts

Documents