wipro interview questions

DBMS & Data ModelingQ. No. Question

1. What is a Multi-Dimensional Database? How does it fit into DW? (2 ) + (4)

Answer:A schema with multiple dimensions lined with single or multiple fact tables are multi dimensional modeling.

The below are the benefits that the user gets out of multidimensional database for analysis which is why Dim Modeling is used for DW

Dimensions are• Descriptive data• Usually textual (not numeric)• Source of constraints for queries• Way business users understand data• Entry point for data access

Dimensional modeling is,

• PredictableAll query constraints come from dimensions

• Withstands changes in user behaviorSupports future, unplanned queries

• Aggregation utilities available• Standard approaches to modeling • Extensible

2. What is the difference between a dimensional data model and a normal data model? (5)

Answer:Normal Modeling(OLTP)

• For transactional systems• Used to model minute relations between data elements• Very complex models• Difficult to query• Eliminate data redundancy

Dimensional Modeling(DW)• Simpler to design• Denormalized form• Easier to build analysis queries• More intuitive, understandable for users

3. What are conformed dimensions? Explain the need for conformed dimensions? (4)

Answer:They are dimensions where two facts share identical dimensions.

Definition:-

Q. No. Question

Conformed dimension is a dimension which is common to more than one fact table in Dimensional Modeling.

For example:- while creating datamart for Sales and Promotions using Dimensional modeling, we need to make relationship to both the facts for sales and Promotions, for this we have to identify Common dimension that makes relationship between two facts(sales and promotions) to retrieve the both facts information by referencing through conformed dimension called Time Dimension which contains time hierarchy that applicable for both the facts.

Conclusion:-Most of the times Time Dimension acts as Conformed dimension while designing or creating data marts through Dimensional Modeling.

4. What is a bus achitecture and how is it implemented? (3)

Answer: A bus architecture is an array of data marts integrated by conformed dimensions and conformed facts. This is Kimaball's view of the EDW. It is implemented step by step , building one datamart at a time. Prior to building the datamarts, the conformed dimensions and facts should be established. Data marts cater to particular departmental needs, whereas the bus architecture will cater to the entire organizational needs.

5. What is Market Basket Analysis? How will you arrive at the combination? (5)

Answer:Noticing the combinations of products that sell together.

Start at the top of the merchandise hierarchy, which is assumed as Department. Calculate the market basket counts for all pairs of the departments. If there are 20 departments, then counts for up to 400 pairs are calculated. Rank the results by total market basket counts. The most desirable results of this first iteration are the records near the top of the list, where the dollars or the units from the two.

6. Consider the relation CAR_SALE (Car#, Date_sold, Salesman#, Commission%, Discount_amt). Assume that a car may be sold by multiple salesmen and hence {Car#, Salesman#} is the primary key. Additional dependencies are Date_sold Discount_amt and Salesman # Commission%. Based on the given primary key, identify the normal form of the relation. If the relation is not in BCNF, normalize it successively into BCNF. (10)

Answer : CAR_SALE (Car#, Date_sold, Salesman#, Commission%, Discount_amt).

First Normal Form : If a table of data meets the definition of a relation, it is in first normal form.Every relation has a unique name.Every attribute value is atomic (single-valued).Every row is unique.Attributes in tables have unique names.

Q. No. Question

The order of the columns is irrelevant.The order of the rows is irrelevant.

Table1 : Salesman #(PK), Commission%Table2 : Car #(PK), Date Sold, Discount_Amt

Second Normal Form : Tab1e should be in 1st NF and no partial functional dependencies.Partial functional dependency: when one or more non-key attributes are functionally dependent on part of the primary key.Every non-key attribute must be defined by the entire key, not just by part of the key.

Table1 : Salesman #(PK), Commission%Table2 : Car #(PK), Date SoldTable3 : Car #(PK), Discount_Amt

Third Normal Form :Table should be in 2nd NF and no transitive dependenciesTransitive dependency: a functional dependency between two or more non-key attributes.No transitive Dependencies Found

BCNF:3NF and every determinant is a candidate keyThe tables are in BCNF.

7. What are the different types of partitions? (6)

Answer : Partition Types :

1) Range Partition : Range partitioning maps rows to partitions based on ranges of column values. Range partitioning is defined by the partitioning specification for a table or index.

2) Hash Partition : Hash partitioning, uses a hash function on the partitioning columns to stripe data into partitions. Hash partitioning allows data that does not lend itself to range partitioning to be easily partitioned for performance reasons (such as parallel DML, partition pruning, and partition-wise joins).

3) Composite Partition : Composite partitioning partitions data using the range method and, within each partition, sub partitions it using the hash method. This type of partitioning supports historical operations data at the partition level and parallelism (parallel DML) and data placement at the sub partition level.

8. What are the types of refresh in the materialized view? (4)

Answer : Oracle maintains the data in materialized views by refreshing them after changes are made to their master tables. The refresh method can be a) incremental (fast refresh) orb) complete.

Q. No. Question

Incremental (fast refresh): Incremental refresh is done when only new information is required to be updated to an existing information

Complete refresh: In this complete view refresh from the scratch.

For materialized views that use the fast refresh method, a materialized view log or direct loader log keeps a record of changes to the master tables. Materialized views can be refreshed on demand or at regular time intervals. Alternatively, materialized views in the same database as their master tables can be refreshed whenever a transaction commits its changes to the master tables.

9. What are the 2 types of BUILD parameters in a materialized view? (2)

Answer : Two types of Build Parameters are

1) BUILD IMMEDIATE clause populates the materialized view during creation (default). 2) BUILD DEFERRED clause creates the structure only; In this case view is required to be

populated later on using DBMS_MVIEW package. Use BUILD DEFERRED to populate the materialized view after office hours to avoid affecting normal operations.

10. If you are going for New Fact Tables for Aggregates, Explain the Strategies that can be followed. Provide at least two strategies. (6)

Answer:1) LOST DIMENSION AGGREGATES - Lost dimension aggregates are created by completely excluding one or more dimensions when summarizing a fact table.

2) SHRUNKEN DIMENSION AGGREGATES - Shrunken dimension aggregates have one or more dimensions replaced by shrunken or rolled versions of themselves.

3) COLLAPSED DIMENSION AGGREGATES - Collapsed dimension aggregates are created when dimensional keys have been replaced with high-level dimensional attributes, resulting in a single, fully denormalized summary table.

LOST DIMENSION AGGREGATES

Q. No. Question

SHRUNKEN DIMENSION AGGREGATES

LEVEL-1 Between 15 And 25 Marks


LEVEL-3 Above 40 Marks

PL / SQLQ.No Question

1. What will happen after commit statement ? (5) Declare Cursor C1 is Select empno, ename from emp for update;

Begin open C1; loop Fetch C1 into eno.ename; Exit When C1 %notfound;----- commit; end loop; end;

Answer:The cursor having query as SELECT .... FOR UPDATE gets closed after COMMIT/ROLLBACK.

The cursor having query as SELECT.... does not get closed even after COMMIT/ROLLBACK.(Answer for cursor having no for update clause)

2. Which one is the good programming method? (4)

a) IF func1(emp_no) AND ( sal < 450 ) THEN ... END IF;

b) IF ( sal < 450 ) AND func1(emp_no) THEN ... END IF;

c) Both

Answer: b.

3. What happens if a procedure that updates a column of table X is called in a database trigger of the same table ? (7)

Answer :Mutation of table occurs.

If Mutating triggers is explained in detail and also the reasons to avoid that then award 10 marks

4. I have one validation in a trigger and another validation in integrity constraints. Which validaton will be first validated ? (7)

Q.No QuestionAnswer :First trigger validation will be completed and then integrity validation will be completed.

5. What is Pragma EXECPTION_INIT ? Explain the usage ? (5)

Answer : The PRAGMA EXECPTION_INIT tells the complier to associate an exception with an oracle error. To get an error message of a specific oracle error.

e.g. PRAGMA EXCEPTION_INIT (exception name, oracle error number)6. What will happen to the following code? (3)

Begin CREATE TABLE emp ( emp_no number, emp_name char(10) );

End;

a). It will give error in runtime error.b). No errors.c). It will give error in compile time.

Answer : C

7. How to know that somebody modified the code? (5)Answer : Code for stored procedures, functions and packages is stored in the Oracle Data Dictionary. One can detect code changes by looking at the LAST_DDL_TIME column in the USER_OBJECTS dictionary view.

8. what is the result of following code? (2)

Declare

A varchar2(10) := NULL;

B varchar2(10) := null;

Begin

If ( a = b) then

Dbms_output.put_line(‘condition success’);

Else

Q.No Question

Dbms_output.put_line(‘condition failed’);

End if;

End;

Answer : It will display condition failed;

9. What is Raise_application_error ? (3)

Answer :Raise_application_error is a procedure of package DBMS_STANDARD which allows to issue an user_defined error messages from stored sub-program or database trigger.

10. I want to execute one operating system command in the pl/sql. How to do? (7)Answer :In Oracle8 one can call external 3GL code in a dynamically linked library (DLL or shared object). One just write a library in C/ C++ to do whatever is required. Defining this C/C++ function to PL/SQL makes it executable.

11. What is pinning in PL/SQL? How to pin? (6)Answer :Another way to improve performance is to pin frequently used packages in the shared memory pool. When a package is pinned, it is not aged out by the least recently used (LRU) algorithm that Oracle normally uses. The package remains in memory no matter how full the pool gets or how frequently you access the package.

BEGIN

DBMS_SHARED_POOL.KEEP(‘PROCESS_DATE’,’P’);

END12. Will the following code work? (3)

Create function fun1 ( a in number(5), b out number(3))Return number is Begin b := 10; a := a + b; return 1;End;

Answer : It won’t work.

13. What is difference between UNIQUE constraint and PRIMARY KEY constraint ? (2) Answer :A column defined as UNIQUE can contain NULLs while a column defined as PRIMARY KEY can't contain Nulls.

14. How to Optimise the following code. (4)

DECLARE

Q.No Question TYPE nlist IS VARRAY(20) OF NUMBER; Dept_type nlist := nlist(10, 30, 70, ...); BEGIN ... FOR i IN Dept_type.FIRST.. Dept_type.LAST LOOP ... UPDATE emp SET sal = sal * 1.10 WHERE deptno = Dept_type(i); END LOOP;

End;

Answer : Use the bulk update to do in one shot.

15. Consider the scenario of report generation block in pl/sql. Populate the record set into a driving table for report from 5 source tables. Before populating the data , remove all the records from the report table. What is the option will you use to remove the data from report table? (2) Answer : The best way is truncate table instead of delete the table . Truncate always faster than delete.

Note :If gives justification that in this no rollback is necessary then award 416. Consider the scenario of moving employee details from one table to another table , for

each employee in the source table, insert the details into target table , if already that employee present in the target table, handle that exception. Write the pseudo code ? (4)

Answer :Use one inner block in a for loop. Inside the inner block do the insert and catch the DUP_VAL_INDEX in the exception section of the inner block.

17. I want to lock some records in a table for my processing but I don’t want to wait for those record if that records are dirty? How to do? (5)

Answer :Use NOWAIT option in the cursor.




SQL QUERIES - PERFORMANCE TUNINGQ.No Questions

1. Mention a few optimizer hints used in SQL Queries ? (8)

Answer:1) COST 2) RULE 3) FIRST_ROWS 4) ALL_ROWS 5) INDEX (TABLE/ALIAS NAME INDEX_NAME) 6) NESTED_LOOPS 7) PARALLEL 8) USE_MERGENote: Award marks based on the number of answers

2. What are the 2 methods of finding / analysing the execution path taken by a SQL Query ? (4)

Answer:1) EXPLAIN PLAN and 2) SQL TRACE AND THEN APPLY TKPROFF

3. Given a query, What would be the approach to tune the same ? (5)

Answer:STEP 1 - UNDERSTAND THE DATA THAT NEED TO BE RETRIEVED.STEP 2 - GET THE FILTER CONDITIONS RIGHTSTEP 3 - HAVE A STEP BY STEP APPROACH IN GETTING THE REQUIRED OUTPUTSTEP 4 - START LOOKING AT THE TECHNICAL ASPECTS OF SQL LIKE JOINS, INDEXES, HINTS AND BEST PRACTICES.

4. Mention a few configurable parameters in init.ora (8)

Answer:THE PARAMETERS AND THEIR MAXIMUM VALUES ARE GIVEN BELOW.Database size 512PB ( Up from 32TB)Maximum datafiles 65,533 Datafiles per tablespace 1022 Blocks per datafile 4000000 Block size 32K Maximum datafile size 128GB (4000000*32K) Maximum tablespace size 128TB (1022 *128GB) Db_block_buffers Unlimited SGA for 32-bit OS 4GB Table columns 1,000 (up from 256

5. Can we force the Oracle optimizer to use an index ? (5)

Answer:YES, THIS IS POSSIBLE BY USING THE HINT (/*+ INDEX(TABLE_NAME INDEX_NAME */)

6. Give a scenario where it would be better to go for a full table scan instead of the query using an index range scan ? (5)

Answer: A QUERY RETURNING MORE THAN 20% OF THE TABLE DATA IS A GOOD CANDIDATE FOR FULL TABLE SCAN.

7. How do we ensure that a query uses a key to hit the partition and not all the partitions ?

Q.No Questions(8)Answer:In the predicate(where) clause of a given sql, make sure all the partition keys are used.

8. Explain Cost / Rule based query processing. (6)

Answer:Cost of executing a query determines the path of execution. The cost is derived from the statistics available on all objects we are intersted in.

Rule is a predetermined set of steps of execution decided by the optimizer for a given sql.9. Is it better to use the operators >= and <= instead of "Between". If the answer to the

question is "YES", please explain. (4)

Answer:A "between" clause is internally converted into a <= and >=. So, the time needed to parse the query and convert it internally isreduced. Hence, the between clause is at best avoided.

10. Who decides the path of query execution ? (3)

Answer:THE ORACLE OPTIMIZERNote : if the candidate is aware of 9i feature that by default it is optimizer then award 5

11. On what basis the optimizer decides the path of execution ? (5)

Answer:Based on the statistics available about the table and indexes under consideration.

12. A function is used in the where clause of a SQL I.E.,(LTRIM(COLUMN_NAME)). Will the corresponding index on the column be used for data retrieval and processing ? (4)

Answer:At run time, i.e., during query execution, the index will not used. The index will be used only, if it was created as a function based index.

13. Explain the difference using a distinct and a group by in a sql query: (3)

Answer:A distinct does a grouping internally and then brings the unique value which means using a group by would be the same and so, there would be a performance benefit of using group by and having than a distinct clause in a SQL query.

14. What might happen to a query execution if the statistics are not availale for the tables and index in question ? (6)

Answer:THE OPTIMIZER MAY GO FOR A RULE BASED EXECUTION.Note : We think the optimizer would proceed with available (wrong) statistics only, Plz cross check

15. Given a scenario where we can use bitmap index ? If yes why ? (4)

Answer:Columns whose cardinality is low are good candidates for creating a bitmap index. In other words, columns which may have a few distinct values are good examples. Columns like

Q.No Questionsgender, status flag etc. are good examples.

16. How the statistics of a table will be build? (4)

Answer:Analyze table / analyze index / estimate statistics / compute statistics are the mechanisms used to get the statistics of the objects under consideration.

17. Consider the query

update table1 t1 set t1.col1, t1.col2 = ( select t2.col1, t2.col2 from table2 t2 where t2.col3 = t1.col3 ) WHERE t1.col4 = 'Y'; what will happen if for any of the record the where condition fails? How to resolve this? (6)

Answer:The query will fail, if the where condition in the subquery fails for any one of the record. The mechanism to resolve is given below.

update table1 t1 set t1.col1, t1.col2 = ( select t2.col1, t2.col2 from table2 t2 where t2.col3 = t1.col3 )

where exists ( select 'x' from table2 t2 where t2.col3 = t1.col3 )AND t1.col4 = 'Y';Note :Mention the inner/outer where condition fails in the question

18. Is it optimal to have more no of indexes in a table in OLTP ? if no why ? (4)

Answer:It is always better to have optimal indexes on a table in oltp. If the no. Of the indexes increases, performance BECOMES AN ISSUE, DURING INSERT/UPDATE/DELETE OPERATIONS.

19. Consider a emp table having index on emp_name . Will the following query do full scan or index scan? (5)

select * from emp where nvl(emp_name , 'JOHN') = 'PETER';

Answer:The index will not be used because of the nvl function. If the index were created as a function index with the nvl function, then the index will be used now.

20. Consider a emp table which is having index on emp_id. Will the following query will do the full scan or index scan? (5)

Q.No Questions select * from emp where emp_name = 'JOHN' and emp_id = 256;

Answer:The index will not be used as the emp_id finds its place later in the where clause. Because, emp_name finds its place in the first, the query will do a full table scan.

21. What is the Advantage and Disadvantage of using Set Operators?. And what are the different Set Operators available in ORACLE? (6)

Answer:When you want to retrieve without WHERE predicate then SET operator can be used, since Index if present in Where predicate will not be considered. The advantage of using Set operators is Parallel Processing and disadvantage is it will not consider the Index present in the Where clause.

The different Set Operators are1) UNION2) INTERSECT3) MINUS




ORACLE FEATURES & ARCHITECTURE & ADMINISTRATION CONCEPTS :-

Q.No Questions1. What are Clusters ? (5)

Q.No Questions

Answer:Clusters are groups of one or more tables physically stores together to share common columns and are often used together.

2. What is Row Chaining ? (5)

Answer:In Circumstances, all of the data for a row in a table may not be able to fit in the same data block. When this occurs , the data for the row is stored in a chain of data block (one or more) reserved for that segment.

3. What is the use of Control File ? (3)

Answer:When an instance of an ORACLE database is started, its control file is used to identify the database and redo log files that must be opened for database operation to proceed.

4. What are the different type of Segments ? (3)

Answer:Data Segment, Index Segment, Rollback Segment and Temporary Segment.

5. When Does DBWR write to the database ? (6)

Answer:DBWR writes when more data needs to be read into the SGA and too few database buffers are free. The least recently used data is written to the data files first. DBWR also writes when CheckPoint occurs.

6. What is the function of Dispatcher (Dnnn) ? (7)

Answer:Dispatcher (Dnnn) process is responsible for routing requests from connected user processes to available shared server processes and returning the responses back to the appropriate user processes.Atleast one Dispatcher process is created for every communication protocol in use.

7. Will the Optimizer always use COST-based approach if OPTIMIZER_MODE is set to "Cost'? (5)

Answer:Presence of statistics in the data dictionary for atleast one of the tables accessed by the SQL statements is necessary for the OPTIMIZER to use COST-based approach. Otherwise OPTIMIZER chooses RULE-based approach.

8. What are the values that can be specified for OPTIMIZER_GOAL parameter of the ALTER SESSION Command? (6)

Answer:CHOOSE , ALL_ROWS ,FIRST_ROWS and RULE.

9. What is the effect of setting the value "CHOOSE" for OPTIMIZER_GOAL, parameter of the ALTER SESSION Command ? (6)

Answer:The Optimizer chooses Cost_based approach and optimizes with the goal of best throughput

Q.No Questionsif statistics for atleast one of the tables accessed by the SQL statement exist in the data dictionary. Otherwise the OPTIMIZER chooses RULE_based approach.

10. What is the use of SNAPSHOT LOG ? (5)

Answer:A snapshot log is a table in the master database that is associated with the master table. ORACLE uses a snapshot log to track the rows that have been updated in the master table. Snapshot logs are used in updating the snapshots based on the master table.

11. What is Full Backup ? (3)

Answer:A full backup is an operating system backup of all data files, on-line redo log files and control file that constitute ORACLE database and the parameter.

12. Which parameter in Storage clause will reduce no. of rows per block? (7)

Answer:PCTFREE parameterThis is used to reserve certain amount of space in a block for expansion of rows.

13. How will you monitor rollback segment status ? (7)

Answer: Querying the DBA_ROLLBACK_SEGS view

14. What are the database administrators utilities avaliable ? (6)

Answer:SQL * DBA - This allows DBA to monitor and control an ORACLE database.

SQL * Loader - It loads data from standard operating system files (Flat files) into ORACLE database tables.

Export (EXP) and Import (imp) utilities allow you to move existing data in ORACLE format to and from ORACLE database.

15. What is a trace file and how is it created ? (6)

Answer:Each server and background process can write an associated trace file. When an internal error is detected by a process or user process, it dumps information about the error to its trace. This can be used for tuning the database.

16. What are the different methods of backing up oracle database ? (3)

Answer: - Logical Backups- Cold Backups- Hot Backups (Archive log)Logical backup involves reading a set of databse records and writing them into a file. Export utility is used for taking backup and Import utility is used to recover from backup.

Cold backup is taking backup of all physical files after normal shutdown of database. We need to take.

Q.No Questions - All Data files. - All Control files. - All on-line redo log files. - The init.ora file (Optional)Taking backup of archive log files when database is open. For this the ARCHIVELOG mode should be enabled. The following files need to be backed up. All data filesAll Archive logredo log filesAll control files

17. Name the ORACLE Background Process ? (6) Answer:DBWR - Database Writer.LGWR - Log WriterCKPT - Check PointSMON - System MonitorPMON - Process MonitorARCH - ArchiverRECO - RecoverDnnn – DispatcherLCKn - LockSnnn - Server.

18. What is the use of PGA? (4) Answer:The PGA is an area in memory that helps user processes execute, such as bind variable information, sort areas, and other aspects of cursor handling

19. What are functions of PMON ? (4)

Answer:Process Monitor (PMON) performs process recovery when a user process fails PMON is responsible for cleaning up the cache and Freeing resources that the process was using PMON also checks on dispatcher and server processes and restarts them if they have failed.

20. Can we change DB block size after a data base created? (cannot rate)

Answer:Depends on the version of oracle(8i/9i)

21. Which export option will generate code to create an initial extent that is equal to the sum of the sizes of all the extents currently allocated to an object? (4)A. FULLB. DIRECTC. COMPACTD. COMPRESS

Answer : D – DBA Question22. What are two reasons for changing user quotas on a tablespace? (3)

A. A datafile becomes full. B. A user encounters slow response time from the application.

Q.No QuestionsC. Tables owned by a user exhibit rapid and unanticipated growth. D. Database objects are reorganized and placed in different tablespace.

Answer : C,D23. A DBA performs the query:

SELECT tablespace_name, max_blocks FROM dba_tablespace_quotas WHERE username= ‘JERRY’;

That returns the result:

TABLESPACE_NAME MAX_BYTESDATA01 -1What does -1 indicate? (5)

A. Tablespace DATA01 has been dropped. B. Tablespace DATA01 has no free space. C. The user has no quotas on tablespace DATA01. D. The user has an unlimited quota on tablespace DATA01. E. The user has exceeded his or her quota on the tablespace DATA01.

Answer : D24. Consider the following command to create the user ‘peter’:

CREATE USER peterIDENTIFIED by panTEMPORARY TABLESPACE tempPASSWORD EXPIRE;Since no default tablespace was specified, what will happen if this command is executed? A. The user will not own a home directory. B. The user peter will be created using the TEMP tablespace as the default. C. The user peter will be created using the SYSTEM tablespace as the default. D. The code will produce an error message, the user peter will not be created. (4)

Answer : C25. An Oracle user receives the following error:

ORA-01555 SNAPSHOP TOO OLDWhat are two possible solutions? (5)A. Increase the extent size of the rollback segmentsB. Perform media recovery. C. Increase the number of rollback segments. D. Increase the size of the rollback segment tablespace. E. Increase the value of the OPTIMAL storage parameter. Answer : A,C

26. MINEXTENTS must be at least _____ when a rollback segment is created. (5)A. 1B. 2C. 3D. 5

Q.No Questions

Answer : B27. You are creating a database with a character set other than US7ACII. Which operating

system environmental variable needs to be set to specify the director location of the NLS support files? (6)A. NLS_LANGB. ORA_NLS33C. ORACLE_SIDD. ORACLE_BASEE. ORACLE_HOME

Answer : B28. Given the statement:

CREATE DATABASE orc1LOGFILE GROUP 1 ‘u1/Oracle/dba/logla.rdo’ SITE DM

GROUP 2 ‘u01/Oracle/dba/logla.rdo’ SITE DMDATAFILE ‘u01/Oracle/dbs/sys_01.dbf’ REUSE;Which statement is true? (6)A. The online redo logs will be multiplexed. B. The file ‘u01/Oracle/dbs/sys_01.dbf’ already exists. C. File ‘u01/Oracle/dbs/sys_01.dbf’ as a parameter file. D. The control file name is ‘u01/Oracle/dbs/sys_01.dbf’. E. Oracle will determine the optimum size for ‘u01/Oracle/dba/sys_01.dbf’.

Answer : B29. What is a default role? (3)

A. A role that requires a password. B. A role that requires no password. C. A role automatically enabled when the user logs on. D. A role automatically assigned when the user is created.

Answer : C30. Which data dictionary view shows the available free space in a certain tablespace? (4)

A. DBA_EXTENTSB. V$FREESPACEC. DBA_FREE_SPACED. DBA_TABLESPACEE. DBA_FREE_EXTENTS

Answer : C31. Which statement about using PCTFREE and PCTUSED is true? (6)

A. Block space utilization can be specified only at the segment level. B. Block space utilization can be specified only in the data dictionary. C. Block space utilization parameters can only be specified at the tablespace. D. Block space utilization can be specified both at the tablespace level and segment level.

Answer : A32. Which type of index should be created to spread the distribution of an index across the

Q.No Questionsindex tree? (6)A. B-tree indexes. B. Bitmap indexes. C. Reverse-key indexes. D. Function-based indexes.

Answer : C33. Which statement about rebuilding indexes is true? (5)

A. The NOSORT option must be used. B. The new index is built using the table as the data sourceC. A reverse b-tree index can be converted to a normal index. D. Query performance may be affected because the index is not.

Answer : C34. Which view will show a list of privileges that are available for the current session to a

user? (5)A. SESSION_PRIVSB. DBA_SYS_PRIVSC. DBA_COL_PRIVSD. DBA_SESSION_PRIVS

Answer : A35. In which situation is it appropriate to enable the restricted session mode? (7)

A. Creating a tableB. Dropping an indexC. Taking a rollback segment offlineD. Exporting a consistent image of a large number of tables.

Answer : D36. Which three events are logged in the ALERT file? (6)

A. Socket usageB. Block corruption errorsC. User session informationD. Internal errors (ORA-600)E. Database startup activities.

Answer : B,D,E37. Which data dictionary view displays the database character set? (5)

A. V$DATBASEB. DBA_CHARACTER_SETC. NLS_DATABASE_PARAMETERSD. NLS_DATABASE_CHARACTERSET

Answer : C



DBA related Questions (Performance and General DBA activity questions)Q.No Questions

Oracle Architecture1. Which statement best describes the purpose of redo log files? (5)

A. They ensure that log switches are performed efficiently.B. They allow changes to the database to be asynchronously recorded.C. They provide a means to redo transactions in the event of a database failure.D. They record changes that have not yet been committed.

The best answer is C.2. What is the minimum number of redo log file groups that are required for an Oracle

database instance? (3)A. OneB. TwoC. ThreeD. Four

The correct answer is B3. Select the three statements that are true about checkpoints: (6)

A. Checkpoints occur when an automatic log switch is performed. B. Checkpoints occur when the database is shut down with the normal, immediate, or transactional option.C. The DBA cannot force checkpoints. D. Checkpoints automatically occur when the DBA performs a manual log switch.E. Checkpoints are recorded in the alert.log file by default.

The correct answers are A, B, and D.4. What four parameters most affect SGA size? (5)

A. SGA_MAX_SIZEB. SHARED_POOL_SIZE C. DB_CACHE_SIZED. LARGE_POOL_SIZEE. LOG_BUFFERS

The correct answers are B, C, D, and E.Physical and Logical Schema

5. Select two characteristics of locally managed tablespaces: (6)A. Extents are managed by the data dictionary.B. A bitmap in the datafile keeps track of the free or used status of blocks in the datafile.C. Each segment stored in the tablespace can have a different storage clause.D. No coalescing is required.E. UNDO is not generated when allocation or deallocation of extents occurs.

The correct answers are B, D, and E.6. What type of tablespace is best for managing sort operations? (3)

A. UNDO tablespaceB. SYSTEM tablespaceC. Temporary tablespaceD. Permanent tablespace

The correct answer is C.Oracle Networking

7. What is the purpose of Oracle Net Services in an Oracle database environment? (3)A. Process requests within the databaseB. Establish the connection between a client and the databaseC. Maintain data integrity within the client applicationD. Start the listener process

B is the correct answer.Background Processes and Oracle Configuration

8. When using dynamic service registration, the process monitor (PMON) reads initialization parameters using what file? (3)

A. listener.oraB. tnsnames.oraC. sqlnet.oraD. init.ora

The correct answer is D.9. What two init.ora file parameters must be set to support dynamic service registration?( 6)

A. TNS_ADMINB. SERVICE_NAMESC. SID_NAMED. INSTANCE_NAME

The correct answer is both B and D.10. What four statements regarding a dedicated server process environment are true? (6)

A. The user process and server process are separateB. Each user process has its own server processC. The user and server processes can run on different machines to take advantage of distributed processingD. There is a one-to-many ratio between the user and server processesE. Even when the user process is not making a database request, the dedicated server exists but remains idleF. Processing results are sent from the server processes satisfying the request to a dispatcher process Next Steps

The correct answer is A, B, C, and E.11. What two parameters are required to configure an Oracle shared server process

environment? (6)A. SHARED_SERVERSB. CIRCUITSC. MAX_SHARED_SERVERSD. DISPATCHERSE. MAX_DISPATCHERS

The correct answer is A and D.12. In a dedicated server configuration, the contents of the PGA include which of the

following three pieces of data? (6)A. User session dataB. Stack space

C. Cursor stateD. Shared pool and other memory structures

A, B, and C is the correct answer.13. What command would you execute to decrease the size of the shared pool from 50MB to

20MB? (5)A. ALTER SESSION set SHARED_POOL_SIZE 50m;B. ALTER SYSTEM set SHARED_POOL_SIZE = 50m;C. ALTER SYSTEM set SHARED_POOL_SIZE 20m;D. ALTER SYSTEM set SHARED_POOL_SIZE = 20m;

The correct answer is D.14. Oracle Managed Files are established by setting what two of the following parameters?(6)

A. DB_FILE_CREATE_DESTB. DB_FILE_NAME_CONVERTC. DB_FILESD. DB_CREATE_ONLINE_LOG_DEST_N

The correct answers are A and D.SQL

15. What ALTER SYSTEM command sets the default directory for OMF? (5)A. ALTER SYSTEM set DB_CREATE_FILE_DEST '/u01/exam_files';B. ALTER SYSTEM set DB_CREATE_ONLINE_LOG_DEST_1 '/u01/exam_files';C. ALTER SYSTEM set DB_CREATE_FILE_DEST = '/u01/exam_files';D. ALTER SYSTEM set DB_CREATE_ONLINE_LOG_DEST_2 = '/u01/exam_files';

The correct answer is C.16. Which of the following commands will fail if you are using OMF? (4)

A. CREATE TABLESPACE ocp_data size 2m;B. CREATE TABLESPACE ocp_data datafile '/u01/exam.ora' size 2m;C. CREATE TABLESPACE ocp_data datafile 2m;D. CREATE TABLESPACE ocp_data;

The correct answer is A.17. Which one of the following statements is true regarding PCTFREE? (4)

A. It specifies the minimum percentage of used space that the Oracle server tries to maintain for each data block of the table.B. It specifies the percentage of space in each block reserved for growth resulting from changes.C. The default value is 40 percent.D. It is not used for index segments.

The correct answer is B18. Match the STATSPACK activity (from the first set of answers, uppercase A-D) with the

correct statement to perform the activity (from the second set of answers, lowercase a-d). (6)

A. Install STATSPACK a. $ORACLE_HOME/rdbms/admin/ spauto.sqlB. Collect statistics b. $ORACLE_HOME/rdbms/admin/ spreport.sql

C. Automatically collect statistics c. $ORACLE_HOME/rdbms/admin/ spcreate.sqlD. Produce a report d. execute STATSPACK.snap

The correct matches are A and c, B and d, C and a, and D and b.19. What two memory areas are located within the shared pool? (3)

A. Library cacheB. Large poolC. Keep buffer poolD. Dictionary cacheE. Default buffer pool

The correct answers are A and D.20. What init.ora parameter determines the amount of memory that a server process should

use for a sort process? (3)A. SORT_AREA_SIZEB. SORT_AREA_RETAINED_SIZEC. SORT_MEMORYD. SORT_DISK

The correct answer is A.21. What two statements are true with regard to when a dedicated server configuration is

used? (5)A. Sort space is part of the shared pool.B. Sort space is part of the PGA.C. Sort space is part of the large pool.D. The parameter SORT_AREA_RETAINED_SIZE can be set dynamically.

The correct answers are B and D.22. Consider the following SQL statement: (4)

select last_name, first_name, department, salary from employee order by last_name;

As the DBA, assume that you did some research and found the following:

1. There is one index on the employee table that is based on the employee_id column. 2. The user SCOTT who executes the statement has the permanent tablespace USER01 assigned as both his default and temporary tablespaces. 3. The sort operation is too large to fit within the memory space specified by SORT_AREA_SIZE.

What two actions can you take to optimize the query?

A. Assign SCOTT to a temporary tablespace that will be used for sort segments. B. Increase the value of SORT_AREA_RETAINED_SIZE. C. Create an index on the last_name column of the employee table. D. Employ a shared server configuration to allow the sort space to be allocated in the shared pool instead of the PGA.

The correct answers are A and C.Oracle Objects

23. Select the statements that are true about bitmap index structures: (5)A. Use for high-cardinality columns.B. Good for multiple predicates.C. Use minimal storage space.D. Best for read-only systems.E. Updates on key values are relatively inexpensive.F. Good for very large tables.

The correct answers are B, C, D, and F.24. Select the statements that are true about materialized views: (5)

A. Fast refreshes apply only to changes made since the last refresh. B. A complete refresh of a materialized view involves truncating existing data and reinserting all the data based on the detail tables. C. A view defined with a refresh type of Force always performs a complete refresh. D. A refresh type of Never suppresses all refreshes of the materialized view. E. Materialized views use the same internal mechanism as snapshots for refresh.

The correct answers are A, B, D, and E.25. What are external tables in oracle? (5)

Answer:External tables allow Oracle to query data that is stored outside the database in flat files. The ORACLE_LOADER driver can be used to access any data stored in any format that can be loaded by SQL*Loader. No DML can be performed on external tables but they can be used for query, join and sort operations. They are useful in the ETL process of data warehouses since the data doesn't need to be staged and can be queried in parallel. They should not be used for frequently queried tables.

Steps to create :1. Directory Object needs to be created.2. Edit the flat file and put it in the respective path3. Create Table

Syntax :

Create table< Table Name> ( Column Names…..)Organization External( TYPE ORACLE_LOADER DEFAULT Directory <Directory Name>Access Parameters( Records delimited by <char>, fields terminated by <char> ( column name 1…2…))

location (‘filename with path’));


BI & DW TechnologiesQ. No. Question

1. How do you decide whether to build data marts or an EDW? (5)

Answer:Data marts are logical/physical subsets of Enterprise Data Warehouse (EDW). This question has more to do with Kimball's view (Bottom-Top Approach) or Inmon's view (Top-Down Approach) of building a EDW. In the bottom top approach (Kimball's view), we build physical datamarts (catering to particular departments within organization viz. Sales, HR etc.), integrate it with other data marts over a period of time using conformed dimensions and build the Bus Architecture (EDW).In the top down approach (Inmon's view), we build the bigger picture i.e. the EDW (for the entire organization) and later depending on specific departmental requirements, separate logical / physical subsets of the EDW in form of data marts.Criteria for choice of approach:If the customer wants quick results involving shorter iteration of development, go in for data mart. It will give the customer confidence of using BI and DW as the ideal technology for taking strategic decisionsData marts are easier to manage.If the customer has the budget and can wait for a longer time for results, go in for the EDW.But during the long development cycle of EDW, business sponsors may change and the priorities may also change leading to termination of EDW project.

2. What are slowly changing dimensions? What are the various methods of handling them?(5)

Answer:Dimension which change over time are called Slowly Changing Dimensions.

1) Type 1 – The old record is updated with the new data. History is not maintained.2) Type 2 – History of changes are maintained by adding new records.3) Type 3 – A new attribute is created to indicate whether the record is Original or Current.

Only the original and Current values are maintained.3. What is an Operational Data Store (ODS)? How different it is from a data warehouse? (5)

Answer:ODS :- It is a subject-oriented, integrated, volatile, current-valued, collection of data in support of an

organization's need for up-to-the second, operational, integrated, collective information. ODS is strictly operational construct and it provides mission critical data. It does not store

the summary information because it is dynamic. An operational data store (ODS) is a type of database often used as an interim area for a data

warehouse. Unlike a data warehouse, which contains static data, the contents of the ODS are updated through the course of business operations. An ODS is designed to quickly perform relatively simple queries on small amounts of data (such as finding the status of a customer order), rather than the complex queries on large amounts of data typical of the data warehouse. An ODS is similar to your short term memory in that it stores only very recent information; in comparison, the data warehouse is more like long term memory in that it stores relatively permanent information.

Distinctive traits of ODS and DW :-

http://searchDatabase.techtarget.com/sDefinition/0,,sid13_gci211895,00.html



Q. No. Question ODS has volatile data while DW contains non-volatile data. ODS contains only the current data whereas DW contains both current and historical data.

i.e., DW contains data that is no more current than 24 hours. But, ODS contains the data that may be only seconds old.

Major difference is ODS contains Detailed data but DW contains both detailed and summary data

Types of data in ODS is different than DW. ODS has system of record which is formal identification of data in the legacy systems that feeds ODS. But DW has summary based data that are stored for analysis and reporting.

ODS can be source for DW but DW cannot be source for ODS.4. What are the data warehouse architecture goals?

(5)

Anwer:Different kinds of data warehouse architecture that we can design.Top-down architecture.Bottom-Up architecture.Federated architecture.All the architecture are useful for analysis and business decision making.Some of the goals are

Easy analysis by maintaining centralized historical,integrated warehouse database. analyze by writing complex queries for decision making according to business

Requirements. creating data marts specific to subject area for maintaining security to end users. We cannot modify the data in the warehouse database and just we can append new

sources for multidimensional analysis. We can query the data in different dimensions and generate reports in different forms. Forecasting existing data depending on time period. Mostly useful for analysis at corporate level rather than transaction processing. extensibility by anticipating future end-user needs and providing a "roadmap" that

reveals where such needs are addressed (e.g. where and how does the financial budget management tool fit into the data warehouse architecture?).

reusability by documenting reusable components, processes, etc. (e.g. after documenting and revising the process of building the first data mart, the process should be reused to build subsequent data marts).

improved productivity by enabling reusability and revealing where specific tools may be necessary to automate data warehouse processes (e.g. how will the incoming data be analyzed and cleansed?).

5. What is a star schema? (2)

Answer:In the Star Schema there is a central fact table which holds the numerical measurements of the business are stored. This is surrounded by Dimension tables which contains the textual description of the dimensions of the business. The dimension tables have only a single join attaching them to the central fact table.

6. What is a snowflake schema? (2)

Answer:

Q. No. QuestionThis is an extension of the star schema; here low cardinality redundant attributes are moved to sub-dimension tables. This is done to save storage space for large dimensions.

7. What is a surrogate key? Justify its usage in a DWH environment (5)

Answer:Surrogate keys are used to uniquely identify a record in a DWH environment. The following situations may arise in a OLTP production environment resulting in the need for generating / using surrogate keys.

Production may reuse keys that it has purged but that you are still maintaining. Production may make a mistake and reuse a key even when it isn’t supposed to. This

happens frequently in the world of UPCs in the retail world, despite everyone's best intentions. Production may re compact its key space because it has a need to garbage-collect the

production system. Production may legitimately overwrite some part of a product description or a customer

description with new values but not change the product key or the customer key to a new value.

Production may generalize its key format to handle some new situation in the transaction system. Now the production keys that used to be integers become alphanumeric. Or perhaps the 12-byte keys have become 20 byte keys.

The company has made an acquisition and there is a need to merge more than a million new customers into the master customer list. The newly acquired production system has nasty production keys that don't look remotely like others.

8. How will we capture the information in an Order Fact table with granularity of order - item level in the following scenario -

An order having more than one item of same nature eg. Two Times Magazine having same Universal Magazine Code. (3)

Answer: In this scenario, the composite key (based on the combination of dimension table primary keys) will fail to give a unique record. We need to have a surrogate key defined for the fact table which can be simple sequence numbers. This will serve as unique identifier of each record in the fact table.

9. What are the Components of typical data warehouse architecture? (5)

Answer:Note: Below is the detailed explanation for the above question, if the candidate addresses the key points with relevant explanation award marks.

Key Component Areas A complete data warehouse architecture includes data and technical elements. Thornthwaite breaks down the architecture into three broad areas. The first, data architecture, is centered on business processes. The next area, infrastructure, includes hardware, networking, operating systems, and desktop machines. Finally, the technical area encompasses the decision-making technologies that will be needed by the users, as well as their supporting structures. These areas are detailed in the sub-sections below.

Data Architecture

Q. No. QuestionAs stated above, the data architecture portion of the overall data warehouse architecture is driven by business processes. For example, in a manufacturing environment the data model might include orders, shipping, and billing. Each area draws on a different set of dimensions. But where dimensions intersect in the data model the definitions have to be the same—the same customer who buys is the same that builds. So data items should have a common structure and content, and involve a single process to create and maintain.

Thornthwaite says that organizations often ask how data should be represented in the warehouse—entity/relationship or dimensional? “If you have a star schema then use dimensional. Is your detail normalized or dimensional? Will users be querying detail? Then use dimensional.” He adds that most data warehousing experts are in substantial agreement; the [data] sources are typically entity/relationship models and the front end is a dimensional model. The only issue is where you draw the line between the warehouse itself and the data staging area.

As you work through the architecture and present data to your users, tool choices will be made, but many choices will disappear as the requirements are set. For example, he explains that product capabilities are beginning to merge, like MOLAP and ROLAP. “MOLAP is okay if you stay within the cube you've built. It's fast and allows for flexible querying—within the confines of the cube.” Its weaknesses are size (overall and within a dimension), design constraints (limited by the cube structure), and the need for a proprietary data base.

Infrastructure Architecture With the required hardware platform and boxes, sometimes the data warehouse becomes its own IS shop. Indeed, there are lots of “boxes” in data warehousing, mostly used for data bases and application servers.

The issues with hardware and DBMS choices are size, scalability, and flexibility. In about 80 percent of data warehousing projects this isn't difficult; most businesses can get enough power to handle their needs.

In terms of the network, check the data sources, the warehouse staging area, and everything in between to ensure there's enough bandwidth to move data around. On the desktop, run the tools and actually get some data through them to determine if there's enough power for retrieval. Sometimes the problem is simply with the machine, and the desktops must be powerful enough to run current-generation access tools. Also, don't forget to implement a software distribution mechanism.

Technical Architecture The technical architecture is driven by the meta data catalog. “Everything should be meta data-driven,” says Thornthwaite. “The services should draw the needed parameters from tables, rather than hard-coding them.” An important component of technical architecture is the data staging process, which covers five major areas:

Extract - data comes from multiple sources and is of multiple types. Data compression and encryption handling must be considered at this area, if it applies.

Transform - data transformation includes surrogate key management,

Q. No. Questionintegration, de-normalization, cleansing, conversion, aggregation, and auditing.

Load - loading is often done to multiple targets, with load optimization and support for the entire load cycle.

Security - administrator access and data encryption policies. Job control - this includes job definition, job scheduling (time and event),

monitoring, logging, exception handling, error handling, and notification.

The staging box needs to be able to extract data from multiple sources, like MVS, Oracle, VM, and others, so be specific when you choose your products. It must handle data compression and encryption, transformation, loading (possibly to multiple targets), and security (at the front end this is challenging, Thornthwaite says). In addition, the staging activities need to be automated. Many vendors' offerings do different things, so he advises that most organizations will need to use multiple products.

A system for monitoring data warehouse use is valuable for capturing queries and tracking usage, and performance tuning is also helpful. Performance optimization includes cost estimation through a “governor” tool, and should include ad hoc query scheduling. Middleware can provide query management services. Tools for all of these and other related tasks are available for the front end, for server-based query management, and for data from multiple sources. Tools are also available for reporting, connectivity, and infrastructure management. Finally, the data access piece should include reporting services (such as publish and subscribe), a report library, a scheduler, and a distribution manager.

10. Explain the role of metadata in data warehousing environment? Who are the users of metadata? (5)

Answer:Role of Metadata:Provide a simple catalogue of business metadata descriptions and views Document/manage metadata descriptions from an integrated development environmentEnable DW users to identify and invoke pre-built queries against the data stores Design and enhance new data models and schemas for the data warehouse Capture data transformation rules between the operational and data warehousing databasesProvide change impact analysis, and update across these technologies Users of Metadata:Technical Users - Warehouse Administrators, Application DevelopersBusiness Users

11. What are the different phases involved in data warehousing development lifecycle? (5)

Answer:The different phases involved in datawarehousing development life cycle are

1) Planning2) Gathering Data requirements and modeling3) Physical database design and development4) Development and Implementation5) Deployment

12. What are fact less fact tables? Mention the different kinds of factless fact tables and explain with an example. (8)

Q. No. Question

Answer:Fact tables which have no measured facts. There are two kinds

1) Fact table to record events. – Example - Student Attendance at a college.The dimensions include - Date : One record for each day on the calendar.Student : one record for each student Course : One record for each course taught each semester.Teacher: one record for each teacher.Facility : one record for each room, laboratory or athletic field.

The grain of the fact table is the individual student attendance event.When the student walks through the door into the lecture , a record is generated.It is clear that the these dimensions ae all well defined and the fact table record, consisting of just the five keys, is a good representation of the student atendance event.

The only problem is that the there is no obvious fact to record each time a student attends a lecture or suits up for pysical rejection. This table records the student attendance process and not a semester grading process.

2) Coverage Table. – Coverage tables are frequently needed, when a primary fact table in a dimensional datawarehouse is sparse.

For Eg, let us consider a sales fact table. This records the sales details or sale events that happened. This cannot tell what did not happen i.e., "which products were on promotion that did not sell?" This is because this contais only the products that did sell.

The coverage table comes to the rescue. A record is placed in the coverage table for each product in each store that is on promotion in each time period. The items not on promotion that also did not sell can be left out.

Answering the question, "Which products were on promtion that did not sell?" requires a two step application. First, consult the coverage table for the liSt of products on that day in that store. Second, consult ths sales table for the list of products that did sell. The desired answer is the set difference between these two list of products.

Another application of coverage table - It is useful for recording the assignment of sales team to customers in businesses in which the sales teams make occasional very large sales. In such a business, the sales fact table is too sparse to provide a good place to record which sales team were associated with which customers, even if some of the combination never resulted in a sale.

13. What are the basic load types required while building DW: (4)

Answer:Full Load: Truncate the target table and reload all rows from the source. This type of load is the simplest to implement, as there is no update type involved, and no need to do a lookup to the target table to establish whether a row is required to be updated or inserted. Here, the target table will be truncated and loaded and this type of load is mainly used for staging table loads.

Q. No. Question

Incremental Load: Only load the rows which are either new or have been updated since they were originally loaded. This type of load is more complex to implement and can be achieved in one of two ways

1) Select all rows from the source table and enable the ETL process to establish which rows are to be updated, inserted and ignored. This is implemented through a lookup to the target table.

Only process the rows, which are new or have been updated since the last extract. This would involve discarding the Rows, which are not changed since the last extract.

14. What are the basic issues with Snowflake Schema (3)

Answer: Only few tools optimized for this schema More complex presentation as, not all the tools are not optimized for this. Browsing is slower/Report drill down process is also slower Problems with multiple join

15. What are the basic steps needed to be considered for Dimensional modeling? (4)

Answer: Identify the Business Process

o A major operational process that is supported by some kind of legacy system(s) from which data can be collected for the purpose of the data warehouse

o Example: orders, invoices, shipments, inventory, sales Identify the Grain

o The fundamental lowest level of data represented in a fact table for the business process

o Example: individual transactions, individual daily snapshots Identify the Dimensions

o Choose the dimensions that will apply to each fact table record Identify the Facts

o Choose the measured facts that will populate each fact table recordWhat are the various types of facts? Describe and give 2 examples each. (3)

Answer:Additive Facts: These facts are meaningfully added along any dimensions. (Ex: dollar sales, Unit sales and Unit sold)Non-Additive Fact: Order No, Item No, Unit PriceSemi-Additive Facts: Some of these tables may be added along certain dimensions but not all dimensions Ex: Customer counts which is non-additive to product dimension but additive to customer dimension, Average Account balance is semi-additive over Time Dimension.

16. What are the different architectural approaches of DM? (4) Answer:

Satellite data mart Tactical and quickly developed data mart Feeder data marts Partition data marts

17. When speed and cost are a constraint what is the architectural approach you will adopt for

Q. No. Questionyour DM? (2)

Answer:Tactical and quickly developed data mart,

A tactical data mart is typically completed in 90- 120 days rather than the much longer time required for a full-scale data warehouse.

Time to develop a system is less, the cost automatically becomes less. 18. Which is a subject-oriented view of a data warehouse? (2)

Answer:Data mart is subject-oriented view of a data warehouse because its related to specific subject area like Accounting, Finance, HR, MM, PM etc.

19. What is the main difference between Data Warehousing and Business Intelligence? (4)

Answer:The differentials are:

DW - is a way of storing data and creating information through leveraging data marts. DM's are segments or categories of information and/or data that are grouped together to provide 'information' into that segment or category. DW does not require BI to work. Reporting tools can generate reports from the DW.

BI - is the leveraging of DW to help make business decisions and recommendations. Information and data rules engines are leveraged here to help make these decisions along with statistical analysis tools and data mining tools.

You will find that BI is much like ERP in that it can be extremely expensive and invasive to your firm and there is a wide range between the offerings - low end to high end - which facilitates the pricing. There is a long list of tools to select from. There are also services that provide this as an outsource. Some of these services will allow you to eventually 'own' the solution and in-source it at a future date. Like anything else, this comes at a price. This scenario works well for those who do not have a high caliber IT staff and would like to get results with a short ramp up time, basically because the system is already built. Your rules and reports just have to be generated. That is a bit oversimplified, but you get the picture.




Data AnalysisQ.No Questions1. What is data analysis and what are the essential steps involved in data analysis. (4)

Answer:Data analysis is a process where in data from a source system is analyzed for its completeness, for integrity of data, for data quality; identify the missing elements in the source data, identifying the codification problem in the data coming from various systems.

Data analysis essentially involves the following steps to be carried out. Identify source systems Data profiling involving the following activities.

o Identify the data patterns across the source systems.o Identify the business rules associated with the data.o Identify the missing data elements.

Prepare a data analysis document, which gives the details of various data files / tables from the source system. Data integrity rules, business rules associated with the data elements in the source system.

2. What is data profiling? What will be the outcome of data profiling stage in data analysis process? (4)

Answer:Data profiling is the process of understanding the data, its charecterisitcs and relationship with its associates.Following will be the outcome of data profiling activity

a. A source (data file / tables) identification documentb. Source to target mapping documentc. Enumeration definition documentd. Data dictionary identifying all the source fields and their descriptions.

3. What should be done to analyse the following ?a) heterogenous system b) homogenous systemc) large volume of data needs (2)

Answer:Data Sampling.

4. What is data quality analysis and what are the activities involved in data quality analysis. (4)

Answer:Data quality analysis is the process to examine inconsistencies in the source data. This involves identifying the inconsistencies in the following area.

1. Codification differences across various sources systems ( ex. Use of different codes / values to convey the same meaning)

2. Multiple entries of the same entity3. Out of range values / Missing values.4. Identifying uniqueness of keys5. Ensuring referential integrity6. Business rules compliance

5. What are the scenarios where data analysis will be helpful ? (3)

Answer:1) Data Migration2) Data Cleansing for DWH3) Data Modelling

6. What is data cleansing? How this process is carried out? (2)

Answer:Data cleansing is an activity which involves identification of unwanted data in the source systems and define rules to filter this data before migrating into the target system. Data cleansing requires processing at field level to eliminate the unwanted data. Hence business rules or validations need to be developed at each filed level which needs to be filtered during this cleansing process. The data rejected due to this process is stored in log files for verification.



Data MigrationQ.No Quetsions

1. What is data migration? (2)

Answer:Data migration is a process where in data from single or multiple source systems need to be transferred into a target system. The source system can be of similar type or heterogeneous in nature. Data from one file or table from the source can split across multiple tables in the target system. The migration process should have a mechanism to reject erroneous records and load them into an error table for validation or correction.

2. What are the most critical steps involved in data migration.? (4)

Answer:Data migration process is critical to any system functioning. Hence this process should ensure that the data migrated is accurate and clean. Data migration has the following processes

a) Data Analysisb) Identify the data to be migrated / Extract the data from source systems.c) Design source and target mappings.d) Define business rules for transforming the data.e) Define the values required for mandatory missing fields from the source data.f) Design error handling and logging mechanism.g) Identify tools / mechanism to carry the migration process.h) Design data validation rules to validate the data migrated.

3. What would be the suggested approach for migrating data from various source systems? i.e. how to consolidate data from various source systems. (2)

Answer:Data migration process migrate data from single source or multiple source systems. The data to be migrated need to be analyzed and profiling should be done to identify the essential data elements from the source system. The data from various source systems can be extracted into data files with a standard format defined. These data files will be loaded into the staging tables during the migration process to consolidate the data from multiple source systems. The staging table will have the same structure as the source data file. The data from this staging table will then be processed and transformed before loading into target tables.

4. What is ETL? (3)

Answer:ETL is basically a combination of

Extraction: Extracting the required data from source systems in a format defined during the data analysis.Transformation: Transforming the source data extracted by applying the rules defined during the data profiling stage to load the data into the respective target tables.Loading: Loading data into the target tables and eliminating unwanted data and storing it in the error tables for verification.

ETL process can be carried out by using of the “commercially available of the shelf tools” or by means of custom developed tools. The decision to go for custom tool or standard application depends on the complexity of the migration process, data volume, and nature of source and

target systems and also the budget allocated for the migration process.5. What is data mapping, what is its role in data migration process? (2)

Answer:Data Mapping involves providing a mapping between the source data fields and target data fields. The mapping can be either direct where a source filed is directly moved into a target field or through a transformation rule on the field or by means of data enrichment where in a new record will be created in the target table based on certain business rules.

6. Which stage of the data migration requires defining business rules? How to apply them for the data being migrated? (3)

Answer:Transformation stage.The business rules defined will be used either to transform the source data as per the target database design or to design validation criteria on the source fields before moving data into the target table fields.

7. How to implement error handling during migration, how to ensure that there is no data loss due to rejections during data migration? (3)

Answer:Error handling and Error logging is most essential for data migration which helps is identifying the corrupt and unwanted data to be removed from the source and log into error tables for verification. The error handling should be strong enough to filter the erroneous records from the source data. The rejected records should be stored in the error tables with the appropriate description to indicate why these records are rejected with the reference to the source data or records.

8. What is staging area? How it is different from the target tables. (2)

Answer:Staging area is a temporary data area in the target database which will hold tables to store the data extracted from various source systems. The table structure in staging area is similar to the source file / table structure.

9. What is transformation? (3)

Answer:In a migration process, it may not be possible to migrate data directly into the target system, due to its design, changes in the business requirements, optimizing the data usage. Hence it is essential that source data need to undergo few changes to cater to the need of target system and the business requirements. The process of applying changes to the source data or enriching the source data before migrating into target system is called ‘Transformation’ process. This process is based on the business rules defined and the default values identified during the data profiling stage.

10. What are the different transformation techniques?Explain each of them. (5)

Answer:The following are the basic techniques used in the transformation process.

a) Structural transformationIn this transformation, there will be change in the structure of source record with respect to the target database. This is a record level transformation process.

b) Content transformation

In content transformation changes happen in the data values of a record. This is an attribute level transformation. In this technique Algorithms, Transformation rules / Tables are used for changing the content of source to that of target database.

c) Functional TransformationFunctional transformation results in the creation of new records in the target database based on the source data. This happens through data aggregation or by combining one or more new attributes from a single source record or multiple source records

11. Following is the requirement for designing error handling and logging mechanism in a data migration process.Data from two source systems having different data structure for storing the same information need to be migrated into a target RDBMS system. The data is very critical to the system; hence there should not be any data loss during the migration process. Hence it is required to capture all the data which is rejected during the migration process along with the reasons why the data got rejected. The error log mechanism should provide features to trace the source records back for the rejected data looking into the error table.

Design an error logging and error handling mechanism for the above requirement. (3)

1) Traceability2) Output to be logged to a reject file / table3) Type of error



wipro interview questions

Documents