answers of adbms

1.)What is SQL? Or write some query in sql.

ANS:- Structured Query Language is a programming language designed for managing data in

relational database management systems (RDBMS).Originally based upon relational algebra and

tuple relational calculus.

Queries in sql

1.) create table employee ( name varchar2(10),id number,designation varchar2(10));

2.) insert into employee values('&name','&id','&designation');

3.) select * from employee;

4.)update employee set id=4 where id=3;

5.) delete from employee where name='jkl';

6.) insert into employee values('ghj',3,'tyu');

7.)select distinct designation from employee;

8.)alter table employee add salary number;

9.)describe employee;

10.) create view V1 as select name,id from employee;

11.)select name from employee order by id DESC;

12.) alter table employee rename to emp1;

13.) drop table emp1;

2.)What is DDL, DML and DCL ?

ANS:-

Data Definition Language (DDL) statements are used to define the database structure or schema. Some examples:

o CREATE - to create objects in the database

http://en.wikipedia.org/wiki/Tuple_relational_calculus

http://en.wikipedia.org/wiki/Relational_algebra

http://en.wikipedia.org/wiki/Relational_database_management_system

http://en.wikipedia.org/wiki/Programming_language

o ALTER - alters the structure of the databaseo DROP - delete objects from the databaseo TRUNCATE - remove all records from a table, including all spaces allocated for the

records are removedo COMMENT - add comments to the data dictionaryo RENAME - rename an object

Data Manipulation Language (DML) statements are used for managing data within schema objects. Some examples:

o SELECT - retrieve data from the a databaseo INSERT - insert data into a tableo UPDATE - updates existing data within a tableo DELETE - deletes all records from a table, the space for the records remaino MERGE - UPSERT operation (insert or update)o CALL - call a PL/SQL or Java subprogramo EXPLAIN PLAN - explain access path to datao LOCK TABLE - control concurrency

Data Control Language (DCL) statements. Some examples:

o GRANT - gives user's access privileges to databaseo REVOKE - withdraw access privileges given with the GRANT command

3.)What is embedded SQL?

ANS:-

Embedded SQL is a method of combining the computing power of a programming language and

the database manipulation capabilities of SQL. The embedded SQL statements are parsed by an

embedded SQL preprocessor and replaced by host-language calls. The output from the

preprocessor is then compiled by the host compiler. This allows programmers to embed SQL

statements in programs Written in any number of languages such as: C/C++, COBOL and

Fortran.

Systems that support Embedded SQL

1.IBM DB2

http://en.wikipedia.org/wiki/Fortran

http://en.wikipedia.org/wiki/COBOL

http://en.wikipedia.org/wiki/Category:C_programming_language_family

http://en.wikipedia.org/wiki/Compiler

http://en.wikipedia.org/wiki/Call_site

http://en.wikipedia.org/wiki/Preprocessor

http://en.wikipedia.org/wiki/Parsing

http://en.wikipedia.org/wiki/SQL

http://en.wikipedia.org/wiki/Data_Manipulation_Language

http://en.wikipedia.org/wiki/Database

http://en.wikipedia.org/wiki/Programming_language

http://en.wikipedia.org/wiki/Computing

2.Microsoft SQL Server

3.MySQL

Systems that do not support Embedded SQL

1.Microsoft SQL Server

2.Sybase

4.)What is Object oriented data base ?

ANS:-

An object database (also object-oriented database management system) is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are different from relational databases and belongs together to the broader database management system.

ODBMS Characteristics

Encapsulation: Types and Classes: Inheritance: Complex Objects Object Identity: Extensibility Overriding and overloading Computational completeness Secondary storage management Concurrency Recovery

5.)What is the difference between DBMS and RDBMS?

ANS:-

DBMS vs. RDBMS

• Relationship among tables is maintained in a RDBMS whereas this not the case DBMS as it is used to manage the database.

• DBMS accepts the ‘flat file’ data that means there is no relation among different

http://en.wikipedia.org/wiki/Database_management_system

http://en.wikipedia.org/wiki/Relational_database

http://en.wikipedia.org/wiki/Object-oriented_programming

http://en.wikipedia.org/wiki/Object-oriented_programming

http://en.wikipedia.org/wiki/Object_(computer_science)



data whereas RDBMS does not accepts this type of design.

• DBMS is used for simpler business applications whereas RDBMS is used for more complex applications.

• Although the foreign key concept is supported by both DBMS and RDBMS but its only RDBMS that enforces the rules.

• RDBMS solution is required by large sets of data whereas small sets of data can be managed by DBMS.

6.)What are CODD rules?

ANS:-

Codd's twelve rules are a set of thirteen rules designed to define what is required from a

database management system in order for it to be considered relational, i.e., a relational database

management system (RDBMS).

The rules

Rule (0): The system must qualify as relational, as a database, and as a management system.

For a system to qualify as a relational database management system (RDBMS), that system must use its relational facilities (exclusively) to manage the database.

Rule 1: The information rule:

All information in the database is to be represented in only one way, namely by values in column positions within rows of tables.

Rule 2: The guaranteed access rule:

All data must be accessible. This rule is essentially a restatement of the fundamental requirement for primary keys. It says that every individual scalar value in the database must be logically addressable by specifying the name of the containing table, the name of the containing column and the primary key value of the containing row.

Rule 3: Systematic treatment of null values:

http://en.wikipedia.org/wiki/Row_(database)

http://en.wikipedia.org/wiki/Table_(database)

http://en.wikipedia.org/wiki/Unique_key


http://en.wikipedia.org/wiki/RDBMS

http://en.wikipedia.org/wiki/Management_system


http://en.wikipedia.org/wiki/Relational_model

http://en.wikipedia.org/wiki/RDBMS


http://www.differencebetween.com/category/technology/electronics/smart-phones/applications-smart-phones/

The DBMS must allow each field to remain null (or empty). Specifically, it must support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number", in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way.

Rule 4: Active online catalog based on the relational model:

The system must support an online, inline, relational catalog that is accessible to authorized users by means of their regular query language. That is, users must be able to access the database's structure (catalog) using the same query language that they use to access the database's data.

Rule 5: The comprehensive data sublanguage rule:

The system must support at least one relational language that

1. Has a linear syntax2. Can be used both interactively and within application programs,3. Supports data definition operations (including view definitions), data

manipulation operations (update as well as retrieval), security and integrity constraints, and transaction management operations (begin, commit, and rollback).

Rule 6: The view updating rule:

All views that are theoretically updatable must be updatable by the system.

Rule 7: High-level insert, update, and delete:

The system must support set-at-a-time insert, update, and delete operators. This means that data can be retrieved from a relational database in sets constructed of data from multiple rows and/or multiple tables. This rule states that insert, update, and delete operations should be supported for any retrievable set rather than just for a single row in a single table.

Rule 8: Physical data independence:

Changes to the physical level (how the data is stored, whether in arrays or linked lists etc.) must not require a change to an application based on the structure.

Rule 9: Logical data independence:

http://en.wikipedia.org/wiki/View_(database)

http://en.wikipedia.org/wiki/Database_transaction

http://en.wikipedia.org/wiki/Linear_syntax

http://en.wikipedia.org/wiki/Query_language

http://en.wikipedia.org/wiki/Database_catalog


http://en.wikipedia.org/wiki/Online

http://en.wikipedia.org/wiki/Data_type

http://en.wikipedia.org/wiki/Systematic

Changes to the logical level (tables, columns, rows, and so on) must not require a change to an application based on the structure. Logical data independence is more difficult to achieve than physical data independence.

Rule 10: Integrity independence:

Integrity constraints must be specified separately from application programs and stored in the catalog. It must be possible to change such constraints as and when appropriate without unnecessarily affecting existing applications.

Rule 11: Distribution independence:

The distribution of portions of the database to various locations should be invisible to users of the database. Existing applications should continue to operate successfully :

1. when a distributed version of the DBMS is first introduced; and2. when existing distributed data are redistributed around the system.

Rule 12: The nonsubversion rule:

If the system provides a low-level (record-at-a-time) interface, then that interface cannot be used to subvert the system, for example, bypassing a relational security or integrity constraint.

7.)What is deadlock?

ANS:-

A deadlock is a situation where in two or more competing actions are each waiting for the other to finish, and thus neither ever does.

In an operating system, a deadlock is a situation which occurs when a process enters a waiting state because a resource requested by it is being held by another waiting process, which in turn is waiting for another resource. If a process is unable to change its state indefinitely because the resources requested by it are being used by other waiting process, then the system is said to be in a deadlock.

http://en.wikipedia.org/wiki/Resource

http://en.wikipedia.org/wiki/Process_states

http://en.wikipedia.org/wiki/Operating_system


http://en.wikipedia.org/wiki/Integrity_constraints

http://en.wikipedia.org/wiki/File:Process_deadlock.svg

Both processes need both resources. P1 requires additional resource R1, P2 requires additional resource R2; neither process can continue.

Necessary conditions

A deadlock situation can arise only if all of the following conditions hold simultaneously in a system:

1. Mutual Exclusion: At least one resource must be non-shareable. Only one process can use the resource at any given instant of time.

2. Hold and Wait or Resource Holding: A process is currently holding at least one resource and requesting additional resources which are being held by other processes.

3. No Preemption: The operating system must not de-allocate resources once they have been allocated; they must be released by the holding process voluntarily.

4. Circular Wait: A process must be waiting for a resource which is being held by another process, which in turn is waiting for the first process to release the resource. In general, there is a set of waiting processes, P = {P1, P2, ..., PN}, such that P1 is waiting for a resource held by P2, P2 is waiting for a resource held by P3 and so on till PN is waiting for a resource held by P1.

8.)Explain inheritance in object relational database.

9.)Give an example for SQL functions.

ANS:-

SQL has many built-in functions for performing calculations on data.

SQL Aggregate Functions

SQL aggregate functions return a single value, calculated from values in a column.

Useful aggregate functions:

AVG() - Returns the average value COUNT() - Returns the number of rows FIRST() - Returns the first value LAST() - Returns the last value MAX() - Returns the largest value MIN() - Returns the smallest value SUM() - Returns the sum

http://en.wikipedia.org/wiki/Set

http://en.wikipedia.org/wiki/Circular_reference

http://en.wikipedia.org/wiki/Preemption

http://en.wikipedia.org/wiki/Mutually_exclusive_events

http://en.wikipedia.org/wiki/File:Process_deadlock.svg

SQL Scalar functions

SQL scalar functions return a single value, based on the input value.

Useful scalar functions:

UCASE() - Converts a field to upper case LCASE() - Converts a field to lower case MID() - Extract characters from a text field LEN() - Returns the length of a text field ROUND() - Rounds a numeric field to the number of decimals specified NOW() - Returns the current system date and time FORMAT() - Formats how a field is to be displayed

10.)What is a Trigger?

ANS:-

A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for keeping the integrity of the information on the database. For example, when a new record (representing a new worker) is added to the employees table, new records should be created also in the tables of the taxes, vacations, and salaries.

The following are major features of database triggers and their effects:

triggers do not accept parameters or arguments (but may store affected-data in temporary tables)

triggers cannot perform commit or rollback operations because they are part of the triggering SQL statement (only through autonomous transactions)

The four main types of triggers are:

1. Row Level Trigger: This gets executed before or after any column value of a row changes2. Column Level Trigger: This gets executed before or after the specified column changes3. For Each Row Type: This trigger gets executed once for each row of the result set caused

by insert/update/delete4. For Each Statement Type: This trigger gets executed only once for the entire result set,

but fires each time the statement is executed.

11.) The first DB2 product was released in the year _1983on MVS mainframe platform.

http://en.wikipedia.org/wiki/Multiple_Virtual_Storage


http://en.wikipedia.org/wiki/View_(database)


http://en.wikipedia.org/wiki/Procedural_code

12.) What is a view?

ANS:-

A View in Oracle and in other database systems is simply the representation of a SQL statement that is stored in memory so that it can easily be re-used. For example, if we frequently issue the following query

To create a view use the CREATE VIEW command as seen in this example

CREATE VIEW view_uscustomersASSELECT customerid, customername FROM customers WHERE countryid='US';

This command creates a new view called view_uscustomers. Note that this command does not result in anything being actually stored in the database at all except for a data dictionary entry that defines this view. This means that every time you query this view, Oracle has to go out and execute the view and query the database data. We can query the view like this:

Benefits of using Views

Commonality of code being used. Since a view is based on one common set of SQL, this means that when it is called it’s less likely to require parsing.

Security. Views have long been used to hide the tables that actually contain the data you are querying. Also, views can be used to restrict the columns that a given user has access to.

Predicate pushing

13.) What is business intelligence ? How it can be achieved ?

ANS:-

Business intelligence (BI) mainly refers to computer-based techniques used in identifying, extracting,[clarification needed] and analyzing business data, such as sales revenue by products and/or departments, or by associated costs and incomes.

business intelligence reporting technologies designed to improve the productivity of business analysts and preserve information consistency throughout an organization.

Business Intelligence is all about making decisions. Businesses often want to make intelligent decisions and in order to do that they employ people with certain skills and experience, they create processes and use latest technologies to help achieve their goal

14.) What are Normal forms?

The normal forms (abbrev. NF) of relational database theory provide criteria for determining a table's degree of vulnerability to logical inconsistencies and anomalies. The higher the normal form applicable to a table, the less vulnerable it is to inconsistencies and anomalies. Each table has a "highest normal form" (HNF): by definition, a table always meets the requirements of its HNF and of all normal forms lower than its HNF; also by definition, a table fails to meet the requirements of any normal form higher than its HNF.

The main normal forms are summarized below.

Normal form Brief definitionFirst normal form (1NF) Table faithfully represents a relation and has no repeating groupsSecond normal form (2NF)

No non-prime attribute in the table is functionally dependent on a proper subset of any candidate key

Third normal form (3NF)

Every non-prime attribute is non-transitively dependent on every candidate key in the table

Elementary Key Normal Form (EKNF)

Every non-trivial functional dependency in the table is either the dependency of an elementary key attribute or a dependency on a superkey

Boyce–Codd normal form (BCNF)

Every non-trivial functional dependency in the table is a dependency on a superkey

Fourth normal form (4NF)

Every non-trivial multivalued dependency in the table is a dependency on a superkey

Fifth normal form (5NF)Every non-trivial join dependency in the table is implied by the superkeys of the table

Domain/key normal form (DKNF)

Every constraint on the table is a logical consequence of the table's domain constraints and key constraints

Sixth normal form (6NF)Table features no non-trivial join dependencies at all (with reference to generalized join operator)

15.) What are the different types of joins in SQL?

ANS:-

An SQL join clause combines records from two or more tables in a database. It creates a set that can be saved as a table or used as is. A JOIN is a means for combining fields from two tables by using values common to each.

TYPES:-

Inner join

An inner join is the most common join operation used in applications and can be regarded as the default join-type. Inner join creates a new result table by combining column values of two tables (A and B) based upon the join-predicate.


http://en.wikipedia.org/wiki/SQL

2.) Equi-join

An equi-join, also known as an equijoin, is a specific type of comparator-based join, or theta join, that uses only equality comparisons in the join-predicate.

3.) Natural join

A natural join is a type of equi-join where the join predicate arises implicitly by comparing all columns in both tables that have the same column-names in the joined tables. The resulting joined table contains only one column for each pair of equally-named columns.

4.) Cross join

CROSS JOIN returns the Cartesian product of rows from tables in the join. In other words, it will produce rows which combine each row from the first table with each row from the second table.

5.)Outer joins

An outer join does not require each record in the two joined tables to have a matching record. The joined table retains each record—even if no other matching record exists. Outer joins subdivide further into left outer joins, right outer joins, and full outer joins, depending on which table(s) one retains the rows from (left, right, or both).

6.)Left outer join

The result of a left outer join (or simply left join) for table A and B always contains all records of the "left" table (A), even if the join-condition does not find any matching record in the "right" table (B).

7.)Right outer join

A right outer join (or right join) closely resembles a left outer join, except with the treatment of the tables reversed.

8.)Full outer join

Conceptually, a full outer join combines the effect of applying both left and right outer joins. ---------------------------------------------------------------------------------------------------------------------

16.) How do you select unique rows using SQL?

Using the “DISTINCT” clause. For example if you fire the below give SQL in “AdventureWorks” , first SQL will give you distinct values for cities , while the other will give you distinct rows.

select distinct city from person.address

select distinct * from person.address

17.) What is the difference between DELETE and TRUCATE?

Ans:-

DELETE:-It deletes all the rows but the space of the record remains

TRUCATE:- It deletes all the rows but the space of the record also removed.

18.) What is a subquery?

Ans:-

Subquery or Inner query or Nested query is a query in a query. A subquery is usually added in the WHERE Clause of the sql statement. Most of the time, a subquery is used when you know how to search for a value using a SELECT statement, but do not know the exact value.

Subqueries are an alternate way of returning data from multiple tables.

Subqueries can be used with the following sql statements along with the comparision operators like =, <, >, >=, <= etc.

SELECT INSERT UPDATE DELETE

For Example:

1) Usually, a subquery should return only one record, but sometimes it can also return multiple records when used with operators like IN, NOT IN in the where clause. The query would be like,

SELECT first_name, last_name, subject FROM student_details WHERE games NOT IN ('Cricket', 'Football');

19.) What is Data Mining? Explain the KDD process.-> data mining is the process of analyzing data from different perspectives and summarizing it into useful information. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified.

KDD (knowledge discovery from data). Data mining is a particular step in the KDD process.The term Knowledge Discovery in Databases, or KDD for short, refers to the broad process of finding knowledge in data, and emphasizes the "high-level" application of particular data mining methods.The unifying goal of the KDD process is to extract knowledge from data in the context of large databases.

20. )Explain the data mining application with real time enviornment?Ans:-1. Data-Mining Applications in Banking and Finance

In Banking it is used to model and predict credit fraud, to evaluate risk, to perform trend analysis, and to analyze profitability.

In the financial markets, neural networks have been used in stock-price forecasting,in option trading as well as in forecasting financial disasters.2. Data-Mining Applications in Retail

The early adoption of data warehousing by retailers has given them a better opportunity to take advantage of data mining. Large retail chains and

grocery stores store vast amounts of point-of-sale data that is information rich. 3. Data-Mining Applications in Healthcare4. Data-Mining Applications in Telecommunications

21.) Data mining techniquesAns:1.)descriptive data mining technigue:-from the characteristics ,relationship ,interconnectedness mining is done.a)Association

Association is one of the best known data mining technique. In association, a pattern is discovered based on a relationship of a particular item on other items in the same transaction.

b)ClassificationClassification is a classic data mining technique based on machine learning. Basically

classification is used to classify each item in a set of data into one of predefined set of classes or groups. Classification method makes use of mathematical techniques such as decision trees, linear programming, neural network and statistics.

c)ClusteringClustering is a data mining technique that makes meaningful or useful cluster of objects

that have similar characteristic using automatic technique.

d)PredictionThe prediction as it name implied is one of a data mining techniques that discovers

relationship between independent variables and relationship between dependent and independent variables. 2)predictive data mining technique:-From the results and previous observation mining is donea) decision tree

22. What is Apriori algorithm?(not)Ans:-Apriori is a classic algorithm for learning association rules.Apriori is designed to operate on databases containing transactions.

As is common in association rule mining, given a set of itemsets the algorithm attempts to find subsets which are common to at least a minimum number C of the itemsets. Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found.

The purpose of the Apriori Algorithm is to find associations between different sets of data. It is sometimes referred to as "Market Basket Analysis". Each set of data has a number of items and is called a transaction. The output of Apriori is sets of rules that tell us how often items are contained in sets of data.

86). Difference between Classfication and Clusering.Ans:- 1. Classification is a type of supervised learning (Background knowledge is known) and Clustering is a type of unsupervised learning(No such knowledge is known). 2. In general, in classification you have a set of predefined classes and want to know which class a new object belongs to. Clustering tries to group a set of objects and find whether there is some relationship between the objects. 3. New set of data used in classification and set of history transaction used in clustering.

23.) Give an example situation where Apriori algorithm will be of great use?

ANS:- In shopping Mall, if we want to buy milk as well as bread using market basket analysis.

24.) Define classification. Explain with example

ANS:- Classification is to predict that the member belongs to a particular class

Ex:- A bank loan officer needs analysis of the data in order to learn which loan application are “Safe” and which are “risky”. A marketing manager needs data analysis to make a decision.

25.)What is prediction?

ANS:- Prediction is to predict continuous value for the given input.

26.)What is Clustering?

ANS:- The process of organizing objects into groups whose members are similar.

27.)What is Association?

ANS:- It is able to reveal all interesting relationships, called associations, in a potentially large database.Classification using association rules combines association rule mining and classification, and is therefore concerned with finding rules that accurately predict a single target (class) variable. The key strength of association rule mining is that all interesting rules are found.28.) Differentiate database and data ware house

ANS:- All data warehouses are databases, not all databases are data warehouses.

Database Datawarehuse

1. Database contains a set of logically related data which is generally small in size as compared to datawarehouse.

1. In dataware house there are collection of all sorts of data and the datas are extracted only according to the customer's needs.

2.it is used for database design and process design

2.it is used for data modeling and database design

3. operations performed are insert,delete,update etc

3. operations performed are load and access.

4. it is designed (and optimized) to record 4. to be designed (and optimized) to respond to analysis questions that are critical for your business.

29.) Explain Join and Prune steps in Apriori Algorithm

ANS:-

Join: Lk-1 is a join of itself. This operation is performed where members of lk-1 are joinable if

their first (k-2) items are in common. Condition is L1[k-1]<L2[k-1] no duplicate are generated

the resulting items are formed by joining L1 and L2 is L1(1),L2(2),….,L1(k-2),L1(k-1),L2(k-1).

Prune: Ck is a superset of Lk. Its member may or not be frequent but all of the frequent k-

itemsets are included in Ck. To reduce the size of Ck apriori algo is used.

30.) What is Slicing, Dicing, Pivoting, Rollup and Rolldown ?

ANS:-

Slicing: The slice operation performs a selection of one dimension of given cube.

Dicing: The slice operation performs a selection of two dimension of given cube.

Pivoting: The pivot operation performs rotation of data

Rollup: The rollup operation performs aggregation of data cubes

Rolldown(drill down): It is reverse of roll up. It navigates from less detailed data to more

detailed data.

31-50

1. Explain data warehousing with real time example.

A data warehouse is a relational database that is designed for query and analysis rather

than for transaction processing. It usually contains historical data derived from transaction

data, but it can include data from other sources. It separates analysis workload from

transaction workload and enables an organization to consolidate data from several sources. In

addition to a relational database, a data warehouse environment includes an extraction,

transportation, transformation, and loading (ETL) solution, an online analytical processing

(OLAP) engine, client analysis tools, and other applications that manage the process of

gathering data and delivering it to business users.

Data warehouses are designed to help you analyze data. For example, to learn more about

your company's sales data, you can build a warehouse that concentrates on sales. Using this

warehouse, you can answer questions like "Who was our best customer for this item last

year?" This ability to define a data warehouse by subject matter, sales in this case , makes the

data warehouse subject oriented.

Some of the applications data warehousing can be used for are:

Decision support

Trend analysis

Financial forecasting

2. Explain ETL (Extraction Transformation Loading)

Extract, transform and load (ETL) is a process in database usage and especially in

data warehousing that involves:

Extracting data from outside sources

Transforming it to fit operational needs (which can include quality levels)

Loading it into the end target (database or data warehouse)

Short for extract, transform, load, three database functions that are combined into one tool to pull

data out of one database and place it into another database.

Extract -- the process of reading data from a database.

Transform -- the process of converting the extracted data from its previous form into the form it

needs to be in so that it can be placed into another database. Transformation occurs by using

rules or lookup tables or by combining the data with other data.

Load -- the process of writing the data into the target database.

ETL is used to migrate data from one database to another, to form data marts and data

warehouses and also to convert databases from one format or type to another.

3. How data mining differs from data warehousing?

The primary differences between data mining and data warehousing are the

system designs, methodology used, and the purpose. Data mining is the use of pattern

recognition logic to identity trends within a sample data set and extrapolate this

information against the larger data pool. Data warehousing is the process of extracting

and storing data to allow easier reporting.

Data mining is a general term used to describe a range of business processes that derive

patterns from data.

Data Mining provides the Enterprise with intelligence and Data Warehousing provides the

Enterprise with a memory.

Data warehousing is the process that is used to integrate and combine data from multiple

sources and format into a single unified schema. So it provides the enterprise with a storage

mechanism for its huge amount of data. On the other hand, Data mining is the process of

extracting interesting patterns and knowledge from huge amount of data. So we can apply data

mining techniques on the data warehouse of an enterprise to discover useful patterns.

Data mining is the process of sorting out and analyzing data in a data warehouse or data

mart.

Data warehousing is the aggregation of data from operational systems (mostly ERP

programs) to support data mining or business intelligence applications.

4. What is LDAP?

http://www.webopedia.com/TERM/D/data_warehouse.html

http://www.webopedia.com/TERM/D/data_warehouse.html

http://www.webopedia.com/TERM/D/data_mart.html

http://www.webopedia.com/TERM/D/data_migration.html

The Lightweight Directory Access Protocol is an application protocol for accessing and

maintaining distributed directory information services over an Internet Protocol (IP) network.[1]

LDAP is defined in terms of ASN.1 and transmitted using BER. Directory services may provide any

organized set of records, often with a hierarchical structure, such as a corporate electronic mail

directory. Similarly, a telephone directory is a list of subscribers with an address and a phone

number. Short for Lightweight Directory Access Protocol, a set of protocols for accessing

information directories. LDAP is based on the standards contained within the X.500 standard,

but is significantly simpler. And unlike X.500, LDAP supports TCP/IP, which is necessary for any

type of Internet access. Because it's a simpler version of X.500, LDAP is sometimes called X.500-

lite. Although not yet widely implemented, LDAP should eventually make it possible for almost

any application running on virtually any computer platform to obtain directory information, such

as email addresses and public keys. Because LDAP is an open protocol, applications need not

worry about the type of server hosting the directory.

5. Differentiate XML and HTML and XHTML

XML - stands for Extensible Markup Language

HTML - stands for HyperText Markup Language

XHTML - stands for Extensible HyperText Markup Language

HTML enables u to create documents and web pages that can be read by all web browsers. It

contains predefined tags

XML enables u to store data in a structured format by using meaningful tags (user defined tags).

XML is a cross-platform, h/w and s/w independent markup language.

XHTML has the same depth of expression as HTML, but also conforms to XML syntax.

XML is a syntax that is very generic and can be used for passing data if many different types for

many different applications. It is really just a data exchange format.

HTML is the markup language used by all browsers. When you view source, you will see html

with a bunch of embedded objects of other types.

http://www.webopedia.com/TERM/P/public_key_cryptography.html

http://www.webopedia.com/TERM/I/Internet.html

http://www.webopedia.com/TERM/T/TCP_IP.html

http://www.webopedia.com/TERM/X/X_500.html

http://www.webopedia.com/TERM/P/protocol.html

http://en.wikipedia.org/wiki/Telephone_directory

http://en.wikipedia.org/wiki/Electronic_mail

http://en.wikipedia.org/wiki/Directory_services

http://en.wikipedia.org/wiki/Basic_Encoding_Rules

http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One

http://en.wikipedia.org/wiki/Lightweight_Directory_Access_Protocol#cite_note-0

http://en.wikipedia.org/wiki/Internet_Protocol

http://en.wikipedia.org/wiki/Application_protocol

XHTML is just a combination of xml and html.

6. What is the specialty of MYSQL

Open source

Widely used

Written in C and C++.

Tested with a broad range of different compilers.

Works on many different platforms. See Section 2.4.2, “Operating Systems Supported by MySQL Community Server”.

Designed to be fully multi-threaded using kernel threads, to easily use multiple CPUs if they are available.

Provides transactional and nontransactional storage engines.

Designed to make it relatively easy to add other storage engines. This is useful if you want to provide an SQL interface for an in-house database.

Uses a very fast thread-based memory allocation system.

Executes very fast joins using an optimized nested-loop join.

Relational Database System

Client/Server Architecture

Supports Stored procedures,Triggers and Full-text search

Distinguishing features

MySQL implements the following features, which some other RDBMS systems may not:

http://en.wikipedia.org/wiki/Relational_database_management_system

http://dev.mysql.com/doc/refman/5.0/en/supported-os.html

http://dev.mysql.com/doc/refman/5.0/en/supported-os.html

Multiple storage engines, allowing one to choose the one that is most effective for each table in

the application (in MySQL 5.0, storage engines must be compiled in; in MySQL 5.1, storage

engines can be dynamically loaded at run time):

o Native storage engines (MyISAM, Falcon, Merge, Memory (heap), Federated, Archive,

CSV, Blackhole, Cluster, EXAMPLE, Maria, and InnoDB, which was made the default as of

5.5)

o Partner-developed storage engines (solidDB, NitroEDB, ScaleDB, TokuDB, Infobright

(formerly Brighthouse), Kickfire, XtraDB, IBM DB2).[31] InnoDB used to be a partner-

developed storage engine, but with recent acquisitions, Oracle now owns both MySQL

core and InnoDB.

o Community-developed storage engines (memcache engine, httpd, PBXT, Revision

Engine)

o Custom storage engines

Commit grouping, gathering multiple transactions from multiple connections together to

increase the number of commits per second. (PostgreSQL has an advanced form of this

functionality[32])

7. Explain data cube.

Users of decision support systems often see data in the form of data cubes. The cube is

used to represent data along some measure of interest. Although called a "cube", it can be 2-

dimensional, 3-dimensional, or higher-dimensional. Each dimension represents some attribute

in the database and the cells in the data cube represent the measure of interest

Data cubes are multidimensional extensions of 2-D tables, just as in geometry a cube is a

three-dimensional extension of a square. The word cube brings to mind a 3-D object, and we

can think of a 3-D data cube as being a set of similarly structured 2-D tables stacked on top of

one another.

But data cubes aren't restricted to just three dimensions. Most online analytical

processing (OLAP) systems can build data cubes with many more dimensions—

Microsoft SQL Server 2000 Analysis Services, for example, allows up to 64 dimensions.

http://www.computerworld.com/action/inform.do?command=search&searchTerms=Microsoft+SQL+Server

http://en.wikipedia.org/wiki/MySQL#cite_note-31

http://en.wikipedia.org/wiki/PostgreSQL

http://en.wikipedia.org/w/index.php?title=Revision_Engine&action=edit&redlink=1

http://en.wikipedia.org/w/index.php?title=Revision_Engine&action=edit&redlink=1

http://en.wikipedia.org/wiki/Web_server

http://en.wikipedia.org/w/index.php?title=Memcache_engine&action=edit&redlink=1

http://en.wikipedia.org/wiki/Oracle_Corporation

http://en.wikipedia.org/wiki/InnoDB

http://en.wikipedia.org/wiki/MySQL#cite_note-30

http://en.wikipedia.org/wiki/IBM_DB2

http://en.wikipedia.org/w/index.php?title=XtraDB&action=edit&redlink=1

http://en.wikipedia.org/wiki/Kickfire

http://en.wikipedia.org/wiki/Infobright

http://en.wikipedia.org/wiki/SolidDB

http://en.wikipedia.org/wiki/InnoDB

http://en.wikipedia.org/wiki/Maria_(storage_engine)

http://en.wikipedia.org/wiki/MySQL_Cluster

http://en.wikipedia.org/wiki/Comma-separated_values

http://en.wikipedia.org/wiki/MySQL_Archive

http://en.wikipedia.org/wiki/MySQL_Federated

http://en.wikipedia.org/wiki/Falcon_(storage_engine)

http://en.wikipedia.org/wiki/MyISAM

http://en.wikipedia.org/wiki/Run_time_(program_lifecycle_phase)

We can think of a 4-D data cube as consisting of a series of 3-D cubes, though visualizing

such higher-dimensional entities in spatial or geometric terms can be a problem.

In practice, therefore, we often construct data cubes with many dimensions, but

we tend to look at just three at a time. What makes data cubes so valuable is that we can

index the cube on one or more of its dimensions.

8. OLAP

In computing, online analytical processing, or OLAP is an approach to swiftly answer

multi-dimensional analytical (MDA) queries. OLAP tools enable users to interactively

analyze multidimensional data from multiple perspectives. On-Line Analytical Processing

(OLAP) is a category of software technology that enables analysts, managers and executives

to gain insight into data through fast, consistent, interactive access to a wide variety of

possible views of information that has been transformed from raw data to reflect the real

dimensionality of the enterprise as understood by the user.

OLAP functionality is characterized by dynamic multi-dimensional analysis of consolidated

enterprise data supporting end user analytical and navigational activities including:

calculations and modeling applied across dimensions, through hierarchies and/or across

members

trend analysis over sequential time periods

slicing subsets for on-screen viewing

drill-down to deeper levels of consolidation

reach-through to underlying detail data

rotation to new dimensional comparisons in the viewing area

Q. Information gain

The attribute with highest information gain is selected as the splitting attribute for node N .This

attribute minimizes the information needed to classify the tuples in the resulting partitions and

http://www.moulton.com/olap/olap.glossary.html#ANALYSIS,%20MULTI-DIMENSIONAL

http://en.wikipedia.org/w/index.php?title=Multi-dimensional_analytical&action=edit&redlink=1


reflects the least impurities in these partitions. This helps in minimizing the expected no of tests

required to classify a tuple.

Formula-

Info(D) = - ∑ i=1 to m pi log2(pi)

40..What are the different schemas in data cube ? Which is better ? Why ?

The foundation of each data warehouse is a relational database built using a dimensional model.

A dimensional model consists of dimension and fact tables and is typically described as star or

snowflake schema.

Star schema resembles a star; one or more fact tables are surrounded by the dimension tables.

Dimension tables aren't normalized - that means even if you have repeating fields such as name

or category no extra table is added to remove the redundancy. For example, in a car dealership

scenario you might have a product dimension that might look like this:

Product_key

Product_category

Product_subcategory

Product_brand

Product_make

Product_model

Product_year

In a relational system such design would be clearly unacceptable because product category (car,

van, truck) can be repeated for multiple vehicles and so could product brand (Toyota, Ford,

Nissan), product make (Camry, Corolla, Maxima) and model (LE, XLE, SE and so forth). So a

vehicle table in a relational system is likely to have foreign keys relating to vehicle category,

vehicle brand, vehicle make and vehicle model. However in the dimensional star schema model

you simply list out the names of each vehicle attribute.

Star schema also contains the entire dimension hierarchy within a single table. Dimension

hierarchy provides a way of aggregating data from the lowest to highest levels within a

dimension. For example, Camry LE and Camry XLE sales roll up to Camry make, Toyota brand

and cars category. Here is what a star schema diagram could look like:

Notice that each dimension table has a primary key. The fact table has foreign keys to each

dimension table. Although data warehouse does not require creating primary and foreign keys,

it is highly recommended to do so for two reasons:

Dimensional models that have primary and foreign keys provide superior performance,

especially for processing Analysis Services cubes.

Analysis Services requires creating either physical or logical relationships between fact and

dimension tables. Physical relationships are implemented through primary and foreign keys.

Therefore if the keys exist you save a step when building cubes.

Snowflake schema resembles a snowflake because dimension tables are further normalized or

have parent tables. For example we could extend the product dimension in the dealership

warehouse to have a product_category and product_subcategory tables. Product categories

could include trucks, vans, sport utility vehicles, etc. Product subcategory tables could contain

subcategories such as leisure vehicles, recreational vehicles, luxury vehicles, industrial trucks

and so forth. Here is what the snowflake schema would look like with extended product

http://sqlserverpedia.com/wiki/File:ASDW3_138.gif

dimension:

Snowflake schema generates more joins than a star schema during cube processing, which

translates into longer queries. Therefore it is normally recommended to choose the star schema

design over the snowflake schema for optimal performance. Snowflake schema does have an

advantage of providing more flexibility, however. For example, if you were working for an auto

parts store chain you might wish to report on car parts (car doors, hoods, engines) as well as

subparts (door knobs, hood covers, timing belts and so forth). In such cases you could have both

part and subpart dimensions, however some attributes of subparts might not apply to parts and

vise versa. For example, you could examine the thread size attribute would apply to a tire but

not for nuts and bolts that go on the tire. If you wish to aggregate your sales by part you will

need to know which subparts should rollup to each part as in the following:

Dim_subpart

subpart_key

subpart_name

subpart_SKU

subpart_size

subpart_weight

subpart_color

part_key

Dim_part

part_key

part_name

part_SKU

With such a design you could create reports that show you a breakdown of your sales by each

type of engine, as well as each part that makes up the engine.

41. What is ASP?

also known as Classic ASP or ASP Classic, was Microsoft's first server-side script engine for

dynamically generated web pages. Initially released as an add-on to Internet Information

Services (IIS) via the Windows NT 4.0 Option Pack (ca. 1998), it was subsequently included as a

free component of Windows Server (since the initial release of Windows 2000 Server). ASP.NET

has superseded ASP.

ASP 2.0 provided six built-in objects: Application, ASPError, Request, Response, Server, and

Session. Session, for example, represents a session that maintains the state of variables from

page to page[1]. The Active Scripting engine's support of the Component Object Model (COM)

enables ASP websites to access functionality in compiled libraries such as DLLs.

42.Why do we require SQL server ?

Microsoft SQL Server is a relational database server, developed by Microsoft: it is a software

product whose primary function is to store and retrieve data as requested by other software

applications, be it those on the same computer or those running on another computer across a

network (including the Internet). There are at least a dozen different editions of Microsoft SQL

Server aimed at different audiences and for different workloads (ranging from small applications

that store and retrieve data on the same computer, to millions of users and computers that

access huge amounts of data from the Internet at the same time).

True to its name, Microsoft SQL Server's primary query languages are T-SQL and ANSI SQL.

FEATURES:

DATA STORAGE

BUFFER MANGEMENT

LOGGING AND TRANSECTION

CONCURRENCY AND CONTROL

43. What is a style sheet?

It describes the presentation of structured documents

A Style sheet is a feature in desktop publishing programs that store and apply formatting to text.

Style sheets are a form of separation of presentation and content: it creates a separate

abstraction to keep the presentation isolated from the text data.

Style sheets are a common feature in most popular desktop publishing and word processing

programs, including Arbortext, Corel Ventura, Adobe InDesign, Scribus, PageMaker, QuarkXPress

and Microsoft Word, though they may be referred to using slightly different terminology.

Individual styles are created by the user and may include a wide variety of commands that

dictate how a selected portion of text is formatted:

Typeface or font

Boldfacing

Italicizing

Underlining

Justification (left, right, center, justify, force justify)

Space before and after paragraphs

Tab stops and indentation

Type size

Leading

Kerning

Tracking

Color

Borders or strokes

Superscript or subscript

Dropcaps

Letter case

Strike through

Outline font style

44. How ASP differs from JSP?

ASP:-

1-asp stands for Active Server Pages a server side scripting technology made by microsoft and it

use Visual Basic language. in short if you know how to program in Visual Basic then you can

easily makae asp pages.

ps. ASP .NET the new age of ASP work on several programming language. mostly on VB . NET

and C#

2-theoretically you can connect to any database through ADO if you used the required addon.

though asp fully supports connecting to access database (no server required) and support also

MS SQL connection.

3-mostly it works on Microsoft IIS server. other servers can support asp also but mostly it works

on IIS web server on windows machine.

so system,web server,DBMS and scripting language would be Microsoft product. in another

saying every thing would be Microsoft made bundle.

4-because neither windows now IIS is free, ASP is not a free language.

JSP:

1-on the other hand jsp technology is not microsoft made thing. it stands for Java Server Pages.

and developed by Sun. it uses Java language as the scripting language. and accordingly if you

know Java then you can easily start creating your website in jsp

2-jsp is not known to be twined with any database. so you can connect to any database system

simply by loading the driver for the database then connect. I use mysql my self when coding in

jsp.

3-jsp website is mostly supported by Apache Tomcat web server and work mostly on Linux

based web server.But it also runs on JBOSS and IBM application servers. JSP is definitely best if

your running a linux or unix server, and should work more reliability over all. It can take a bit of

work to configure tomcat to work with Apache though.

4-because Linux, Apache tomcat is free, you can develop you own jsp website without having to

pay a dim.

45. What are the features of Object relational data base ?

Object-relational databases offer higher level of abstraction over the problem domain. They

extend relational databases with object-oriented features to minimise the gap between

relational and object representation of application data, known as the impedance mismatch

problem.

An object-relational database (ORD), or object-relational database management system

(ORDBMS), is a database management system (DBMS) similar to a relational database, but with

an object-oriented database model: objects, classes and inheritance are directly supported in

database schemas and in the query language. In addition, just as with proper relational systems,

it supports extension of the data model with custom data-types and methods.

Example of an object-oriented database model.[1]

An object-relational database can be said to provide a middle ground between relational

databases and object-oriented databases (OODBMS). In object-relational databases, the

approach is essentially that of relational databases: the data resides in the database and is

manipulated collectively with queries in a query language; at the other extreme are OODBMSes

in which the database is essentially a persistent object store for software written in an object-

oriented programming language, with a programming API for storing and retrieving objects, and

little or no specific support for querying.

46. Explain Well formed XML. How to create it?

Well formed XML documents simply markup pages with descriptive tags. You don't need to

describe or explain what these tags mean. In other words a well formed XML document does

not need a DTD, but is must conform to the XML syntax rules. If all tags in a document are

correctly formed and follow XML guidelines, then a document is considered as well formed.

Syntax is the Grammar of a language. For a document in XML to be well formed, it must obey

the following most important rules:

XML documents must contain at least one element. In this example "Tootsie" is not well formed,

because it is not marked up as an element within angle brackets.

http://en.wikipedia.org/wiki/File:Object-Oriented_Model.svg

Well Formed Not Well Formed

<title>Tootsie</title>

"Tootsie"

XML documents must contain a unique opening and closing tag that contains the whole

document, forming what is called a root element. In this example, the second column is not well

formed because it lacks a root element as in the first column:

<videocollection>...</videocollection>


<videocollection>


<title>Jurassic Park</title>

<title>Mission Impossible</title>

</videocollection>>




All other tags must be nested properly, i.e. there must be an opening and a closing tag and the

tags cannot overlap. The tags that in HTML would normally stand alone, such as <img> or <br>

Tag are called "empty Tags" when used in an XML document In XML empty Tags look like this:

e.g.: <BR/>.

</title... has no closing angle bracket, therfore the tag is not complete!

</title)...has a wrong closing bracket, therfore the tag is not complete!

In the second example the tags are not properly nested.


<videocollection>




</videocollection>

<videocollection>


</videocollection>

<videocollection>

<title>Tootsie</title

<title>Jurassic Park</title)


</videocollection >

<videocollection>

<title>Tootsie

</videocollection></title>

Tags in XML are case sensitive, that means that <CREW>, <Crew> and <crew> are not the same.

The XML processing instruction must be all lowercase. But keywords in DTDs must be all

UPPERCASE, such as ELEMENT, ATTLIST, #REQUIRED, #IMPLIED, NMTOKEN, ID, etc. However,

your own elements and attributes may be any case you choose, as long as you are consistent.


<crew>Sydney Pollak</crew>

<CREW>Sydney Pollak</crew>

<crew>Sydney Pollak</Crew>

Attribute values must always be quoted (as opposed to HTML).


<title id="1">Tootsie</title>

<title id="1>Tootsie</title>

<title id=1>Tootsie</title>

These are just some examples to for the above mentioned well formdness constraints, which are

only the most important, but by far not complete. Please check the XML Specifications for a

complete knowledge.

To check whether a document is well-formed or not you should

only checked whether the document is properly marked up according to XML syntax rules(well-

formed or not).

In our example the following section of XML data is Well Formed.

<?xml version="1.0"?>

<videocollection>

<title id="1">Tootsie</title>

<genre>comedy</genre>

<year>1982</year>

<language>English</language>

<cast>Dustin Hoffman</cast>

<cast>Jessica Lang</cast>

<cast>Teri Gar</cast>

<cast>Sydney Pollak</cast>

<crew>

<director>Sydney Pollak</director>

</crew>

<title id="2">Jurassic Park</title>

<genre>science fiction</genre>

<year>1993</year>


<cast>Sam Neil</cast>

<cast>Laura Dern</cast>

<cast>Jeff Goldblum</cast>

<crew>

<director>Steven Spielberg</director>

</crew>

<title id="3">Mission Impossible</title>

<genre>action</genre>

<year>1996</year>


<cast>Tom Cruise</cast>

<cast>Jon Voight</cast>

<cast>Emmanuelle Beart</cast>

<cast>Jean Reno</cast>

<crew>

<director>Brian de Palma</director>

</crew>

</videocollection>

In the chapter on Parser you can check whether this document is really well-formed, using a

parser.

47.Why do we need XML Name spaces?

* Namespaces are used primarily to avoid conflicts between element names when mixing XML

languages.

* XML namespaces help contextualize elements an attributes, among other things. It also offers

a precise identification for a particular element or attribute.

* Namespaces let you reduce ambiguity when there are duplicates. You could have a <title> tag

that refers to authors and <title> tag that refers to a salutation, like Mr., Mrs. etc. To

differentiate, you could assign them to different namespaces.

You can also use namespaces when validating documents for conformance to a particular

standard/restrictions, where the namespace would indicate to what "Schema" that the

document is belonging to.

48. Which is the most popular protocol that allows cross computer communications?

TCP/IP

50. What do you mean by “Extensible” in XML?

As we know that like html we have inbuild tags in XML,we can use that but XML has additional functionality to make its own user defined tags. These property of defining user defined tags explains extensible.

51. What is an URL?

Uniform Resource Locator: In computing, a uniform resource locator (URL) is a specific character string that constitutes a reference to an Internet resource.

52. What is database or database management system?

Basically Database is repository where all data related to a particular system is stored. The management of storing and manipulating the data in the database is database management system. Oracle is an example of database management system. It becomes necessary when the data becomes large in database and needs to be retrived soon as possible.

53. What is the difference between files and database? Can files qualify as a database?

File is a type of storage for a data. We can store data as per sections in proper files. Files sizes are predefined so data to be stored should also be of that particular size.It is a kind of database storage. On the other hand database is storage of data in various

different formats. Files can qualify as a database.

54. Is XML case sensitive?

Yes, XML is case sensitive.

55. What is valid XML?

A valid XML document is defined by the W3C as a well-formed XML document which also conforms to the rules of a Document Type Definition (DTD) or an XML Schema (XSD), which W3C supports as an alternate to DTD.

56. What is XQuery?

XQuery is designed to query XML data - not just XML files, but anything that can appear as XML, including databases.It is basically used to extract data from XML files.

57. What are Fact tables and Dimensional tables?

In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is often located at the center of a star schema or a snowflake schema, surrounded by dimension tables.

Fact tables provide the (usually) additive values that act as independent variables by which dimensional attributes are analyzed. Fact tables are often defined by their grain. The grain of a fact table represents the most atomic level by which the facts may be defined. The grain of a SALES fact table might be stated as "Sales volume by Day by Product by Store". Each record in this fact table is therefore uniquely defined by a day, product and store. Other dimensions might be members of this fact table (such as location/region) but these add nothing to the uniqueness of the fact records.

In data warehousing, a dimension table is one of the set of companion tables to a fact table. Dimension table rows are uniquely identified by a single key field. It is recommended that the key field is a simple integer for the reason that key value is meaningless and is only used to be join fields between the fact and dimension tables.

58. What is Data Purging?

Information sources may be unreliable and may purge data. Purging or sanitising is the removal of sensitive data from a system or storage device with the intent that the data can not be reconstructed by any known technique. Inshort Data purging is technique use to remove the data from database, the data which is sensitive or may be unused for a long time.

59. What is an OLAP and OLTP?a. OLAP : OLAP stands for Online Analytical Processing . Historical data is

processed. Few users like managers. Complex queries.b. Types of OLAP:

http://en.wikipedia.org/wiki/Fact_table

http://en.wikipedia.org/wiki/Fact_table

http://en.wikipedia.org/wiki/Data_warehousing

http://en.wikipedia.org/wiki/Dimension_table

http://en.wikipedia.org/wiki/Snowflake_schema

http://en.wikipedia.org/wiki/Star_schema

http://en.wikipedia.org/wiki/Business_process

http://en.wikipedia.org/wiki/Fact_(data_warehouse)

http://en.wikipedia.org/wiki/Data_warehousing

http://en.wikipedia.org/wiki/XML_Schema_(W3C)

http://en.wikipedia.org/wiki/Document_Type_Definition

http://en.wikipedia.org/wiki/Well-formed_XML_document

http://en.wikipedia.org/wiki/W3C

http://en.wikipedia.org/wiki/XML

i. Multidimensional (MOLAP)ii. Relational (ROLAP)

iii. Hybrid (HOLAP)c. OLTP : OLTP stands for Online Transaction Processing. Data is up-to-date.

More number of users (thousands). Simple queries are evaluated.d.

60. What are the primary ways to store a data in OLAP?

61. How do you plan a DataWare house project?a. Project organizationb. Gathering user requirementsc. Develop detailed project scope and pland. Modelinge. Data design

62. What are the different stages of data mining?a. Data integrationb. Data Selectionc. Data cleaningd. Data Transformatione. Data Miningf. Pattern evaluationg. Decisions

63. How will you optimize a Query ?a. Use of precise keywordsb. Optimize queries and stored proceduresc. Add, remove, modify indexesd. Move Queries to Stored procedurese. Remove unneeded views.

64. What is forecaseting?a. Forecasting refers to prediction. In Data mining various regression techniques are used

for prediction.i. Regression can be linear or non linear.

65. Explain text mininga. refers to the process of deriving high-quality information from text.b. Text mining usually involves the process of structuring the input textc. Software and applications

i. Text mining methods and software is also being researched and developed by major firms, including IBM and Microsoft, to further automate the mining and

http://en.wikipedia.org/wiki/Information

analysis processes, and by different firms working in the area of search and indexing in general as a way to improve their results.

66. What is a Semi Structured Data ?

It has following properties:

organised in semantic entities similar entities are grouped together entities in same group may not have same attributes order of attributes not necessarily important not all attributes may be required size of same attributes in a group may differ type of same attributes in a group may differ

67. What is meta data?

The term metadata is an ambiguous term which is used for two fundamentally different concepts (types). Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at design time the application contains no data. In this case the correct description would be "data about the containers of data".

68. What are data marts?

A data mart is the access layer of the data warehouse environment that is used to get data out to the users. The data mart is a subset of the data warehouse which is usually oriented to a specific business line or team. In some deployments, each department or business unit is considered the owner of its data mart including all the hardware, software and data.

Reasons for creating a data mart

Easy access to frequently needed data Creates collective view by a group of users Improves end-user response time Ease of creation Lower cost than implementing a full data warehouse Potential users are more clearly defined than in a full data warehouse

69. Compare OLTP and Data warehouse databases.

OLTP (online transaction processing) is a class of program that facilitates and manages transaction-oriented applications, typically for data entry and retrieval transactions in a number of industries, including banking, airlines, mailorder, supermarkets, and manufacturers. Probably the most widely installed OLTP product is IBM's CICS (Customer Information Control System).

http://searchdatacenter.techtarget.com/definition/CICS

http://searchcio.techtarget.com/definition/transaction

http://en.wikipedia.org/wiki/Data_warehouse

http://en.wikipedia.org/wiki/Metadata#Metadata_types

Today's online transaction processing increasingly requires support for transactions that span a network and may include more than one company.

In computing, a data warehouse (DW) is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting. Data warehouses can be subdivided into data marts. Data marts store subsets of data from a warehouse.

70. Explain the process of candidate generation.

The set of frequent 1-itemsets, L1, consists of the candidate 1-itemsets satisfying minimum support. In the first iteration of the algorithm, each item is a member of the set of candidate.

To discover the set of frequent 2-itemsets, L2, the algorithm uses L1 Join L1to generate a candidate set of 2-itemsets, C2. Next, the transactions in D are scanned and the support count for each candidate itemset in C2 is accumulated (as shown in the middle table).The set of frequent 2-itemsets, L2, is then determined, consisting of those candidate 2-itemsets in C2having minimum support.

The generation of the set of candidate 3-itemsets, C3, involves use of the Apriori Property.

In order to find C3, we compute L2JoinL2.

C3= L2 JoinL2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}.

Now, Join stepis complete and Prune stepwill be used to reduce the size of C3. Prune step helps to avoid heavy computation due to large Ck.

Based on the Apriori propertythat all subsets of a frequent itemset must also be frequent, we can determine that four latter candidates cannot possibly be frequent.

For example , lets take {I1, I2, I3}.The 2-item subsets of it are {I1, I2}, {I1, I3} & {I2, I3}. Since all 2-item subsets of {I1, I2, I3} are members of L2, We will keep {I1, I2, I3} in C3.

Lets take another example of {I2, I3, I5}which shows how the pruning is performed. The 2-item subsets are {I2, I3}, {I2, I5} & {I3,I5}.

BUT, {I3, I5} is not a member of L2and hence it is not frequent violating Apriori Property. Thus We will have to remove {I2, I3, I5} from C3.

Therefore, C3= {{I1, I2, I3}, {I1, I2, I5}} after checking for all members of result of Join operationfor Pruning.

Now, the transactions in D are scanned in order to determine L3, consisting of those candidates 3-itemsets in C3having minimum support.

http://en.wikipedia.org/wiki/Data_mart

http://en.wikipedia.org/wiki/Operational_data_store

http://en.wikipedia.org/wiki/Uploading_and_downloading

http://en.wikipedia.org/wiki/Reporting



The algorithm uses L3 JoinL3to generate a candidate set of 4-itemsets, C4. Although the join results in {{I1, I2, I3, I5}}, this itemset is pruned since its subset {{I2, I3, I5}}is not frequent.

Thus, C4= φ, and algorithm terminates, having found all of the frequent items.

71. What is the difference between classification and prediction techniques?

Classification: Classification analysis is the organization of data in given classes. Also known as supervised classification, the classification uses given class labels to order the objects in the data collection. Classification approaches normally use a training set where all objects are already associated with known class labels. The classification algorithm learns from the training set and builds a model. The model is used to classify new objects. For example, after starting a credit policy, the OurVideoStore managers could analyze the customers� behaviours vis-à-vis their credit, and label accordingly the customers who received credits with three possible labels "safe", "risky" and "very risky". The classification analysis would generate a model that could be used to either accept or reject credit requests in the future.

Prediction: Prediction has attracted considerable attention given the potential implications of successful forecasting in a business context. There are two major types of predictions: one can either try to predict some unavailable data values or pending trends, or predict a class label for some data. The latter is tied to classification. Once a classification model is built based on a training set, the class label of an object can be foreseen based on the attribute values of the object and the attribute values of the classes. Prediction is however more often referred to the forecast of missing numerical values, or increase/ decrease trends in time related data. �The major idea is to use a large number of past values to consider probable future values.

72 . What is the difference between Databases and Datawarehouse?

Data Base :

A Computer Database is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a database model. The model in most common use today is the relational model. Other models such as the hierarchical model and the network model use a more explicit representation of relationships (see below for explanation of the various database models).

A computer database relies upon software to organize the storage of data. This software is known as a database management system (DBMS). best examples are Mysql, Oracle etc.

Data Warehouse :

A data warehouse is a repository of an organization's electronically stored data. Data warehouses are designed to facilitate reporting and analysis.However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform, and load data into the repository, and tools to manage and retrieve meta data.

Here we're talking about Gigabytes of data.. usually minimum DataWarehose size will be around 20 to 40 GB

Oracle supports Data Warehousing as well.

73.What is a Relational Database?

A relational database is a set of tables containing data fitted into predefined categories. Each table

(which is sometimes called a relation) contains one or more data categories in columns. Each row

contains a unique instance of data for the categories defined by the columns.

74. What are the different kinds of data on which mining can be performed?

In principle, data mining is not specific to one type of media or data. Data mining should be applicable to

any kind of information repository. However, algorithms and approaches may differ when applied to

different types of data. Indeed, the challenges presented by different types of data vary significantly.

Data mining is being put into use and studied for databases, including relational databases, object-

relational databases and object-oriented databases, data warehouses, transactional databases,

unstructured and semi-structured repositories such as the World Wide Web, advanced databases such

as spatial databases, multimedia databases, time-series databases and textual databases, and even flat

files.

http://searchoracle.techtarget.com/definition/row

75.What are the 7 steps of KDD?

The Knowledge Discovery in Databases process comprises of a few steps leading from raw data collections to some form of new knowledge. The iterative process consists of the following steps:

Data cleaning: also known as data cleansing, it is a phase in which noise data and irrelevant data are removed from the collection.

Data integration: at this stage, multiple data sources, often heterogeneous, may be combined in a common source.

Data selection:� at this step, the data relevant to the analysis is decided on and retrieved from the data collection.

Data transformation: also known as data consolidation, it is a phase in which the selected data is transformed into forms appropriate for the mining procedure.

Data mining:� it is the crucial step in which clever techniques are applied to extract patterns potentially useful.

Pattern evaluation:� in this step, strictly interesting patterns representing knowledge are identified based on given measures.

Knowledge representation: is the final phase in which the discovered knowledge is visually represented to the user. This essential step uses visualization techniques to help users understand and interpret the data mining results.

76 .What is cluster analysis?

Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that

the objects in the same cluster are more similar (in some sense or another) to each other than to those

in other clusters.

77.What is minimum support and minimum confidence threshold?

Minimum support is related with minimum frequency that is required to select a data set and minimum

confidence threshold is minimum confidence in term of percentage that is necessary to select

transaction.

78. Why Data Preprocessing is important?

Data pre-processing is an often neglected but important step in the data mining process. The phrase "garbage in, garbage out" is particularly applicable to data mining and machine learning projects. Data-gathering methods are often loosely controlled, resulting in out-of-range values (e.g., Income: −100), impossible data combinations (e.g., Gender: Male, Pregnant: Yes), missing values, etc. Analyzing data that has not been carefully screened for such problems can produce misleading results. Thus, the representation and quality of data is first and foremost before running an analysis.

If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more difficult. Data preparation and filtering steps can take considerable amount of processing time. Data pre-processing includes cleaning, normalization, transformation, feature extraction and selection, etc. The product of data pre-processing is the final training set. Kotsiantis et al. (2006) present a well-known algorithm for each step of data pre-processing.

79.How will u handle missing values in data?

Eliminate data object

Estimated

80. What are the Binning methods for data smoothing?

Smoothing by bin boundaries

Smoothing by bin mean values

http://en.wikipedia.org/wiki/Training_set

http://en.wikipedia.org/wiki/Feature_extraction

http://en.wikipedia.org/wiki/Data_cleaning

http://en.wikipedia.org/wiki/Knowledge_discovery

http://en.wikipedia.org/wiki/Missing_values

http://en.wikipedia.org/wiki/Missing_values

http://en.wikipedia.org/wiki/Machine_learning

http://en.wikipedia.org/wiki/Data_mining

http://en.wikipedia.org/wiki/GIGO

Smoothing by bin

81. Data Integration and Data Transformation?

Data integration: at this stage, multiple data sources, often heterogeneous, may be combined in a common source. Data transformation: also known as data consolidation, it is a phase in which the selected

82. Strategies for Data Reduction?

-> 1. Data cube aggregation

2. Dimensionality reduction — e.g., remove unimportant attributes

3. Data Compression--- e.g, String compression, Audio/video compression

4. Numerosity reduction — e.g., fit data into models

5. Discretization and concept hierarchy generation

83.Star, Snowflake and Fact Constellation for Multidimensional databases. Explain.

The foundation of each data warehouse is a relational database built using a dimensional model. A dimensional model consists of dimension and fact tables and is typically described as star or snowflake schema.

Star schema resembles a star; one or more fact tables are surrounded by the dimension tables. Dimension tables aren't normalized - that means even if you have repeating fields such as name or category no extra table is added to remove the redundancy. For example, in a car dealership scenario you might have a product dimension that might look like this:

Product_keyProduct_categoryProduct_subcategoryProduct_brandProduct_makeProduct_modelProduct_year

In a relational system such design would be clearly unacceptable because product category (car, van, truck) can be repeated for multiple vehicles and so could product brand (Toyota, Ford, Nissan), product make (Camry, Corolla, Maxima) and model (LE, XLE, SE and so forth). So a vehicle table in a relational system is likely to have foreign

keys relating to vehicle category, vehicle brand, vehicle make and vehicle model. However in the dimensional star schema model you simply list out the names of each vehicle attribute.

Star schema also contains the entire dimension hierarchy within a single table. Dimension hierarchy provides a way of aggregating data from the lowest to highest levels within a dimension. For example, Camry LE and Camry XLE sales roll up to Camry make, Toyota brand and cars category. Here is what a star schema diagram could look like:

Notice that each dimension table has a primary key. The fact table has foreign keys to each dimension table. Although data warehouse does not require creating primary and foreign keys, it is highly recommended to do so for two reasons:

1. Dimensional models that have primary and foreign keys provide superior performance, especially for processing Analysis Services cubes.

2. Analysis Services requires creating either physical or logical relationships between fact and dimension tables. Physical relationships are implemented through primary and foreign keys. Therefore if the keys exist you save a step when building cubes.

Snowflake schema resembles a snowflake because dimension tables are further normalized or have parent tables. For example we could extend the product dimension in the dealership warehouse to have a product_category and product_subcategory tables. Product categories could include trucks, vans, sport utility vehicles, etc. Product subcategory tables could contain subcategories such as leisure vehicles, recreational vehicles, luxury vehicles, industrial trucks and so forth. Here is what the snowflake schema would look like with extended product dimension:File:ASDW3 139.gif

Snowflake schema generates more joins than a star schema during cube processing, which translates into longer queries. Therefore it is normally recommended to choose the star schema design over the snowflake schema for optimal performance. Snowflake schema does have an advantage of providing more flexibility, however. For example, if you were working for an auto parts store chain you might wish to report on car parts (car doors, hoods, engines) as well as subparts (door knobs, hood covers, timing belts and so forth). In such cases you could have both part and subpart dimensions, however some attributes of subparts might not apply to parts and vise versa. For example, you could examine the thread size attribute would apply to a tire but not for nuts and bolts that go on the

http://sqlserverpedia.com/wiki/File:ASDW3_138.gif

tire. If you wish to aggregate your sales by part you will need to know which subparts should rollup to each part as in the following:

Dim_subpartsubpart_keysubpart_namesubpart_SKUsubpart_sizesubpart_weightsubpart_colorpart_key

Dim_partpart_keypart_namepart_SKU

With such a design you could create reports that show you a breakdown of your sales by each type of engine, as

well as each part that makes up the engine.

84.What are the various OLAP operations in multidimensional data.

OLAP operations

The analyst can understand the meaning contained in the databases using multi-dimensional analysis. By aligning the data content with the analyst's mental model, the chances of confusion and erroneous interpretations are reduced. The analyst can navigate through the database and screen for a particular subset of the data, changing the data's orientations and defining analytical calculations.[6] The user-initiated process of navigating by calling for page displays interactively, through the specification of slices via rotations and drill down/up is sometimes called "slice and dice". Common operations include slice and dice, drill down, roll up, and pivot.

OLAP slicing

Slice: A slice is a subset of a multi-dimensional array corresponding to a single value for one or more members of the dimensions not in the subset.[6] The picture shows a slicing operation: The sales figures of all sales regions and all product categories of the company in the year 2004 are "sliced" out the data cube.

http://en.wikipedia.org/wiki/File:OLAP_slicing.png

http://en.wikipedia.org/wiki/File:OLAP_slicing.png

OLAP dicing

Dice: The dice operation is a slice on more than two dimensions of a data cube (or more than two consecutive slices).[7] The picture shows a dicing operation: The new cube shows the sales figures of a limited number of product categories, the time and region dimensions cover the same range as before.

OLAP Drill-up and drill-down

Drill Down/Up: Drilling down or up is a specific analytical technique whereby the user navigates among levels of data ranging from the most summarized (up) to the most detailed (down).[6] The picture shows a drilling operation: There's a better understanding of the sales figures of the product category "Outdoor-Schutzausrüstung" since you now see the sales figures for the single products of this category.

Roll-up: A roll-up involves computing all of the data relationships for one or more dimensions. To do this, a computational relationship or formula might be defined.[6]

OLAP pivoting

http://en.wikipedia.org/wiki/File:OLAP_dicing.png

http://en.wikipedia.org/wiki/File:OLAP_dicing.png

http://en.wikipedia.org/wiki/File:OLAP_drill_up&down.png

http://en.wikipedia.org/wiki/File:OLAP_drill_up&down.png

http://en.wikipedia.org/wiki/File:OLAP_pivoting.png

http://en.wikipedia.org/wiki/File:OLAP_pivoting.png

Pivot: This operation is also called rotate operation. It rotates the data in order to provide an alternative presentation of data - the report or page display takes a different dimensional orientation.[6] The picture shows a pivoting operation: The whole cube is rotated, giving another perspective on the data.

85. Explain 3-Tier data warehousing architecture.

Three-Tier Architecture of Data Warehouse

Client:-

* GUI/Presentation logic* Query specification* Data Analysis* Report formatting* Data access

Application/Data Mart Server:-

* Summarizing* Filtering* Meta Data* Multidimensional view* Data access

DW Server:-

* Data logic* Data services* Meta data* File services

http://4.bp.blogspot.com/_TEvs4lcxiUI/Rv_hTNK6h-I/AAAAAAAAACI/_lcCpJmkDyg/s1600-h/3tadw.JPG

answers of adbms

Documents