sound4lovers.files.wordpress.com€¦  · web viewunit –ii. data warehousing is used for. 1)...

24
UNIT –II Data warehousing is used for 1) Validation 2) tactical reporting 3) exploration. Validation Validation is the where the user community validates with data what they already believe to be true. For example, Denver consumers buy products differently than New York City consumers. New York folks tend to purchase a candy bar on a whim (city population buying patterns), where Denver folks are less likely to do so (rural population buying patterns). This has been hypothesized for years, but empirical data shows it to be true. Another example is the question "Who are my best customers?" Once we get past the definition of "best", I bet that most of the user community already knows the answer. The biggest part is getting past the definition of "best". The estimate is about 45% of the usage of the data warehouse is validation Tactical Reporting Tactical reporting is where the user community uses the data for a tactical reason. For example, salesperson Daffy Duck of Acme Corp. is going to visit customer Wylie T. Coyote and he wants to know what customer Wylie T. Coyote bought during the last year. There is no comparison of customer Wylie T. Coyote and customer Roadrunner to see if there is anything that might suggest new products to sell. The estimate is about 40% of the usage of the data warehouse is for tactical reporting. Exploration Exploration is where you search for ideas or knowledge that you did not know before. This is where data mining techniques (e.g., association, classification, genetic algorithms) and applications (e.g., market basket analysis, fraud detection) come into play. The estimate is about 15% of the usage of the data warehouse is for exploration.

Upload: others

Post on 29-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

UNIT –II

Data warehousing is used for

1) Validation 2) tactical reporting 3) exploration.

Validation

Validation is the where the user community validates with data what they already believe to be true. For example, Denver consumers buy products differently than New York City consumers. New York folks tend to purchase a candy bar on a whim (city population buying patterns), where Denver folks are less likely to do so (rural population buying patterns). This has been hypothesized for years, but empirical data shows it to be true. Another example is the question "Who are my best customers?" Once we get past the definition of "best", I bet that most of the user community already knows the answer. The biggest part is getting past the definition of "best". The estimate is about 45% of the usage of the data warehouse is validation

Tactical Reporting

Tactical reporting is where the user community uses the data for a tactical reason. For example, salesperson Daffy Duck of Acme Corp. is going to visit customer Wylie T. Coyote and he wants to know what customer Wylie T. Coyote bought during the last year. There is no comparison of customer Wylie T. Coyote and customer Roadrunner to see if there is anything that might suggest new products to sell. The estimate is about 40% of the usage of the data warehouse is for tactical reporting.

Exploration

Exploration is where you search for ideas or knowledge that you did not know before. This is where data mining techniques (e.g., association, classification, genetic algorithms) and applications (e.g., market basket analysis, fraud detection) come into play. The estimate is about 15% of the usage of the data warehouse is for exploration.

Query and reporting

A querying and reporting tool helps you run regular reports, create organized listings, and perform cross-tabular reporting and querying.

Query and Reporting Tools

1. Production reporting toolsCompanies generate regular operational reports or supports high volume batch jobs such as printing pay cheques

2. Report WritersUsers design and run reports without having to rely on IS department.

3. Managed Query Tools

Page 2: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

It shield end users from the complexities of SQL and database structures by inserting a metalayer between users and database

Data Analysis

Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.

The process of data analysis

Data science process flowchart

Data analysis is a process for obtaining raw data and converting it into information useful for decision-making by users.

There are several phases that can be distinguished. The phases are iterative, in that feedback from later phases may result in additional work in earlier phases.

Data requirementsThe data necessary as inputs to the analysis are specified based upon the requirements of those directing the analysis or customers who will use the finished product of the analysis. The general type of entity upon which the data will be collected is referred to as an experimental unit (e.g., a person or population of people). Specific variables regarding a population (e.g., age and income) may be specified and obtained. Data may be numerical or categorical (i.e., a text label for numbers).

Data collection

Page 3: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

Data is collected from a variety of sources. The requirements may be communicated by analysts to custodians of the data, such as information technology personnel within an organization. The data may also be collected from sensors in the environment, such as traffic cameras, satellites, recording devices, etc. It may also be obtained through interviews, downloads from online sources, or reading documentation.

Data processing

The phases of the intelligence cycle used to convert raw information into actionable intelligence or knowledge are conceptually similar to the phases in data analysis.

Data initially obtained must be processed or organized for analysis. For instance, this may involve placing data into rows and columns in a table format for further analysis, such as within a spreadsheet or statistical software.

Data cleaningOnce processed and organized, the data may be incomplete, contain duplicates, or contain errors. The need for data cleaning will arise from problems in the way that data is entered and stored. Data cleaning is the process of preventing and correcting these errors. Common tasks include record matching, duplication, and column segmentation. Textual data spellcheckers can be used to lessen the amount of mistyped words, but it is harder to tell if the words themselves are correct

Exploratory data analysisOnce the data is cleaned, it can be analyzed. Analysts may apply a variety of techniques referred to as exploratory data analysis to begin understanding the messages contained in the data. The process of exploration may result in additional data cleaning or additional requests for data, so these activities may be iterative in nature. Descriptive statistics such as the average or median may be generated to help understand the data. Data visualization may also be used to examine the data in graphical format, to obtain additional insight regarding the messages within the data.

Page 4: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

Modeling and algorithmsMathematical formulas or models called algorithms may be applied to the data to identify relationships among the variables, such as correlation or causation. Inferential statistics includes techniques to measure relationships between particular variables.

Data productA data product is a computer application that takes data inputs and generates outputs, feeding them back into the environment. It may be based on a model or algorithm.

CommunicationOnce the data is analyzed, it may be reported in many formats to the users of the analysis to support their requirements. The users may have feedback, which results in additional analysis.

OLAP

OLAP (online analytical processing) is computer processing that enables a user to easily and selectively extract and view data from different points of view. For example, a user can request that data be analyzed to display a spreadsheet showing all of a company's beach ball products sold in Florida in the month of July, compare revenue figures with those for the same products in September, and then see a comparison of other product sales in Florida in the same time period. To facilitate this kind of analysis, OLAP data is stored in a multidimensional database. Whereas a relational database can be thought of as two-dimensional, a multidimensional database considers each data attribute (such as product, geographic sales region, and time period) as a separate "dimension." OLAP software can locate the intersection of dimensions (all products sold in the Eastern region above a certain price during a certain time period) and display them. Attributes such as time periods can be broken down into subattributes.

OLAP can be used for data mining or the discovery of previously undiscerned relationships between data items. An OLAP database does not need to be as large as a data warehouse, since not all transactional data is needed for trend analysis. Using Open Database Connectivity (ODBC), data can be imported from existing relational databases to create a multidimensional database for OLAP.

Page 5: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

Two leading OLAP products are Hyperion Solution's Essbase and Oracle's Express Server. OLAP products are typically designed for multiple-user environments, with the cost of the software based on the number of users.

OLAP OperationsAs we know that the OLAP server is based on the multidimensional view of data hence we will discuss the OLAP operations in multidimensional data.

Here is the list of OLAP operations.

Roll-up Drill-down Slice and dice Pivot (rotate)

Roll-upThis operation performs aggregation on a data cube in any of the following way:

By climbing up a concept hierarchy for a dimension By dimension reduction.

Consider the following diagram showing the roll-up operation.

Page 6: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

The roll-up operation is performed by climbing up a concept hierarchy for the dimension location.

Initially the concept hierarchy was "street < city < province < country". On rolling up the data is aggregated by ascending the location hierarchy from the

level of city to level of country. The data is grouped into cities rather than countries. When roll-up operation is performed then one or more dimensions from the data

cube are removed.

Drill-downDrill-down operation is reverse of the roll-up. This operation is performed by either of the following way:

By stepping down a concept hierarchy for a dimension. By introducing new dimension.

Consider the following diagram showing the drill-down operation:

Page 7: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

The drill-down operation is performed by stepping down a concept hierarchy for the dimension time.

Initially the concept hierarchy was "day < month < quarter < year." On drill-up the time dimension is descended from the level quarter to the level of

month. When drill-down operation is performed then one or more dimensions from the

data cube are added. It navigates the data from less detailed data to highly detailed data.

SliceThe slice operation performs selection of one dimension on a given cube and give us a new sub cube. Consider the following diagram showing the slice operation.

Page 8: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

The Slice operation is performed for the dimension time using the criterion time ="Q1".

It will form a new sub cube by selecting one or more dimensions.

DiceThe Dice operation performs selection of two or more dimension on a given cube and give us a new subcube. Consider the following diagram showing the dice operation:

Page 9: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

The dice operation on the cube based on the following selection criteria that involve three dimensions.

(location = "Toronto" or "Vancouver") (time = "Q1" or "Q2") (item =" Mobile" or "Modem").

PivotThe pivot operation is also known as rotation.It rotates the data axes in view in order to provide an alternative presentation of data.Consider the following diagram showing the pivot operation.

Page 10: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

In this the item and location axes in 2-D slice are rotated.

OLAP vs OLTPSN Data Warehouse (OLAP) Operational Database(OLTP)

1This involves historical processing of information.

This involves day to day processing.

2OLAP systems are used by knowledge workers such as executive, manager and analyst.

OLTP system are used by clerk, DBA, or database professionals.

3 This is used to analysis the business. This is used to run the business.

4 It focuses on Information out. It focuses on Data in.

Page 11: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

5This is based on Star Schema, Snowflake Schema and Fact Constellation Schema.

This is based on Entity Relationship Model.

6 It focuses on Information out. This is application oriented.

7 This contains historical data. This contains current data.

8This provides summarized and consolidated data.

This provide primitive and highly detailed data.

9This provide summarized and multidimensional view of data.

This provides detailed and flat relational view of data.

10 The number or users are in Hundreds. The number of users are in thousands.

11The number of records accessed are in millions.

The number of records accessed are in tens.

12 The database size is from 100GB to TB The database size is from 100 MB to GB.

13 This are highly flexible. This provide high performance.

DataMining

Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.

Data mining involves six common classes of tasks:

Anomaly detection (Outlier/change/deviation detection) – The identification of unusual data records, that might be interesting or data errors that require further investigation.

Association rule learning (Dependency modeling) – Searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.

Clustering – is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data.

Classification – is the task of generalizing known structure to apply to new data. For example, an e-mail program might attempt to classify an e-mail as "legitimate" or as "spam".

Page 12: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

Regression – attempts to find a function which models the data with the least error.

Summarization – providing a more compact representation of the data set, including visualization and report generation.

DataMining Algorithms

Classification algorithms predict one or more discrete variables, based on the other attributes in the dataset.

Regression algorithms predict one or more continuous variables, such as profit or loss, based on other attributes in the dataset.

Segmentation algorithms divide data into groups, or clusters, of items that have similar properties.

Association algorithms find correlations between different attributes in a dataset. The most common application of this kind of algorithm is for creating association rules, which can be used in a market basket analysis.

Sequence analysis algorithms summarize frequent sequences or episodes in data, such as a Web path flow.

BI technologies-Data mining and OLAP

Data Integration

It is a part of any business intelligence (BI) system. BI is commonly used to support key decision-making processes and can be a surprisingly high-profile endeavor. Additionally, integrating data from disparate sources across an organization has unique challenges.

Data Workflow

Workflow is the series of activities that are necessary to complete a task.

SOA

A service-oriented architecture is essentially a collection of services. These services communicate with each other. The communication can involve either simple data passing or it could involve two or more services coordinating some activity. Some means of connecting services to each other is needed.

Service-oriented architectures are not a new thing. The first service-oriented architecture for many people in the past was with the use DCOM or Object Request Brokers (ORBs) based on the CORBA specification. For more on DCOM and CORBA,

If a service-oriented architecture is to be effective, we need a clear understanding of the term service. A service is a function that is well-defined, self-contained, and does not depend on the context or state of other services.

Page 13: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

A typical interaction flow among the layers of the SOA RA is described below:

1. Service consumers request services using the Integration Layer. 2. The Integration Layer invokes the business process in the Business Process Layer

which is using one or more services. 3. It invokes the Services Layer. 4. The Services Layer binds and invokes Service Components in the Service

Component Layer. 5. Service Components in the Service Component Layer invoke Solution Components

from the Operational Systems Layer to carry out the service request. 6. The response is sent back up to the service consumer

Layer 1. Operational layerThis layer includes all custom or packaged application assets in the application portfolio running in an IT operating environment, supporting business activities.

The operational layer is made up of existing application software systems; thereby, it is used to leverage existing IT investments in implementing an SOA solution.

Layer 2. Service component layer

Page 14: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

This layer contains software components, each of which provide the implementation for, realization of, or operation on a service, which is why it's called a service component. Service components reflect the definition of a service, both in its functionality and its quality of service.

Layer 3. Services layerThis layer consists of all the services defined within the SOA. For the purposes of this reference architecture, a service is considered to be an abstract specification of a collection of (one or more) business-aligned IT functions. The specification provides consumers with sufficient detail to invoke the business functions exposed by a provider of the service; ideally this is done in a platform-independent manner.

Layer 4. Business process layerCompositions and choreographies of services exposed in layer 3 are defined in this layer. We use service composition to combine groups of services into flows, or we choreograph services into flows, thereby establishing applications out of services. These applications support specific use cases and business processes

The business process layer communicates with the consumer layer (also called the presentation layer) to communicate inputs and results from the various people who use the system (end users, decision makers, system administrators) through Web portals or business-to-business (B2B) programs.

Layer 5. Consumer layerThe consumer layer, or the presentation layer, provides the capabilities required to deliver IT functions and data to end users to meet specific usage preferences. This layer can also provide an interface for application to application communication. The consumer layer of the SOA solution stack provides the capability to quickly create the front end of business processes and composite applications to respond to changes in user needs through channels, portals, rich clients, and other mechanisms.

Layer 6. Integration layerThe integration layer is a key enabler for an SOA because it provides the capability to mediate, route, and transport service requests from the service requester to the correct service provider. This layer enables the integration of services through the introduction of a reliable set of capabilities.

Layer 7. Quality of service layerInherent in SOA are characteristics that exacerbate existing QoS concerns in computer systems. These characteristics create complications for QoS that clearly require attention within any SOA solution.

The QoS layer provides an SOA with the capabilities required to realize nonfunctional requirements (NFRs). It must also capture, monitor, log, and signal noncompliance with those requirements relating to the relevant service qualities associated with each SOA layer. This layer serves as an observer of the other layers and can emit signals or events when a noncompliance condition is detected or, preferably, when a noncompliance condition is anticipated.

Page 15: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

Layer 8. Information architecture and business intelligence layerThe information architecture and business intelligence layer ensures the inclusion of key considerations pertaining to data architecture and information architectures that can also be used as the basis for the creation of business intelligence through data marts and data warehouses.

Layer 9. Governance layerThe governance layer covers all aspects of business operational life-cycle management in SOA. It provides guidance and policies for making decisions about an SOA and managing all aspects of an SOA solution, including capacity, performance, security, and monitoring.

Master Data Management

Master data management is one kind of Data Management Initiative. MDM is a set of processes, infrastructure and tools to create and maintain a unified (though not necessarily a physically single) reference for all ‘non-transaction entities’, to ensure that there is a ‘consistent and standard’ ‘structure and data’ related to these entities.

MDM includes a host of data integration techniques and also the establishment of standards, which are enforced manually OR in automated way. For example, Customer MDM will also make sure that customer data is either existing at a single place OR is synchronized to make sure that all copies of Customer Master Data are congruent/aligned.

Master data management (MDM) is the practice of acquiring, improving, and sharing master data. MDM involves creating consistent definitions of business entities via integration techniques across multiple internal IT systems and often to partners or customers. MDM is enabled by integration tools and techniques for ETL, EAI, EII, and replication. MDM is related to data governance, which aims to improve data’s quality, share it broadly, leverage it for competitive advantage, manage change, and comply with regulations and standards.

Page 16: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

External participants may access and update master data through multiple delivery channels. Customers might access and update master data through business systems that provide self-service capabilities for shopping and online banking or through the use of telephony systems to access and update personal information. Supply chain data from suppliers, trading partners, and business partners participate in business-to-business transactions that involve the exchange of core master data entities such as customer and product data. Agents from multiple branch locations that conduct business on behalf of a company may access and update master data through a business system provided by that company or through a business-to-business transaction. Business system users update and query master data typically through the use of their respective business systems. Business systems request MDM Services as part of a business transaction or after the transaction has completed based upon the MDM method of use and implementation style. The decision to access MDM Services as part of a business transaction or after the system has completely processed the transaction is an implementation decision that should be based upon analysis of nonfunctional requirements such as performance and availability. Business systems and partner systems would request MDM Services to access master data through capabilities provided in the connectivity and interoperability layer.

Third-party data service providers such as Dun and Bradstreet, Acxiom, and Lexis Nexis can be accessed for additional information about a person or organization information to enrich master data maintained in the MDM System. Data from these organizations may be used to support the initial loading of master data into the MDM System or periodic updates, or may be used on a transactional basis depending on business requirements. Government agencies also provide watch lists required to support regulatory compliance, the war against terror and anti-money laundering.

The connectivity and interoperability layer facilitates business-to-business communications with trading and business partners, system-to-system communications within the enterprise, and communications to external data providers. Many IT organizations have realized the need to reduce the number of point-to-point interfaces between systems in order to reduce complexity and improve maintainability of the enterprise. They have implemented this layer using application integration techniques such as Enterprise Application Integration Hubs that support communications through the use of messaging, or have adopted the use of an enterprise service bus. MDM and

Page 17: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

Information Integration Services provide information services that can be invoked and choreographed through this layer. The connectivity and interoperability layer represents the enterprise service bus architectural construct or it can simply be thought of as a layer that provides choreography services, and synchronous and asynchronous integration capabilities such as message mediation and routing, publish and subscribe, FTP, and service-oriented integration through the use of Web services. Serviceintegration represents that MDM and Information Integration Services can be requested directly from any business system without going through the connectivity and interoperability layer.

Just below the connectivity and interoperability layer in the center of the figure resides the MDM Services Architecture Building Bock. It consists of a set of MDM Services that are grouped into the following software components:

Interface Services support a consistent entry point to request MDM Services through techniques such as messaging, method calls, Web services, and batch processing. The same MDM service should be invoked during batch processing that may be requested as part of a transaction in order to maintain and apply consistent business logic.

Lifecycle Management Services manage the lifecycle of master data, provide CRUD (create, read, update, and delete) support for master data managed by the MDM System, and apply business logic based upon the context of that data. Data Quality Management Services are called by Lifecycle Management Services to enforce data quality rules and perform data cleansing, standardization, and reconciliation. MDM Event Management Services are called to detect any actions that should be triggered based upon business rules or data governance policies.

Hierarchy and Relationship Management Services manage master data hierarchies, groupings, and relationships that have been defined for master data. These services may also request Identity Analytics Services to discover relationships, such as those between people that are not obvious, and then store that information in the MDM System.

MDM Event Management Services are used to make information actionable and trigger operations based upon events detected within the data. Events can be defined to support data governance policies, such as managing changes to critical data, based upon business rules or time and date scheduled.

Authoring Services provide services to author, approve, manage, customize, and extend the definition of master data as well as the ability to add or modify instance master data, such as product, vendor, and supplier. These services support the MDM collaborative style of use and may be invoked as part of a collaborative workflow to complete the creation, updating, and approval of the information for definition or instance master data.

Data Quality Management Services validate and enforce data quality rules, perform data standardization for both data values and structures, and perform data reconciliation. These services may request Information Integrity Services that are available from the Information Integration Services architecture building block.

Base services are available to support security and privacy, search, audit logging, and workflow. Base services can be implemented to integrate with common enterprise components that support workflow, security, and audit logging.

The Master Data Repository consists of master data, both instance and definition master data, metadata for the MDM System, and history data that records changes to master data. MDM Services can also be used to maintain and control the distribution of reference data that should be maintained at the global level for an organization.

Page 18: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

OPERATIONAL DATA STORE

An operational data store (or "ODS") is a database designed to integrate data from multiple sources for additional operations on the data. Unlike a master data store the data is not passed back to operational systems. It may be passed for further operations and to the data warehouse for reporting.

Because the data originates from multiple sources, the integration often involves cleaning, resolving redundancy and checking against business rules for integrity. An ODS is usually designed to contain low-level or atomic (indivisible) data (such as transactions and prices) with limited history that is captured "real time" or "near real time" as opposed to the much greater volumes of data stored in the data warehouse generally on a less-frequent basis.

The general purpose of an ODS is to integrate data from disparate source systems in a single structure, using data integration technologies like data virtualization, data federation, or extract, transform, and load. This will allow operational access to the data for operational reporting, master data or reference data management.

An ODS is not a replacement or substitute for a data warehouse but in turn could become a source.

EII

Enterprise information integration (EII) is the ability to support a unified view of data and information for an entire organization. In a data virtualization application of EII, a process of information integration, using data abstraction to provide a unified interface (known as uniform data access) for viewing all the data within an organization, and a single set of structures and naming conventions (known as uniform information representation) to represent this data; the goal of EII is to get a large set of heterogeneous data sources to appear to a user or system as a single, homogeneous data source.

Overview

Data within an enterprise can be stored in heterogeneous formats, including relational databases (which themselves come in a large number of varieties), text files, XML files, spreadsheets and a variety of proprietary storage methods, each with their own indexing and data access methods.

Standardized data access APIs have emerged, that offer a specific set of commands to retrieve and modify data from a generic data source. Many applications exist that implement these APIs' commands across various data sources, most notably relational databases. Such APIs include ODBC, JDBC, XQJ, OLE DB, and more recently ADO.NET.

Enterprise Information Integration (EII) applies data integration commercially. It aims at:

Combining disparate data sets 

Page 19: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

Each data source is disparate and as such is not designed to support EII. Therefore, data virtualization as well as data federation depends upon accidental data commonality to support combining data and information from disparate data sets. Because of this lack of data value commonality across data sources, the return set may be inaccurate, incomplete, and impossible to validate.

Simplicity of understanding 

Simplicity of deployment 

Even if recognized as a solution to a problem, EII as of 2009 currently takes time to apply and offers complexities in deployment. People have proposed a variety of schema-less solutions such as "Lean Middleware", but ease-of-use and speed of employment appear inversely proportional to the generality of such systems. Others

cite the need for standard data interfaces to speed and simplify the integration process in practice.

Handling higher-order information 

Analysts experience difficulty — even with a functioning information integration system — in determining whether the sources in the database will satisfy a given application. Answering these kinds of questions about a set of repositories requires semantic information like metadata . The few commercial tools that leverage this information remain in their infancy.

Applications

EII products enable loose coupling between homogeneous-data consuming client applications and services and heterogeneous-data stores. Such client applications and services include Desktop Productivity Tools (spreadsheets, word processors, presentation software, etc.), development environments and frameworks (Java EE, .NET, Mono, SOAP or RESTful Web services, etc.), business intelligence (BI), business activity monitoring (BAM) software, enterprise resource planning (ERP), Customer relationship management (CRM), business process management (BPM and/or BPEL) Software, and web content management (CMS).

Data access technologies

XQuery and XQuery API for Java Service Data Objects (SDO) for Java, C++ and .Net clients and any type of data

source

Batch Processing

Executing a series of non-interactive jobs all at one time. The term originated in the days when users entered programs on punch cards. They would give a batch of these programmed cards to the system operator, who would feed them into the computer.

Batch jobs can be stored up during working hours and then executed during the evening or whenever the computer is idle. Batch processing is particularly useful for operations that

Page 20: sound4lovers.files.wordpress.com€¦  · Web viewUNIT –II. Data warehousing is used for. 1) Validation 2) tactical reporting 3) exploration. Validation. Validation is the where

require the computer or a peripheral device for an extended period of time. Once a batch job begins, it continues until it is done or until an error occurs. Note that batch processing implies that there is no interaction with the user while the program is being executed.

An example of batch processing is the way that credit card companies process billing. The customer does not receive a bill for each separate credit card purchase but one monthly bill for all of that month purchases. The bill is created through batch processing, where all of the data are collected and held until the bill is processed as a batch at the end of the billing cycle