a report on databases

Upload: shoitashringi5926

Post on 07-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 A Report on Databases

    1/24

    IES::MMS IT for Management

    [Type text] Page 1

    A Report on

    Data Management

  • 8/3/2019 A Report on Databases

    2/24

    IES::MMS IT for Management

    [Type text] Page 2

    INDEXSR.NO Particulars Page No.

    1. Databaseintroduction

    Need of databases 3

    2. Moving from Database to Data Warehouse 4

    3. Data Warehouse

    What is data warehousing? Evolution Benefits of a data ware house Applications of data warehousing

    6

    4. Data marts 10

    5. Data mining

    Need for data mining Use ofdata mining services How does data mining work? Data mining technologies The future of data mining

    12

    6. Decision support system

    Types of decision support system models Benefits

    18

    7. OLAP

    Types 22

    http://www.dataentrysolution.com/subdivision-12-Data-mining.htmlhttp://www.dataentrysolution.com/subdivision-12-Data-mining.html
  • 8/3/2019 A Report on Databases

    3/24

    IES::MMS IT for Management

    [Type text] Page 3

    Database

    A database is a collection of related data and interrelated files. A database management

    system (DBMS), sometimes just called Database Manager, is a program that lets one or

    more computer users create and access data in a database. A DBMS is computer software

    designed for the purpose of managing databases. A database manager organizes a related

    collection of data so that information can be retrieved easily. Database Manager allows

    users to store modify and access information from a database. Database managers are used

    by all kinds of people from teachers to police officers.

    Need of databases:

    1. Redundancies and Inconsistencies can be reduced:The data in conventional data systems is often not centralised. Some applicationsmay require data to be combined from several systems. These several systemscould well have data is redundant as well as inconsistent (that is, different copies ofthe same data may have different values).Data inconsistencies are oftenencountered in everyday life. For example, we have all come across situationswhen a new address is communicated to an organisation that we deal with (e.g.: abank), we find that some of the communications from that organisation arereceived at the new address while others continue to be mailed to the old address.Combining all the data in a database would involve reduction in redundancy aswell as inconsistency. It also is likely to reduce the costs for collection, storage andupdating of data. With DBMS, data items needs to be recorded only once and are

    available for everyone to use.

    2. Better service to the Users:A DBMS is often used to provide better service to the users. In conventionalsystems,Availability of information is often poor since it normally is difficult to obtaininformation that the existing systems were not designed for. Once severalconventional systems are combined to form one centralised data base, theavailability of information and its up-to-datedness is likely to improve since thedata can now be shared and the DBMS makes it easy to respond to unforeseeninformation requests. Centralizing the data in a database also often means that userscan obtain new and combined information that would have been impossible toobtain otherwise. Also, use of a DBMS should allow users that do not knowprogramming to interact with the data more easily. The ability to quickly obtainnew and combined information is becoming increasingly important. Anorganisation running a conventional data processing system would require newprograms to be written to meet every new demand.

    3. Flexibility of the system is improved:Changes are often necessary to the contents of data stored in any system. Thesechanges are more easily made in a database than in a conventional system in that

    these changes do not need to have any impact on application programs. Thus data

  • 8/3/2019 A Report on Databases

    4/24

    IES::MMS IT for Management

    [Type text] Page 4

    processing becomes more flexible and enables it to respond more quickly to theexpanding needs of the business.

    4. Cost of developing, implementation and maintaining systems is lower:It is much easier to respond to unforeseen requests when the data is centralized in adatabase than when it is stored in conventional file systems. Although the initialcost of setting up of a database can be large, the input/output routines normallycoded by the programmers are now handled through the DBMS,the amount of timeand money spent writing an application program is reduced. Since the programmerspends less time writing applications, the amount of time required to implementingimplement new applications is reduced.

    5. Security can be improved:In conventional systems, applications are developed in an ad hoc manner. Oftendifferent system of an organisation would access different components of the

    operational data. In such an environment, enforcing security can be quite difficult.Setting up of a database makes it easier to enforce security restrictions since thedata is now centralised. It is easier to control who has access to what parts of thedatabase. However, setting up a database can also make it easier for a determinedperson to breach security.

    6. Integrity can be improved:Since the data of the organisation using a database approach is centralised andwould be used by a no of users at a time, it is essential to enforce integrity controls.Integrity may be compromised in many ways. For example, A student may be

    shown to have borrowed books but has no enrolment. Salary of a staff member inone department may be coming out of the budget of another department. If a no ofusers are allowed to update the same data item at the same time, there is apossibility that the result of the updates is not quite what was intended. Controlstherefore must be introduced to prevent such errors to occur because of concurrentupdating activities.however, since all data is stored only once, it is often easier tomaintain integrity than in conventional systems.

    Moving from Database to Data Warehouse

    A data warehouse is often used as the basis for a decision-support system (also referred tofrom an analytical perspective as a business intelligence system). It is designed toovercome some of the problems encountered when an organization attempts to performstrategic analysis using the same database that is used to perform online transactionprocessing (OLTP).

    A typical OLTP system is characterized by having large numbers of concurrent usersactively adding and modifying data. The database represents the state of a particularbusiness function at a specific point in time, such as an airline reservation system.However, the large volume of data maintained in many OLTP systems can overwhelm anorganization. As databases grow larger with more complex data, response time can

    deteriorate quickly due to competition for available resources. A typical OLTP system has

  • 8/3/2019 A Report on Databases

    5/24

    IES::MMS IT for Management

    [Type text] Page 5

    many users adding new data to the database while fewer users generate reports from thedatabase. As the volume of data increases, reports take longer to generate.

    As organizations collect increasing volumes of data by using OLTP database systems, theneed to analyze data becomes more acute. Typically, OLTP systems are designed

    specifically to manage transaction processing and minimize disk storage requirements by aseries of related, normalized tables. However, when users need to analyze their data, amyriad of problems often prohibits the data from being used:

    Users may not understand the complex relationships among the tables, andtherefore cannot generate ad hoc queries.

    Application databases may be segmented across multiple servers, making itdifficult for users to find the tables in the first place.

    Security restrictions may prevent users from accessing the detail data they need. Database administrators prohibit ad hoc querying of OLTP systems, to prevent

    analytical users from running queries that could slow down the performance ofmission-critical production databases.

    By copying an OLTP system to a reporting server on a regularly scheduled basis, anorganization can improve response time for reports and queries. Yet a schema optimizedfor OLTP is often not flexible enough for decision support applications, largely due to thevolume of data involved and the complexity of normalized relational tables.

    For example, each regional sales manager in a company may wish to produce a monthlysummary of the sales per region. Because the reporting server contains data at the samelevel of detail as the OLTP system, the entire month's data is summarized each time the

    report is generated. The result is longer-running queries that lower user satisfaction.Additionally, many organizations store data in multiple heterogeneous database systems.Reporting is more difficult because data is not only stored in different places, but indifferent formats.

    Data warehousing and online analytical processing (OLAP) provide solutions to theseproblems. Data warehousing is an approach to storing data in which heterogeneous datasources (typically from multiple OLTP databases) are migrated to a separate homogenousdata store. Data warehouses provide these benefits to analytical users:

    Data is organized to facilitate analytical queries rather than transaction processing. Differences among data structures across multiple heterogeneous databases can beresolved. Data transformation rules can be applied to validate and consolidate data when data

    is moved from the OLTP database into the data warehouse.

    Security and performance issues can be resolved without requiring changes in theproduction systems.

    Sometimes organizations maintain smaller, more topic-oriented data stores called datamarts. In contrast to a data warehouse which typically encapsulates all of an enterprise'sanalytical data, a data mart is typically a subset of the enterprise data targeted at a smallerset of users or business functions.

  • 8/3/2019 A Report on Databases

    6/24

    IES::MMS IT for Management

    [Type text] Page 6

    Whereas a data warehouse or data mart is the data stores for analytical data, OLAP is thetechnology that enables client applications to efficiently access the data. OLAP providesthese benefits to analytical users:

    Pre-aggregation of frequently queried data, enabling a very fast response time to adhoc queries.

    An intuitive multidimensional data model that makes it easy to select, navigate, andexplore the data.

    A powerful tool for creating new views of data based upon a rich array of ad hoccalculation functions.

    Technology to manage security, client/server query management and data caching,and facilities to optimize system performance based upon user needs.

    DATAWAREHOUSE

    A data warehouse is a repository (or archive) of information gathered from multiplesources, stored under a unified schema, at a single site. Once gathered, the data are storedfor a long time, permitting access to historical data. Large companies have presences inmany places, each of which may generate a large volume of data. For instance, large retailchains have hundreds or thousands of stores, whereas insurance companies may have datafrom thousands of local branches. Further, large organizations have a complex internalorganization structure, and therefore different data may be present in different locations, oron different operational systems, or under different schemas. For instance, manufacturing-problem data and customer-complaint data may be stored on different database systems.Corporate decision makers require access to information from all such sources. Setting upqueries on individual sources is both cumbersome and inefficient. Moreover, the sourcesof data may store only current data, whereas decision makers may need access to past dataas well; for instance, information about how purchase patterns have changed in the pastyear could be of great importance. Data warehouses provide a solution to these problems.Thus, data warehouses provide the user a single consolidated interface to data, makingdecision-support queries easier to write. Moreover, by accessing information for decisionsupport from a data warehouse, the decision maker ensures that online transaction-processing systems are not affected by the decision-support workload.A data warehousemaintains its functions in three layers: staging, integration, and access. Staging is used to

    store raw data for use by developers. The integration layer is used to integrate data and tohave a level of abstraction from users. The access layer is for getting data out for users.This definition of the data warehouse focuses on data storage. The main source of the datais cleaned, transformed, catalogued and made available for use by managers and otherbusiness professionals for data mining, online analytical processing, market research anddecision support.

    http://en.wikipedia.org/wiki/Data_mininghttp://en.wikipedia.org/wiki/OLAPhttp://en.wikipedia.org/wiki/OLAPhttp://en.wikipedia.org/wiki/Data_mining
  • 8/3/2019 A Report on Databases

    7/24

    IES::MMS IT for Management

    [Type text] Page 7

    Data-warehouse Architecture

    Components of a Data WarehouseThe above diagram shows the architecture of a typical data warehouse, and illustrates thegathering of data, the storage of data, and the querying and data-analysis support. Amongthe issues to be addressed in building a warehouse are the following:

    When and how to gather dataIn a source-driven architecture for gathering data, the data sources transmit newinformation, either continually (as transaction processing takes place), or periodically(nightly, for example). In a destination-driven architecture, the data warehouseperiodically sends requests for new data to the sources. Unless updates at the sources arereplicated at the warehouse via two-phase commit, the warehouse will never be quite up todate with the sources. Two-phase commit is usually far too expensive to be an option, sodata warehouses typically have slightly out-of-date data. That, however, is usually not aproblem for decision-support systems.

    What schema to useData sources that have been constructed independently are likely to have differentschemas. In fact, they may even use different data models. Part of the task of a warehouseis to perform schema integration, and to convert data to the integrated schema before theyare stored. As a result, the data stored in the warehouse are not just a copy of the data atthe sources. Instead, they can be thought of as a materialized view of the data at thesources.

    Data cleansingThe task of correcting and pre-processing data is called data cleansing. Data sources oftendeliver data with numerous minor inconsistencies that can be corrected. For example,

    names are often misspelled, and addressesmay have street/area/city names misspelled, orzip codes entered incorrectly. These can be corrected to a reasonable extent by consulting a

  • 8/3/2019 A Report on Databases

    8/24

    IES::MMS IT for Management

    [Type text] Page 8

    databaseof street names and zip codes in each city. Address lists collected frommultiplesources may have duplicates that need to be eliminated in a mergepurge operation.Records for multiple individuals in a house may be groupedtogether so only one mailingis sent to each house; this operation is called house holding.

    How to propagate updatesUpdates on relations at the data sources must be propagated to the data warehouse. If therelations at the data warehouse are exactly the same as those at the data source, thepropagation is straightforward.If they are not, the problem of propagating updates is basically the view-maintenanceproblem.

    What data to summarizeThe raw data generated by a transaction-processing system may be too large to storeonline. However, we can answer many queries by maintaining just summary data obtainedby aggregation on a relation,

    rather than maintaining the entire relation. For example, instead of storing data about everysale of clothing, we can store total sales of clothing by item name and category.

    WHAT IS DATA WAREHOUSING?A process of transforming data into information and making it available to users in atimely enough manner to make a difference.

    INFORMATION

    DATAHence it is a Process or Technique for assembling and managing data from various sources

    for the purpose of answering business questions. Thus making decisions that were notprevious possible.A decision support database maintained separately from the

    organizations operational database.

    EVOLUTION

    60s: Batch reports

    hard to find and analyze information inflexible and expensive, reprogram every new request

    70s: Terminal-based DSS and EIS (executive information systems)

    still inflexible, not integrated with desktop tools

  • 8/3/2019 A Report on Databases

    9/24

    IES::MMS IT for Management

    [Type text] Page 9

    80s: Desktop data access and analysis tools

    query tools, spreadsheets, GUIs easier to use, but only access operational databases

    90s: Data warehousing with integrated OLAP engines and tools

    Benefits of a Data ware house

    A data warehouse maintains a copy of information from the source transaction systems.

    This architectural complexity provides the opportunity to:

    Maintain data history, even if the source transaction systems do not. Integrate data from multiple source systems, enabling a central view across the

    enterprise. This benefit is always valuable, but particularly so when theorganization has grown by merger.

    Improve data, by providing consistent codes and descriptions, flagging or evenfixing bad data.

    Present the organization's information consistently. Provide a single common data model for all data of interest regardless of the data's

    source.

    Restructure the data so that it makes sense to the business users.

    Restructure the data so that it delivers excellent query performance, even forcomplex analytic queries, without impacting the operational systems.

    Add value to operational business applications, notably customer relationshipmanagement (CRM) systems.

    APPLICATIONS OF DATA WAREHOUSING

    Decision support Trend analysis Financial forecasting Churn Prediction for Telecom subscribers, Credit Card users etc. Insurance fraud analysis Call record analysis Logistics and Inventory management Agriculture

    http://en.wikipedia.org/wiki/Operational_systemhttp://en.wikipedia.org/wiki/Customer_relationship_managementhttp://en.wikipedia.org/wiki/Customer_relationship_managementhttp://en.wikipedia.org/wiki/Churn_ratehttp://en.wikipedia.org/wiki/Churn_ratehttp://en.wikipedia.org/wiki/Customer_relationship_managementhttp://en.wikipedia.org/wiki/Customer_relationship_managementhttp://en.wikipedia.org/wiki/Operational_system
  • 8/3/2019 A Report on Databases

    10/24

    IES::MMS IT for Management

    [Type text] Page 10

    Data Marts

    A data mart is typically defined as a subset of the contents of a data warehouse, storedwithin its own database. A data mart tends to contain data focused at the department level,

    or on a specific business area. The data can exist at both the detail and summary levels.The data mart can be populated with data taken directly from operational sources, similarto a data warehouse, or data taken from the data warehouse itself. Because the volume ofdata in a data mart is less than that in a data warehouse, query processing is often faster.

    Characteristics of a data mart include:

    Quicker and simpler implementation. Lower implementation cost. Needs of a specific business unit or function met. Protection of sensitive information stored elsewhere in the data warehouse. Faster response times due to lower volumes of data. Distribution of data marts to user organizations. Built from the bottom upward.

    Departmental or regional divisions often determine whether data marts or data warehousesare used. For example, if managers in different sales regions require data from only theirregion, then it can be beneficial to build data marts containing specific regional data. Ifregional managers require access to all the organization's data, then a larger datawarehouse is usually necessary.

    Although data marts are often designed to contain data relating to a specific businessfunction, there can be times when users need a broader level of business data. However,because this broader-level data is often only needed in summarized form, it is acceptable tostore it within each data mart rather than implementing a full data warehouse.

    Hence Data Mart is needed because organizations both big and small need more analysison a narrower range of data than provided by a data warehouse, itdeliver value quickly,with less complexity and expense than a data warehouse. Data marts are not data driven,but are a response to real business needs

    Building a Data Warehouse from Data Marts

    Data warehouses can be built using a top-down or bottom-up approach. Top-downdescribes the process of building a data warehouse for the entire organization, containingdata from multiple, heterogeneous, operational sources. The bottom-up approach describesthe process of building data marts for departments, or specific business areas, and then

    joining them to provide the data for the entire organization. Building a data warehousefrom the bottom-up, by implementing data marts, is often simpler because it is lessambitious.

    A common approach to using data marts and data warehouses involves storing all detaildata within the data warehouse, and summarized versions within data marts. Each datamart contains summarized data per functional split within the business, such as salesregion or product group, further reducing the data volume per data mart.

  • 8/3/2019 A Report on Databases

    11/24

    IES::MMS IT for Management

    [Type text] Page 11

    Data Mart Considerations

    Data marts can be useful additions or alternatives to the data warehouse, but issues toconsider before implementation include:

    Additional hardware and software. Time required to populate each data mart regularly. Consistency with other data marts and the data warehouse. Network access (if each data mart is located in a different geographical region).

  • 8/3/2019 A Report on Databases

    12/24

    IES::MMS IT for Management

    [Type text] Page 12

    DATA MINING

    What is Data Mining?

    Data mining, or knowledge discovery(the analysis step of the Knowledge Discovery inDatabases process, or KDD), is the computer-assisted process of digging through andanalyzing enormous sets of data and then extracting the meaning of the data. Generally,data mining is the process of analyzing data from different perspectives and summarizingit into useful information - information that can be used to increase revenue, cuts costs, orboth. Data mining tools predict behaviours and future trends, allowing businesses to makeproactive, knowledge-driven decisions. Data mining tools can answer business questionsthat traditionally were too time consuming to resolve. They scour databases for hiddenpatterns, finding predictive information that experts may miss because it lies outside theirexpectations.

    Data mining derives its name from the similarities between searching for valuableinformation in a large database and mining a mountain for a vein of valuable ore. Bothprocesses require either sifting through an immense amount of material, or intelligentlyprobing it to find where the value resides. Although data mining is a relatively new term,the technology is not. Companies have used powerful computers to sift through volumes ofsupermarket scanner data and analyze market research reports for years. However,continuous innovations in computer processing power, disk storage, and statisticalsoftware are dramatically increasing the accuracy of analysis while driving down the cost.

    The Evolution of Data Mining

    Data mining is a natural development of the increased use of computerized databases tostore data and provide answers to business analysts.

    Evolutionary Step Business Question Enabling Technology

    Data Collection (1960s)"What was my totalrevenue in the last fiveyears?"

    computers, tapes, disks

    Data Access (1980s)"What were unit salesin New England lastMarch?"

    faster and cheaper computers with more

    storage, relational databases

    Data Warehousing andDecision Support

    "What were unit salesin New England lastMarch? Drill down toBoston."

    faster and cheaper computers with morestorage, On-line analytical processing(OLAP), multidimensional databases,data warehouses

    Data Mining

    "What's likely tohappen to Boston unitsales next month?Why?"

    faster and cheaper computers with morestorage, advanced computer algorithms

  • 8/3/2019 A Report on Databases

    13/24

    IES::MMS IT for Management

    [Type text] Page 13

    Traditional query and report tools have been used to describe and extract what is in adatabase. The user forms a hypothesis about a relationship and verifies it or discounts itwith a series of queries against the data. For example, an analyst might hypothesize thatpeople with low income and high debt are bad credit risks and query the database to verify

    or disprove this assumption. Data mining can be used to generate a hypothesis. Forexample, an analyst might use a neural net to discover a pattern that analysts did not thinkto try - for example, that people over 30 years old with low incomes and high debt but whoown their own homes and have children are good credit risks.

    Need for Data Mining

    Nowadays, large quantities of data is being accumulated. The amount of data collectedis said to be almost doubled every 9 months. Seeking knowledge from massive data isone of the most desired attributes of Data Mining. Data could be large in two senses. Interms of size, e.g. for Image Data or in terms of dimensionality, e.g. for Geneexpression data.

    Usually there is a huge gap from the stored data to the knowledge that could beconstrued from the data. This transition won't occur automatically, that's where DataMining comes into picture. In Exploratory Data Analysis, some initial knowledge isknown about the data, but Data Mining could help in a more in-depth knowledge aboutthe data.

    Manual data analysis has been around for some time now, but it creates a bottleneckfor large data analysis.

    Fast developing computer science and engineering techniques and methodologygenerates new demands. Data Mining techniques are now being applied to all kinds of

    domains, which are rich in data, e.g. Image Mining and Gene data analysis.

    What Can Data Mining Do?

    Although data mining is still in its infancy, companies in a wide range of industries -including retail, finance, health care, manufacturing transportation, and aerospace - arealready using data mining tools and techniques to take advantage of historical data. Itenables these companies to determine relationships among "internal" factors such as price,product positioning, or staff skills, and "external" factors such as economic indicators,competition, and customer demographics. And, it enables them to determine the impact onsales, customer satisfaction, and corporate profits. Finally, it enables them to "drill down"into summary information to view detail transactional data By using pattern recognitiontechnologies and statistical and mathematical techniques to sift through warehousedinformation, data mining helps analysts recognize significant facts, relationships, trends,patterns, exceptions and anomalies that might otherwise go unnoticed.

    Use ofData Mining Services

    All businesses depends on data some way or the other. While some have data as theirunderlying procedures, others base their business proceedings on it. Data mining servicesare of greatest importance in applications like

    http://www.dataentrysolution.com/subdivision-12-Data-mining.htmlhttp://www.dataentrysolution.com/subdivision-12-Data-mining.html
  • 8/3/2019 A Report on Databases

    14/24

    IES::MMS IT for Management

    [Type text] Page 14

    Understanding Customer needs Product analysis Demand and supply analysis E-commerce trends

    Telecommunications etc.

    For businesses, data mining is used to discover patterns and relationships in the data inorder to help make better business decisions. Data mining can help spot sales trends,develop smarter marketing campaigns, and accurately predict customer loyalty. Specificuses of data mining include:

    Market segmentation - Identify the common characteristics of customers who buythe same products from your company.

    Customer churn - Predict which customers are likely to leave your company and goto a competitor.

    Fraud detection - Identify which transactions are most likely to be fraudulent. Direct marketing - Identify which prospects should be included in a mailing list to

    obtain the highest response rate. Interactive marketing - Predict what each individual accessing a Web site is most

    likely interested in seeing. Market basket analysis - Understand what products or services are commonly

    purchased together; e.g., beer and diapers. Trend analysis - Reveal the difference between a typical customer this month and

    last.

    Data mining technology can generate new business opportunities by:

    Automated prediction of trends and behaviors: Data mining automates the process offinding predictive information in a large database. Questions that traditionally requiredextensive hands-on analysis can be directly answered from the data. A typical example of apredictive problem is targeted marketing. Data mining uses data on past promotionalmailings to identify the targets most likely to maximize return on investment in futuremailings. Other predictive problems include forecasting bankruptcy and other forms ofdefault, and identifying segments of a population likely to respond similarly to givenevents.

    Automated discovery of previously unknown patterns: Data mining tools sweep throughdatabases and identify previously hidden patterns. An example of pattern discovery is theanalysis of retail sales data to identify seemingly unrelated products that are oftenpurchased together. Other pattern discovery problems include detecting fraudulent creditcard transactions and identifying anomalous data that could represent data entry keyingerrors.

    Using massively parallel computers, companies dig through volumes of data to discoverpatterns about their customers and products. For example, grocery chains have found thatwhen men go to a supermarket to buy diapers, they sometimes walk out with a six-pack ofbeer as well. Using that information, it's possible to lay out a store so that these items are

    closer.

  • 8/3/2019 A Report on Databases

    15/24

    IES::MMS IT for Management

    [Type text] Page 15

    AT&T, A.C. Nielson, and American Express are among the growing ranks of companiesimplementing data mining techniques for sales and marketing. These systems arecrunching through terabytes of point-of-sale data to aid analysts in understandingconsumer behavior and promotional strategies. Why? To gain a competitive advantage and

    increase profitability!

    Similarly, financial analysts are plowing through vast sets of financial records, data feeds,and other information sources in order to make investment decisions. Health-careorganizations are examining medical records to understand trends of the past so they canreduce costs in the future.

    How does data mining work?

    While large-scale information technology has been evolving separate transaction andanalytical systems, data mining provides the link between the two. Data mining software

    analyzes relationships and patterns in stored transaction data based on open-ended userqueries. Several types of analytical software are available: statistical, machine learning,and neural networks. Generally, any of four types of relationships are sought:

    Classes: Stored data is used to locate data in predetermined groups. For example, arestaurant chain could mine customer purchase data to determine when customersvisit and what they typically order. This information could be used to increasetraffic by having daily specials.

    Clusters: Data items are grouped according to logical relationships or consumerpreferences. For example, data can be mined to identify market segments orconsumer affinities.

    Associations: Data can be mined to identify associations. The beer-diaper exampleis an example of associative mining.

    Sequential patterns: Data is mined to anticipate behavior patterns and trends. Forexample, an outdoor equipment retailer could predict the likelihood of a backpackbeing purchased based on a consumer's purchase of sleeping bags and hiking shoes.

    Data mining consists of five major elements:

    Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and information technology professionals. Analyze the data by application software. Present the data in a useful format, such as a graph or table.

  • 8/3/2019 A Report on Databases

    16/24

    IES::MMS IT for Management

    [Type text] Page 16

    Data Mining Technologies

    The analytical techniques used in data mining are often well-known mathematicalalgorithms and techniques. What is new is the application of those techniques to general

    business problems made possible by the increased availability of data and inexpensivestorage and processing power. Also, the use of graphical interfaces has led to toolsbecoming available that business experts can easily use.

    Some of the tools used for data mining are:

    Artificial neural networks: Non-linear predictive models that learn throughtraining and resemble biological neural networks in structure.

    Genetic algorithms: Optimization techniques that use processes such as geneticcombination, mutation, and natural selection in a design based on the concepts of

    natural evolution.

    Decision trees: Tree-shaped structures that represent sets of decisions. Thesedecisions generate rules for the classification of a dataset. Specific decision treemethods include Classification and Regression Trees (CART) and Chi SquareAutomatic Interaction Detection (CHAID) . CART and CHAID are decision treetechniques used for classification of a dataset. They provide a set of rules that youcan apply to a new (unclassified) dataset to predict which records will have a givenoutcome. CART segments a dataset by creating 2-way splits while CHAIDsegments using chi square tests to create multi-way splits. CART typically requires

    less data preparation than CHAID.

    Nearest neighbour method: A technique that classifies each record in a datasetbased on a combination of the classes of the krecord(s) most similar to it in ahistorical dataset (where k1). Sometimes called the k-nearest neighbor technique.

    Rule induction: The extraction of useful if-then rules from data based on statisticalsignificance.

    Data visualization: The visual interpretation of complex relationships inmultidimensional data. Graphics tools are used to illustrate data relationships.

    Example

    For example, one Midwest grocery chain used the data mining capacity of Oracle softwareto analyze local buying patterns. They discovered that when men bought diapers onThursdays and Saturdays, they also tended to buy beer. Further analysis showed that theseshoppers typically did their weekly grocery shopping on Saturdays. On Thursdays,however, they only bought a few items. The retailer concluded that they purchased thebeer to have it available for the upcoming weekend. The grocery chain could use thisnewly discovered information in various ways to increase revenue. For example, . Using

    that information, it's possible to lay out a store so that these items are closer. They could

  • 8/3/2019 A Report on Databases

    17/24

    IES::MMS IT for Management

    [Type text] Page 17

    move the beer display closer to the diaper display. And, they could make sure beer anddiapers were sold at full price on Thursdays.

    The Future of Data Mining

    In the short-term, the results of data mining will be in profitable, if mundane, businessrelated areas. Micro-marketing campaigns will explore new niches. Advertising will targetpotential customers with new precision.

    In the medium term, data mining may be as common and easy to use as e-mail. We mayuse these tools to find the best airfare to New York, root out a phone number of a long-lostclassmate, or find the best prices on lawn mowers.

    The long-term prospects are truly exciting. Imagine intelligent agents turned loose onmedical research data or on sub-atomic particle data. Computers may reveal new

    treatments for diseases or new insights into the nature of the universe.

  • 8/3/2019 A Report on Databases

    18/24

    IES::MMS IT for Management

    [Type text] Page 18

    DECISION SUPPORT SYSTEMDefining the Concept

    Decision support system: An interactive software-based computerized information systemintended to help decisionmakers compile useful information from raw data, documents,personal knowledge, and business models to identify and solve problems and to makedecisions.

    DSSs include knowledge-based systems. A properly designed DSS is an interactivesoftware-based system intended to help decision makers compile useful information from acombination of raw data, documents, personal knowledge, or business models to identifyand solve problems and make decisions.

    DSS components may be classified as:

    Inputs: Factors, numbers, and characteristics to analyze

    User Knowledge and Expertise: Inputs requiring manual analysis by the user

    Outputs: Transformed data from which DSS "decisions" are generated

    Decisions: Results generated by the DSS based on user criteria

    High-level Decision Support System Requirements:

    Data collection from multiple sources (sales data, inventory data, supplier data,market research data. etc.)

    Data formatting and collation A suitable database location and format built for decision support -based reporting

    and analysis

    Robust tools and applications to report, monitor, and analyze the data

    Types of Decision Support SystemModels

    It is important to note that the DSS field does not have a universally acceptedmodel. That is to say, there are many theories vying for supremacy in this broadfield. Because there are many working DSS theories, there are many ways toclassify DSS.

    For instance, one of the DSS models available bears the relationship of the user inmind. This model takes into consideration passive, active, and cooperative DSSmodels.

    http://en.wikipedia.org/wiki/Expert_systemhttp://en.wikipedia.org/wiki/Expert_system
  • 8/3/2019 A Report on Databases

    19/24

    IES::MMS IT for Management

    [Type text] Page 19

    Decision Support Systems that just collect data and organize it effectively areusually called passive models. They do not suggest a specific decision, and theyonly reveal the data. An active DSS actually processes data and explicitly showssolutions based upon that data. While there are many systems that can be active,

    many organizations would be hard pressed to put all their faith into a computermodel without any human intervention. A cooperative Decision Support System is when data is collected, analyzed, and

    then given to a human who helps the system revise or refine it. Here, both a humanand computer component work together to come up with the best solution.

    While the above DSS model considers the users relationship, another popular DSSmodel sees the mode of assistance as the underlying basis of the DSS model. Thisincludes the Model Driven DSS, Communications Driven DSS, Data Driven DSS,Document Driven DSS, and Knowledge Driven DSS.

    A Model Driven DSS is one in which decision makers use statistical simulations orfinancial models to come up with a solution or strategy. Though these decisions are

    based on models, they do not have to be overwhelmingly data intensive. A Communications Driven DSS model is one in which many collaborate to come

    up with a series of decisions to set a solution or strategy in motion. This model canbe in an office environment or on the web.

    A Data Driven DSS model puts its emphasis on collected data that is thenmanipulated to fit the decision makers needs. This data can be internal or external

    and in a variety of formats. It is important that data is collected and categorizedsequentially, for example daily sales, operating budgets from one quarter to thenext, inventory over the previous year, etc.

    A Document Driven DSS model uses a variety of documents such as textdocuments, spreadsheets, and database records to come up with decisions as well

    as further manipulate the information to refine strategies. A Knowledge Driven DSS model uses special rules stored in a computer or that a

    human uses to determine whether a decision should be made. For instance, manyday traders see a stop loss limit as a knowledge driven DSS model. These rules orfacts are used in order to make a decision.

    The scope in which decisions are made can also be seen as a DSS model. Forinstance, an organizational, departmental, or single user decision can be seen in thescope-wide model.

    Decision-Support Systems

    Database applications can be broadly classified into transaction processing and decisionSupport. Transaction-processing systems are widely used today, and companies haveaccumulated a vast amount of information generated by these systems.

    For example, company databases often contain enormous quantities of information aboutcustomers and transactions. The size of the information storage required may range up tohundreds of gigabytes, or even terabytes, for large retail chains.Transaction informationfor a retailer may include the name or identifier (such as credit-card number) of the

    customer, the items purchased, the price paid, and the dates on which the purchases weremade. Information about the items purchased may include the name of the item, the

  • 8/3/2019 A Report on Databases

    20/24

    IES::MMS IT for Management

    [Type text] Page 20

    manufacturer, the model number, the color, and the size. Customer information mayinclude credit history, annual income, residence, age, and even educational background.

    Such large databases can be treasure troves of information for making business decisions,

    such as what items to stock and what discounts to offer. For instance, a retail companymay notice a sudden spurt in purchases of flannel shirts in the Pacific Northwest, mayrealize that there is a trend, and may start stocking a larger number of such shirts in shopsin that area. As another example, a car company may find, on querying its database, thatmost of its small sports cars are bought by young women whose annual incomes are above$50,000. The company may then target its marketing to attract more such women to buy itssmall sports cars, and may avoid wasting money trying to attract other categories of peopleto buy those cars. In both cases, thecompany has identified patterns in customer behavior, and has used the patterns to makebusiness decisions.

    Decision Support Systems delivered by Micro Strategy Business

    Intelligence

    Business Intelligence (BI) reporting tools, processes, and methodologies are keycomponents to any decision support system and provide end users with rich reporting,monitoring, and data analysis. Micro Strategy provides companies with a unified reporting,analytical, and monitoring platform that forms the core of any Decision Support System.The software exemplifies all of the important characteristics of an ideal Decision SupportSystem:

    Supports individual and group decision making: Micro Strategy provides asingle platform that allows all users to access the same information and access thesame version of truth, while providing autonomy to individual users anddevelopment groups to design reporting content locally.

    Easy to develop and deploy: Micro Strategy delivers an interactive, scalableplatform for rapidly developing and deploying projects. Multiple projects can becreated within a single shared metadata. Within each project, development teamscreate a wide variety of re-usable metadata objects. As decision support systemdeployment expands within an organization, the Micro Strategy platformeffortlessly supports an increasing concurrent user base.

    Comprehensive Data Access: Micro Strategy software allows users to access datafrom different sources concurrently, leaving organizations the freedom to choosethe data warehouse that best suits their unique requirements and preferences.

    Integrated software:Micro Strategys integrated platform enables administratorsand IT professionals to develop data models, perform sophisticated analysis,generate analytical reports, and deliver these reports to end users via differentchannels (Web, email, file, print and mobile devices). This eliminates the need forcompanies to spend countless effort purchasing and integrating disparate softwareproducts in an attempt to deliver a consistent user experience.

    Flexibility: Micro Strategy SDK (Software Development Kit) exposes its vastfunctionality through an extensive library of APIs. Micro Strategy customers can

    choose to leverage the power of the softwares flexible APIs to design and deploysolutions tailored to their unique business needs.

    http://www.microstrategy.com/Software/Products/Dev_Tools/SDK/http://www.microstrategy.com/Software/Products/Dev_Tools/SDK/
  • 8/3/2019 A Report on Databases

    21/24

    IES::MMS IT for Management

    [Type text] Page 21

    Benefits

    1. Helps automate managerial processes2. Improves personal efficiency3. Speed up the process of decision making4. Increases organizational control5. Encourages exploration and discovery on the part of the decision maker6. Speeds up problem solving in an organization7. Facilitates interpersonal communication8. Promotes learning or training9. Generates new evidence in support of a decision10.Creates a competitive advantage over competition11.Reveals new approaches to thinking about the problem space12.Helps automate managerial processes

  • 8/3/2019 A Report on Databases

    22/24

    IES::MMS IT for Management

    [Type text] Page 22

    OLAP

    In computing, online analytical processing, or OLAP is an approach to swiftly answer

    multi-dimensional analytical (MDA) queries. OLAP is part of the broader category

    of business intelligence, which also encompasses relational reporting and data

    mining. Typical applications of OLAP include business reporting for

    sales, marketing, management reporting, business process management (BPM),

    budgeting and forecasting, financial reporting and similar areas, with new applications

    coming up, such as agriculture. The term OLAP was created as a slight modification of the

    traditional database term OLTP (Online Transaction Processing).

    OLAP tools enable users to interactively analyze multidimensional data from multiple

    perspectives. OLAP consists of three basic analytical operations: consolidation, drill-

    down, and slicing and dicing. Consolidation involves the aggregation of data that can be

    accumulated and computed in one or more dimensions. For example, all sales offices are

    rolled up to the sales department or sales division to anticipate sales trends. In contrast, the

    drill-down is a technique that allows users to navigate through the details. For instance,

    users can access to the sales by individual products that make up a regions sales. Slicing

    and dicing is a feature whereby users can take out (slicing) a specific set of data of thecube and view (dicing) the slices from different viewpoints.

    Databases configured for OLAP use a multidimensional data model, allowing for complex

    analytical and ad-hoc queries with a rapid execution time. They borrow aspects

    of databases, hierarchical and relational databases.

    Types

    1) MOLAP is the 'classic' form of OLAP and is sometimes referred to as just OLAP.MOLAP stores this data in optimized multi-dimensional array storage, rather than

    in a relational database. Therefore it requires the pre-computation and storage of

    information in the cube - the operation known as processing.

    2) ROLAP works directly with relational databases. The base data and the dimensiontables are stored as relational tables and new tables are created to hold the

    aggregated information.

  • 8/3/2019 A Report on Databases

    23/24

    IES::MMS IT for Management

    [Type text] Page 23

    3) HOLAP There is no clear agreement across the industry as to what constitutes"Hybrid OLAP", except that a database will divide data between relational and

    specialized storage. For example, for some vendors, a HOLAP database will use

    relational tables to hold the larger quantities of detailed data, and use specialized

    storage for at least some aspects of the smaller quantities of more-aggregate or less-

    detailed data.

  • 8/3/2019 A Report on Databases

    24/24

    IES::MMS IT for Management

    WE TAKE THIS OPPORTUNITY TO ACKNOWLEDGE THE CONSTANT HELP

    AND ENCOURAGEMENT GIVEN TO US BY, PROF. MRUNAL JOSHI, UNDERHER DIRECTION THE STUDY WAS UNDERTAKEN AND COMPLETED. WE

    WILL REMAIN GREATFUL TO HER FOR HER EXPERT GUIDANCE AND

    INSPIRING ATTITUDE AND GIVING PROPER DIRECTION TO

    DISSERTATION.