data mining,cobol,memory

54
Data mining An Introduction to Data Mining Discovering hidden value in your data warehouse Overview Data mining, the extraction of hidden predictive informati on from large databases , is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge- driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden  patterns, finding predictive inf ormation that ex perts may miss because it lies outside their expectations. Most companies already collect and refine massive quantities of data. Data mining techniques can be implemented rapidly on existing software and hardware  platforms to enha nce the value of e xisting inform ation resources, and c an be integrated with new products and systems as they are brought on-line. When implemented on high performance client/server or p arallel processing computers, data mining tools can analyze massive databases to deliver answers to qu estions such as, "Which clients are most likely to respond to my next promotional mailing, and why?" This white paper provides an introduction to the basic technologies of data mining. Examples of profitable applications illustrate its relevance to t oday’s business environment as well as a basic description of how data warehouse architecture s can evolve to deliver the value of data mining to end users. The Foundations of Data Mining Data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time.

Upload: pitchrks19841

Post on 14-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 1/54

Data mining

An Introduction to Data Mining 

Discovering hidden value in your data warehouse

Overview 

Data mining, the extraction of hidden predictive information from large databases,is a powerful new technology with great potential to help companies focus on themost important information in their data warehouses. Data mining tools predictfuture trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data miningmove beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions thattraditionally were too time consuming to resolve. They scour databases for hidden

 patterns, finding predictive information that experts may miss because it liesoutside their expectations.

Most companies already collect and refine massive quantities of data. Data miningtechniques can be implemented rapidly on existing software and hardware

 platforms to enhance the value of existing information resources, and can be

integrated with new products and systems as they are brought on-line. Whenimplemented on high performance client/server or parallel processing computers,data mining tools can analyze massive databases to deliver answers to questionssuch as, "Which clients are most likely to respond to my next promotional mailing,and why?"

This white paper provides an introduction to the basic technologies of data mining.Examples of profitable applications illustrate its relevance to today’s business

environment as well as a basic description of how data warehouse architectures canevolve to deliver the value of data mining to end users.

The Foundations of Data Mining 

Data mining techniques are the result of a long process of research and productdevelopment. This evolution began when business data was first stored oncomputers, continued with improvements in data access, and more recently,generated technologies that allow users to navigate through their data in real time.

Page 2: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 2/54

Data mining takes this evolutionary process beyond retrospective data access andnavigation to prospective and proactive information delivery. Data mining is readyfor application in the business community because it is supported by threetechnologies that are now sufficiently mature:

  Massive data collection  Powerful multiprocessor computers  Data mining algorithms

Commercial databases are growing at unprecedented rates. A recent META Groupsurvey of data warehouse projects found that 19% of respondents are beyond the50 gigabyte level, while 59% expect to be there by second quarter of 1996.1 Insome industries, such as retail, these numbers can be much larger. Theaccompanying need for improved computational engines can now be met in a cost-

effective manner with parallel multiprocessor computer technology. Data miningalgorithms embody techniques that have existed for at least 10 years, but have onlyrecently been implemented as mature, reliable, understandable tools thatconsistently outperform older statistical methods.

In the evolution from business data to business information, each new step has built upon the previous one. For example, dynamic data access is critical for drill-through in data navigation applications, and the ability to store large databases iscritical to data mining. From the user’s point of view, the four steps listed in Table

1 were revolutionary because they allowed new business questions to be answered

accurately and quickly.

Evolutionary

Step Business

Question Enabling

Technologies Product

Providers Characteristics 

DataCollection

(1960s)

"What was mytotal revenue in

the last fiveyears?"

Computers,tapes, disks

IBM, CDC Retrospective,static data

delivery

Data Access

(1980s)

"What were unitsales in NewEngland lastMarch?"

Relationaldatabases(RDBMS),Structured Query

Oracle,Sybase,Informix,IBM,

Retrospective,dynamic datadelivery atrecord level

Page 3: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 3/54

Language (SQL),ODBC

Microsoft

Data

Warehousing&

DecisionSupport

(1990s)

"What were unit

sales in NewEngland lastMarch? Drilldown toBoston."

On-line analytic

 processing(OLAP),multidimensionaldatabases, datawarehouses

Pilot,

Comshare,Arbor,Cognos,Microstrategy

Retrospective,

dynamic datadelivery atmultiple levels

Data Mining

(EmergingToday)

"What’s likely to

happen to

Boston unit salesnext month?Why?"

Advancedalgorithms,

multiprocessor computers,massivedatabases

Pilot,Lockheed,

IBM, SGI,numerousstartups(nascentindustry)

Prospective, proactive

informationdelivery

Table 1. Steps in the Evolution of Data Mining.

The core components of data mining technology have been under development for 

decades, in research areas such as statistics, artificial intelligence, and machinelearning. Today, the maturity of these techniques, coupled with high-performancerelational database engines and broad data integration efforts, make thesetechnologies practical for current data warehouse environments.

The Scope of Data Mining 

Data mining derives its name from the similarities between searching for valuable business information in a large database — for example, finding linked products ingigabytes of store scanner data — and mining a mountain for a vein of valuable

ore. Both processes require either sifting through an immense amount of material,or intelligently probing it to find exactly where the value resides. Given databasesof sufficient size and quality, data mining technology can generate new businessopportunities by providing these capabilities:

  Automated prediction of trends and behaviors. Data mining automatesthe process of finding predictive information in large databases. Questions

Page 4: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 4/54

that traditionally required extensive hands-on analysis can now be answereddirectly from the data — quickly. A typical example of a predictive problemis targeted marketing. Data mining uses data on past promotional mailings toidentify the targets most likely to maximize return on investment in futuremailings. Other predictive problems include forecasting bankruptcy andother forms of default, and identifying segments of a population likely torespond similarly to given events.

  Automated discovery of previously unknown patterns. Data mining toolssweep through databases and identify previously hidden patterns in one step.An example of pattern discovery is the analysis of retail sales data to identifyseemingly unrelated products that are often purchased together. Other 

 pattern discovery problems include detecting fraudulent credit cardtransactions and identifying anomalous data that could represent data entry

keying errors.

Data mining techniques can yield the benefits of automation on existing softwareand hardware platforms, and can be implemented on new systems as existing

 platforms are upgraded and new products developed. When data mining tools areimplemented on high performance parallel processing systems, they can analyzemassive databases in minutes. Faster processing means that users can automaticallyexperiment with more models to understand complex data. High speed makes it

 practical for users to analyze huge quantities of data. Larger databases, in turn,yield improved predictions.

Databases can be larger in both depth and breadth:

  More columns. Analysts must often limit the number of variables theyexamine when doing hands-on analysis due to time constraints. Yet variablesthat are discarded because they seem unimportant may carry informationabout unknown patterns. High performance data mining allows users toexplore the full depth of a database, without preselecting a subset of variables.

  More rows. Larger samples yield lower estimation errors and variance, andallow users to make inferences about small but important segments of a

 population.

A recent Gartner Group Advanced Technology Research Note listed data miningand artificial intelligence at the top of the five key technology areas that "willclearly have a major impact across a wide range of industries within the next 3 to 5

Page 5: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 5/54

years."2 Gartner also listed parallel architectures and data mining as two of the top10 new technologies in which companies will invest during the next 5 years.According to a recent Gartner HPC Research Note, "With the rapid advance in datacapture, transmission and storage, large-systems users will increasingly need toimplement new and innovative ways to mine the after-market value of their vaststores of detail data, employing MPP [massively parallel processing] systems tocreate new sources of business advantage (0.9 probability)."3

The most commonly used techniques in data mining are:

  Artificial neural networks: Non-linear predictive models that learn throughtraining and resemble biological neural networks in structure.

  Decision trees: Tree-shaped structures that represent sets of decisions.

These decisions generate rules for the classification of a dataset. Specificdecision tree methods include Classification and Regression Trees (CART)and Chi Square Automatic Interaction Detection (CHAID) .

  Genetic algorithms: Optimization techniques that use processes such asgenetic combination, mutation, and natural selection in a design based on theconcepts of evolution.

  Nearest neighbor method: A technique that classifies each record in adataset based on a combination of the classes of the k record(s) most similar 

to it in a historical dataset (where k ³ 1). Sometimes called the k-nearestneighbor technique.

  Rule induction: The extraction of useful if-then rules from data based onstatistical significance.

Many of these technologies have been in use for more than a decade in specializedanalysis tools that work with relatively small volumes of data. These capabilitiesare now evolving to integrate directly with industry-standard data warehouse andOLAP platforms. The appendix to this white paper provides a glossary of data

mining terms.

How Data Mining Works 

How exactly is data mining able to tell you important things that you didn't knowor what is going to happen next? The technique that is used to perform these featsin data mining is called modeling. Modeling is simply the act of building a model

Page 6: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 6/54

in one situation where you know the answer and then applying it to another situation that you don't. For instance, if you were looking for a sunken Spanishgalleon on the high seas the first thing you might do is to research the times whenSpanish treasure had been found by others in the past. You might note that theseships often tend to be found off the coast of Bermuda and that there are certaincharacteristics to the ocean currents, and certain routes that have likely been taken by the ship’s captains in that era. You note these similarities and build a model that

includes the characteristics that are common to the locations of these sunkentreasures. With these models in hand you sail off looking for treasure where your model indicates it most likely might be given a similar situation in the past.Hopefully, if you've got a good model, you find your treasure.

This act of model building is thus something that people have been doing for along time, certainly before the advent of computers or data mining technology.

What happens on computers, however, is not much different than the way people build models. Computers are loaded up with lots of information about a variety of situations where an answer is known and then the data mining software on thecomputer must run through that data and distill the characteristics of the data thatshould go into the model. Once the model is built it can then be used in similar situations where you don't know the answer. For example, say that you are thedirector of marketing for a telecommunications company and you'd like to acquiresome new long distance phone customers. You could just randomly go out andmail coupons to the general population - just as you could randomly sail the seaslooking for sunken treasure. In neither case would you achieve the results youdesired and of course you have the opportunity to do much better than random -you could use your business experience stored in your database to build a model.

As the marketing director you have access to a lot of information about all of your customers: their age, sex, credit history and long distance calling usage. The goodnews is that you also have a lot of information about your prospective customers:their age, sex, credit history etc. Your problem is that you don't know the longdistance calling usage of these prospects (since they are most likely now customersof your competition). You'd like to concentrate on those prospects who have large

amounts of long distance usage. You can accomplish this by building a model.Table 2 illustrates the data used for building a model for new customer prospectingin a data warehouse.

Page 7: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 7/54

Customers Prospects

General information (e.g.demographic data)

Known Known

Proprietary information (e.g.customer transactions)

Known Target

Table 2 - Data Mining for Prospecting

The goal in prospecting is to make some calculated guesses about the informationin the lower right hand quadrant based on the model that we build going from

Customer General Information to Customer Proprietary Information. For instance,a simple model for a telecommunications company might be:

98% of my customers who make more than $60,000/year spend more than$80/month on long distance

This model could then be applied to the prospect data to try to tell something aboutthe proprietary information that this telecommunications company does notcurrently have access to. With this model in hand new customers can be selectively

targeted.

Test marketing is an excellent source of data for this kind of modeling. Mining theresults of a test market representing a broad but relatively small sample of 

 prospects can provide a foundation for identifying good prospects in the overallmarket. Table 3 shows another common scenario for building models: predict whatis going to happen in the future.

Yesterday Today Tomorrow

Static information andcurrent plans (e.g.demographic data, marketing

 plans)

Known Known Known

Page 8: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 8/54

Dynamic information (e.g.customer transactions)

Known Known Target

Table 3 - Data Mining for Predictions

If someone told you that he had a model that could predict customer usage howwould you know if he really had a good model? The first thing you might trywould be to ask him to apply his model to your customer base - where you alreadyknew the answer. With data mining, the best way to accomplish this is by settingaside some of your data in a vault to isolate it from the mining process. Once themining is complete, the results can be tested against the data held in the vault to

confirm the model’s validity. If the model works, its observations should hold for the vaulted data.

An Architecture for Data Mining 

To best apply these advanced techniques, they must be fully integrated with a datawarehouse as well as flexible interactive business analysis tools. Many data miningtools currently operate outside of the warehouse, requiring extra steps for extracting, importing, and analyzing the data. Furthermore, when new insightsrequire operational implementation, integration with the warehouse simplifies the

application of results from data mining. The resulting analytic data warehouse can be applied to improve business processes throughout the organization, in areassuch as promotional campaign management, fraud detection, new product rollout,and so on. Figure 1 illustrates an architecture for advanced analysis in a large datawarehouse.

Page 9: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 9/54

Figure 1 - Integrated Data Mining Architecture

The ideal starting point is a data warehouse containing a combination of internal

data tracking all customer contact coupled with external market data aboutcompetitor activity. Background information on potential customers also providesan excellent basis for prospecting. This warehouse can be implemented in a varietyof relational database systems: Sybase, Oracle, Redbrick, and so on, and should beoptimized for flexible and fast data access.

An OLAP (On-Line Analytical Processing) server enables a more sophisticatedend-user business model to be applied when navigating the data warehouse. Themultidimensional structures allow the user to analyze the data as they want to view

their business – summarizing by product line, region, and other key perspectives of their business. The Data Mining Server must be integrated with the data warehouseand the OLAP server to embed ROI-focused business analysis directly into thisinfrastructure. An advanced, process-centric metadata template defines the datamining objectives for specific business issues like campaign management,

 prospecting, and promotion optimization. Integration with the data warehouseenables operational decisions to be directly implemented and tracked. As thewarehouse grows with new decisions and results, the organization can continuallymine the best practices and apply them to future decisions.

This design represents a fundamental shift from conventional decision supportsystems. Rather than simply delivering data to the end user through query andreporting software, the Advanced Analysis Server applies users’ business models

directly to the warehouse and returns a proactive analysis of the most relevantinformation. These results enhance the metadata in the OLAP Server by providinga dynamic metadata layer that represents a distilled view of the data. Reporting,visualization, and other analysis tools can then be applied to plan future actionsand confirm the impact of those plans.

Profitable Applications 

A wide range of companies have deployed successful applications of data mining.While early adopters of this technology have tended to be in information-intensiveindustries such as financial services and direct mail marketing, the technology isapplicable to any company looking to leverage a large data warehouse to better manage their customer relationships. Two critical factors for success with datamining are: a large, well-integrated data warehouse and a well-defined

Page 10: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 10/54

understanding of the business process within which data mining is to be applied(such as customer prospecting, retention, campaign management, and so on).

Some successful application areas include:

  A pharmaceutical company can analyze its recent sales force activity andtheir results to improve targeting of high-value physicians and determinewhich marketing activities will have the greatest impact in the next fewmonths. The data needs to include competitor market activity as well asinformation about the local health care systems. The results can bedistributed to the sales force via a wide-area network that enables therepresentatives to review the recommendations from the perspective of thekey attributes in the decision process. The ongoing, dynamic analysis of thedata warehouse allows best practices from throughout the organization to be

applied in specific sales situations.  A credit card company can leverage its vast warehouse of customer 

transaction data to identify customers most likely to be interested in a newcredit product. Using a small test mailing, the attributes of customers with anaffinity for the product can be identified. Recent projects have indicatedmore than a 20-fold decrease in costs for targeted mailing campaigns over conventional approaches.

  A diversified transportation company with a large direct sales force canapply data mining to identify the best prospects for its services. Using datamining to analyze its own customer experience, this company can build a

unique segmentation identifying the attributes of high-value prospects.Applying this segmentation to a general business database such as those

 provided by Dun & Bradstreet can yield a prioritized list of prospects byregion.

  A large consumer package goods company can apply data mining toimprove its sales process to retailers. Data from consumer panels, shipments,and competitor activity can be applied to understand the reasons for brandand store switching. Through this analysis, the manufacturer can select

 promotional strategies that best reach their target customer segments.

Each of these examples have a clear common ground. They leverage theknowledge about customers implicit in a data warehouse to reduce costs andimprove the value of customer relationships. These organizations can now focustheir efforts on the most important (profitable) customers and prospects, and designtargeted marketing strategies to best reach them.

Page 11: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 11/54

Conclusion 

Comprehensive data warehouses that integrate operational data with customer,supplier, and market information have resulted in an explosion of information.Competition requires timely and sophisticated analysis on an integrated view of thedata. However, there is a growing gap between more powerful storage and retrievalsystems and the users’ ability to effectively analyze and act on the information they

contain. Both relational and OLAP technologies have tremendous capabilities for navigating massive data warehouses, but brute force navigation of data is notenough. A new technological leap is needed to structure and prioritize informationfor specific end-user problems. The data mining tools can make this leap.Quantifiable business benefits have been proven through the integration of datamining with current information systems, and new products are on the horizon thatwill bring this integration to an even wider audience of users.

1 META Group Application Development Strategies: "Data Mining for DataWarehouses: Uncovering Hidden Patterns.", 7/13/95 .

2 Gartner Group Advanced Technologies and Applications Research Note, 2/1/95.

3 Gartner Group High Performance Computing Research Note, 1/31/95.

Glossary of Data Mining Terms 

analytical model A structure and process for analyzing a dataset. For example, adecision tree is a model for the classification of a dataset.

anomalous data Data that result from errors (for example, data entry keyingerrors) or that represent unusual events. Anomalous datashould be examined carefully because it may carry importantinformation.

artificial neuralnetworks

 Non-linear predictive models that learn through training andresemble biological neural networks in structure.

CART Classification and Regression Trees. A decision tree technique

Page 12: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 12/54

used for classification of a dataset. Provides a set of rules thatyou can apply to a new (unclassified) dataset to predict whichrecords will have a given outcome. Segments a dataset bycreating 2-way splits. Requires less data preparation than

CHAID.

CHAID Chi Square Automatic Interaction Detection. A decision treetechnique used for classification of a dataset. Provides a set of rules that you can apply to a new (unclassified) dataset to

 predict which records will have a given outcome. Segments adataset by using chi square tests to create multi-way splits.Preceded, and requires more data preparation than, CART.

classification The process of dividing a dataset into mutually exclusive

groups such that the members of each group are as "close" as possible to one another, and different groups are as "far" as possible from one another, where distance is measured withrespect to specific variable(s) you are trying to predict. For example, a typical classification problem is to divide adatabase of companies into groups that are as homogeneous as

 possible with respect to a creditworthiness variable with values"Good" and "Bad."

clustering The process of dividing a dataset into mutually exclusivegroups such that the members of each group are as "close" as

 possible to one another, and different groups are as "far" as possible from one another, where distance is measured withrespect to all available variables.

data cleansing The process of ensuring that all values in a dataset areconsistent and correctly recorded.

data mining The extraction of hidden predictive information from large

databases.

data navigation The process of viewing different dimensions, slices, and levelsof detail of a multidimensional database. See OLAP.

datavisualization

The visual interpretation of complex relationships inmultidimensional data.

Page 13: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 13/54

data warehouse A system for storing and delivering massive quantities of data.

decision tree A tree-shaped structure that represents a set of decisions. Thesedecisions generate rules for the classification of a dataset. See

CART and CHAID.

dimension In a flat or relational database, each field in a record representsa dimension. In a multidimensional database, a dimension is aset of similar entities; for example, a multidimensional salesdatabase might include the dimensions Product, Time, andCity.

exploratory dataanalysis

The use of graphical and descriptive statistical techniques tolearn about the structure of a dataset.

geneticalgorithms

Optimization techniques that use processes such as geneticcombination, mutation, and natural selection in a design basedon the concepts of natural evolution.

linear model An analytical model that assumes linear relationships in thecoefficients of the variables being studied.

linear regression A statistical technique used to find the best-fitting linear relationship between a target (dependent) variable and its

 predictors (independent variables).

logisticregression

A linear regression that predicts the proportions of acategorical target variable, such as type of customer, in a

 population.

multidimensionaldatabase

A database designed for on-line analytical processing.Structured as a multidimensional hypercube with one axis per dimension.

multiprocessor computer 

A computer that includes multiple processors connected by anetwork. See parallel processing.

nearest neighbor A technique that classifies each record in a dataset based on acombination of the classes of the k record(s) most similar to itin a historical dataset (where k ³ 1). Sometimes called a k-

Page 14: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 14/54

nearest neighbor technique.

non-linear model An analytical model that does not assume linear relationshipsin the coefficients of the variables being studied.

OLAP On-line analytical processing. Refers to array-oriented databaseapplications that allow users to view, navigate through,manipulate, and analyze multidimensional databases.

outlier A data item whose value falls outside the bounds enclosingmost of the other corresponding values in the sample. Mayindicate anomalous data. Should be examined carefully; maycarry important information.

 parallel processing

The coordinated use of multiple processors to performcomputational tasks. Parallel processing can occur on amultiprocessor computer or on a network of workstations or PCs.

 predictive model A structure and process for predicting the values of specifiedvariables in a dataset.

 prospective dataanalysis

Data analysis that predicts future trends, behaviors, or events based on historical data.

RAID Redundant Array of Inexpensive Disks. A technology for theefficient parallel storage of data for high-performancecomputer systems.

retrospectivedata analysis

Data analysis that provides insights into trends, behaviors, or events that have already occurred.

rule induction The extraction of useful if-then rules from data based onstatistical significance.

SMP Symmetric multiprocessor. A type of multiprocessor computer in which memory is shared among the processors.

terabyte One trillion bytes.

time series The analysis of a sequence of measurements made at specified

Page 15: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 15/54

analysis time intervals. Time is usually the dominating dimension of thedata.

Data mining

From Wikipedia, the free encyclopedia Not to be confused with information extraction. 

This article or section reads like an editorial or opinion piece and may

require cleanup. Please improve this article  by rewriting this article or section in an encyclopedic style to make it neutral in tone. Please see WP:Nooriginal research and WP:NOTOPINION for further details. (July 2010) 

Data mining (the analysis step of the Knowledge Discovery in Databases process,or KDD), a relatively young and interdisciplinary field of  computer science,[1][2] isthe process of extracting patterns from large data sets  by combining methods fromstatistics and artificial intelligence with database management.[3] 

With recent technical advances in processing power, storage capacity, and inter-

connectivity of computer technology, data mining is seen as an increasinglyimportant tool by modern business to transform unprecedented quantities of digitaldata into  business intelligence giving an informational advantage. It is currentlyused in a wide range of   profiling practices, such as marketing, surveillance, fraud detection, and scientific discovery. The growing consensus that data mining can

 bring real value has led to an explosion in demand for novel data miningtechnologies.[4] 

The related terms data dredging , data fishing and data snooping refer to the use of data mining methods to sample parts of a larger population data set that are (or 

may be) too small for reliable statistical inferences to be made about the validity of any patterns discovered. These methods can, however, be used in creating newhypotheses to test against the larger data populations.

Contents

Page 16: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 16/54

[hide] 

  1 Background o  1.1 Research and evolution 

  2 Process o  2.1 Pre-processing o  2.2 Data mining o  2.3 Results validation 

  3 Notable uses o  3.1 Games o  3.2 Business o  3.3 Science and engineering o  3.4 Spatial data mining 

  3.4.1 Challenges 

o  3.5 Surveillance   3.5.1 Pattern mining   3.5.2 Subject-based data mining 

  4 Privacy concerns and ethics   5 Marketplace surveys   6 Groups and associations   7 See also 

o  7.1 Methods and algorithms o  7.2 Applications o  7.3 Miscellaneous o  7.4 Commercial data-mining software and applications o  7.5 Free libre open source data-mining software and applications 

  8 References   9 Further reading   10 External links 

[edit] Background

The manual extraction of patterns from data has occurred for centuries. Earlymethods of identifying patterns in data include Bayes' theorem (1700s) andregression analysis (1800s). The proliferation, ubiquity and increasing power of computer technology has increased data collection, storage and manipulations. Asdata sets have grown in size and complexity, direct hands-on data analysis hasincreasingly been augmented with indirect, automatic data processing. This has

 been aided by other discoveries in computer science, such as neural networks, 

Page 17: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 17/54

clustering, genetic algorithms (1950s), decision trees (1960s) and support vector machines (1990s). Data mining is the process of applying these methods to datawith the intention of uncovering hidden patterns.[5] It has been used for many years

 by businesses, scientists and governments to sift through volumes of data such asairline passenger trip records, census data and supermarket scanner data to producemarket research reports. (Note, however, that reporting is not always considered to

 be data mining.)

A primary reason for using data mining is to assist in the analysis of collections of observations of behavior. Such data are vulnerable to collinearity  because of unknown interrelations. An unavoidable fact of data mining is that the (sub-)set(s)of data being analyzed may not be representative of the whole domain, andtherefore may not contain examples of certain critical relationships and behaviorsthat exist across other parts of the domain. To address this sort of issue, the

analysis may be augmented using experiment-based and other approaches, such aschoice modelling for human-generated data. In these situations, inherentcorrelations can be either controlled for, or removed altogether, during theconstruction of the experimental design. 

There have been some efforts to define standards for data mining, for example the1999 European Cross Industry Standard Process for Data Mining (CRISP-DM 1.0)and the 2004 Java Data Mining standard (JDM 1.0). These are evolving standards;later versions of these standards are under development. Independent of thesestandardization efforts, freely available open-source software systems like the R  

language, Weka, KNIME, RapidMiner ,  jHepWork  and others have become aninformal standard for defining data-mining processes. Notably, all these systemsare able to import and export models in PMML (Predictive Model MarkupLanguage) which provides a standard way to represent data mining models so thatthese can be shared between different statistical applications.[6] PMML is an XML-

 based language developed by the Data Mining Group (DMG),[7] an independentgroup composed of many data mining companies. PMML version 4.0 was releasedin June 2009.[7][8][9] 

[edit] Research and evolution

The premier professional body in the field is the Association for ComputingMachinery's Special Interest Group on Knowledge discovery and Data Mining(SIGKDD).[citation needed ] Since 1989 they have hosted an annual internationalconference and published its proceedings,[10] and since 1999 have published a

Page 18: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 18/54

 biannual academic journal titled "SIGKDD Explorations".[11]

 Other computer science conferences on data mining include:

  DMIN – International Conference on Data Mining[12]   DMKD – Research Issues on Data Mining and Knowledge Discovery  ECDM – European Conference on Data Mining  ECML-PKDD –  European Conference on Machine Learning and Principles

and Practice of Knowledge Discovery in Databases   EDM – International Conference on Educational Data Mining  ICDM –  IEEE International Conference on Data Mining[13]   MLDM – Machine Learning and Data Mining in Pattern Recognition  PAKDD – The annual Pacific-Asia Conference on Knowledge Discovery

and Data Mining  PAW – Predictive Analytics World[14] 

  SDM –  SIAM International Conference on Data Mining

[edit] Process

The CRoss Industry Standard Process for Data Mining (CRISP-DM)[15] is a datamining process model that describes commonly used approaches that expert dataminers use to tackle problems. It defines six phases as (1) Business Understanding,(2) Data Understanding, (3) Data Preparation, (4) Modeling, (5) Evaluation, and(6) Deployment.[16] 

Alternatively, other process models may define three phases as (1) Pre-processing,(2) Data mining, and (3) Results validation.

[edit] Pre-processing

Before data mining algorithms can be used, a target data set must be assembled. Asdata mining can only uncover patterns already present in the data, the target datasetmust be large enough to contain these patterns while remaining concise enough to

 be mined in an acceptable timeframe. A common source for data is a datamart or data warehouse. Pre-process is essential to analyse the multivariate datasets before

data mining.

The target set is then cleaned. Data cleaning removes the observations with noise and missing data. 

[edit] Data mining

Page 19: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 19/54

Data mining commonly involves four classes of tasks:[17] 

  Association rule learning  – Searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits.Using association rule learning, the supermarket can determine which

 products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.

  Clustering  – is the task of discovering groups and structures in the data thatare in some way or another "similar", without using known structures in thedata.

  Classification  – is the task of generalizing known structure to apply to newdata. For example, an email program might attempt to classify an email aslegitimate or spam. Common algorithms include decision tree learning, nearest neighbor , naive Bayesian classification, neural networks and support

vector machines.   Regression  – Attempts to find a function which models the data with the

least error.

[edit] Results validation

The final step of knowledge discovery from data is to verify the patterns produced by the data mining algorithms occur in the wider data set. Not all patterns found bythe data mining algorithms are necessarily valid. It is common for the data miningalgorithms to find patterns in the training set which are not present in the general

data set. This is called overfitting. To overcome this, the evaluation uses a test set of data on which the data mining algorithm was not trained. The learned patternsare applied to this test set and the resulting output is compared to the desiredoutput. For example, a data mining algorithm trying to distinguish spam fromlegitimate emails would be trained on a training set of sample emails. Once trained,the learned patterns would be applied to the test set of emails on which it had not

 been trained. The accuracy of these patterns can then be measured from how manyemails they correctly classify. A number of statistical methods may be used toevaluate the algorithm such as ROC curves. 

If the learned patterns do not meet the desired standards, then it is necessary toreevaluate and change the pre-processing and data mining. If the learned patternsdo meet the desired standards then the final step is to interpret the learned patternsand turn them into knowledge.

[edit] Notable uses

Page 20: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 20/54

[edit] Games

Since the early 1960s, with the availability of  oracles for certain combinatorialgames, also called tablebases (e.g. for 3x3-chess) with any beginningconfiguration, small-board dots-and-boxes, small-board-hex, and certain endgamesin chess, dots-and-boxes, and hex; a new area for data mining has been opened.This is the extraction of human-usable strategies from these oracles. Current

 pattern recognition approaches do not seem to fully acquire the high level of abstraction required to be applied successfully. Instead, extensive experimentationwith the tablebases, combined with an intensive study of tablebase-answers to welldesigned problems and with knowledge of prior art, i.e. pre-tablebase knowledge,is used to yield insightful patterns. Berlekamp in dots-and-boxes etc. and John

 Nunn in chess endgames are notable examples of researchers doing this work,though they were not and are not involved in tablebase generation.

[edit] Business

Data mining in customer relationship management applications can contributesignificantly to the bottom line.[citation needed ] Rather than randomly contacting a

 prospect or customer through a call center or sending mail, a company canconcentrate its efforts on prospects that are predicted to have a high likelihood of responding to an offer. More sophisticated methods may be used to optimizeresources across campaigns so that one may predict to which channel and to whichoffer an individual is most likely to respond — across all potential offers.

Additionally, sophisticated applications could be used to automate the mailing.Once the results from data mining (potential prospect/customer and channel/offer)are determined, this "sophisticated application" can either automatically send an e-mail or regular mail. Finally, in cases where many people will take an actionwithout an offer, uplift modeling can be used to determine which people will havethe greatest increase in responding if given an offer. Data clustering can also beused to automatically discover the segments or groups within a customer data set.

Businesses employing data mining may see a return on investment, but also theyrecognize that the number of predictive models can quickly become very large.Rather than one model to predict how many customers will churn, a business could

 build a separate model for each region and customer type. Then instead of sendingan offer to all people that are likely to churn, it may only want to send offers tocustomers. Finally, it may want to determine which customers are going to be

 profitable over a window of time and only send the offers to those that are likely to

Page 21: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 21/54

 be profitable. In order to maintain this quantity of models, they need to managemodel versions and move to automated data mining .

Data mining can also be helpful to human-resources departments in identifying thecharacteristics of their most successful employees. Information obtained, such asuniversities attended by highly successful employees, can help HR focus recruitingefforts accordingly. Additionally, Strategic Enterprise Management applicationshelp a company translate corporate-level goals, such as profit and margin sharetargets, into operational decisions, such as production plans and workforcelevels.[18] 

Another example of data mining, often called the market basket analysis, relates toits use in retail sales. If a clothing store records the purchases of customers, a data-mining system could identify those customers who favor silk shirts over cotton

ones. Although some explanations of relationships may be difficult, takingadvantage of it is easier. The example deals with association rules withintransaction-based data. Not all data are transaction based and logical or inexactrules may also be present within a database. 

Market basket analysis has also been used to identify the purchase patterns of theAlpha consumer . Alpha Consumers are people that play a key role in connectingwith the concept behind a product, then adopting that product, and finallyvalidating it for the rest of society. Analyzing the data collected on this type of user has allowed companies to predict future buying trends and forecast supply

demands.[citation needed ] 

Data Mining is a highly effective tool in the catalog marketing industry.[citation needed ] Catalogers have a rich history of customer transactions on millions of customersdating back several years. Data mining tools can identify patterns amongcustomers and help identify the most likely customers to respond to upcomingmailing campaigns.

Data Mining for business applications is a component which needs to be integratedinto a complex modelling and decision making process. Reactive BusinessIntelligence (RBI) advocates a holistic approach that integrates data mining,modeling and interactive visualization, into an end-to-end discovery andcontinuous innovation process powered by human and automated learning.[19] Inthe area of  decision making the RBI approach has been used to mine theknowledge which is progressively acquired from the decision maker and self-tunethe decision method accordingly.[20] 

Page 22: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 22/54

Related to an integrated-circuit production line, an example of data mining isdescribed in the paper "Mining IC Test Data to Optimize VLSI Testing."[21] In this

 paper the application of data mining and decision analysis to the problem of die-level functional test is described. Experiments mentioned in this paper demonstratethe ability of applying a system of mining historical die-test data to create a

 probabilistic model of patterns of die failure. These patterns are then utilized todecide in real time which die to test next and when to stop testing. This system has

 been shown, based on experiments with historical test data, to have the potential toimprove profits on mature IC products.

[edit] Science and engineering

In recent years, data mining has been used widely in the areas of science andengineering, such as  bioinformatics, genetics, medicine, education and electrical

 power  engineering.

In the study of human genetics, an important goal is to understand the mappingrelationship between the inter-individual variation in human DNA sequences andvariability in disease susceptibility. In lay terms, it is to find out how the changesin an individual's DNA sequence affect the risk of developing common diseasessuch as cancer . This is very important to help improve the diagnosis, preventionand treatment of the diseases. The data mining method that is used to perform thistask is known as multifactor dimensionality reduction.[22] 

In the area of electrical power engineering, data mining methods have been widelyused for  condition monitoring of high voltage electrical equipment. The purpose of condition monitoring is to obtain valuable information on the insulation's healthstatus of the equipment. Data clustering such as self-organizing map (SOM) has

 been applied on the vibration monitoring and analysis of transformer on-load tap-changers(OLTCS). Using vibration monitoring, it can be observed that each tapchange operation generates a signal that contains information about the conditionof the tap changer contacts and the drive mechanisms. Obviously, different tap

 positions will generate different signals. However, there was considerablevariability amongst normal condition signals for exactly the same tap position.SOM has been applied to detect abnormal conditions and to estimate the nature of the abnormalities.[23] 

Data mining methods have also been applied for  dissolved gas analysis (DGA) on power transformers. DGA, as a diagnostics for power transformer, has beenavailable for many years. Methods such as SOM has been applied to analyze data

Page 23: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 23/54

and to determine trends which are not obvious to the standard DGA ratio methodssuch as Duval Triangle.[23] 

A fourth area of application for data mining in science/engineering is withineducational research, where data mining has been used to study the factors leadingstudents to choose to engage in behaviors which reduce their learning[24] and tounderstand the factors influencing university student retention.[25] A similar example of the social application of data mining is its use in expertise findingsystems, whereby descriptors of human expertise are extracted, normalized andclassified so as to facilitate the finding of experts, particularly in scientific andtechnical fields. In this way, data mining can facilitate Institutional memory. 

Other examples of applying data mining method applications are  biomedical datafacilitated by domain ontologies,[26] mining clinical trial data,[27] traffic analysis 

using SOM,[28]

 et cetera.

In adverse drug reaction surveillance, the Uppsala Monitoring Centre has, since1998, used data mining methods to routinely screen for reporting patternsindicative of emerging drug safety issues in the WHO global database of 4.6 million suspected adverse drug reaction incidents.[29] Recently, similar methodology has been developed to mine large collections of  electronic healthrecords for temporal patterns associating drug prescriptions to medicaldiagnoses.[30] 

[edit] Spatial data mining

Spatial data mining is the application of data mining methods to spatial data.Spatial data mining follows along the same functions in data mining, with the endobjective to find patterns in geography. So far, data mining and GeographicInformation Systems (GIS) have existed as two separate technologies, each with itsown methods, traditions and approaches to visualization and data analysis.Particularly, most contemporary GIS have only very basic spatial analysisfunctionality. The immense explosion in geographically referenced dataoccasioned by developments in IT, digital mapping, remote sensing, and the globaldiffusion of GIS emphasises the importance of developing data driven inductiveapproaches to geographical analysis and modeling.

Data mining, which is the partially automated search for hidden patterns in largedatabases, offers great potential benefits for applied GIS-based decision-making.Recently, the task of integrating these two technologies has become critical,especially as various public and private sector organizations possessing huge

Page 24: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 24/54

databases with thematic and geographically referenced data begin to realise thehuge potential of the information hidden there. Among those organizations are:

  offices requiring analysis or dissemination of geo-referenced statistical data   public health services searching for explanations of disease clusters  environmental agencies assessing the impact of changing land-use patterns

on climate change  geo-marketing companies doing customer segmentation based on spatial

location.

[edit] Challenges

Geospatial data repositories tend to be very large. Moreover, existing GIS datasetsare often splintered into feature and attribute components, that are conventionally

archived in hybrid data management systems. Algorithmic requirements differ substantially for relational (attribute) data management and for topological(feature) data management.[31] Related to this is the range and diversity of geographic data formats, that also presents unique challenges. The digitalgeographic data revolution is creating new types of data formats beyond thetraditional "vector" and "raster" formats. Geographic data repositories increasinglyinclude ill-structured data such as imagery and geo-referenced multi-media.[32] 

There are several critical research challenges in geographic knowledge discoveryand data mining. Miller and Han [33] offer the following list of emerging research

topics in the field:

  Developing and supporting geographic data warehouses  – Spatial properties are often reduced to simple aspatial attributes in mainstream datawarehouses. Creating an integrated GDW requires solving issues in spatialand temporal data interoperability, including differences in semantics,referencing systems, geometry, accuracy and position.

  Better spatio-temporal representations in geographic knowledge

discovery  – Current geographic knowledge discovery (GKD) methodsgenerally use very simple representations of geographic objects and spatialrelationships. Geographic data mining methods should recognize morecomplex geographic objects (lines and polygons) and relationships (non-Euclidean distances, direction, connectivity and interaction throughattributed geographic space such as terrain). Time needs to be more fullyintegrated into these geographic representations and relationships.

Page 25: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 25/54

  Geographic knowledge discovery using diverse data types  – GKDmethods should be developed that can handle diverse data types beyond thetraditional raster and vector models, including imagery and geo-referencedmultimedia, as well as dynamic data types (video streams, animation).

In four annual surveys of data miners (2007-2010),[34][35][36][37] data mining practitioners consistently identified that they faced three key challenges more thanany others:

  Dirty Data  Explaining Data Mining to Others  Unavailability of Data / Difficult Access to Data

In the 2010 survey data miners also shared their experiences in overcoming these

challenges.

[38]

 

[edit] Surveillance

Prior data mining to stop terrorist programs under the U.S. government include theTotal Information Awareness (TIA) program, Secure Flight (formerly known asComputer-Assisted Passenger Prescreening System (CAPPS II)), Analysis,Dissemination, Visualization, Insight, Semantic Enhancement (ADVISE),[39] andthe Multi-state Anti-Terrorism Information Exchange (MATRIX).[40] These

 programs have been discontinued due to controversy over whether they violate the

US Constitution's 4th amendment, although many programs that were formedunder them continue to be funded by different organizations, or under differentnames.[41] 

Two plausible data mining methods in the context of combating terrorism include"pattern mining" and "subject-based data mining".

[edit] Pattern mining

"Pattern mining" is a data mining method that involves finding existing  patterns in

data. In this context patterns often means association rules. The original motivationfor searching association rules came from the desire to analyze supermarkettransaction data, that is, to examine customer behavior in terms of the purchased

 products. For example, an association rule "beer ⇒ potato chips (80%)" states thatfour out of five customers that bought beer also bought potato chips.

Page 26: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 26/54

In the context of pattern mining as a tool to identify terrorist activity, the  NationalResearch Council  provides the following definition: "Pattern-based data mininglooks for patterns (including anomalous data patterns) that might be associatedwith terrorist activity — these patterns might be regarded as small signals in alarge ocean of noise."

[42][43][44] Pattern Mining includes new areas such a Music

Information Retrieval (MIR) where patterns seen both in the temporal and nontemporal domains are imported to classical knowledge discovery search methods.

[edit] Subject-based data mining

"Subject-based data mining" is a data mining method involving the search for associations between individuals in data. In the context of combating terrorism, the

 National Research Council  provides the following definition: "Subject-based datamining uses an initiating individual or other datum that is considered, based on

other information, to be of high interest, and the goal is to determine what other  persons or financial transactions or movements, etc., are related to that initiatingdatum."[43] 

[edit] Privacy concerns and ethics

Some people believe that data mining itself is ethically neutral.[45] It is important tonote that the term data mining has no ethical implications. The term is oftenassociated with the mining of information in relation to peoples' behavior.However, data mining is a statistical method that is applied to a set of information,

or a data set. Associating these data sets with people is an extreme narrowing of the types of data that are available in today's technological society. Examples couldrange from a set of crash test data for passenger vehicles, to the performance of agroup of stocks. These types of data sets make up a great proportion of theinformation available to be acted on by data mining methods, and rarely haveethical concerns associated with them. However, the ways in which data miningcan be used can raise questions regarding privacy, legality, and ethics.

[46] In

 particular, data mining government or commercial data sets for national security or law enforcement purposes, such as in the Total Information Awareness Program or in ADVISE, has raised privacy concerns.[47][48] 

Data mining requires data preparation which can uncover information or patternswhich may compromise confidentiality and privacy obligations. A common wayfor this to occur is through data aggregation. Data aggregation is when the data areaccrued, possibly from various sources, and put together so that they can beanalyzed.[49] This is not data mining per se, but a result of the preparation of data

Page 27: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 27/54

 before and for the purposes of the analysis. The threat to an individual's privacycomes into play when the data, once compiled, cause the data miner, or anyonewho has access to the newly compiled data set, to be able to identify specificindividuals, especially when originally the data were anonymous.

It is recommended that an individual is made aware of the following before dataare collected:

  the purpose of the data collection and any data mining projects,  how the data will be used,  who will be able to mine the data and use them,  the security surrounding access to the data, and in addition,  how collected data can be updated.[49] 

In the United States, privacy concerns have been somewhat addressed by their congress via the passage of regulatory controls such as the Health InsurancePortability and Accountability Act (HIPAA). The HIPAA requires individuals to

 be given "informed consent" regarding any information that they provide and itsintended future uses by the facility receiving that information. According to anarticle in Biotech Business Week, "In practice, HIPAA may not offer any greater 

 protection than the longstanding regulations in the research arena, says the AAHC.More importantly, the rule's goal of protection through informed consent isundermined by the complexity of consent forms that are required of patients and

 participants, which approach a level of incomprehensibility to average

individuals." [50] This underscores the necessity for data anonymity in dataaggregation practices.

One may additionally modify the data so that they are anonymous, so thatindividuals may not be readily identified.[49] However, even de-identified data setscan contain enough information to identify individuals, as occurred when

 journalists were able to find several individuals based on a set of search historiesthat were inadvertently released by AOL.[51] 

Data Mining: What is Data Mining?

Overview

Page 28: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 28/54

Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it intouseful information - information that can be used to increase revenue, cuts costs, or 

 both. Data mining software is one of a number of analytical tools for analyzingdata. It allows users to analyze data from many different dimensions or angles,categorize it, and summarize the relationships identified. Technically, data miningis the process of finding correlations or patterns among dozens of fields in largerelational databases.

Continuous Innovation

Although data mining is a relatively new term, the technology is not. Companieshave used powerful computers to sift through volumes of supermarket scanner dataand analyze market research reports for years. However, continuous innovations in

computer processing power, disk storage, and statistical software are dramaticallyincreasing the accuracy of analysis while driving down the cost.

Example

For example, one Midwest grocery chain used the data mining capacity of  Oracle software to analyze local buying patterns. They discovered that when men boughtdiapers on Thursdays and Saturdays, they also tended to buy beer. Further analysisshowed that these shoppers typically did their weekly grocery shopping onSaturdays. On Thursdays, however, they only bought a few items. The retailer 

concluded that they purchased the beer to have it available for the upcomingweekend. The grocery chain could use this newly discovered information invarious ways to increase revenue. For example, they could move the beer displaycloser to the diaper display. And, they could make sure beer and diapers were soldat full price on Thursdays.

Data, Information, and Knowledge

Data

Data are any facts, numbers, or text that can be processed by a computer. Today,organizations are accumulating vast and growing amounts of data in differentformats and different databases. This includes:

  operational or transactional data such as, sales, cost, inventory, payroll, andaccounting

Page 29: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 29/54

  nonoperational data, such as industry sales, forecast data, and macroeconomic data

  meta data - data about the data itself, such as logical database design or datadictionary definitions

Information

The patterns, associations, or relationships among all this data can provideinformation. For example, analysis of retail point of sale transaction data can yieldinformation on which products are selling and when.

Knowledge

Information can be converted into knowledge about historical patterns and futuretrends. For example, summary information on retail supermarket sales can beanalyzed in light of promotional efforts to provide knowledge of consumer buying

 behavior. Thus, a manufacturer or retailer could determine which items are mostsusceptible to promotional efforts.

Data Warehouses

Dramatic advances in data capture, processing power, data transmission, andstorage capabilities are enabling organizations to integrate their various databases

into data warehouses. Data warehousing is defined as a process of centralized datamanagement and retrieval. Data warehousing, like data mining, is a relatively newterm although the concept itself has been around for years. Data warehousingrepresents an ideal vision of maintaining a central repository of all organizationaldata. Centralization of data is needed to maximize user access and analysis.Dramatic technological advances are making this vision a reality for manycompanies. And, equally dramatic advances in data analysis software are allowingusers to access this data freely. The data analysis software is what supports datamining.

What can data mining do?

Data mining is primarily used today by companies with a strong consumer focus -retail, financial, communication, and marketing organizations. It enables thesecompanies to determine relationships among "internal" factors such as price,

 product positioning, or staff skills, and "external" factors such as economicindicators, competition, and customer demographics. And, it enables them to

Page 30: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 30/54

determine the impact on sales, customer satisfaction, and corporate profits. Finally,it enables them to "drill down" into summary information to view detailtransactional data.

With data mining, a retailer could use point-of-sale records of customer purchasesto send targeted promotions based on an individual's purchase history. By miningdemographic data from comment or warranty cards, the retailer could develop

 products and promotions to appeal to specific customer segments.

For example, Blockbuster Entertainment mines its video rental history database torecommend rentals to individual customers. American Express can suggest

 products to its cardholders based on analysis of their monthly expenditures.

WalMart is pioneering massive data mining to transform its supplier relationships.

WalMart captures point-of-sale transactions from over 2,900 stores in 6 countriesand continuously transmits this data to its massive 7.5 terabyte Teradata datawarehouse. WalMart allows more than 3,500 suppliers, to access data on their 

 products and perform data analyses. These suppliers use this data to identifycustomer buying patterns at the store display level. They use this information tomanage local store inventory and identify new merchandising opportunities. In1995, WalMart computers processed over 1 million complex data queries.

The National Basketball Association (NBA) is exploring a data mining applicationthat can be used in conjunction with image recordings of basketball games. The

Advanced Scout software analyzes the movements of players to help coachesorchestrate plays and strategies. For example, an analysis of the play-by-play sheetof the game played between the New York Knicks and the Cleveland Cavaliers onJanuary 6, 1995 reveals that when Mark Price played the Guard position, JohnWilliams attempted four jump shots and made each one! Advanced Scout not onlyfinds this pattern, but explains that it is interesting because it differs considerablyfrom the average shooting percentage of 49.30% for the Cavaliers during thatgame.

By using the NBA universal clock, a coach can automatically bring up the videoclips showing each of the jump shots attempted by Williams with Price on thefloor, without needing to comb through hours of video footage. Those clips show avery successful pick-and-roll play in which Price draws the Knick's defense andthen finds Williams for an open jump shot.

How does data mining work?

Page 31: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 31/54

While large-scale information technology has been evolving separate transactionand analytical systems, data mining provides the link between the two. Datamining software analyzes relationships and patterns in stored transaction data

 based on open-ended user queries. Several types of analytical software areavailable: statistical, machine learning, and neural networks. Generally, any of four types of relationships are sought:

  Classes: Stored data is used to locate data in predetermined groups. For example, a restaurant chain could mine customer purchase data to determinewhen customers visit and what they typically order. This information could

 be used to increase traffic by having daily specials.

  Clusters: Data items are grouped according to logical relationships or consumer preferences. For example, data can be mined to identify market

segments or consumer affinities.

  Associations: Data can be mined to identify associations. The beer-diaper example is an example of associative mining.

  Sequential patterns: Data is mined to anticipate behavior patterns andtrends. For example, an outdoor equipment retailer could predict thelikelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes.

Data mining consists of five major elements:

  Extract, transform, and load transaction data onto the data warehousesystem.

  Store and manage the data in a multidimensional database system.

  Provide data access to business analysts and information technology professionals.

  Analyze the data by application software.

  Present the data in a useful format, such as a graph or table.

Different levels of analysis are available:

Page 32: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 32/54

  Artificial neural networks: Non-linear predictive models that learn throughtraining and resemble biological neural networks in structure.

  Genetic algorithms: Optimization techniques that use processes such asgenetic combination, mutation, and natural selection in a design based on theconcepts of natural evolution.

  Decision trees: Tree-shaped structures that represent sets of decisions.These decisions generate rules for the classification of a dataset. Specificdecision tree methods include Classification and Regression Trees (CART)and Chi Square Automatic Interaction Detection (CHAID) . CART andCHAID are decision tree techniques used for classification of a dataset.They provide a set of rules that you can apply to a new (unclassified) datasetto predict which records will have a given outcome. CART segments a

dataset by creating 2-way splits while CHAID segments using chi squaretests to create multi-way splits. CART typically requires less data preparation than CHAID.

  Nearest neighbor method: A technique that classifies each record in adataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k 1). Sometimes called the k -nearestneighbor technique.

  Rule induction: The extraction of useful if-then rules from data based on

statistical significance.

  Data visualization: The visual interpretation of complex relationships inmultidimensional data. Graphics tools are used to illustrate datarelationships.

What technological infrastructure is required?

Today, data mining applications are available on all size systems for mainframe, client/server, and PC platforms. System prices range from severalthousand dollars for the smallest applications up to $1 million a terabyte for thelargest. Enterprise-wide applications generally range in size from 10 gigabytes toover 11 terabytes.  NCR   has the capacity to deliver applications exceeding 100terabytes. There are two critical technological drivers:

Page 33: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 33/54

  Size of the database: the more data being processed and maintained, themore powerful the system required.

  Query complexity: the more complex the queries and the greater thenumber of queries being processed, the more powerful the system required.

Relational database storage and management technology is adequate for manydata mining applications less than 50 gigabytes. However, this infrastructure needsto be significantly enhanced to support larger applications. Some vendors haveadded extensive indexing capabilities to improve query performance. Others usenew hardware architectures such as Massively Parallel Processors (MPP) toachieve order-of-magnitude improvements in query time. For example, MPPsystems from NCR link hundreds of high-speed Pentium processors to achieve

 performance levels exceeding those of the largest supercomputers.

Page 34: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 34/54

Memory

Memory

From Wikipedia, the free encyclopediaFor other uses, see Memory (disambiguation). 

Neuropsychology 

Topics[show] 

Brain functions[show]

People[show] 

Tests[show]

 

Mind and Brain Portal  

v · d · e 

Overview of the forms and functions of memory in the sciences

Page 35: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 35/54

In  psychology, memory is an organism's ability to store, retain, and recall information andexperiences. Traditional studies of memory began in the fields of   philosophy, includingtechniques of  artificially enhancing memory. During the late nineteenth and early twentiethcentury, scientists have put memory within the  paradigm of  cognitive psychology. In recentdecades, it has become one of the principal pillars of a branch of science called cognitive

neuroscience, an interdisciplinary link between cognitive psychology and neuroscience. 

Contents

[hide] 

  1 Processes o  1.1 Sensory memory o  1.2 Short-term o  1.3 Long-term 

  2 Models o  2.1 Atkinson-Shiffrin model o  2.2 Working memory o  2.3 Levels of processing 

  3 Classification by information type   4 Classification by temporal direction   5 Physiology   6 Genetics   7 Disorders   8 Methods   9 Memory and aging   10 Improving memory   11 Memory tasks   12 See also   13 Footnotes   14 References   15 External links 

[edit] Processes

From an information processing  perspective there are three main stages in the formation and

retrieval of memory:

   Encoding  or registration (receiving, processing and combining of received information)  Storage (creation of a permanent record of the encoded information)   Retrieval , recall or recollection (calling back the stored information in response to some

cue for use in a process or activity)

[edit] Sensory memory

Page 36: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 36/54

Main article: Sensory memory 

Sensory memory corresponds approximately to the initial 200 – 500 milliseconds after an item is perceived. The ability to look at an item, and remember what it looked like with just a second of observation, or memorisation, is an example of sensory memory. With very short presentations,

 participants often report that they seem to "see" more than they can actually report. The firstexperiments exploring this form of sensory memory were conducted by George Sperling (1960)using the "partial report paradigm". Subjects were presented with a grid of 12 letters, arrangedinto three rows of four. After a brief presentation, subjects were then played either a high,medium or low tone, cuing them which of the rows to report. Based on these partial reportexperiments, Sperling was able to show that the capacity of sensory memory was approximately12 items, but that it degraded very quickly (within a few hundred milliseconds). Because thisform of memory degrades so quickly, participants would see the display, but be unable to reportall of the items (12 in the "whole report" procedure) before they decayed. This type of memorycannot be prolonged via rehearsal.

[edit] Short-term

Main article: Short-term memory 

Short-term memory allows recall for a period of several seconds to a minute without rehearsal.Its capacity is also very limited: George A. Miller  (1956), when working at Bell Laboratories, conducted experiments showing that the store of short-term memory was 7±2 items (the title of his famous paper, "The magical number 7±2"). Modern estimates of the capacity of short-termmemory are lower, typically on the order of 4 – 5 items,[1] however, memory capacity can beincreased through a process called chunking.[2] For example, in recalling a ten-digit telephonenumber , a person could chunk the digits into three groups: first, the area code (such as 215), then

a three-digit chunk (123) and lastly a four-digit chunk (4567). This method of rememberingtelephone numbers is far more effective than attempting to remember a string of 10 digits; this is because we are able to chunk the information into meaningful groups of numbers. Herbert Simon showed that the ideal size for chunking letters and numbers, meaningful or not, was three.[citation

needed ] This may be reflected in some countries in the tendency to remember telephone numbers asseveral chunks of three numbers with the final four-number groups, generally broken down intotwo groups of two.

Short-term memory is believed to rely mostly on an acoustic code for storing information, and toa lesser extent a visual code. Conrad (1964)[3] found that test subjects had more difficultyrecalling collections of letters that were acoustically similar (e.g. E, P, D). Confusion with

recalling acoustically similar letters rather than visually similar letters implies that the letterswere encoded acoustically. Conrad's (1964) study however, deals with the encoding of writtentext, thus while memory of written language may rely on acoustic components, generalisations toall forms of memory cannot be made.

However, some individuals have been reported to be able to remember large amounts of information, quickly, and be able to recall that information in seconds.[citation needed ] 

Page 37: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 37/54

[edit] Long-term

Olin Levi Warner ,  Memory (1896). Library of Congress Thomas Jefferson Building, Washington, D.C.Main article: Long-term memory 

The storage in sensory memory and short-term memory generally have a strictly limited capacityand duration, which means that information is available only for a certain period of time, but is

not retained indefinitely. By contrast, long-term memory can store much larger quantities of information for potentially unlimited duration (sometimes a whole life span). Its capacity isimmeasurably large. For example, given a random seven-digit number we may remember it for only a few seconds before forgetting, suggesting it was stored in our short-term memory. On theother hand, we can remember telephone numbers for many years through repetition; thisinformation is said to be stored in long-term memory.

While short-term memory encodes information acoustically, long-term memory encodes itsemantically: Baddeley (1966)[4] discovered that after 20 minutes, test subjects had the mostdifficulty recalling a collection of words that had similar meanings (e.g. big, large, great, huge).

Short-term memory is supported by transient patterns of neuronal communication, dependent onregions of the frontal lobe (especially dorsolateral  prefrontal cortex) and the  parietal lobe. Long-term memories, on the other hand, are maintained by more stable and permanent changes inneural connections widely spread throughout the brain. The hippocampus is essential (for learning new information) to the consolidation of information from short-term to long-termmemory, although it does not seem to store information itself. Without the hippocampus, newmemories are unable to be stored into long-term memory, and there will be a very short attentionspan. Furthermore, it may be involved in changing neural connections for a period of three

Page 38: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 38/54

months or more after the initial learning. One of the primary functions of  sleep is thought to beimproving consolidation of information, as several studies have demonstrated that memorydepends on getting sufficient sleep between training and test. Additionally, data obtained fromneuroimaging studies have shown activation patterns in the sleeping brain which mirror thoserecorded during the learning of tasks from the previous day, suggesting that new memories may

 be solidified through such rehearsal.

[edit] Models

Models of memory provide abstract representations of how memory is believed to work. Beloware several models proposed over the years by various psychologists. Note that there is somecontroversy as to whether there are several memory structures, for example, Tarnow (2005) findsthat it is likely that there is only one memory structure between 6 and 600 seconds.

[edit] Atkinson-Shiffrin model

See also: Memory consolidation 

The multi-store model (also known as Atkinson-Shiffrin memory model) was first recognised in1968 by Atkinson and Shiffrin. 

The multi-store model has been criticised for being too simplistic. For instance, long-termmemory is believed to be actually made up of multiple subcomponents, such as episodic and procedural memory. It also proposes that rehearsal is the only mechanism by which informationeventually reaches long-term storage, but evidence shows us capable of remembering thingswithout rehearsal.

The model also shows all the memory stores as being a single unit whereas research into thisshows differently. For example, short-term memory can be broken up into different units such asvisual information and acoustic information. Patient KF proves this. Patient KF was  braindamaged and had problems with his short term memory. He had problems with things such asspoken numbers, letters and words and with significant sounds (such as doorbells and catsmeowing). Other parts of short term memory were unaffected, such as visual (pictures).[5] 

It also shows the sensory store as a single unit whilst we know that the sensory store is split upinto several different parts such as taste, vision, and hearing.

[edit] Working memory

Page 39: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 39/54

 The working memory model.Main article: working memory 

In 1974 Baddeley and Hitch proposed a working memory model which replaced the concept of general short term memory with specific, active components. In this model, working memoryconsists of three basic stores: the central executive, the phonological loop and the visuo-spatialsketchpad. In 2000 this model was expanded with the multimodal episodic buffer .[6] 

The central executive essentially acts as attention. It channels information to the three component processes: the phonological loop, the visuo-spatial sketchpad, and the episodic buffer.

The phonological loop stores auditory information by silently rehearsing sounds or words in acontinuous loop: the articulatory process (for example the repetition of a telephone number over and over again). Then, a short list of data is easier to remember.

The visuospatial sketchpad stores visual and spatial information. It is engaged when performingspatial tasks (such as judging distances) or visual ones (such as counting the windows on a houseor imagining images).

The episodic buffer is dedicated to linking information across domains to form integrated unitsof visual, spatial, and verbal information and chronological ordering (e.g., the memory of a storyor a movie scene). The episodic buffer is also assumed to have links to long-term memory andsemantical meaning.

The working memory model explains many practical observations, such as why it is easier to dotwo different tasks (one verbal and one visual) than two similar tasks (e.g., two visual), and theaforementioned word-length effect. However, the concept of a central executive as noted herehas been criticised as inadequate and vague.[citation needed ] 

[edit] Levels of processing

Main article: Levels-of-processing effect 

Craik and Lockhart (1972) proposed that it is the method and depth of processing that affectshow an experience is stored in memory, rather than rehearsal.

Page 40: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 40/54

  Organization - Mandler (1967) gave participants a pack of word cards and asked them tosort them into any number of piles using any system of categorisation they liked. Whenthey were later asked to recall as many of the words as they could, those who used morecategories remembered more words. This study suggested that the act of organisinginformation makes it more memorable.

 Distinctiveness - Eysenck and Eysenck (1980) asked participants to say words in adistinctive way, e.g. spell the words out loud. Such participants recalled the words better than those who simply read them off a list.

  Effort - Tyler et al. (1979) had participants solve a series of anagrams, some easy(FAHTER) and some difficult (HREFAT). The participants recalled the difficultanagrams better, presumably because they put more effort into them.

  Elaboration - Palmere et al. (1983) gave participants descriptive paragraphs of afictitious African nation. There were some short paragraphs and some with extrasentences elaborating the main idea. Recall was higher for the ideas in the elaborated paragraphs.

[edit] Classification by information type

Anderson (1976)[7] divides long-term memory into declarative (explicit) and  procedural 

(implicit) memories.

Declarative memory requires conscious recall, in that some conscious process must call back theinformation. It is sometimes called explicit memory, since it consists of information that isexplicitly stored and retrieved.

Declarative memory can be further sub-divided into semantic memory, which concerns factstaken independent of context; and episodic memory, which concerns information specific to a

 particular context, such as a time and place. Semantic memory allows the encoding of abstractknowledge about the world, such as "Paris is the capital of France". Episodic memory, on theother hand, is used for more personal memories, such as the sensations, emotions, and personalassociations of a particular place or time. Autobiographical memory - memory for particular events within one's own life - is generally viewed as either equivalent to, or a subset of, episodicmemory. Visual memory is part of memory preserving some characteristics of our senses pertaining to visual experience. One is able to place in memory information that resemblesobjects, places, animals or people in sort of a mental image. Visual memory can result in  priming and it is assumed some kind of perceptual representational system underlies this phenomenon.[2] 

In contrast,  procedural memory (or  implicit memory) is not based on the conscious recall of information, but on implicit learning. Procedural memory is primarily employed in learningmotor skills and should be considered a subset of implicit memory. It is revealed when one does better in a given task due only to repetition - no new explicit memories have been formed, butone is unconsciously accessing aspects of those previous experiences. Procedural memoryinvolved in motor learning depends on the cerebellum and  basal ganglia. 

Page 41: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 41/54

Topographic memory is the ability to orient oneself in space, to recognize and follow anitinerary, or to recognize familiar places.[8] Getting lost when traveling alone is an example of thefailure of topographic memory. This is often reported among elderly patients who are evaluatedfor dementia. The disorder could be caused by multiple impairments, including difficulties with perception, orientation, and memory.[9] 

[edit] Classification by temporal direction

A further major way to distinguish different memory functions is whether the content to beremembered is in the past, retrospective memory, or whether the content is to be remembered inthe future,  prospective memory. Thus, retrospective memory as a category includes semantic,episodic and autobiographical memory. In contrast, prospective memory is memory for futureintentions, or remembering to remember (Winograd, 1988). Prospective memory can be further  broken down into event- and time-based prospective remembering. Time-based prospectivememories are triggered by a time-cue, such as going to the doctor (action) at 4pm (cue). Event- based prospective memories are intentions triggered by cues, such as remembering to post a

letter (action) after seeing a mailbox (cue). Cues do not need to be related to the action (as themailbox example is), and lists, sticky-notes, knotted handkerchiefs, or string around the finger are all examples of cues that are produced by people as a strategy to enhance prospectivememory.

[edit] Physiology

Brain areas involved in the neuroanatomy of memory such as the hippocampus, the amygdala, the striatum, or the mammillary bodies are thought to be involved in specific types of memory.For example, the hippocampus is believed to be involved in spatial learning and declarative

learning, while the amygdala is thought to be involved in emotional memory. Damage to certainareas in patients and animal models and subsequent memory deficits is a primary source of information. However, rather than implicating a specific area, it could be that damage to adjacentareas, or to a pathway traveling through the area is actually responsible for the observed deficit.Further, it is not sufficient to describe memory, and its counterpart, learning, as solely dependenton specific brain regions. Learning and memory are attributed to changes in neuronal synapses, thought to be mediated by long-term potentiation and long-term depression. 

Hebb distinguished between short-term and long-term memory. He postulated that any memorythat stayed in short-term storage for a long enough time would be consolidated into a long-termmemory. Later research showed this to be false. Research has shown that direct injections of cortisol or  epinephrine help the storage of recent experiences. This is also true for stimulation of the amygdala. This proves that excitement enhances memory by the stimulation of hormones thataffect the amygdala. Excessive or prolonged stress (with prolonged cortisol) may hurt memorystorage. Patients with amygdalar damage are no more likely to remember emotionally chargedwords than nonemotionally charged ones. The hippocampus is important for explicit memory.The hippocampus is also important for memory consolidation. The hippocampus receives inputfrom different parts of the cortex and sends its output out to different parts of the brain also. Theinput comes from secondary and tertiary sensory areas that have processed the information a lot

Page 42: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 42/54

already. Hippocampal damage may also cause memory loss and problems with memorystorage.[10] 

[edit] Genetics

Study of the genetics of human memory is in its infancy. A notable initial success was theassociation of  APOE with memory dysfunction in Alzheimer's Disease. The search for genesassociated with normally-varying memory continues. One of the first candidates for normalvariation in memory is the gene  KIBRA

[11], which appears to be associated with the rate at whichmaterial is forgotten over a delay period.

[edit] Disorders

Much of the current knowledge of memory has come from studying memory disorder s. Loss of memory is known as amnesia. There are many sorts of amnesia, and by studying their different

forms, it has become possible to observe apparent defects in individual sub-systems of the brain'smemory systems, and thus hypothesize their function in the normally working brain. Other neurological disorders such as Alzheimer's disease and Parkinson's disease [12] can also affectmemory and cognition. Hyperthymesia, or hyperthymesic syndrome, is a disorder which affectsan individual's autobiographical memory, essentially meaning that they cannot forget smalldetails that otherwise would not be stored.[13] Korsakoff's syndrome, also known as Korsakoff's psychosis, amnesic-confabulatory syndrome, is an organic brain disease that adversely affectsmemory.

While not a disorder, a common temporary failure of word retrieval from memory is the tip-of-the-tongue  phenomenon. Sufferers of   Nominal Aphasia (also called Anomia), however, do

experience the tip-of-the-tongue phenomenon on an ongoing basis due to damage to the frontaland parietal lobes of the brain. 

[edit] Methods

Methods to optimize memorization

Memorization is a method of learning that allows an individual to recall information verbatim.Rote learning is the method most often used. Methods of memorizing things have been thesubject of much discussion over the years with some writers, such as Cosmos Rossellius usingvisual alphabets. The spacing effect shows that an individual is more likely to remember a list of 

items when rehearsal is spaced over an extended period of time. In contrast to this is cramming which is intensive memorisation in a short period of time. Also relevant is the Zeigarnik effect which states that people remember uncompleted or interrupted tasks better than completed ones.The so-called Method of loci uses spatial memory to memorize non-spatial information.

Interference from previous knowledge

Page 43: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 43/54

At the Center for Cognitive Science at Ohio State University, researchers have found thatmemory accuracy of adults is hurt by the fact that they know more than children and tend toapply this knowledge when learning new information. The findings appeared in the August 2004edition of the journal Psychological Science.

Interference can hamper memorisation and retrieval. There is retroactive interference whenlearning new information causes forgetting of old information, and proactive interference wherelearning one piece of information makes it harder to learn similar new information.[14] 

Influence of odors and emotions

In March 2007 German researchers found they could use odors to re-activate new memories inthe brains of people while they slept and the volunteers remembered better later .[15] Emotion canhave a powerful impact on memory. Numerous studies have shown that the most vividautobiographical memories tend to be of emotional events, which are likely to be recalled moreoften and with more clarity and detail than neutral events.[16] 

[edit] Memory and aging

Main article: Memory and aging 

One of the key concerns of older adults is the experience of  memory loss, especially as it is oneof the hallmark symptoms of  Alzheimer's disease. However, memory loss is qualitativelydifferent in normal aging from the kind of memory loss associated with a diagnosis of Alzheimer's (Budson & Price, 2005).

[edit] Improving memoryMain article: Improving memory 

A UCLA research study published in the June 2006 issue of the American Journal of GeriatricPsychiatry found that people can improve cognitive function and brain efficiency through simplelifestyle changes such as incorporating memory exercises, healthy eating,  physical fitness andstress reduction into their daily lives. This study examined 17 subjects, (average age 53) withnormal memory performance. Eight subjects were asked to follow a "brain healthy" diet,relaxation, physical, and mental exercise (brain teasers and verbal memory training techniques).After 14 days, they showed greater word fluency (not memory) compared to their baseline

 performance. No long term follow up was conducted, it is therefore unclear if this interventionhas lasting effects on memory.[17] 

There are a loosely associated group of mnemonic principles and techniques that can be used tovastly improve memory known as the Art of memory. 

The International Longevity Center  released in 2001 a repor t[18] which includes in pages 14 – 16recommendations for keeping the mind in good functionality until advanced age. Some of the

Page 44: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 44/54

recommendations are to stay intellectually active through learning, training or reading, to keep physically active so to promote blood circulation to the brain, to socialize, to reduce stress, tokeep sleep time regular, to avoid depression or emotional instability and to observe goodnutrition.

[edit] Memory tasks

  Paired associate learning - when one learns to associate one specific word with another.For example when given a word such as "safe" one must learn to say another specificword, such as "green". This is stimulus and response.[19] 

  Free recall - during this task a subject would be asked to study a list of words and thensometime later they will be asked to recall or write down as many words that they canremember .[20] 

  Recognition - subjects are asked to remember a list of words or pictures, after which point they are asked to identify the previously presented words or pictures from among alist of alternatives that were not presented in the original list.[21] 

Page 45: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 45/54

COBOL

From Wikipedia, the free encyclopediaFor other uses, see COBOL (disambiguation). 

COBOL 

Paradigm procedural, object-oriented 

Appeared in 1959

Designed by

Grace Hopper , William Selden,Gertrude Tierney, HowardBromberg, Howard Discount,Vernon Reeves, Jean E. Sammet 

Stable release COBOL 2002 (2002)

Typingdiscipline

strong, static 

Major

implementations  

OpenCOBOL, Micro FocusInternational (e.g. the Eclipse- plug-in Micro Focus Net  Express)

Dialects  

HP3000 COBOL/II, COBOL/2,IBM OS/VS COBOL, IBMCOBOL/II, IBM COBOL SAA,IBM Enterprise COBOL, IBMCOBOL/400, IBM ILE COBOL,Unix COBOL X/Open, MicroFocus COBOL, MicrosoftCOBOL, Ryan McFarlandRM/COBOL, Ryan McFarlandRM/COBOL-85, DOSVSCOBOL, UNIVAC COBOL,Realia COBOL, Fujitsu COBOL,ICL COBOL, ACUCOBOL-GT,COBOL-IT, DEC COBOL-10,DEC VAX COBOL, Wang VSCOBOL, Visual COBOL,Tandem (NonStop) COBOL85,Tandem (NonStop) SCOBOL (aCOBOL74 variant for creatingscreens on text-based terminals)

Influenced byFLOW-MATIC, COMTRAN, FACT 

Influenced PL/I, CobolScript, ABAP 

Page 46: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 46/54

COBOL at Wikibooks 

COBOL (  /ˈkoʊ bɒl/) is one of the oldest  programming languages. Its name is an acronym for 

COmmon Business-Oriented Language, defining its primary domain in  business, finance, andadministrative systems for companies and governments.

The COBOL 2002 standard includes support for  object-oriented programming and other modernlanguage features.[1] 

Contents

[hide] 

  1 History and specification o  1.1 ANS COBOL 1968 

o  1.2 COBOL 1974 o  1.3 COBOL 1985 o  1.4 COBOL 2002 and object-oriented COBOL o  1.5 History of COBOL standards o  1.6 Legacy 

  2 Features o  2.1 Self-modifying code o  2.2 Syntactic features o  2.3 Data types o  2.4 Hello, world 

  3 Criticism and defense 

o  3.1 Lack of structurability o  3.2 Verbose syntax o  3.3 Other defenses 

  4 See also   5 References   6 Sources   7 External links 

[edit] History and specification

The COBOL specification was created by a committee of researchers from private industry,universities, and government during the second half of 1959. The specifications were to a greatextent inspired by the FLOW-MATIC language invented by Grace Hopper  - commonly referredto as "the mother of the COBOL language." The IBM COMTRAN language invented by BobBemer  was also drawn upon, but the FACT language specification from Honeywell was notdistributed to committee members until late in the process and had relatively little impact.

Page 47: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 47/54

FLOW-MATIC's status as the only language of the bunch to have actually been implementedmade it particularly attractive to the committee.[2] 

The scene was set on April 8, 1959 at a meeting of computer manufacturers, users, anduniversity people at the University of Pennsylvania Computing Center. The United States

Department of Defense subsequently agreed to sponsor and oversee the next activities. Ameeting chaired by Charles A. Phillips was held at the Pentagon on May 28 and 29 of 1959(exactly one year after the Zürich ALGOL 58 meeting); there it was decided to set up threecommittees: short, intermediate and long range (the last one was never actually formed). It wasthe Short Range Committee, chaired by Joseph Wegstein of the US National Bureau of Standards, that during the following months created a description of the first version of COBOL.[3] The committee was formed to recommend a short range approach to a common business language. The committee was made up of members representing six computer manufacturers and three government agencies. The six computer manufacturers were BurroughsCorporation, IBM, Minneapolis-Honeywell (Honeywell Labs), RCA, Sperry Rand, and SylvaniaElectric Products. The three government agencies were the US Air Force, the Navy's David

Taylor Model Basin, and the  National Bureau of Standards (now  National Institute of Standardsand Technology). The intermediate-range committee was formed but never became operational.In the end a sub-committee of the Short Range Committee developed the specifications of theCOBOL language. This sub-committee was made up of six individuals:

  William Selden and Gertrude Tierney of  IBM   Howard Bromberg and Howard Discount of  RCA   Vernon Reeves and Jean E. Sammet of  Sylvania Electric Products[4] 

The decision to use the name "COBOL" was made at a meeting of the committee held on 18September 1959. The subcommittee completed the specifications for COBOL in December 

1959.

The first compilers for COBOL were subsequently implemented in 1960, and on December 6and 7, essentially the same COBOL program ran on two different computer makes, an RCAcomputer and a Remington-Rand Univac computer, demonstrating that compatibility could beachieved.

[edit] ANS COBOL 1968

After 1959 COBOL underwent several modifications and improvements. In an attempt toovercome the problem of incompatibility between different versions of COBOL, the American

 National Standards Institute (ANSI) developed a standard form of the language in 1968. Thisversion was known as American National Standard (ANS) COBOL.

[edit] COBOL 1974

In 1974, ANSI published a revised version of (ANS) COBOL, containing a number of featuresthat were not in the 1968 version.

Page 48: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 48/54

[edit] COBOL 1985

In 1985, ANSI published still another revised version that had new features not in the 1974

standard, most notably structured language constructs ("scope terminators"), including END-IF,

END-PERFORM, END-READ, etc.

[edit] COBOL 2002 and object-oriented COBOL

The language continues to evolve today. In the early 1990s it was decided to add object-orientation in the next full revision of COBOL. The initial estimate was to have this revisioncompleted by 1997 and an ISO CD (Committee Draft) was available by 1997. Someimplementers (including Micro Focus, Fujitsu, Veryant, and IBM) introduced object-orientedsyntax based on the 1997 or other drafts of the full revision. The final approved ISO Standard(adopted as an ANSI standard by INCITS) was approved and made available in 2002.

Like the C++ and Java  programming languages, object-oriented COBOL compilers are available

even as the language moves toward standardization. Fujitsu and Micro Focus currently supportobject-oriented COBOL compilers targeting the .NET framework .[5] 

The 2002 (4th revision) of COBOL included many other features beyond object-orientation.These included (but are not limited to):

   National Language support (including but not limited to Unicode support)  Locale-based processing  User-defined functions  CALL (and function)  prototypes (for compile-time parameter checking)  Pointers and syntax for getting and freeing storage

  Calling conventions to and from non-COBOL languages such as C   Support for execution within framework environments such as Microsoft's .NET and Java 

(including COBOL instantiated as Enterprise JavaBeans)   Bit and Boolean support  ―True‖ binary support (up until this enhancement, binary items were truncated based on

the (base-10) specification within the Data Division)  Floating-point support  Standard (or portable) arithmetic results  XML generation and parsing

[edit] History of COBOL standards

The specifications approved by the full Short Range Committee were approved by the ExecutiveCommittee on January 3, 1960, and sent to the government printing office, which edited and printed these specifications as Cobol 60. 

The American National Standards Institute (ANSI) produced several revisions of the COBOLstandard, including:

Page 49: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 49/54

  COBOL-68   COBOL-74   COBOL-85   Intrinsic Functions Amendment - 1989   Corrections Amendment - 1991 

After the Amendments to the 1985 ANSI Standard (which were adopted by ISO), primarydevelopment and ownership was taken over by ISO. The following editions and TRs (TechnicalReports) have been issued by ISO (and adopted as ANSI) Standards:

  COBOL 2002   Finalizer Technical Report - 2003    Native XML syntax Technical Report - 2006   Object Oriented Collection Class Libraries - pending final approval...

From 2002, the ISO standard is also available to the public coded as ISO/IEC 1989. 

Work progresses on the next full revision of the COBOL Standard. Approval and availabilitywas expected early 2010s. For information on this revision, to see the latest draft of this revision,or to see what other works is happening with the COBOL Standard, see the COBOL StandardsWebsite. 

[edit] Legacy

COBOL programs are in use globally in governmental and military agencies and in commercialenterprises, and are running on operating systems such as IBM's z/OS, the POSIX families(Unix/Linux etc.), and Microsoft's Windows as well as ICL's VME operating system and Unisys' 

OS 2200. In 1997, the Gartner Group reported that 80% of the world's business ran on COBOLwith over 200 billion lines of code in existence and with an estimated 5 billion lines of new codeannually.[6] 

 Near the end of the twentieth century the year 2000 problem was the focus of significantCOBOL programming effort, sometimes by the same programmers who had designed thesystems decades before. The particular level of effort required for COBOL code has beenattributed both to the large amount of business-oriented COBOL, as COBOL is by design a business language and business applications use dates heavily, and to constructs of the COBOLlanguage such as the PICTURE clause, which can be used to define fixed-length numeric fields,including two-digit fields for years.[

citation needed ] Because of the clean-up effort put into these

COBOL programs for Y2K, many of them have been kept in use for years since then.

[citation needed ]

 

[edit] Features

COBOL as defined in the original specification included a PICTURE clause for detailed fieldspecification. It did not support local variables, recursion, dynamic memory allocation, or structured programming constructs. Support for some or all of these features has been added in

Page 50: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 50/54

later editions of the COBOL standard. COBOL has many reserved words (over 400), calledkeywords. 

[edit] Self-modifying code

The original COBOL specification supported self-modifying code via the infamous "ALTER XTO PROCEED TO Y" statement. X and Y are paragraph labels, and any "GOTO X" statementsexecuted after such an ALTER statement have the meaning "GOTO Y" instead. Most [citation needed ] compilers still support it, but it should not be used in new programs.

[edit] Syntactic features

COBOL provides an update-in-place syntax, for example

ADD YEARS TO AGE

The equivalent construct in many procedural languages would be

age = age + years

This syntax is similar to the compound assignment operator later adopted by C:

age += years

The abbreviated conditional expression

IF SALARY > 8000 OR SUPERVISOR-SALARY OR = PREV-SALARY

is equivalent to

IF SALARY > 8000

OR SALARY > SUPERVISOR-SALARY

OR SALARY == PREV-SALARY

COBOL provides "named conditions" (so-called 88-levels). These are declared as sub-items of another item (the conditional variable). The named condition can be used in an IF statement, andtests whether the conditional variable is equal to any of the values given in the named condition'sVALUE clause. The SET statement can be used to make a named condition TRUE (by assigningthe first of its values to the conditional variable).

COBOL allows identifiers up to 30 characters long. When COBOL was introduced, muchshorter lengths (e.g., 6 characters for FORTRAN) were prevalent.

COBOL introduced the concept of  copybooks — chunks of code that can be inserted into a larger  program. COBOL does this with the COPY statement, which also allows other code to replace parts of the copybook's code with other code (using the REPLACING ... BY ... clause).

Page 51: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 51/54

[edit] Data types

Standard COBOL provides the following data types: 

Data type Sample declaration Notes

Character PIC X(20)

PIC A(4)9(5)X(7) 

Alphanumeric and alphabetic-onlySingle-byte character set (SBCS)

Edited character  PIC X99BAXX  Formatted and inserted characters

 Numeric fixed-point binary

PIC S999V99

[USAGE]

COMPUTATIONAL or BINARY 

Binary 16, 32, or 64 bits (2, 4, or 8 bytes)Signed or unsigned. Conformingcompilers limit the maximumvalue of variables based on the picture clause and not the number 

of bits reserved for storage. Numeric fixed-point packed decimal 

PIC S999V99

PACKED-DECIMAL 

1 to 18 decimal digits (1 to 10 bytes)Signed or unsigned

 Numeric fixed-pointzoned decimal 

PIC S999V99

[USAGE DISPLAY] 

1 to 18 decimal digits (1 to 18 bytes)Signed or unsignedLeading or trailing sign,overpunch or separate

 Numeric floating-point PIC S9V999ES99  Binary floating-point

Edited numeric

PIC +Z,ZZ9.99

PIC $***,**9.99CR  Formatted characters and digits

Group ( record  ) 

01 CUST-NAME.

05 CUST-LAST PIC

X(20).

05 CUST-FIRST PIC

X(20). 

Aggregated elements

Table ( array )  OCCURS 12 TIMES Fixed-size array, row-major order Up to 7 dimensions

Variable-length table  

OCCURS 0 to 12 TIMES

DEPENDING ON CUST-

COUNT 

Variable-sized array, row-major order Up to 7 dimensions

Renames ( variant  or union data) 

66 RAW-RECORDRENAMES CUST-

RECORD 

Character data overlaying other variables

Condition name88 IS-RETIRED-AGE

VALUES 65 THRU 150 Boolean valuedependent upon another variable

Array index [USAGE] INDEX  Array subscript

Most vendors provide additional types, such as:

Page 52: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 52/54

Data type Sample declaration Notes

 Numeric floating-pointsingle precision

PIC S9V9999999ES99 [USAGE]

COMPUTATIONAL-1 

Binary floating-point (32 bits, 7+digits)(IBM extension)

 Numeric floating-pointdouble precision

PIC S9V999ES99 [USAGE]

COMPUTATIONAL-2 

Binary floating-point (64 bits, 16+

digits)(IBM extension)

 Numeric fixed-point packed decimal 

PIC S9V999 [USAGE]

COMPUTATIONAL-3 

same as PACKED DECIMAL(IBM extension)

 Numeric fixed-point binary

PIC S999V99

[USAGE]

COMPUTATIONAL-4 

same as COMPUTATIONAL or BINARY(IBM extension)

 Numeric fixed-point binary(native binary)

PIC S999V99[USAGE]

COMPUTATIONAL-5 

Binary 16, 32, or 64 bits (2, 4, or 8 bytes)Signed or unsigned. The

maximum value of variables based on the number of bitsreserved for storage and not onthe picture clause.(IBM extension)

 Numeric fixed-point binaryin native byte order 

PIC S999V99

[USAGE]

COMPUTATIONAL-4 

Binary 16, 32, or 64 bits (2, 4, or 8 bytes)Signed or unsigned

 Numeric fixed-point binary

in  big-endian  byte order 

PIC S999V99

[USAGE]

COMPUTATIONAL-5 

Binary 16, 32, or 64 bits (2, 4, or 8 bytes)

Signed or unsigned

Wide character  PIC G(20) AlphanumericDouble-byte character set (DBCS)

Edited wide character  PIC G99BGGG Formatted and inserted widecharacters

Edited floating-point PIC +9.9(6)E+99 Formatted characters, decimaldigits, and exponent

Data pointer   [USAGE] POINTER  Data memory address

Code pointer [USAGE] PROCEDURE-

POINTER  Code memory address

Bit field PIC 1(n) [USAGE]COMPUTATIONAL-5 

n can be from 1 to 64, defining ann-bit integer Signed or unsigned

Index [USAGE] INDEX 

Binary value corresponding to anoccurrence of a table elementMay be linked to a specific table

using INDEXED BY 

Page 53: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 53/54

[edit] Hello, world

An example of the "Hello, world" program in COBOL:

IDENTIFICATION DIVISION.

PROGRAM-ID. HELLO-WORLD.

PROCEDURE DIVISION.

DISPLAY 'Hello, world'.

STOP RUN.

[edit] Criticism and defense

[edit] Lack of structurability

In his letter to an editor in 1975 titled "How do we tell truths that might hurt?" which was criticalof several programming languages contemporaneous with COBOL, computer scientist andTuring Award recipient Edsger Dijkstra remarked that "The use of COBOL cripples the mind; itsteaching should, therefore, be regarded as a criminal offense."[7] 

In his dissenting response to Dijkstra's article and the above "offensive statement," computer scientist Howard E. Tompkins defended structured COBOL: "COBOL programs withconvoluted control flow indeed tend to 'cripple the mind'," but this was because "There are toomany such business application programs written by programmers that have never had the benefit of structured COBOL taught well..."[8] 

Additionally, the introduction of OO-COBOL has added support for  object-oriented code as wellas user-defined functions and user-defined data types to COBOL's repertoire.

[edit] Verbose syntax

COBOL 85 was not fully compatible with earlier versions, resulting in the "cesarean birth" of COBOL 85. Joseph T. Brophy, CIO, Travelers Insurance, spearheaded an effort to inform usersof COBOL of the heavy reprogramming costs of implementing the new standard. As a result theANSI COBOL Committee received more than 3,200 letters from the public, mostly negative,requiring the committee to make changes. On the other hand, conversion to COBOL 85 wasthought to increase productivity in future years, thus justifying the conversion costs.[9] 

COBOL syntax has often been criticized for its verbosity. However, proponents note that this

was intentional in the language design, and many consider it one of COBOL's strengths. One of the design goals of COBOL was that non-programmers — managers, supervisors, and users — could read and understand the code. This is why COBOL has an English-like syntax andstructural elements — including: nouns, verbs, clauses, sentences, sections, and divisions.Consequently, COBOL is considered by at least one source to be "The most readable,understandable and self-documenting programming language in use today. [...] Not only doesthis readability generally assist the maintenance process but the older a program gets the morevaluable this readability becomes."[10] On the other hand, the mere ability to read and understand

Page 54: Data Mining,Cobol,Memory

7/29/2019 Data Mining,Cobol,Memory

http://slidepdf.com/reader/full/data-miningcobolmemory 54/54

a few lines of COBOL code does not grant to an executive or end user the experience andknowledge needed to design, build, and maintain large software systems.[citation needed ] 

[edit] Other defenses

Additionally, traditional COBOL is a simple language with a limited scope of function (with no pointers, no user-defined types, and no user-defined functions), encouraging a straightforwardcoding style. This has made it well-suited to its primary domain of business computing — wherethe program complexity lies in the business rules that need to be encoded rather thansophisticated algorithms or data structures. And because the standard does not belong to any particular vendor, programs written in COBOL are highly portable. The language can be used ona wide variety of hardware platforms and operating systems. And the rigid hierarchical structurerestricts the definition of external references to the Environment Division, which simplifies platform changes.[10]