chapter 7 data, text, and web mining. learning objectives define data mining and list its objectives...

40
Chapter 7 DATA, TEXT, AND WEB MINING

Post on 20-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Chapter 7

DATA, TEXT,

AND WEB MINING

Page 2: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Learning Objectives

• Define data mining and list its objectives and benefits

• Understand different purposes and applications of data mining

• Understand different methods of data mining, especially clustering and decision tree models

• Build expertise in use of some data mining software

Page 3: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Learning Objectives

• Learn the process of data mining projects• Understand data mining pitfalls and

myths• Define text mining and its objectives and

benefits• Appreciate use of text mining in business

applications• Define Web mining and its objectives and

benefits

Page 4: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Concepts and Applications • Six factors behind the sudden rise in popularity of

data mining1. General recognition of the untapped value in

large databases;2. Consolidation of database records tending

toward a single customer view;3. Consolidation of databases, including the

concept of an information warehouse;4. Reduction in the cost of data storage and

processing, providing for the ability to collect and accumulate data;

5. Intense competition for a customer’s attention in an increasingly saturated marketplace; and

6. The movement toward the de-massification of business practices

Page 5: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Concepts and Applications • Data mining (DM)

A process that uses statistical, mathematical, artificial intelligence and machine-learning techniques to extract and identify useful information and subsequent knowledge from large databases

Page 6: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Concepts and Applications • Major characteristics and objectives of data

mining – Data are often buried deep within very large

databases, which sometimes contain data from several years; sometimes the data are cleansed and consolidated in a data warehouse

– The data mining environment is usually client/server architecture or a Web-based architecture

Page 7: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Concepts and Applications • Major characteristics and objectives of data

mining – Sophisticated new tools help to remove the

information ore buried in corporate files or archival public records; finding it involves massaging and synchronizing the data to get the right results.

– The miner is often an end user, empowered by data drills and other power query tools to ask ad hoc questions and obtain answers quickly, with little or no programming skill

Page 8: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Concepts and Applications • Major characteristics and objectives of data

mining – Striking it rich often involves finding an

unexpected result and requires end users to think creatively

– Data mining tools are readily combined with spreadsheets and other software development tools; the mined data can be analyzed and processed quickly and easily

– Parallel processing is sometimes used because of the large amounts of data and massive search efforts

Page 9: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Concepts and Applications • How data mining works

– Data mining tools find patterns in data and may even infer rules from them

– Three methods are used to identify patterns in data:1. Simple models

2. Intermediate models

3. Complex models

Page 10: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Concepts and Applications • Classification

Supervised induction used to analyze the historical data stored in a database and to automatically generate a model that can predict future behavior

• Common tools used for classification are:– Neural networks– Decision trees– If-then-else rules

Page 11: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Concepts and Applications • Clustering

Partitioning a database into segments in which the members of a segment share similar qualities

• Association

A category of data mining algorithm that establishes relationships about items that occur together in a given record

Page 12: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Concepts and Applications • Sequence discovery

The identification of associations over time

• Visualization can be used in conjunction with data mining to gain a clearer understanding of many underlying relationships

Page 13: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Concepts and Applications • Regression is a well-known statistical

technique that is used to map data to a prediction value

• Forecasting estimates future values based on patterns within large sets of data

Page 14: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Concepts and Applications • Hypothesis-driven data mining

Begins with a proposition by the user, who then seeks to validate the truthfulness of the proposition

• Discovery-driven data mining Finds patterns, associations, and relationships among the data in order to uncover facts that were previously unknown or not even contemplated by an organization

Page 15: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Concepts and Applications

– Marketing– Banking– Retailing and sales– Manufacturing and

production– Brokerage and

securities trading– Insurance

– Computer hardware and software

– Government and defense

– Airlines– Health care– Broadcasting – Police– Homeland security

Data mining applications

Page 16: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Techniques and Tools • Data mining tools and techniques can be

classified based on the structure of the data and the algorithms used:

– Statistical methods– Decision trees

Defined as a root followed by internal nodes. Each node (including root) is labeled with a question and arcs associated with each node cover all possible responses

Page 17: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Techniques and Tools • Data mining tools and techniques can be

classified based on the structure of the data and the algorithms used:

– Case-based reasoning – Neural computing– Intelligent agents– Genetic algorithms – Other tools

• Rule induction• Data visualization

Page 18: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Techniques and Tools • A general algorithm for building a decision

tree:1. Create a root node and select a splitting

attribute.

2. Add a branch to the root node for each split candidate value and label

3. Take the following iterative steps:a. Classify data by applying the split value.

b. If a stopping point is reached, then create leaf node and label it. Otherwise, build another subtree

Page 19: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Techniques and Tools • Gini index

Used in economics to measure the diversity of the population. The same concept can be used to determine the ‘purity’ of a specific class as a result of a decision to branch along a particular attribute/variable

Page 20: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Techniques and Tools

Page 21: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Techniques and Tools • The ID3 algorithm decision tree approach

– Entropy

Measures the extent of uncertainty or randomness in a data set. If all the data in a subset belong to just one class, then there is no uncertainty or randomness in that dataset, therefore the entropy is zero

Page 22: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Techniques and Tools • Cluster analysis for data mining

– Cluster analysis is an exploratory data analysis tool for solving classification problems

– The object is to sort cases into groups so that the degree of association is strong between members of the same cluster and weak between members of different clusters

Page 23: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Techniques and Tools • Cluster analysis results may be used to:

– Help identify a classification scheme – Suggest statistical models to describe

populations– Indicate rules for assigning new cases to

classes for identification, targeting, and diagnostic purposes

– Provide measures of definition, size, and change in what were previously broad concepts

– Find typical cases to represent classes

Page 24: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Techniques and Tools • Cluster analysis methods

– Statistical methods– Optimal methods– Neural networks– Fuzzy logic– Genetic algorithms

• Each of these methods generally works with one of two general method classes:

– Divisive– Agglomerative

Page 25: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Techniques and Tools • Hierarchical clustering method and example

1. Decide which data to record from the items

2. Calculate the distances between all initial clusters. Store the results in a distance matrix

3. Search through the distance matrix and find the two most similar clusters

4. Fuse those two clusters together to produce a cluster that has at least two items

5. Calculate the distances between this new cluster and all the other clusters

6. Repeat steps 3 to 5 until you have reached the prespecified maximum number of clusters

Page 26: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Techniques and Tools • Classes of data mining tools and techniques

as they relate to information and business intelligence (BI) technologies

– Mathematical and statistical analysis packages– Personalization tools for Web-based marketing– Analytics built into marketing platforms– Advanced CRM tools– Analytics added to other vertical industry-specific

platforms– Analytics added to database tools (e.g., OLAP)– Standalone data mining tools

Page 27: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Project Processes

Page 28: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Project Processes

Page 29: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Project Processes

• Knowledge discovery in databases (KDD)

A comprehensive process of using data mining methods to find useful information and patterns in data

Page 30: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Project Processes

• KDD process 1. Selection

2. Preprocessing

3. Transformation

4. Data mining

5. Interpretation/evaluation

Page 31: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Text Mining

• Text mining

Application of data mining to nonstructured or less structured text files. It entails the generation of meaningful numerical indices from the unstructured text and then processing these indices using various data mining algorithms

Page 32: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Text Mining

• Text mining helps organizations:– Find the “hidden” content of documents,

including additional useful relationships– Relate documents across previous unnoticed

divisions – Group documents by common themes

Page 33: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Text Mining

• Applications of text mining – Automatic detection of e-mail spam or

phishing through analysis of the document content

– Automatic processing of messages or e-mails to route a message to the most appropriate party to process that message

– Analysis of warranty claims, help desk calls/reports, and so on to identify the most common problems and relevant responses

Page 34: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Text Mining

• Applications of text mining – Analysis of related scientific publications in

journals to create an automated summary view of a particular discipline

– Creation of a “relationship view” of a document collection

– Qualitative analysis of documents to detect deception

Page 35: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Text Mining

• How to mine text 1. Eliminate commonly used words (stop-words)

2. Replace words with their stems or roots (stemming algorithms)

3. Consider synonyms and phrases

4. Calculate the weights of the remaining terms

Page 36: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Web Mining

• Web mining

The discovery and analysis of interesting and useful information from the Web, about the Web, and usually through Web-based tools

Page 37: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Project Processes

Page 38: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Web Mining

• Web content mining

The extraction of useful information from Web pages

• Web structure mining

The development of useful information from the links included in the Web documents

• Web usage mining

The extraction of useful information from the data being generated through webpage visits, transaction, etc.

Page 39: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Web Mining

• Uses for Web mining:– Determine the lifetime value of clients– Design cross-marketing strategies across

products– Evaluate promotional campaigns– Target electronic ads and coupons at user

groups– Predict user behavior– Present dynamic information to users

Page 40: Chapter 7 DATA, TEXT, AND WEB MINING. Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications

Data Mining Project Processes