g54dmt – data mining techniques and applications jqb/g54dmt jqb/g54dmt dr. jaume bacardit...

G54DMT – Data Mining Techniques and Applications

http://www.cs.nott.ac.uk/~jqb/G54DMT

Dr. Jaume [email protected]

Topic 3: Data MiningLecture 5: Regression and Association Rules

Some slides from chapter 5 of Data Mining. Concepts and Techniques by Han & Kamber

Outline of the lecture

• Regression– Definition– Representations

• Association rules– Definition– Methods

• Resources

Regression• Regression problems are supervised problems where the

output variable is continuous• Many techniques with different names are included in this

category– Regression– Function approximation– Modelling– Curve-fitting

• Given an input vector X and a corresponding output y, we want to find a function f such that y’=f(X) is as close as possible to the true y

Evaluating regression

• Supervised learning: we know the true outputs, so we check how different are from the predicted ones– Mean Absolute Error– Mean Squared Error– Root Mean Squared Error

Linear Regression• Most classic (and widespread in statistics)

type of regression• f(X) is modelled as

– y’=w0+w1x1+w2x2+…+wnxn

http://upload.wikimedia.org/wikipedia/en/thumb/1/13/Linear_regression.png/400px-Linear_regression.png

Linear regression

• Simple but limited in expression power– The same model would apply to these four

datasets

http://en.wikipedia.org/wiki/Anscombe%27s_quartet

Linear regression

• How to find W?– Many mathematical methods available

• Least squares• Ridge regression• Lasso• Etc

– We can also use some kind of metaheuristic (e.g. a Genetic Algorithm)

Polynomial regression

• More complex and sophisticated functions– y=w0+w1x+w2x2+…..

– Y=w0+w1x1+w2x2+w3x1x2+…

• Now the job is double– Choosing the correct function (human inspection

may help)– Adjusting the weights of the model

• Still, would a single mathematical function fit any type of data?

Piece-wise regression

• Input space is partitioned in regions• A local regression model is generated from

the training examples that fell inside each region– Approximating a sine function with linear

regressions

(Butz, 2010)

Piece-wise regression

• How to partition the input space– Using a series of rules

• With a (hyper)rectangular condition (XCSF)• With a (hyper)ellipsoidal condition (XCSF,LWPR)• With a neural condition (XCSF)

– Using a tree-like structure (CART, M5)

• How to perform the regression process for each local approximation– Pick any of the functions discussed before– Plus some truly non-linear methods (SVR)

Piece-wise approximation with hyperellipsoids

• Using XCSF (Wilson, 02) with hyperellipsoid conditions (Butz et al, 08)

Test functionXCSF’s population

(Stalph et al, 2010)

Other regression methods

• Neural networks– A MLP is natively a regression method

• Classification is done by discretising the output of the network

– It is proven that a MLP with enough hidden nodes can approximate any function

• Support Vector Regression– As in SVM, depending on the kernel we got linear or non-

linear regression– The margin specifies a tube around the approximated

function. All points inside the tube have their errors ignored– Support Vectors are the points that lay outside the tube

Association Rules• Association rules try to find frequent patterns in the

dataset that appear together• It can use the class label but it does not have to

we can consider it an unsupervised learning paradigm

• Two types of elements being generated– Association rules: They have antecedent and consequent– Frequent itemsets: They just have an antecedent.

• Both antecedent and consequent are logic predicates (generally of conjunctive form)

Association rules mining

Witten and Frank, 2005 (http://www.cs.waikato.ac.nz/~eibe/Slides2edRev2.zip)

Origin of Association Rules

• These methods were originally employed to analyse shopping carts

• Database is specified as a set of transactions. Each of them includes one or more of a set of items

• An frequent itemset is a set of items that appears in many transactions

• These databases are extremely sparse Tid Items

10 A, C, D

20 B, C, E

30 A, B, C, E

40 B, E

Beers and diapers

• An urban myth about association rules says that when applied to analyze a very large volume of shopping carts they discovered a very simple pattern– “Customers that buy beer also tend to buy diapers”

• This story has changed through time. You can find an article about it here

• It is a good example of data mining, as it was able to find an unexpected pattern

Why Is Freq. Pattern Mining Important?

• Discloses an intrinsic and important property of data sets• Forms the foundation for many essential data mining tasks

– Association, correlation, and causality analysis– Sequential, structural (e.g., sub-graph) patterns– Pattern analysis in spatiotemporal, multimedia, time-series,

and stream data – Classification: associative classification– Cluster analysis: frequent pattern-based clustering– Data warehousing: iceberg cube and cube-gradient – Semantic data compression: fascicles– Broad applications

Evaluation of association rules• Support

– Percentage of examples covered by the predicate in the antecedent

– Applies to both association rules and frequent itemsets

• Confidence– Percentage of the examples matched by the antecedent for

which also match the consequent– Only apply to association rules

• Typically, the user specifies a minimum support and confidence and the algorithm finds all rules above the thresholds

Scalable Methods for Mining Frequent Patterns

• The downward closure property of frequent patterns– Any subset of a frequent itemset must be frequent– If {beer, diaper, nuts} is frequent, so is {beer, diaper}– i.e., every transaction having {beer, diaper, nuts} also

contains {beer, diaper} • Scalable mining methods: Three major approaches

– Apriori (Agrawal & Srikant@VLDB’94)– Freq. pattern growth (FPgrowth—Han, Pei & Yin

@SIGMOD’00)– Vertical data format approach (Charm—Zaki & Hsiao

@SDM’02)

Apriori: A Candidate Generation-and-Test Approach

• Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated/tested! (Agrawal & Srikant @VLDB’94, Mannila, et al. @ KDD’ 94)

• Method:

– Initially, scan DB once to get frequent 1-itemset

– Generate length (k+1) candidate itemsets from length k frequent itemsets

– Test the candidates against DB

– Terminate when no frequent or candidate set can be generated

The Apriori Algorithm—An Example

Database TDB

1st scan

C1L1

L2

C2 C2

2nd scan

C3 L33rd scan

Tid Items

10 A, C, D

20 B, C, E

30 A, B, C, E

40 B, E

Itemset sup

{A} 2

{B} 3

{C} 3

{D} 1

{E} 3

Itemset sup

{A} 2

{B} 3

{C} 3

{E} 3

Itemset

{A, B}

{A, C}

{A, E}

{B, C}

{B, E}

{C, E}

Itemset sup{A, B} 1{A, C} 2{A, E} 1{B, C} 2{B, E} 3{C, E} 2

Itemset sup{A, C} 2{B, C} 2{B, E} 3{C, E} 2

Itemset

{B, C, E}

Itemset sup

{B, C, E} 2

Supmin = 2

The Apriori Algorithm

• Pseudo-code:Ck: Candidate itemset of size kLk : frequent itemset of size k

L1 = {frequent items};for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do

increment the count of all candidates in Ck+1 that are contained in t

Lk+1 = candidates in Ck+1 with min_support endreturn k Lk;

Resources• “The Elements of Statistical Learning” by Hastie et al.

contains a lot of detail about statistical regression• List of Regression and association rules methods in

KEEL• Weka also contains both kind of methods• Chapter 5 of the Han and Kamber book is all about

association rules (Han created the Fpgrowth method)• Review of evolutionary algorithms for association

rule mining

Questions?

g54dmt – data mining techniques and applications jqb/g54dmt jqb/g54dmt dr. jaume bacardit...

Documents