our new progress on frequent/sequential pattern mining

Post on 26-Jan-2016

42 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Our New Progress on Frequent/Sequential Pattern Mining. We develop new frequent/sequential pattern mining methods Performance study on both synthetic and real data sets shows that our methods outperform conventional ones in wide margins. Mining Complete Set of Frequent Patterns on T10I4D100k. - PowerPoint PPT Presentation

TRANSCRIPT

Our New Progress on Frequent/Sequential Pattern Mining

We develop new frequent/sequential pattern mining methods

Performance study on both synthetic and real data sets shows that our methods outperform conventional ones in wide margins

Our newmethods

Conventionalmethods

Frequent patternmining

FP-growth Apriori, TreeProjection

Sequential patternmining

PrefixSpan,FreeSpan

GSP

Frequent closedpattern mining

CLOSET A-close, CHARM

Mining Complete Set of Frequent Patterns on T10I4D100k

0

20

40

60

80

100

120

140

0.00% 0.05% 0.10% 0.15%

Support threshold

Ru

nti

me (

seco

nd

)

Apriori

TreeProjection

FP-growth

Mining Complete Set of Frequent Patterns on T25I20D100k

0

20

40

60

80

100

120

140

160

180

200

0.00% 0.50% 1.00% 1.50%

Support threshold

Ru

nti

me (

seco

nd

)

Apriori

TreeProjection

FP-growth

Mining Complete Set of Frequent Patterns on Connect-4

0

50

100

150

200

250

300

350

400

70% 75% 80% 85% 90% 95%

Support threshold

Ru

nti

me (

seco

nd

) Apriori

TreeProjection

FP-growth

Mining Sequential Patterns on C10T4S16I4

0

100

200

300

400

500

600

700

800

0.00% 0.50% 1.00% 1.50% 2.00%

Support threshold

Ru

n t

ime (

seco

nd

)

PrefixSpan-1

PrefixSpan-2

GSP

FreeSpan-2

Mining Sequential Patterns on C10T8S8I8

0

20

40

60

80

100

120

140

160

180

200

0.00% 0.50% 1.00% 1.50% 2.00%

Support threshold

Ru

n t

ime (

seco

nd

)

PrefixSpan-1

PrefixSpan-2

GSP

FreeSpan-2

Scalability of Mining Sequential Patterns on C10-100T8S8I8

0

100

200

300

400

500

600

700

800

0 20000 40000 60000 80000 100000

Number of sequences

Ru

n t

ime

(s

ec

on

d)

PrefixSpan-1

PrefixSpan-2

GSP

FreeSpan-2

Scalability of Mining Sequential Patterns on C10-100T4S16I4

0

200

400

600

800

1000

1200

1400

1600

0 20000 40000 60000 80000 100000

Number of sequences

Ru

n t

ime

(s

ec

on

d)

PrefixSpan-1

PrefixSpan-2

GSP

FreeSpan-2

Why Prefix Is Faster Than GSP?

0.001

0.01

0.1

1

10

100

0.00% 0.50% 1.00% 1.50% 2.00%

Support threshold

# cand/pattern inGSP

Runtime/proj. db inPrefixSpan

0.001

0.01

0.1

1

10

100

0.00% 0.50% 1.00% 1.50% 2.00%

Support threshold

# cand/pattern inGSP

Runtime/proj. db inPrefixSpan

Dataset C10T4S16I4 Dataset C10T8S8I8

Mining Frequent Closed Itemsets on T25I20D100k

0

20

40

60

80

100

0.7% 0.9% 1.1% 1.3% 1.5%

Support threshold

Ru

nti

me (

seco

nd

)

A-CLOSE

CLOSET

ChARM

Mining Frequent Closed Itemsets on Connect-4

1

10

100

1000

10000

40% 50% 60% 70% 80% 90% 100%

Support threshold

Ru

nti

me (

seco

nd

) A-CLOSE

CLOSET

ChARM

Mining Frequent Closed Itemsets on Pumsb

0

50

100

150

200

250

300

75% 80% 85% 90% 95%

Support threshold

Ru

nti

me (

seco

nd

) A-CLOSE

CLOSET

ChARM

References R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for

generation of frequent itemsets. In Journal of Parallel and Distributed Computing (Special Issue on High Performance Data Mining), (to appear), 2000.

R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases, pages 487--499, Santiago, Chile, September 1994.

J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu. FreeSpan: Frequent pattern-projected sequential pattern mining. In Proc. KDD'2000, Boston, August 2000.

J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation, Proc. SIGMOD’2000, Dallas, TX, May 2000.

J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth, submitted for publication

R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. 5th Int. Conf. Extending Database Technology (EDBT), pages 3--17, Avignon, France, March 1996.

N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Proc. ICDT’99, Israel, January 1999.

M.J. Zaki and C. Hsiao. ChARM: An efficient algorithm for closed association rule mining. In Proc. KDD'2000, Boston, August 2000.

DBMiner Version 2.5 (Beta)

DBMiner Technology Inc.B.C. Canada

What we had for DBMiner 2.0…

Association module on data cubes Classification module on data cubes Clustering module on data cubes OLAP browser 3D Cube browser

What we will do in DBMiner 2.5…

Keep the existing association module and classification module in version 2.0

Change the existing clustering module Add new visual classification module

both on SQL server and OLAP Add new sequential pattern modules

on SQL server using FP algorithm

What we have done…

We have incorporated the existing association module and added OLAP browser Module

We have added the visual classification module

We have changed the existing clustering module

We have added the sequential pattern module

We are still in the development stage

Association module on data cubes

New sequential pattern module on SQL Server

New visual classification module on data cubes

New clustering module on data cubes

top related