bryan lahartinger. “the apriori algorithm is a fundamental correlation-based data mining...

13
An Architecture for Efficient Hardware Data Mining using Reconfigurable Computing Systems Bryan Lahartinger

Upload: annis-marsh

Post on 17-Dec-2015

227 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm

An Architecture for Efficient Hardware Data Mining using Reconfigurable Computing Systems

Bryan Lahartinger

Page 2: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm

“The Apriori algorithm is a fundamental correlation-based data mining [technique]”

“Software implementations of the Aprioiri algorithm utilize…methods...for the support and candidate generation operations”

“This paper demonstrates an efficient structure for computing the support of a set of candidates.”

“…though the combination of Content-Addessable-Memories (CAM)”

“As far as we know, the Aprioiri algorithm has not been studied in any significant way for hardware implementation.”

Objective Investigation

Page 3: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm

To exploit parallelism in hardware to accelerate a bottleneck in the Apriori algorithm with applications specifically to data mining.

What is the Aprioiri algorithm?

What is the bottleneck?

How does hardware acceleration fit into the picture?

Objective

Page 4: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm

• Background• Apriori Algorithm• Apriori bottleneck• Bitmapped CAM

• Implementing Bitmap CAM

• Analysis of the Approach

• Results of software comparisons

• Conclusions

Paper Overview

Page 5: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm

Given transactions consisting of sets:

 {1,2,3,4}, {2,3,4}, {2,3}, {1,2,4}, {1,2,3,4}, and {2,4}

Apriori

Item Support

1 3

2 6

3 4

4 5

Item Support

{1,2} 3

{1,3} 2

{1,4} 3

{2,3} 4

{2,4} 5

{3,4} 3

Item Support

{1,2,4} 3

{2,3,4} 3

Page 6: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm

• Each candidate can be addressedto a row of bits

• Each column represents if a candidate is included in the CAM entry as a candidate

• Column bits can be summed toform the number of matchingcandidates

Bitmapped CAM

Page 7: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm

• Large LUT in memory

Candidate 249 is frequently associated with candidates 1-11 but not 12…

ImplementedCAM Bitmap

Page 8: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm

• They varied the number of CAM elements to candidates • Max CAM blocks of 32

• 32 Blocks fit most cases

• When they didn’t…• Solution:• Stop adding candidates to

the block when full [why?]

Analysis of the Approach

Page 9: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm

• VHDL architecture req only 10 cycles per CAM stage (Xilinx 7.2 on Viritex II)

• Max clock rate 120MHz

• Used standard datasets

• Compared software from only 1 hardware platform

• Used half logic cells per candidates compared to USC FCCM05 (Half FPGA Area?)

Results

Page 10: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm

• CAM = awesome VS software = sucks• Allows similarities between candidates to be utilized

• Their previous paper on systolic array architecture of Apriori Algoin hardware would work even better with this improvement

• An ideal architecture will be constructed/tested with both arch’s combined

Conclusions

Page 11: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm

Pros

• Intro was unclear at first i.e. NOT about Apriori, but more general applications

• Reasonable explanation of Apriori and CAM

Criticisms

Page 12: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm

Cons

• No VHDL implementation details – “highly pipelined”, that’s it…for real

• Software only tested on one hardware platform – 2.8Ghz Xeon 3Gb ram

Page 13: Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm

• Bad analysis of their methodology• Hard to follow• Unclear how to reproduce

• Unclear results Questionable standard datesets• 120Mhz??? 10 cycles/CAM stage?????