mil mapld2005/p249 an fpga co-processor for statistical pattern recognition applications jason...

46
MI MI L L MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College of Engineering Department of Electrical and Computer Engineering

Upload: carmel-fitzgerald

Post on 28-Dec-2015

223 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

MAPLD2005/P249

An FPGA Co-Processor for Statistical Pattern Recognition Applications

Jason Isaacs and Simon Y. Foo

Machine Intelligence Laboratory

FAMU-FSU College of Engineering

Department of Electrical and Computer Engineering

Page 4: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 4 MAPLD2005/P249

Pattern Recognition Overview

Pattern Recognition: “the act of taking raw data and taking an action based on the category of the pattern.”

Common Applications: speech recognition, fingerprint identification (biometrics), DNA sequence identification

Related Terminology: Machine Learning: The ability of a machine to improve its

performance based on previous results. Machine Understanding: acting on the intentions of the user

generating the data. Related Fields: artificial intelligence, signal processing and

discipline-specific research (e.g., target recognition, speech recognition, natural language processing).

Page 5: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 5 MAPLD2005/P249

Train Classifier

Choose Model

Choose Features

Evaluate Classifier

End

Collect Data

Start

Design Flow

Key issues: “There is no data like more

data.” Perceptually-meaningful

features? How do we find the best model? How do we estimate parameters? How do we evaluate

performance?

Page 6: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 6 MAPLD2005/P249

Common Misconceptions

I got 100% accuracy on... Almost any algorithm works some of the time, but few

real-world problems have ever been completely solved. Training on the evaluation data is forbidden. Once you use evaluation data, you should discard it. My algorithm is better because... Statistical significance and experimental design play a big

role in determining the validity of a result. There is always some probability a random choice of an

algorithm will produce a better result.

Page 7: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 7 MAPLD2005/P249

URL

Dual P4 - XPGigabit

Ethernet

Spider

<…jpg>View Source

32/64 bit PCIAnalyze and Classify Store Original

Image and Class Vector

System Layout

Page 8: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 8 MAPLD2005/P249

URL List

Spider (Webbot)

HTMLDownload

WEB

Text SearchHyperlinks

Image

Video

Audio

Text Content Classifier Image

Classifier

Video Classifier

Audio Classifier

URL Feature Vector

URL

Classification System

Current research focused on RED path

Page 9: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 9 MAPLD2005/P249

Image Database: Web-Mining for Images

• Images are an important class of data. • The Web is presently regarded as the largest global multimedia data repository,

encompassing different types of images in addition to other multimedia data types. • To search the web for images, a crawler (also called a spider, mobile agent, or bot) is

utilized.

src="home_page/images/rover_spin.jpg" alt="&quot; width="124" height="70"></a><a

href="images/home_page/pgt_in_use.jpg"><img src="images/home_page/pgt_in_use_small.jpg"

• The agent searches HTML documents for strings of type jpg, gif, and tif, stores the image and url.

Page 10: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 10 MAPLD2005/P249

Web Mining Example: Software Process

[root@Nebula getURL]# ./getImagesEnter URL: eng.fsu.edu./getURL http://www.eng.fsu.edu > out.txt images/index_01.jpgimages/index_02_new_2.jpgimages/index_03.jpgimages/index_04.jpgimages/index_05.jpgimages/index_06.jpgimages/index_07.jpgimages/index_08_new.jpgimages/index_01.jpg length: 19

./getURL http://www.eng.fsu.edu/images/index_01.jpg > images/engA.jpg

images/index_02_new_2.jpg length: 25./getURL

http://www.eng.fsu.edu/images/index_02_new_2.jpg > images/engB.jpg

images/index_03.jpg length: 19./getURL http://www.eng.fsu.edu/images/index_03.jpg >

images/engC.jpgimages/index_04.jpg length: 19./getURL http://www.eng.fsu.edu/images/index_04.jpg >

images/engD.jpgimages/index_05.jpg length: 19./getURL http://www.eng.fsu.edu/images/index_05.jpg >

images/engE.jpgimages/index_06.jpg length: 19./getURL http://www.eng.fsu.edu/images/index_06.jpg >

images/engF.jpgimages/index_07.jpg length: 19./getURL http://www.eng.fsu.edu/images/index_07.jpg >

images/engG.jpgimages/index_08_new.jpg length: 23./getURL

http://www.eng.fsu.edu/images/index_08_new.jpg > images/engH.jpg

Page 11: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 11 MAPLD2005/P249

Web Mining Example Images Example results from our

“getImages” software are shown to the right

These are from the news.bbc.co.uk website (more interesting than the ones from our engineering site)

Can prove useful when looking for faces or particular objects, such as the space shuttle

We are able to search either a particular group of sites, randomly search all known sites (not limited to US or Western Europe) , or search all pages within a certain domain, say nytimes.com

Page 12: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 12 MAPLD2005/P249

Example Image Objects

These are sample objects that could be the target objects of a specific search. These particular objects are from the COIL database.

They are used to train the analysis system

Page 13: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 13 MAPLD2005/P249

Image AnalysisImplementation Model for Image Recognition

Stored Patterns

SIGNAL PREPROCESING

FEATURE EXTRACTION

PATTERN RECOGNITION

MATCHED VECTOR

Observed input, RGB image X

Recognized Image

X* Y

W* W

Feature Extraction is the process of determining a vector Y

that represents an observed input X that enables accurate implementation of pattern recognition schemes. For this process, a mapping takes place such that X* is mapped to a vector Y.

nyyyY ,...,, 21

Page 14: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 14 MAPLD2005/P249

5x5 Scaled Spatial FiltersUsed for Feature Extraction

% Gabor Filter 1gabor1 = [-16 -19 -20 -19 -16;... -36 -43 -46 -43 -36;... 0 0 0 0 0;... 36 43 46 43 36;... 16 19 20 19 16];gaborDiv = 1/1000;

mask = zeros(5,5,1);mask(:,:,1) = gabor1;maskDiv = [gaborDiv];

Page 15: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 15 MAPLD2005/P249

Wavelet Review

waveletmothertheistwhere "")(dttfbaW abt )()(),(

Wavelet Transform:

The Wavelet Transform has variable window lengths that allow it greater flexibility when analyzing signals. Therefore, it becomes an attractive tool for signal analysis.

Scale1

Scale2

Page 19: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 19 MAPLD2005/P249

The Spectral Histogram Representation

Properties A spectral histogram is translation invariant. A spectral histogram is a nonlinear operator. With sufficient filters, a spectral histogram can uniquely represent any

image up to a translation. All the images sharing a spectral histogram define an equivalence class.

Preprocessing step in classification Choose N image filter kernels to convolve with the image. Perform the convolutions, generating n resultant responses. For each response, generate a response image histogram. Concatenate each of the histograms and send to the classifier.

Page 20: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 20 MAPLD2005/P249

The Spectral Histogram Representation

1st step – choose N image filter kernels to convolve with the image. Filter kernels chosen carefully from several image filter banks including intensity:

δ(x,y), differencing or gradient filters, laplacian of gaussian filters:

Where t determines the scale of the filter, and finally the gabor filter defined by sine and cosine components:

2nd step – perform the convolutions, generating n resultant responses. To calculate each response pixel value, roughly m x n multiplies and adds must be

performed, where m x n is the dimension of chosen kernel. Here m = n. Thus for an M x N image a total of [k*M*N*(n)4]multiplies and adds must be

performed, where subscript k implies the kth filter.

Page 21: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 21 MAPLD2005/P249

Feature Vector

Our feature vector is comprised of the spectral histograms of the images resulting from filtering

The feature vector is laid out as follows

Gabor Features | Haar Features | LoG Features| Wavelet Features

Page 22: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 22 MAPLD2005/P249

Pattern Recognition:Neural Decision Tree

After the feature vectors have been created they are sent back to the host PC and tested against a Neural Decision Tree to determine the presence of selected objects or textures, e.g. faces, cars, or brick.

Page 23: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 23 MAPLD2005/P249

Artificial Neural Network Model

Each node in the tree is comprised of an artificial neural network that is trained to separate the input into k classes. As the tree is traversed the leaf nodes represent objects or textures of interest.

Feedforward Neural Network Model

x0

x80

input

hidden

.

.

.

Y0

.

.

.

Yk

S0

S7

output

i j k

Feature Vector

Number of Branches at

Node n

.

.

.

Page 24: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 24 MAPLD2005/P249

Other Pattern Recognition Techniques

Density Estimation Histogram Approach Parzen-window method Kn-Nearest-Neighbor Estimation

Principal Components Analysis Fisher Linear Discriminant MDA

Our future work aims at creating a library of generic modules implementing all of these discrimination techniques. These methods were supposed to have been completed prior to this submission but have been delayed.

Page 25: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 25 MAPLD2005/P249

Summary of These Techniques

Kn-Nearest-Neighbor Estimation To estimate p(x) from n training samples, we center a cell about x and let it grow

until it captures kn samples, where kn is some specified function of n.

These samples are the kn nearest-neighbors of x.

If the density is high near x, the cell will be relatively small Therefore, good resolution.

Component Analysis and Discriminants How to reduce excessive dimensionality? Answer: Combine features. Linear methods project high-dimensional data onto lower dimensional space. Principal Components Analysis (PCA) - seeks the projection which best represents

the data in a least-square sense. Fisher Linear Discriminant - seeks the projection that best separates the data in a

least-square sense

Page 26: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 26 MAPLD2005/P249

Summary of These Techniques Continued

Generalized Linear Discriminant Functions The linear discriminant function g(x) can be written as

By adding d(d+1)/2 additional terms involving the products of pairs of components of x, we obtain the quadratic discriminant function

The separating surface defined by g(x)=0 is a second-degree or hyperquadric surface.

By continuing to add terms such as we can obtain the class of polynomial discriminant functions.

01

( )d

i ii

g w w x

x

01 1 1

( ) .d d d

i i ij i ji i j

g w w x w x x

x

ijk i j kw x x x

Page 27: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 27 MAPLD2005/P249

So, Why Move to Hardware?

Speed of classification is limited in software and with such a large database (Web), the faster the better.

For example, given a 128x128 8-bit gray scale image, the number of computations required to generate the spectral histogram for 10 5x5 filters is roughly 410k multiplies and 410k adds.

This is the main computational bottleneck. A general purpose -processor can only perform one or two

multiply/adds simultaneously (depending on the processor) Some FPGAs allow for up to 88 simultaneous multiply operations and

many adds to be performed in one or two clock cycles. The filtering algorithm is inherently parallelizable, therefore well

suited for a pipelined hardware implementation.

Page 28: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 28 MAPLD2005/P249

Target Hardware:

Avnet’s Virtex II Pro Board

Uses Virtex II Pro XC2VP20 Many Options for I/O. 32 Bit PCI Bus has Data Throughput of Over 100 MB per Second.

Page 29: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 29 MAPLD2005/P249

Hardware vs. Software Tradeoffs

Not all tasks have such a drastic speedup in hardware. Memory Accesses

Only one address per clock cycle can be read in SDRAM, Flash, or SRAM.

We require more than 32-bits per action, so we waist time reading data.

Possible to store more data in BRAM to create an initial data stack that would overcome future read times.

Combine hardware and software for optimal ease of design and speed of execution. Need to determine optimal compromise.

Page 31: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 31 MAPLD2005/P249

11x11 Filter Model Top Level

This 4 11x11 Filter bank design was the first test design. We felt that an 11x11 kernel would allow for the best representation of our Filter bank set.

Page 34: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 34 MAPLD2005/P249

Filter Model: Filter MAC System

A down sampler reduces the capture register sample period to the output sample period. The block is configured with latency to obtain the most efficient hardware implementation. The down sampling rate is equal to the coefficient array length.

An addressable shift register (ASR) implements the input delay buffer. The address port runs n times faster than the data port, where n is the number of filter taps. The filter coefficients are stored in a ROM configured to use block memory.

A comparator generates the reset and enable pulse for the accumulator and capture register. The pulse is asserted when the address is 0 and is delayed to account for pipeline stages.

Page 35: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 35 MAPLD2005/P249

Device Utilization Summary:Four 11x11 Image Filters

Selected Device : 2vp20ff896-6

Number of Slices: 7913 out of 9280 85% Number of Slice Flip Flops: 10644 out of 18560 57% Number of 4 input LUTs: 8770 out of 18560 47% Number of bonded IOBs: 67 out of 556 12% Number of GCLKs: 1 out of 16 6% ============================================= TIMING REPORT

Clock Information: -----------------------------------+------------------------+-------+ Clock Signal | Clock buffer(FF name) | Load | -----------------------------------+------------------------+-------+ clk | BUFGP | 15322 | -----------------------------------+------------------------+-------+ Timing Summary: --------------- Speed Grade: -6 Minimum period: 4.542ns (Maximum Frequency: 220.192MHz) Minimum input arrival time before clock: 3.006ns Maximum output required time after clock: 3.615ns Maximum combinational path delay: No path found

The4 11x11 Filter bank design device utilization left little room for other logic our target device. Since, we felt that an 11x11 kernel would allow for the best representation of our Filter bank set we decided to target additional devices to leave our options open.

Page 36: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 36 MAPLD2005/P249

Device Utilization Summary:Six 11x11 Image Filters with New Target

Selected Device : 4vsx55ff1148-11   Number of Slices: 9543 out of 24576 38% Number of Slice Flip Flops: 11616 out of 49152 23% Number of 4 input LUTs: 9816 out of 49152 19% Number of bonded IOBs: 99 out of 642 15% Number of GCLKs: 1 out of 32 3% Number of DSP48s: 66 out of 512 12% ============================================== TIMING REPORT Clock Information: -----------------------------------+------------------------+-------+ Clock Signal | Clock buffer(FF name) | Load | -----------------------------------+------------------------+-------+ clk | BUFGP | 18732 | -----------------------------------+------------------------+-------+ Timing Summary: --------------- Speed Grade: -11   Minimum period: 6.632ns (Maximum Frequency: 150.790MHz) Minimum input arrival time before clock: 3.217ns Maximum output required time after clock: 3.546ns Maximum combinational path delay: No path found

This 6 11x11 Filter bank design device utilization left more room for other logic our new target device. However, we did not possess this device and therefore had to consider our in house options. Thus, we moved toward a more V2P20 friendly design.

Page 38: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 38 MAPLD2005/P249

Device Utilization Summary: 5x5 with 10 Histograms

Selected Device : 2vp20ff896-6

Number of Slices: 8775 out of 9280 94% Number of Slice Flip Flops: 10768 out of 18560 58% Number of 4 input LUTs: 10274 out of 18560 55% Number of bonded IOBs: 343 out of 556 61% Number of MULT18X18s: 50 out of 88 56% Number of GCLKs: 1 out of 16 6% =============================================== TIMING REPORT

Clock Information: -----------------------------------+------------------------+-------+ Clock Signal | Clock buffer(FF name) | Load | -----------------------------------+------------------------+-------+ clk | BUFGP | 16755 | -----------------------------------+------------------------+-------+ Timing Summary: --------------- Speed Grade: -6 Minimum period: 4.758ns (Maximum Frequency: 210.172MHz) Minimum input arrival time before clock: 2.987ns Maximum output required time after clock: 6.322ns Maximum combinational path delay: No path found

Note that a pipelined implementation without explicit use of the embedded multipliers exceeds the number of slices at 108%.

Page 42: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 42 MAPLD2005/P249

Mcode Block for Histogram Bin-Sorter

function [bin10,bin9,bin8,bin7,bin6,bin5,bin4,bin3,bin2,bin1] = xhist(input1)

bin10 = 0;bin9 = 0;bin8 = 0;bin7 = 0;bin6 = 0; bin5 = 0;bin4 = 0;bin3 = 0;bin2 = 0;bin1 = 0; if input1 >= 224; bin10 = 1; elseif input1 >=180; bin9 = 1; elseif input1 >=158; bin8 = 1; elseif input1 >=136; bin7 = 1; elseif input1 >=114; bin6 = 1; elseif input1 >=92; bin5 = 1; elseif input1 >=70; bin4 = 1; elseif input1 >=48; bin3 = 1; elseif input1 >=26; bin2 = 1; else bin1 = 1; end;

Page 43: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 43 MAPLD2005/P249

ModelSim Waveform Snapshot

Histogram results for Gabor Filter 2 with Bin Ranges shown on the previous slide. Also, note that there is a 16 clock cycle delay before the bin sort result is posted.

Page 46: MIL MAPLD2005/P249 An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory

MIMILL

Isaacs 248 46 MAPLD2005/P249

Conclusions/Future Work

In addition to the other pattern recognition techniques mentioned above, we intend optimize the PC/FPGA interfacing to create our own low-cost integrated system. Our problems currently reside on the PCI interface design shipped with the Avnet

Development Board. We are working hard to resolve this issue, but in the end we may have to consider another board.

We also wish to time the results (how many images can we process per second); is it real-time?

Possibly move to a board with better interfacing tools, as well as faster interfacing via PCI-X or PCI express, or DMA capabilities.

Finally, optimize calculating efficiency of the image analysis algorithm, i.e., consider a multi-stage pipeline with more efficient memory access algorithms.

The ultimate goal is to do real time search and recognition utilizing FPGAs as co-processors.