mil mapld2005/p249 an fpga co-processor for statistical pattern recognition applications jason...
Post on 28-Dec-2015
223 Views
Preview:
TRANSCRIPT
MIMILL
MAPLD2005/P249
An FPGA Co-Processor for Statistical Pattern Recognition Applications
Jason Isaacs and Simon Y. Foo
Machine Intelligence Laboratory
FAMU-FSU College of Engineering
Department of Electrical and Computer Engineering
MIMILL
Isaacs 248 2 MAPLD2005/P249
Project Goal
To develop and implement a real-time image content analysis system using an FPGA Co-processor.
MIMILL
Isaacs 248 3 MAPLD2005/P249
Outline
Pattern Recognition Image Database System Layout Image Content Analysis Hardware Implementation Conclusions Future Work
MIMILL
Isaacs 248 4 MAPLD2005/P249
Pattern Recognition Overview
Pattern Recognition: “the act of taking raw data and taking an action based on the category of the pattern.”
Common Applications: speech recognition, fingerprint identification (biometrics), DNA sequence identification
Related Terminology: Machine Learning: The ability of a machine to improve its
performance based on previous results. Machine Understanding: acting on the intentions of the user
generating the data. Related Fields: artificial intelligence, signal processing and
discipline-specific research (e.g., target recognition, speech recognition, natural language processing).
MIMILL
Isaacs 248 5 MAPLD2005/P249
Train Classifier
Choose Model
Choose Features
Evaluate Classifier
End
Collect Data
Start
Design Flow
Key issues: “There is no data like more
data.” Perceptually-meaningful
features? How do we find the best model? How do we estimate parameters? How do we evaluate
performance?
MIMILL
Isaacs 248 6 MAPLD2005/P249
Common Misconceptions
I got 100% accuracy on... Almost any algorithm works some of the time, but few
real-world problems have ever been completely solved. Training on the evaluation data is forbidden. Once you use evaluation data, you should discard it. My algorithm is better because... Statistical significance and experimental design play a big
role in determining the validity of a result. There is always some probability a random choice of an
algorithm will produce a better result.
MIMILL
Isaacs 248 7 MAPLD2005/P249
URL
Dual P4 - XPGigabit
Ethernet
Spider
<…jpg>View Source
32/64 bit PCIAnalyze and Classify Store Original
Image and Class Vector
System Layout
MIMILL
Isaacs 248 8 MAPLD2005/P249
URL List
Spider (Webbot)
HTMLDownload
WEB
Text SearchHyperlinks
Image
Video
Audio
Text Content Classifier Image
Classifier
Video Classifier
Audio Classifier
URL Feature Vector
URL
Classification System
Current research focused on RED path
MIMILL
Isaacs 248 9 MAPLD2005/P249
Image Database: Web-Mining for Images
• Images are an important class of data. • The Web is presently regarded as the largest global multimedia data repository,
encompassing different types of images in addition to other multimedia data types. • To search the web for images, a crawler (also called a spider, mobile agent, or bot) is
utilized.
src="home_page/images/rover_spin.jpg" alt="" width="124" height="70"></a><a
href="images/home_page/pgt_in_use.jpg"><img src="images/home_page/pgt_in_use_small.jpg"
• The agent searches HTML documents for strings of type jpg, gif, and tif, stores the image and url.
MIMILL
Isaacs 248 10 MAPLD2005/P249
Web Mining Example: Software Process
[root@Nebula getURL]# ./getImagesEnter URL: eng.fsu.edu./getURL http://www.eng.fsu.edu > out.txt images/index_01.jpgimages/index_02_new_2.jpgimages/index_03.jpgimages/index_04.jpgimages/index_05.jpgimages/index_06.jpgimages/index_07.jpgimages/index_08_new.jpgimages/index_01.jpg length: 19
./getURL http://www.eng.fsu.edu/images/index_01.jpg > images/engA.jpg
images/index_02_new_2.jpg length: 25./getURL
http://www.eng.fsu.edu/images/index_02_new_2.jpg > images/engB.jpg
images/index_03.jpg length: 19./getURL http://www.eng.fsu.edu/images/index_03.jpg >
images/engC.jpgimages/index_04.jpg length: 19./getURL http://www.eng.fsu.edu/images/index_04.jpg >
images/engD.jpgimages/index_05.jpg length: 19./getURL http://www.eng.fsu.edu/images/index_05.jpg >
images/engE.jpgimages/index_06.jpg length: 19./getURL http://www.eng.fsu.edu/images/index_06.jpg >
images/engF.jpgimages/index_07.jpg length: 19./getURL http://www.eng.fsu.edu/images/index_07.jpg >
images/engG.jpgimages/index_08_new.jpg length: 23./getURL
http://www.eng.fsu.edu/images/index_08_new.jpg > images/engH.jpg
MIMILL
Isaacs 248 11 MAPLD2005/P249
Web Mining Example Images Example results from our
“getImages” software are shown to the right
These are from the news.bbc.co.uk website (more interesting than the ones from our engineering site)
Can prove useful when looking for faces or particular objects, such as the space shuttle
We are able to search either a particular group of sites, randomly search all known sites (not limited to US or Western Europe) , or search all pages within a certain domain, say nytimes.com
MIMILL
Isaacs 248 12 MAPLD2005/P249
Example Image Objects
These are sample objects that could be the target objects of a specific search. These particular objects are from the COIL database.
They are used to train the analysis system
MIMILL
Isaacs 248 13 MAPLD2005/P249
Image AnalysisImplementation Model for Image Recognition
Stored Patterns
SIGNAL PREPROCESING
FEATURE EXTRACTION
PATTERN RECOGNITION
MATCHED VECTOR
Observed input, RGB image X
Recognized Image
X* Y
W* W
Feature Extraction is the process of determining a vector Y
that represents an observed input X that enables accurate implementation of pattern recognition schemes. For this process, a mapping takes place such that X* is mapped to a vector Y.
nyyyY ,...,, 21
MIMILL
Isaacs 248 14 MAPLD2005/P249
5x5 Scaled Spatial FiltersUsed for Feature Extraction
% Gabor Filter 1gabor1 = [-16 -19 -20 -19 -16;... -36 -43 -46 -43 -36;... 0 0 0 0 0;... 36 43 46 43 36;... 16 19 20 19 16];gaborDiv = 1/1000;
mask = zeros(5,5,1);mask(:,:,1) = gabor1;maskDiv = [gaborDiv];
MIMILL
Isaacs 248 15 MAPLD2005/P249
Wavelet Review
waveletmothertheistwhere "")(dttfbaW abt )()(),(
Wavelet Transform:
The Wavelet Transform has variable window lengths that allow it greater flexibility when analyzing signals. Therefore, it becomes an attractive tool for signal analysis.
Scale1
Scale2
MIMILL
Isaacs 248 16 MAPLD2005/P249
Wavelet Review
)( abt Then, a mother wavelet is defined by :
Given a basis function :)(t
)( atThe dilation operation is indicated by :
MIMILL
Isaacs 248 17 MAPLD2005/P249
FIR Coefficients for Daubechies “7”
g(n) : high pass filter
h(n) : low pass filter
MIMILL
Isaacs 248 18 MAPLD2005/P249
FIR Implementation
Approximation is down-sampled and input to next level.
Detail is stored as coefficients.
MIMILL
Isaacs 248 19 MAPLD2005/P249
The Spectral Histogram Representation
Properties A spectral histogram is translation invariant. A spectral histogram is a nonlinear operator. With sufficient filters, a spectral histogram can uniquely represent any
image up to a translation. All the images sharing a spectral histogram define an equivalence class.
Preprocessing step in classification Choose N image filter kernels to convolve with the image. Perform the convolutions, generating n resultant responses. For each response, generate a response image histogram. Concatenate each of the histograms and send to the classifier.
MIMILL
Isaacs 248 20 MAPLD2005/P249
The Spectral Histogram Representation
1st step – choose N image filter kernels to convolve with the image. Filter kernels chosen carefully from several image filter banks including intensity:
δ(x,y), differencing or gradient filters, laplacian of gaussian filters:
Where t determines the scale of the filter, and finally the gabor filter defined by sine and cosine components:
2nd step – perform the convolutions, generating n resultant responses. To calculate each response pixel value, roughly m x n multiplies and adds must be
performed, where m x n is the dimension of chosen kernel. Here m = n. Thus for an M x N image a total of [k*M*N*(n)4]multiplies and adds must be
performed, where subscript k implies the kth filter.
MIMILL
Isaacs 248 21 MAPLD2005/P249
Feature Vector
Our feature vector is comprised of the spectral histograms of the images resulting from filtering
The feature vector is laid out as follows
Gabor Features | Haar Features | LoG Features| Wavelet Features
MIMILL
Isaacs 248 22 MAPLD2005/P249
Pattern Recognition:Neural Decision Tree
After the feature vectors have been created they are sent back to the host PC and tested against a Neural Decision Tree to determine the presence of selected objects or textures, e.g. faces, cars, or brick.
MIMILL
Isaacs 248 23 MAPLD2005/P249
Artificial Neural Network Model
Each node in the tree is comprised of an artificial neural network that is trained to separate the input into k classes. As the tree is traversed the leaf nodes represent objects or textures of interest.
Feedforward Neural Network Model
x0
x80
input
hidden
.
.
.
Y0
.
.
.
Yk
S0
S7
output
i j k
Feature Vector
Number of Branches at
Node n
.
.
.
MIMILL
Isaacs 248 24 MAPLD2005/P249
Other Pattern Recognition Techniques
Density Estimation Histogram Approach Parzen-window method Kn-Nearest-Neighbor Estimation
Principal Components Analysis Fisher Linear Discriminant MDA
Our future work aims at creating a library of generic modules implementing all of these discrimination techniques. These methods were supposed to have been completed prior to this submission but have been delayed.
MIMILL
Isaacs 248 25 MAPLD2005/P249
Summary of These Techniques
Kn-Nearest-Neighbor Estimation To estimate p(x) from n training samples, we center a cell about x and let it grow
until it captures kn samples, where kn is some specified function of n.
These samples are the kn nearest-neighbors of x.
If the density is high near x, the cell will be relatively small Therefore, good resolution.
Component Analysis and Discriminants How to reduce excessive dimensionality? Answer: Combine features. Linear methods project high-dimensional data onto lower dimensional space. Principal Components Analysis (PCA) - seeks the projection which best represents
the data in a least-square sense. Fisher Linear Discriminant - seeks the projection that best separates the data in a
least-square sense
MIMILL
Isaacs 248 26 MAPLD2005/P249
Summary of These Techniques Continued
Generalized Linear Discriminant Functions The linear discriminant function g(x) can be written as
By adding d(d+1)/2 additional terms involving the products of pairs of components of x, we obtain the quadratic discriminant function
The separating surface defined by g(x)=0 is a second-degree or hyperquadric surface.
By continuing to add terms such as we can obtain the class of polynomial discriminant functions.
01
( )d
i ii
g w w x
x
01 1 1
( ) .d d d
i i ij i ji i j
g w w x w x x
x
ijk i j kw x x x
MIMILL
Isaacs 248 27 MAPLD2005/P249
So, Why Move to Hardware?
Speed of classification is limited in software and with such a large database (Web), the faster the better.
For example, given a 128x128 8-bit gray scale image, the number of computations required to generate the spectral histogram for 10 5x5 filters is roughly 410k multiplies and 410k adds.
This is the main computational bottleneck. A general purpose -processor can only perform one or two
multiply/adds simultaneously (depending on the processor) Some FPGAs allow for up to 88 simultaneous multiply operations and
many adds to be performed in one or two clock cycles. The filtering algorithm is inherently parallelizable, therefore well
suited for a pipelined hardware implementation.
MIMILL
Isaacs 248 28 MAPLD2005/P249
Target Hardware:
Avnet’s Virtex II Pro Board
Uses Virtex II Pro XC2VP20 Many Options for I/O. 32 Bit PCI Bus has Data Throughput of Over 100 MB per Second.
MIMILL
Isaacs 248 29 MAPLD2005/P249
Hardware vs. Software Tradeoffs
Not all tasks have such a drastic speedup in hardware. Memory Accesses
Only one address per clock cycle can be read in SDRAM, Flash, or SRAM.
We require more than 32-bits per action, so we waist time reading data.
Possible to store more data in BRAM to create an initial data stack that would overcome future read times.
Combine hardware and software for optimal ease of design and speed of execution. Need to determine optimal compromise.
MIMILL
Isaacs 248 30 MAPLD2005/P249
Hardware Designs: Preliminary Test Designs and Final Implementations
MIMILL
Isaacs 248 31 MAPLD2005/P249
11x11 Filter Model Top Level
This 4 11x11 Filter bank design was the first test design. We felt that an 11x11 kernel would allow for the best representation of our Filter bank set.
MIMILL
Isaacs 248 32 MAPLD2005/P249
Filter Model: One Filter Bank
MIMILL
Isaacs 248 33 MAPLD2005/P249
11x11 Filter MAC System
MIMILL
Isaacs 248 34 MAPLD2005/P249
Filter Model: Filter MAC System
A down sampler reduces the capture register sample period to the output sample period. The block is configured with latency to obtain the most efficient hardware implementation. The down sampling rate is equal to the coefficient array length.
An addressable shift register (ASR) implements the input delay buffer. The address port runs n times faster than the data port, where n is the number of filter taps. The filter coefficients are stored in a ROM configured to use block memory.
A comparator generates the reset and enable pulse for the accumulator and capture register. The pulse is asserted when the address is 0 and is delayed to account for pipeline stages.
MIMILL
Isaacs 248 35 MAPLD2005/P249
Device Utilization Summary:Four 11x11 Image Filters
Selected Device : 2vp20ff896-6
Number of Slices: 7913 out of 9280 85% Number of Slice Flip Flops: 10644 out of 18560 57% Number of 4 input LUTs: 8770 out of 18560 47% Number of bonded IOBs: 67 out of 556 12% Number of GCLKs: 1 out of 16 6% ============================================= TIMING REPORT
Clock Information: -----------------------------------+------------------------+-------+ Clock Signal | Clock buffer(FF name) | Load | -----------------------------------+------------------------+-------+ clk | BUFGP | 15322 | -----------------------------------+------------------------+-------+ Timing Summary: --------------- Speed Grade: -6 Minimum period: 4.542ns (Maximum Frequency: 220.192MHz) Minimum input arrival time before clock: 3.006ns Maximum output required time after clock: 3.615ns Maximum combinational path delay: No path found
The4 11x11 Filter bank design device utilization left little room for other logic our target device. Since, we felt that an 11x11 kernel would allow for the best representation of our Filter bank set we decided to target additional devices to leave our options open.
MIMILL
Isaacs 248 36 MAPLD2005/P249
Device Utilization Summary:Six 11x11 Image Filters with New Target
Selected Device : 4vsx55ff1148-11 Number of Slices: 9543 out of 24576 38% Number of Slice Flip Flops: 11616 out of 49152 23% Number of 4 input LUTs: 9816 out of 49152 19% Number of bonded IOBs: 99 out of 642 15% Number of GCLKs: 1 out of 32 3% Number of DSP48s: 66 out of 512 12% ============================================== TIMING REPORT Clock Information: -----------------------------------+------------------------+-------+ Clock Signal | Clock buffer(FF name) | Load | -----------------------------------+------------------------+-------+ clk | BUFGP | 18732 | -----------------------------------+------------------------+-------+ Timing Summary: --------------- Speed Grade: -11 Minimum period: 6.632ns (Maximum Frequency: 150.790MHz) Minimum input arrival time before clock: 3.217ns Maximum output required time after clock: 3.546ns Maximum combinational path delay: No path found
This 6 11x11 Filter bank design device utilization left more room for other logic our new target device. However, we did not possess this device and therefore had to consider our in house options. Thus, we moved toward a more V2P20 friendly design.
MIMILL
Isaacs 248 37 MAPLD2005/P249
5x5 Spectral Histogram System Top:The Best Fit Option
MIMILL
Isaacs 248 38 MAPLD2005/P249
Device Utilization Summary: 5x5 with 10 Histograms
Selected Device : 2vp20ff896-6
Number of Slices: 8775 out of 9280 94% Number of Slice Flip Flops: 10768 out of 18560 58% Number of 4 input LUTs: 10274 out of 18560 55% Number of bonded IOBs: 343 out of 556 61% Number of MULT18X18s: 50 out of 88 56% Number of GCLKs: 1 out of 16 6% =============================================== TIMING REPORT
Clock Information: -----------------------------------+------------------------+-------+ Clock Signal | Clock buffer(FF name) | Load | -----------------------------------+------------------------+-------+ clk | BUFGP | 16755 | -----------------------------------+------------------------+-------+ Timing Summary: --------------- Speed Grade: -6 Minimum period: 4.758ns (Maximum Frequency: 210.172MHz) Minimum input arrival time before clock: 2.987ns Maximum output required time after clock: 6.322ns Maximum combinational path delay: No path found
Note that a pipelined implementation without explicit use of the embedded multipliers exceeds the number of slices at 108%.
MIMILL
Isaacs 248 39 MAPLD2005/P249
5x5 Filter Systems for Spectral Histogram
MIMILL
Isaacs 248 40 MAPLD2005/P249
5x5 Gabor Filter Subsystem
MIMILL
Isaacs 248 41 MAPLD2005/P249
Histogram Subsystem
MIMILL
Isaacs 248 42 MAPLD2005/P249
Mcode Block for Histogram Bin-Sorter
function [bin10,bin9,bin8,bin7,bin6,bin5,bin4,bin3,bin2,bin1] = xhist(input1)
bin10 = 0;bin9 = 0;bin8 = 0;bin7 = 0;bin6 = 0; bin5 = 0;bin4 = 0;bin3 = 0;bin2 = 0;bin1 = 0; if input1 >= 224; bin10 = 1; elseif input1 >=180; bin9 = 1; elseif input1 >=158; bin8 = 1; elseif input1 >=136; bin7 = 1; elseif input1 >=114; bin6 = 1; elseif input1 >=92; bin5 = 1; elseif input1 >=70; bin4 = 1; elseif input1 >=48; bin3 = 1; elseif input1 >=26; bin2 = 1; else bin1 = 1; end;
MIMILL
Isaacs 248 43 MAPLD2005/P249
ModelSim Waveform Snapshot
Histogram results for Gabor Filter 2 with Bin Ranges shown on the previous slide. Also, note that there is a 16 clock cycle delay before the bin sort result is posted.
MIMILL
Isaacs 248 44 MAPLD2005/P249
Simulation Filter Results
MIMILL
Isaacs 248 45 MAPLD2005/P249
Simulation Histogram Results
MIMILL
Isaacs 248 46 MAPLD2005/P249
Conclusions/Future Work
In addition to the other pattern recognition techniques mentioned above, we intend optimize the PC/FPGA interfacing to create our own low-cost integrated system. Our problems currently reside on the PCI interface design shipped with the Avnet
Development Board. We are working hard to resolve this issue, but in the end we may have to consider another board.
We also wish to time the results (how many images can we process per second); is it real-time?
Possibly move to a board with better interfacing tools, as well as faster interfacing via PCI-X or PCI express, or DMA capabilities.
Finally, optimize calculating efficiency of the image analysis algorithm, i.e., consider a multi-stage pipeline with more efficient memory access algorithms.
The ultimate goal is to do real time search and recognition utilizing FPGAs as co-processors.
top related